- MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training(arXiv)
Author : Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee , et al. (5 further authors not confirmed)
Abstract : On this work, we speak about setting up performant Multimodal Large Language Fashions (MLLMs). Notably, we look at the importance of assorted construction parts and data choices. By cautious and full ablations of the image encoder, the imaginative and prescient language connector, and quite a few pre-training data choices, we acknowledged various important design lessons. For example, we present that for large-scale multimodal pre-training using a cautious combination of image-caption, interleaved image-text, and text-only data is crucial for attaining state-of-the-art (SOTA) few-shot outcomes all through various benchmarks, as compared with completely different revealed pre-training outcomes. Extra, we current that the image encoder together with image choice and the image token rely has substantial impression, whereas the vision-language connector design is of comparatively negligible significance. By scaling up the provided recipe, we assemble MM1, a family of multimodal fashions as a lot as 30B parameters, consisting of every dense fashions and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and acquire aggressive effectivity after supervised fine-tuning on quite a lot of established multimodal benchmarks. Due to large-scale pre-training, MM1 enjoys attention-grabbing properties akin to enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting
2.Taming Pre-trained LLMs for Generalised Time Sequence Forecasting by the use of Cross-modal Info Distillation (arXiv)
Author : Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia
Abstract :