- MM1: Strategies, Evaluation & Insights from Multimodal LLM Pre-training(arXiv)
Writer : Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Mark Lee , et al. (5 extra authors not proven)
Summary : On this work, we talk about constructing performant Multimodal Giant Language Fashions (MLLMs). Particularly, we examine the significance of varied structure elements and information decisions. By cautious and complete ablations of the picture encoder, the imaginative and prescient language connector, and numerous pre-training information decisions, we recognized a number of essential design classes. For instance, we show that for large-scale multimodal pre-training utilizing a cautious mixture of image-caption, interleaved image-text, and text-only information is essential for attaining state-of-the-art (SOTA) few-shot outcomes throughout a number of benchmarks, in comparison with different revealed pre-training outcomes. Additional, we present that the picture encoder along with picture decision and the picture token rely has substantial impression, whereas the vision-language connector design is of comparatively negligible significance. By scaling up the offered recipe, we construct MM1, a household of multimodal fashions as much as 30B parameters, consisting of each dense fashions and mixture-of-experts (MoE) variants, which are SOTA in pre-training metrics and obtain aggressive efficiency after supervised fine-tuning on a variety of established multimodal benchmarks. Because of large-scale pre-training, MM1 enjoys interesting properties akin to enhanced in-context studying, and multi-image reasoning, enabling few-shot chain-of-thought prompting
2.Taming Pre-trained LLMs for Generalised Time Sequence Forecasting by way of Cross-modal Information Distillation (arXiv)
Writer : Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia
Summary :