OpenFlamingo: An Open-Source Framework for Training Vision-Language Models with In-Context Learning

Abstract

We are thrilled to announce the release of OpenFlamingo, an open-source reproduction of DeepMind’s Flamingo model. At its core, OpenFlamingo is a framework that enables training and evaluation of large multimodal models (LMMs). Check out our GitHub repository and demo to get started! For this first release, our contributions are as follows (1) ๐Ÿ‹๏ธ A Python framework to train Flamingo-style LMMs (based on Lucidrains' flamingo implementation and David Hansmair’s flamingo-mini repository). (2) ๐Ÿช… A large-scale multimodal dataset with interleaved image and text sequences. (3) ๐Ÿงช An in-context learning evaluation benchmark for vision-language tasks. (4) ๐Ÿค– A first version of our OpenFlamingo-9B model based on LLaMA, with much better models to come!

Publication
On arXiv
Wanrong Zhu
Wanrong Zhu
CS Ph.D. Candidate

My research interests include vision-and-language problems and text generation.

Related