Vision-and-Language

Multimodal Procedural Planning via Dual Text-Image Prompting

Preprint (arXiv 2305.01795)

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Preprint (arXiv 2304.06939)

OpenFlamingo: An Open-Source Framework for Training Vision-Language Models with In-Context Learning

Stay-tuned for the technical report!

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023, Findings)

End-to-end Dense Video Captioning as Sequence Generation

The 29th International Conference on Computational Linguistics (COLING 2022)

Imagination-Augmented Natural Language Understanding

The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022, Oral)

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023, Findings)

Diagnosing Vision-and-Language Navigation: What Really Matters

The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022, Oral)

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

The 16th conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Short)