Vision-and-Language

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

The Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS D&B 2023)

Multimodal Procedural Planning via Dual Text-Image Prompting

Preprint (arXiv 2305.01795)

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

The Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS D&B 2023)

OpenFlamingo: An Open-Source Framework for Training Vision-Language Models with In-Context Learning

Preprint (arXiv 2308.01390)

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023, Findings)

End-to-end Dense Video Captioning as Sequence Generation

The 29th International Conference on Computational Linguistics (COLING 2022)

Imagination-Augmented Natural Language Understanding

The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022, Oral)

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023, Findings)

Diagnosing Vision-and-Language Navigation: What Really Matters

The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022, Oral)

Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

The 16th conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)