Publications & Preprints

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Preprint (arXiv 2311.07562)

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

The Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

Multimodal Procedural Planning via Dual Text-Image Prompting

Multimodal Procedural Planning via Dual Text-Image Prompting

Preprint (arXiv 2305.01795)

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

The Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS D&B 2023)

OpenFlamingo: An Open-Source Framework for Training Vision-Language Models with In-Context Learning

OpenFlamingo: An Open-Source Framework for Training Vision-Language Models with In-Context Learning

Preprint (arXiv 2308.01390)

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Preprint (arXiv 2210.05836)

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

The 11th International Conference on Learning Representations (ICLR 2023, Spotlight)

End-to-end Dense Video Captioning as Sequence Generation

End-to-end Dense Video Captioning as Sequence Generation

The 29th International Conference on Computational Linguistics (COLING 2022)

Diagnosing Vision-and-Language Navigation: What Really Matters

Diagnosing Vision-and-Language Navigation: What Really Matters

The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022, Oral)

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation

Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation

The 57th Annual Meeting of the Association for Computational Linguistics:System Demonstrations (ACL 2019 System Demonstration)

Text Infilling

Text Infilling

Preprint (arXiv 1901.00158)