LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Preprint (arXiv 2305.15393)

Multimodal Procedural Planning via Dual Text-Image Prompting

Preprint (arXiv 2305.01795)

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text

Preprint (arXiv 2304.06939)

OpenFlamingo: An Open-Source Framework for Training Vision-Language Models with In-Context Learning

Stay-tuned for the technical report!

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

Preprint (arXiv 2305.11317)

Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning

Preprint (arXiv 2301.11916)

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Preprint (arXiv 2210.05836)

Text Infilling

Preprint (arXiv 1901.00158)