Search

William Yang Wang

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Multimodal Procedural Planning via Dual Text-Image Prompting
Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
CLIP also Understands Text: Prompting CLIP for Phrase Understanding
Neuro-Symbolic Causal Language Planning with Commonsense Prompting
End-to-end Dense Video Captioning as Sequence Generation
Imagination-Augmented Natural Language Understanding
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Diagnosing Vision-and-Language Navigation: What Really Matters
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

Published with Wowchemy — the free, open source website builder that empowers creators.