Search

An Yan

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
CLIP also Understands Text: Prompting CLIP for Phrase Understanding
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation

Published with Wowchemy — the free, open source website builder that empowers creators.