Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Apr 03, 2024
Apr 2024
Assess ?

LLaVA (Large Language and Vision Assistant) is an open-source, large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding. LLaVA's strong proficiency in instruction-following positions it as a highly competitive contender among multimodal AI models. The latest version, LLaVA-NeXT, allows for improved question answering. Among the open-source models for language and vision assistance, LLaVA is a promising option when compared to GPT-4 Vision. Our teams have been experimenting with it for visual question answering.

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes