LLaVA

Technology Radar

Published : Apr 03, 2024

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Apr 2024

Assess

LLaVA (Large Language and Vision Assistant) is an open-source, large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding. LLaVA's strong proficiency in instruction-following positions it as a highly competitive contender among multimodal AI models. The latest version, LLaVA-NeXT, allows for improved question answering. Among the open-source models for language and vision assistance, LLaVA is a promising option when compared to GPT-4 Vision. Our teams have been experimenting with it for visual question answering.