LLMLingua

Technology Radar

Published : Oct 23, 2024

Oct 2024

Assess

LLMLingua enhances LLM efficiency by compressing prompts using a small language model to remove nonessential tokens with minimal performance loss. This approach allows LLMs to maintain reasoning and in-context learning while efficiently processing longer prompts, which addresses challenges like cost efficiency, inference latency and context handling. Compatible with various LLMs without additional training and supporting frameworks like LLamaIndex, LLMLingua is great for optimizing LLM inference performance.