Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Recommendation engines
Recommendation engines

Retrieval-augmented generation

Retrieval-augmented generation (RAG) is a technique that improves the quality of outputs from large language models (LLMs) used in Generative AI (GenAI) and reduces the chance of false or inaccurate outputs — what are usually called ‘hallucinations’. It works by augmenting outputs with information from a verified source. 

 

The advantages are much like the benefits of an ‘open-book’ over ‘closed book’ exam — the LLM can refer to a trustworthy source of information in addition to those on which it’s been trained.

What is it?

 A technique for improving the reliability and accuracy of LLM outputs in generative AI applications.

What’s in it for you?

 RAG can help you minimize the risk of hallucinations and build more reliable generative AI products without the need for further training.

What are the trade-offs?

RAG can slow down LLM speeds and requires additional infrastructure, which requires more time and money.

How is it being used?

 It’s being used to develop more reliable and effective products helping increase confidence that they can be used in production.

What is retrieval-augmented generation?

 

Retrieval-augmented generation is a technique that improves the accuracy and reliability of LLM outputs. It effectively works as a kind of add-on to an LLM, supporting the model by connecting it with additional external sources of knowledge that are relevant and authoritative in the domain in which the LLM is being used.

 

It essentially ‘tops up’ the LLM’s knowledge, ensuring its outputs are less likely to ‘hallucinate’. In AI applications, hallucination is a common issue where results containing false or misleading information are presented as fact.

What’s in it for you?

 

LLMs are prone to producing erroneous results; RAG makes them more accurate and reliable. That means they can be deployed with greater confidence and fewer risks that they’ll damage your reputation or undermine customer experience. 

 

It’s true that RAG isn’t the only way to improve LLM performance. You can, for example, continue to train or fine-tune a model. However, this can raise a number of issues, such as computational costs. RAG can help you get a more specific and accurate LLM without fine-tuning.

What are the trade-offs of retrieval augmented generation?

 

Although RAG can improve LLM performance, there are a number of drawbacks that need to be considered before using them.

 

  • Performance: Searching and processing external data can be slower than a standard LLM response.

  • Resources: Setting up and maintaining RAG requires additional infrastructure and data storage compared to a basic LLM.

  • Relevancy: The retrieved information might not always be perfectly on point, which is why the retrieval system needs to be carefully designed.

How is retrieval-augmented generation being used?

 

RAG is being used in a number of ways. It’s being used in search engines, for example, to better understand search intent and provide more relevant results. RAG is also improving the quality of chatbots, helping them to deliver more accurate results, thus providing a much better customer experience and, ultimately, ensuring generative AI is more impactful for a business. 


It’s also particularly useful in contexts like translation and document summarization — using a RAG architecture that retrieves knowledge from specific resources can ensure that the system's outputs are more reliable. Thoughtworks used RAG to improve the outputs of a chatbot designed to help people in India find information about government support services in their native language. Given these services often are often changed, RAG helps ensure information is up-to-date so users can trust and rely upon the tool.

Find out how we can work together so you can get more from AI