Overview

Large Language Models (LLMs) like OpenAI’s GPT4 is trained on loads of data.

Retrieval Augmented Generation or RAG, is a technique used to give a LLM context on external knowledge that it is not aware of.

This data can be:

personal data such as your phonebook or journal entries
private company data
current data that predates its training data

RAG techniques can also be used to direct the LLM to a predictable response. Using these techniques, the model has a lower chance to pull information embedded into its training data and tuning parameters.

This reduces the chance that the LLM will “hallucinate” an incorrect answer.

Lets take a look at an example: When asking GPT4 a question most of us go to chat.openai.com.

Here is where we can ask questions or prompt GPT.

A prompt could look something like this:

What is the name of Elon's youngest son?

Here is what a prompt could look like using RAG:

Use the following context to answer the question:
Question: What is the name of Elon's youngest son?
Context: Musk tweeted, later adding: “Mom & baby all good.” When asked to reveal the name of his son, Musk tweeted: “X Æ A-12 Musk.” Reported by <https://www.independent.co.uk/>

The example above has us inputting the context directly into chat.openai.com. Which doesn’t make a lot of sense 😅. Why would we search first on google for a question get answer and ask Chatgpt to regurgitate the answer to us in better words.

However, you could imagine that with code, we have a lot more flexibility on how we create the context and how we send data over to GPT-4 using OpenAI’s API.

We could programmatically create a prompt that includes a question and context. We could reformat the prompt which in turn can reap better results from GPT-4. Or maybe we could create that context by grabbing data from an API, search or scrape from the web, grab data from a database and more.