Large Language Models (LLMs) like OpenAI’s GPT4 is trained on loads of data.
Retrieval Augmented Generation or RAG, is a technique used to give a LLM context on external knowledge that it is not aware of.
This data can be:
RAG techniques can also be used to direct the LLM to a predictable response. Using these techniques, the model has a lower chance to pull information embedded into its training data and tuning parameters.
This reduces the chance that the LLM will “hallucinate” an incorrect answer.
Lets take a look at an example: When asking GPT4 a question most of us go to chat.openai.com.
Here is where we can ask questions or prompt GPT.
A prompt could look something like this:
What is the name of Elon's youngest son?
Here is what a prompt could look like using RAG:
Use the following context to answer the question:
Question: What is the name of Elon's youngest son?
Context: Musk tweeted, later adding: “Mom & baby all good.” When asked to reveal the name of his son, Musk tweeted: “X Æ A-12 Musk.” Reported by <https://www.independent.co.uk/>
The example above has us inputting the context directly into chat.openai.com. Which doesn’t make a lot of sense 😅. Why would we search first on google for a question get answer and ask Chatgpt to regurgitate the answer to us in better words.
However, you could imagine that with code, we have a lot more flexibility on how we create the context and how we send data over to GPT-4 using OpenAI’s API.
We could programmatically create a prompt that includes a question and context. We could reformat the prompt which in turn can reap better results from GPT-4. Or maybe we could create that context by grabbing data from an API, search or scrape from the web, grab data from a database and more.