RAG: Let ChatGPT handle the formation of contexts
Retrieval/Context formation is one of the most important things in the RAG pipeline. This is the retrieval strategy we are using at EazyRAG.
I am currently building a product called EazyRAG. It is a dead simple API for Retrieval-Augmented Generation (RAG). In simple words, it is basically an API for ChatGPT with your own data.
In this article, I will share the retrieval strategy I use to form the final context formation at EazyRAG. Before diving into the article, there is one concept you need to understand.
What is RAG?
LLMs such as GPT lack factual consistency and lack up-to-date knowledge. To fix this, RAG is a technique that retrieves related documents to the user's question, combines them with LLM-base prompt, and sends them to LLMs like GPT to produce more factually accurate generation.
I've written an extensive article on building an RAG pipeline without using Langchain. It will give you a good idea of what RAG is.
Here is what the RAG (Before) pipeline looks like.
Retrieval/Context formation
Now, here is how we differ from the general RAG pipeline: we don't just retrieve chunks shove them under the base prompt, and call it a day. This approach may work for questions like 'What is the height of the Eiffel Tower?' but may not be effective all the time. Just imagine, intuitively, you have an article called 'How to make spaghetti,' and you've split the long article into pieces and indexed it. When a user queries it, you retrieve all the relevant chunks. Herein lies the problem: some general cooking instructions, such as 'boil the water and season it,' won't have a relatable relevance score to the user's search query, which, in turn, results in poor context formation and increases hallucinations.
Not only that but just shoving unstructured messy chunks will also decrease the quality of the answer.
Here is the comparison between Langchain (code taken from the Quickstart example) and EazyRAG.
Langchain
from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator
loader = WebBaseLoader("https://bun.sh/docs/installation")
index = VectorstoreIndexCreator().from_loaders([loader])
index.query("How do I install this on a Mac?")
Output by Langchain
" You can install Bun on a Mac using the Homebrew package manager. Run the command 'brew tap oven-sh/bun' followed by 'brew install bun'."
And here is the answer by EazyRAG
Not to brag, but I think EazyRAG's answer is better with multiple options. But, what went wrong with Langchain is that when I dug into the final prompt, all the chunks retrieved were in a confusing format without proper spaces. Anyway, this is not a rant about Langchain; I have huge respect for that project.
So how does EazyRAG do it?
When I was coding this layer, I knew that this API had to work with all types of datasets and with any type of search intent. I couldn't manually customize the context formation for every type of dataset, so why not let GPT itself handle context formation?
Here's how it works:
When a user queries it, we retrieve the relevant documents related to the document title and send them to GPT to classify whether we should consider feeding the entire documents for this user query. If so, which documents should we consider? Then, we feed all the documents mentioned by the GPT classifier. If the GPT classifier returns nothing, we simply shove all the chunks into base prompts and send them to LLM for answer generation. And it is working great!
In fact, this is an early version of this idea. I have a lot of ideas surrounding this concept of "GPT itself handling context formation" that I need to try.
So, what do you think of this idea? I'd be happy to hear your feedback and improve EazyRAG.
This is an interesting idea! Does it make it more costly to generate a response though? Since you need to make multiple api calls to chatgpt api and include the documents in most of them.
Not sure i understand this part: "... fetching relevant documents related to the document title"... So to clarify, are you fetching the relevant documents related to the user's query, then reranking the docs to figure out which to include in the final prompt? Or are you fetching additional documents based on the document title of initial documents retrieved based on the user's query?