A short while back, I came across a concept called Retrieval-Augmented Generation, or RAG for short, that aimed to improve AI responses. Since then, I have been researching and doing hands-on building with it extensively and I realized that the explanations I read didn’t quite hit the mark, so let’s clear things up.

To explain RAG, we first need to review LLMs. Large Language Models (LLMs) are the advanced brains behind tools like ChatGPT. Think of them as highly intelligent librarians who have read vast amounts of text and learned from it. They’re not just programmed with a fixed set of responses; instead, they can understand and generate human-like text by predicting what word should come next in a sentence, much like a seasoned writer might. This allows them to engage in conversations, answer questions, and even create content that feels natural and fluid. When you interact with ChatGPT, you’re actually chatting with an LLM. It uses what it’s learned from a wide range of sources to talk with you about almost anything under the sun, from Shakespeare’s plays to the latest scientific breakthroughs.

Think of RAG as a smart assistant who fetches useful information to help answer questions on a specific subset of knowledge more accurately than a more generalized LLM could. Imagine you’re working on a college project, and you ask your AI, which is an LLM, a question. Now, instead of just relying on what it has been previously taught, or its ‘knowledge’, it first gathers data related to your question from a range of specific sources to provide a well-informed answer.

Let’s put this into a real-world scenario. I have a friend who is a public figure advocating for the use of MDMA in therapeutic settings. She’s quite active online, sharing her insights across various platforms like her blog, website, YouTube, Facebook, TikTok, and so on. I created a tool using RAG that compiles all her published work into a single database. With her permission, I was able to pull her texts and videos from these different outlets. Now, when I pose a question to this system, it not only provides an answer but also points me to the exact place in her content where the topic is discussed, complete with links and timestamps.

For instance, when I asked it:

When will MDMA be legal?

 It responded:

The FDA approval for MDMA-assisted therapy for PTSD is anticipated in early 2024. Once the FDA approves this therapy, the DEA will need to reschedule MDMA, and individual states will have to determine how to regulate and implement this particular healing modality.

Relevant YouTube video link: MDMA Therapy for PTSD – Clinical research being done at MAPS.org and FDA Evaluation in 2023

In other words, it responded directly from the video. This seamless integration happens because, behind the scenes, I gathered and merged all of her content, applied data tagging for easy reference, and created a database. This database was then indexed — think of indexing like creating a highly organized library catalog that the AI can quickly search through — to make the data easily accessible to the LLM.

Initially, I assumed that collecting, preparing, and organizing this data would be the most challenging part. However, Indexing and creating a tailored ‘prompt’, which is essentially a detailed request or instruction that guides the AI’s response, required a lot more effort. The part I thought would be easy was hard and the part I thought would be hard was easy!

Now, let’s talk about ‘prompt engineering’. This is the art of crafting the perfect question to elicit the best response from an AI. You might think it’s like becoming a search engine whiz, but it’s a bit different for LLMs. They interpret our language into something called ‘word vectors’, which means they understand the essence of what’s being asked, regardless of how it’s phrased. In other words, to the LLM the phrase “The sun dipped below the horizon, signaling the end of the day.” is very closely related to the phrase: “Nightfall arrived as the celestial orb retreated from view.” This is because the meaning is the same even though the words greatly differ.

It seemed to me that if we had special roles for “Prompt Engineers” we should have special roles for “Google Searchers”. In fact, LLMs are probably more forgiving to spelling errors and miswording than a google search.

However, with RAG proper prompts are much more important. Since it combines additional data with your question, the way you set up the prompt needs to be precise to make the most of that data. When working with my friend’s video content, I had to fine-tune the system to reference videos by their titles, figure out how to extract timestamps, and display URLs, among numerous other adjustments.

In summary, RAG brings a dynamic twist to AI, where the question you ask is just the beginning. It’s about how the AI can retrieve relevant information and present it in a clear, helpful way. This isn’t just about understanding language — it’s about making connections between vast amounts of information to provide the best possible answer.


Discover more from Lowry On Leadership

Subscribe to get the latest posts sent to your email.

2 responses to “The Latest Wave in AI: Retrieval-Augmented Generation Explained”

  1. […] blogged recently giving an overview of RAG AI, a fairly new and advanced concept. That post explained some of the newer things in AI. This post […]

  2. […] seemed an interesting task for RAG AI, a newer frontier in AI wherein AI is used to search and answer questions on a limited set of data […]

Leave a Reply

Quote of the week

“AI will probably most likely lead to the end of the world, but in the meantime, there’ll be great companies.”

~ Sam Altman (apocryphal)

Designed with WordPress

Discover more from Lowry On Leadership

Subscribe now to keep reading and get access to the full archive.

Continue reading