Retrieval Augmented Generation tutorial and OpenAI example

August 8, 2024

Last update: August 19, 2024

10 min read

226

Retrieval Augmented Generation tutorial and OpenAI example

In the digital era, where access to vast amounts of data is ubiquitous, businesses seek ways to efficiently process and utilize this information. One of the latest advancements in natural language processing (NLP) is the technique of Retrieval Augmented Generation (RAG). By combining information retrieval mechanisms with text generation, RAG offers precise, contextually rich responses, providing significant support in scaling business operations. Let’s explore how RAG works, its benefits, and its applications.

Retrieval Augmented Generation – How it works?

RAG merges two main components: information retrieval and response generation. Traditionally, NLP systems were divided into these two areas, limiting their ability to deliver comprehensive and up-to-date responses. RAG overcomes these barriers by integrating retrieval and generation in one process.

Information retrieval stage

When a user asks a question, the RAG system searches available information sources, such as databases, documents, scientific articles, or websites. Using advanced algorithms, the system identifies the most relevant text fragments that can help answer the query. Retrieval is done using techniques like TF-IDF, BM25, or models like BERT.

Retrieval process:

Query encoding -the user’s query is processed by NLP algorithms that convert it into a vector representing the query’s semantic meaning. This process can utilize models like BERT, trained on large text corpora to understand the context and meaning of words in the query.
Database search -the encoded query is compared to a database of documents, also previously encoded into vectors. Algorithms like TF-IDF (Term Frequency-Inverse Document Frequency) or BM25 (Best Matching 25) evaluate the similarity between the query and the documents, identifying the most relevant ones.
Selecting the most relevant documents – among the identified documents, the system selects those with the highest similarity scores. Selection can be based on various criteria, such as relevance, timeliness, or source authority. Filters may also be applied to eliminate less valuable sources.
Extracting key fragments – from the selected documents, the system extracts the most relevant text fragments. Extraction can include entire paragraphs, sentences, or even shorter fragments that directly address the user’s query. This process can be aided by additional NLP models evaluating the quality and relevance of the extracted fragments.

Response generation stage

After identifying the relevant documents, the system proceeds to the response generation stage. Using advanced generative models, such as GPT-3 or GPT-4 by OpenAI, the system creates a coherent and logical response based on the found information. These models are trained on massive text datasets and can generate high-quality texts that are contextually accurate.

Generation process:

Document analysis and processing – at this stage, the system processes the retrieved documents, identifying key information to be used in answering the query. Generative models analyze the context and meaning of the text to understand how best to integrate this information coherently.
Formulating the response – after gathering the necessary information, the generative model formulates the response. This involves creating new text that not only answers the question but also presents the information clearly and understandably. Models like chat GPT-4 can generate texts that are natural and fluid due to advanced training on diverse datasets.
Context integration – generative models integrate the query’s context with the answers found in the documents. This means the response is not just a simple repetition of information but considers the user’s specific needs and context, adjusting the tone and style of the reply.
Verification and accuracy – the generated response is then verified for factual and grammatical accuracy. NLP models use self-check mechanisms and can be assisted by additional algorithms that check the consistency of facts and logical coherence of the response.
Presentation of the response – the final response is presented to the user. The system can adjust the response format depending on the medium and specific user requirements, for example, providing answers in written, graphical, or multimedia forms.
Continuous learning – RAG systems can also learn continuously by analyzing user feedback and adjusting their answer generation algorithms to improve the relevance and quality of future responses.

Key benefits for business of using RAG Retrieval Augmented Generation

Precision and accuracy – RAG offers higher precision and accuracy compared to traditional generative systems. By integrating retrieval mechanisms, RAG can provide responses based on the latest and most relevant data, crucial for making informed business decisions.
Timeliness of information – In a dynamically changing business environment, access to current information is invaluable. RAG systems can search and utilize the latest available data, ensuring that responses are always in line with the latest trends and discoveries.
Flexibility and versatility -RAG systems are flexible and versatile, capable of answering a wide range of questions and issues. This flexibility is particularly valuable in scaling processes, where the diversity of problems and questions can be significant. RAG allows for quick adaptation to new challenges and needs.

Applications of RAG in scaling online business

Decision support – supporting management in decision-making is one of the most important applications of RAG. By quickly searching for and analyzing the latest data, RAG can provide key information needed for strategic decisions. For example, analyzing market trends, assessing risks associated with new investments, or predicting consumer behavior can be significantly enhanced by RAG.
Customer service optimization – RAG can significantly improve the quality and efficiency of responding to customer inquiries. These systems can automatically search for information about products, services, company policies, and other important issues to provide precise answers to customer questions. This increases customer satisfaction and reduces the burden on customer service departments.
User Experience personalization -with RAG, companies can deliver more personalized user experiences by analyzing their behaviors and preferences and generating tailored responses and recommendations. This can lead to increased customer loyalty and higher conversion rates.
Process automation – RAG systems can be used to automate various processes, from knowledge management to generating reports and analyses. Automating these tasks increases operational efficiency and allows for more strategic use of human resources.

Retrieval Augmented Generation and hallucination

Understanding hallucination in NLP

In NLP, “hallucination” refers to generating information by a plausible model that is factually incorrect or irrelevant. This issue arises particularly in generative models like those used in RAG. While these models are adept at producing coherent and contextually appropriate responses, they can sometimes generate outputs that include inaccuracies or entirely fabricated information.

Causes of hallucination in RAG

Hallucination in RAG systems can stem from several sources:

Training data limitations – generative models like GPT-3 or GPT-4 are trained on vast datasets from the internet and other sources. If the training data contains inaccuracies or biases, the model may inadvertently propagate these errors in its responses.
Contextual misalignment – even when relevant information is retrieved, the model may misinterpret the context or fail to correctly align the context of the retrieved information with the query, leading to misleading or incorrect outputs.
Overgeneralization – generative models may sometimes generalize beyond the specific data retrieved, creating responses that include assumptions not supported by the underlying information.
Insufficient retrieval – if the retrieval phase fails to find sufficiently relevant or specific documents, the generative model may attempt to fill in gaps, leading to the creation of information that is not actually present in the retrieved sources.

Mitigating hallucination in RAG systems

To minimize the occurrence of hallucination in RAG systems, several strategies can be employed:

Enhanced training data curation – ensuring that the training data is accurate, diverse, and representative of the desired knowledge base can help reduce the likelihood of hallucination. This involves careful selection and filtering of data sources.
Improved retrieval techniques – utilizing more sophisticated retrieval algorithms, such as those based on transformer models like BERT, can enhance the relevance and accuracy of the documents retrieved, providing a more solid foundation for the generative phase.
Verification mechanisms – implementing verification steps where the generated output is cross-checked against factual databases or additional sources can help identify and correct hallucinations.
User feedback integration – continuously incorporating user feedback can assist in identifying patterns of hallucination and refining the system’s responses accordingly.
Prompt design and context control – careful design of prompts and context control mechanisms can guide the generative model to stay within the bounds of verified information, reducing the risk of drifting into speculative or incorrect content.

Example scenario – Hallucination in customer service

Consider a RAG system implemented in a customer service chatbot. If a customer asks about the features of a newly released product, and the retrieval phase fails to fetch the latest product specifications due to incomplete indexing, the generative model might generate features based on older models or even fabricate plausible-sounding features that don’t exist. This can lead to customer confusion and dissatisfaction.

To address this, the system could include a verification step where the generated features are cross-checked against the latest product database before presenting the response to the customer.

Challenges and limitations of RAG

Implementing RAG comes with challenges. One of the main challenges is managing the vast amount of data that can be searched. Efficient indexing and search algorithms are necessary for RAG systems to operate quickly and accurately. Another challenge is ensuring the quality of generated responses. Generative models must be carefully trained to avoid errors and inaccuracies. Monitoring and evaluating response quality is crucial for maintaining high standards.

Hallucination in RAG systems also represents a significant challenge, particularly in domains where accuracy is crucial. By understanding the causes and implementing robust mitigation strategies, developers can enhance the reliability and trustworthiness of these systems. As RAG technology continues to evolve, ongoing research and development efforts will be key to minimizing hallucination and maximizing the benefits of this powerful approach.

Integration of RAG with language models

To effectively combine RAG with language models, it’s essential to understand how these two components can work together. This process involves several key stages:

Query encoding – the user’s query is encoded into text vectors.
Information retrieval -the encoded query is compared to a database of information to identify the most relevant documents.
Response generation – the retrieved documents are combined with the query and processed by the language model to generate a response.

RAG use cases

Medicine – supporting medical decisions is one of the key applications of RAG. Doctors can use RAG systems to get up-to-date information on the latest research, therapies, and medical recommendations. This enables them to make more informed decisions based on the latest medical knowledge.
Scientists – scientists can use RAG to search scientific literature and get synthetic answers to research questions. These systems can automatically search databases like PubMed to find the latest publications and provide summaries of key findings.
Education -RAG also finds applications in education. Students can use these systems to get answers to questions related to their studies. These systems can provide detailed and understandable explanations, helping in better understanding complex topics.
Knowledge base for business – RAG systems can serve as an invaluable resource for businesses by providing a centralized knowledge base. They can automatically pull from various internal and external sources to provide employees with up-to-date information on company policies, product details, market trends, and industry news. This allows for more informed decision-making and efficient access to necessary information. For instance, a sales team can use an RAG system to quickly access product specifications, customer preferences, and competitive analyses, enabling them to tailor their pitches and strategies more effectively. Additionally, management can leverage these systems to stay abreast of emerging trends and potential risks, ensuring that business strategies remain agile and informed.

Simple step-by-step tutorial – Implementing RAG

Prerequisites

Before we begin, ensure you have the following:

Python installed on your machine.
An API key from OpenAI for accessing GPT-4.
Basic understanding of Python and NLP.

Step 1 – Setting up the environment

First, install the necessary libraries:

pip install openai 
pip install transformers 
pip install torch 
pip install requests

Step 2 – Preparing the retrieval component

For the retrieval part, we’ll use a simple TF-IDF model. You can replace this with more advanced retrieval systems like BM25 or even a neural-based retriever like BERT.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Sample documents
documents = [
    "createIT is a software and web development company.",
    "Located in Poland.",
    "Established in 2002.",
    "Provides IT solutions and services.",
    "Specializes in web and mobile app development.",
    "Offers digital marketing services.",
    "Focuses on quality-oriented people.",
    "Builds long-term client relationships.",
    "Developed over 40,000 websites.",
    "Operates on five continents.",
    "Provides developer outsourcing.",
    "Works with iGaming, eCommerce, eHealth, real estate, and industry sectors.",
    "Uses technologies like PHP, Symfony, Flutter, and blockchain.",
    "Offers SEO and SEM services.",
    "Delivers custom and ready-made solutions.",
    "Implements blockchain for supply chain apps.",
    "Provides consultation and project implementation services.",
    "Experienced in app and mobile design.",
    "Provides full platform design and maintenance.",
    "Focuses on achieving stable business growth for clients."
]

# Create the TF-IDF vectorizer and transform the documents
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)

Step 3 – Query processing and retrieval

Next, we’ll implement the retrieval process to find the most relevant document for a given query.

# Sample query
query = "Tell me about createIT"

# Transform the query using the same vectorizer
query_vec = vectorizer.transform([query])

# Compute cosine similarities between the query and the documents
cosine_similarities = cosine_similarity(query_vec, tfidf_matrix).flatten()

# Get the index of the most relevant document
most_relevant_doc_index = np.argmax(cosine_similarities)
most_relevant_doc = documents[most_relevant_doc_index]

Step 4 – Generating the answer with chat GPT-4o

Now that we have the most relevant document, we can use GPT-4o to generate a detailed answer.

# Set your OpenAI API key
key = 'your-api-key'

# Construct the prompt for gpt-4o-mini
prompt = f"Based on the following document, answer the query:\n\nDocument: {most_relevant_doc}\n\nQuery: {query}\n\nAnswer:"

# Function to handle streaming responses

def stream_openai_response(prompt):
    client = OpenAI(api_key=key)

    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
          
# Generate the response using GPT-4o-mini with streaming
print(f"Query: {query}\n")
print("Most relevant document:", most_relevant_doc)
stream_openai_response(prompt)
print('\n\nWhat else can I assist you with?\n')

Full example

Here is the complete code combined into a single script:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from openai import OpenAI

import requests

# Sample documents
documents = [
    "createIT is a software and web development company.",
    "Located in Poland.",
    "Established in 2002.",
    "Provides IT solutions and services.",
    "Specializes in web and mobile app development.",
    "Offers digital marketing services.",
    "Focuses on quality-oriented people.",
    "Builds long-term client relationships.",
    "Developed over 40,000 websites.",
    "Operates on five continents.",
    "Provides developer outsourcing.",
    "Works with iGaming, eCommerce, eHealth, real estate, and industry sectors.",
    "Uses technologies like PHP, Symfony, Flutter, and blockchain.",
    "Offers SEO and SEM services.",
    "Delivers custom and ready-made solutions.",
    "Implements blockchain for supply chain apps.",
    "Provides consultation and project implementation services.",
    "Experienced in app and mobile design.",
    "Provides full platform design and maintenance.",
    "Focuses on achieving stable business growth for clients."
]

# Create the TF-IDF vectorizer and transform the documents
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)

# Sample query
query = "Tell me about createIT"

# Transform the query using the same vectorizer
query_vec = vectorizer.transform([query])

# Compute cosine similarities between the query and the documents
cosine_similarities = cosine_similarity(query_vec, tfidf_matrix).flatten()

# Get the index of the most relevant document
most_relevant_doc_index = np.argmax(cosine_similarities)
most_relevant_doc = documents[most_relevant_doc_index]


# Set your OpenAI API key
key = 'your-api-key'

# Construct the prompt for gpt-4o-mini
prompt = f"Based on the following document, answer the query:\n\nDocument: {most_relevant_doc}\n\nQuery: {query}\n\nAnswer:"

# Function to handle streaming responses

def stream_openai_response(prompt):
    client = OpenAI(api_key=key)

    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
          
# Generate the response using GPT-4o-mini with streaming
print(f"Query: {query}\n")
print("Most relevant document:", most_relevant_doc)
stream_openai_response(prompt)
print('\n\nWhat else can I assist you with?\n')

Results

Query: Tell me about createIT

Most relevant document: createIT is a software and web development company.

Answer: createIT is a software and web development company that specializes in developing digital solutions.

Query: Which technologies do they work with?

Most relevant document: Uses technologies like PHP, Symfony, Flutter, and blockchain.

Answer: They work with PHP, Symfony, Flutter, and blockchain.

Query: What sectors do they create projects for?

Most relevant document: Works with iGaming, eCommerce, eHealth, real estate, and industry sectors.

Answer: They create projects for the iGaming, eCommerce, eHealth, real estate, and industry sectors.

While the examples provided may appear straightforward, the true potential and effectiveness of a RAG system largely depend on the complexity and architecture of the entire setup. The more sophisticated and well-constructed the RAG system is, the better it can handle diverse and intricate queries. This includes having an extensive, well-organized database of information to draw from and implementing advanced NLP techniques that accurately interpret user queries and context.

The future of Retrieval Augmented Generation

RAG is a technology with the potential to revolutionize many areas of business. Its ability to integrate information retrieval with text generation opens new possibilities in NLP and artificial intelligence. Future research and development can lead to even more advanced RAG systems, which will be more precise, faster, and versatile.

As RAG technology develops, we can expect even higher precision and accuracy in responses. Integration with more advanced machine learning algorithms and greater data availability will make RAG systems increasingly reliable and versatile. This, in turn, will enable better support for business processes, contributing to effective scaling and increased competitiveness in the market. If you are looking for a web development company, feel free to reach us!