Make a Question and Answer Bot with Google Gemini and LangChain


Post by Michael Rosario of InnovativeTeams.NET

This blog post will guide you through building a Question Answering (QA) system leveraging the power of Google Gemini large language model(LLM) and LangChain, a powerful open-source library for building LLM applications. This system will allow you to ask questions and receive answers directly related to your codebase.

Google Gemini is a large language model (LLM) capable of understanding and responding to complex queries. Unlike OpenAI’s GPT-4 model, Gemini boasts a larger window size for processing text, allowing it to analyze bigger chunks of information at once. This can be particularly beneficial for tasks like code review where context is crucial.

LangChain is an open-source framework that simplifies building applications powered by large language models. It streamlines the process of integrating LLMs with your data and allows you to design conversational interfaces for tasks like question answering and code analysis. By leveraging Chroma, a vector database, LangChain facilitates efficient retrieval of relevant information from large datasets.

Start this project by setting up your Python virtual environment. Create a requirements.txt with the following packages.

langchain-openai
langchain-chroma
langchain
langchain-google-genai

Install the packages:

pip install -r requirements.txt

We need to make sure to set environment variable to enable the Google Gemini API.

export GOOGLE_API_KEY="myKeyGoesHere"

You can get an API key from https://aistudio.google.com/app/apikey .

Create a python file named main.py.

Let’s start with by importing libraries

from langchain_chroma import Chroma
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import LanguageParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_text_splitters import Language
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains import LLMChain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate

The process needs to start by defining a path to the code repository we want to review.

repo_path = "definePathToYourCode"

Using the GenericLoader class, we load the files for analysis into memory. In my test case, I used a C# code base.

loader = GenericLoader.from_filesystem(
repo_path,
glob="**/*",
suffixes=[".cs"],
exclude=[],
parser=LanguageParser(language=Language.CSHARP, parser_threshold=500),
)
documents = loader.load()

Large language models have a defined limit on the amount of text they can observe at one time. In the GPT4 version of this process, OpenAI can review 2000 chunks of text. Knowing that Google Gemini has a larger window size for text, you can specify a larger chunk size for splitting up your documents. In the following code, we split the documents into smaller windows of content.

python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.CSHARP, chunk_size=2000, chunk_overlap=200
)

texts = python_splitter.split_documents(documents)

Deep neural networks perceive their input data as a large sequence of floating point numbers or vectors. In the industry, we tend to call the numerical vector representation of information an embedding.

The system needs an embedding model to convert incoming text chunks into embeddings. In the next line, we setup the embedding model for Google Generative AI.

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

At this point, we need to setup a vector database to store the text chunks and their related embeddings. The vector database will also enable the system to query the text chunks during the course of conversation. When a question comes into the LLM, the system converts the input into an embedding that will be used to find all text chunks closely related.

Chroma is an open-source vector database making it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. The framework feels very concise.

db = Chroma.from_documents(texts, embeddings)
retriever = db.as_retriever(
search_type="mmr", # Also test "similarity"
search_kwargs={"k": 8},
)

At this point, we have a vector database loaded with text chunks and embeddings.

For the curious, I’ve also built this same experiment using a PGVector enabled Postgres instance. Check out the LangChain documentation for more details on this pattern.

Ok. Let’s setup the Gemini model.

gllm = ChatGoogleGenerativeAI(model="gemini-pro")

We will setup a prompt template to help focus the chat conversation. You can refine this template to your context.

template = """
You are a helpful AI assistant.
Answer based on the context provided.
context: {context}
input: {input}
answer:
"""
prompt = PromptTemplate.from_template(template)

In the following code, we connect the Gemini model into the LangChain

retriever_chain = create_history_aware_retriever(gllm, retriever, prompt)
document_chain = create_stuff_documents_chain(gllm, prompt)
qa = create_retrieval_chain(retriever_chain, document_chain)

Time to test drive the question and answer experience with our Gemini model. Using this retrieval augmented generation pattern, you should be able to ask Google Gemini questions about your code base. The LangChain documentation provides similar examples if you simply want to setup a RAG system for other types of content besides code. The LangChain community has included a robust framework for augmenting LLM technology for enterprise and commercial usage.

while  True:
	query = input("Enter query: ")
	response = qa.invoke({"input": query})
	print(str(response["answer"]))

Reference