Welcome to Day 29 of #30DaysOfLangChain – LangChain 0.3 Edition! We’ve covered a vast landscape, from the fundamentals of LLMs and basic chains to building sophisticated LangGraph agents, creating interactive UIs with Streamlit, exposing APIs with FastAPI, and ensuring observability with LangSmith.
Today is our capstone project: consolidating all this knowledge into a single, comprehensive GenAI application. This is where theory meets practice, and you’ll see how seamlessly these components integrate to form a powerful system.
The Comprehensive GenAI Application: A Research Assistant
For our final project, we’ll build a “Document Research Assistant” that lives in a Streamlit chat interface. This assistant will allow users to:
- Upload their own PDF documents: Providing a personal knowledge base. (Day 23)
- Ask complex questions: Interacting through a natural chat interface. (Day 22)
- Leverage RAG: Grounding answers in the uploaded documents to reduce hallucinations. (Day 23)
- Employ a Multi-step LangGraph Agent: Orchestrating the retrieval and summarization process for robust answers. (Day 28)
- Utilize LLMs (local or cloud): Configurable for either OpenAI or Ollama. (Day 21, Day 24)
- (Implicitly) Enable LangSmith Tracing: For full observability and debugging. (Day 26)
This application mimics a common real-world use case: allowing users to query their specific, private documents with intelligent AI assistance.
Architecting Our Research Assistant
Here’s how the different components will fit together:
- Frontend (Streamlit):
st.file_uploader: To accept PDF documents.st.session_state: To persist the chat history (messages) and the processed document’s vector store (vectorstore).st.chat_input&st.chat_message: For the conversational interface.- Visual feedback: Spinners and success/error messages.
- Backend Logic (LangChain & LangGraph):
- Document Processing: When a PDF is uploaded:
PyPDFLoader: Loads the PDF content.RecursiveCharacterTextSplitter: Chunks the text into smaller, manageable pieces.OpenAIEmbeddings/OllamaEmbeddings: Creates vector embeddings for the chunks.Chroma: Stores the embeddings in a local, in-memory vector store.
- LangGraph Research Agent: This will be the brain of our application. It will be a simple graph with a sequential flow:
retrieve_documentsnode: Takes the user’s question, accesses thevectorstore, and retrieves the most relevant document chunks.generate_raw_answernode: Uses the retrieved chunks and the original question to formulate a comprehensive answer using an LLM.summarize_and_refinenode: Takes the raw answer and summarizes/refines it to be concise and directly address the user’s query. This adds another layer of quality control.
RunnableWithMessageHistory: While the LangGraph agent will manage its internal state, the overall Streamlit app will useInMemoryChatMessageHistoryviaRunnableWithMessageHistoryto pass the full conversation context to the start of the LangGraph agent for each turn.
- Document Processing: When a PDF is uploaded:
Putting It All Together: The Code
Before you run the code:
- Install necessary libraries:
pip install streamlit langchain-openai langchain-ollama chromadb pypdf langchain-text-splitters langgraph python-dotenv - Ensure your
OPENAI_API_KEYis set if using OpenAI. - If using Ollama, ensure it’s running and you’ve pulled your desired chat model (e.g.,
ollama pull llama2) and an embedding model (e.g.,ollama pull nomic-embed-text).
import streamlit as st
import os
import tempfile
from typing import List, TypedDict, Annotated, Dict, Any, Union
import operator
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langgraph.graph import StateGraph, END, START
from langgraph.prebuilt import ToolExecutor # Not strictly used as agent has no tools, but good for context
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import InMemoryChatMessageHistory # For session history
# Load environment variables
from dotenv import load_dotenv
load_dotenv()
# --- Configuration for LLM and Embeddings ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower() # 'openai' or 'ollama'
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower()
OLLAMA_MODEL_EMBED = os.getenv("OLLAMA_MODEL_EMBED", "nomic-embed-text").lower()
OPENAI_MODEL_CHAT = os.getenv("OPENAI_MODEL_CHAT", "gpt-3.5-turbo")
OPENAI_MODEL_EMBED = os.getenv("OPENAI_MODEL_EMBED", "text-embedding-3-small")
# --- Initialize LLM and Embeddings ---
@st.cache_resource
def get_llm_and_embeddings():
"""Initializes and returns LLM and Embeddings based on provider."""
llm = None
embeddings = None
if LLM_PROVIDER == "openai":
if not os.getenv("OPENAI_API_KEY"):
st.error("OPENAI_API_KEY not set for OpenAI provider. Please set it in your .env file or environment variables.")
st.stop()
llm = ChatOpenAI(model=OPENAI_MODEL_CHAT, temperature=0.2)
embeddings = OpenAIEmbeddings(model=OPENAI_MODEL_EMBED)
elif LLM_PROVIDER == "ollama":
try:
llm = ChatOllama(model=OLLAMA_MODEL_CHAT, temperature=0.2)
# Test chat LLM connection
llm.invoke("test", config={"stream": False})
st.success(f"Successfully connected to Ollama chat model: {OLLAMA_MODEL_CHAT}")
except Exception as e:
st.error(f"Error connecting to Ollama chat LLM '{OLLAMA_MODEL_CHAT}': {e}")
st.info(f"Please ensure Ollama is running and you have pulled the model: `ollama pull {OLLAMA_MODEL_CHAT}`")
st.stop()
try:
embeddings = OllamaEmbeddings(model=OLLAMA_MODEL_EMBED)
embeddings.embed_query("test")
st.success(f"Successfully connected to Ollama embedding model: {OLLAMA_MODEL_EMBED}")
except Exception as e:
st.error(f"Error connecting to Ollama embedding model '{OLLAMA_MODEL_EMBED}': {e}")
st.info(f"Please ensure Ollama is running and you have pulled the embedding model: `ollama pull {OLLAMA_MODEL_EMBED}`")
st.stop()
else:
st.error(f"Invalid LLM provider: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")
st.stop()
return llm, embeddings
# Initialize LLMs and Embeddings (cached for efficiency)
rag_llm, embeddings_model = get_llm_and_embeddings()
# --- LangGraph Agent State Definition ---
class ResearchState(TypedDict):
"""
Represents the state of our Research Assistant agent.
- question: The user's current question.
- retrieved_documents: List of retrieved documents from the vector store.
- raw_answer: The initial answer generated by the LLM based on retrieved docs.
- final_answer: The summarized/refined answer presented to the user.
- chat_history: Cumulative chat history for context.
- vectorstore: The ChromaDB instance for retrieval.
"""
question: str
retrieved_documents: Annotated[List[Any], operator.add] # Using Any for Document objects
raw_answer: str
final_answer: str
chat_history: Annotated[List[BaseMessage], operator.add] # For LangGraph internal message passing
vectorstore: Chroma # The actual vectorstore instance
# --- LangGraph Agent Nodes ---
def retrieve_documents(state: ResearchState):
"""
Retrieval Agent: Retrieves relevant documents from the vector store.
"""
print("\n---AGENT: RETRIEVING DOCUMENTS---")
question = state["question"]
vectorstore = state["vectorstore"]
if not vectorstore:
raise ValueError("Vector store not initialized. Please upload a document.")
retriever = vectorstore.as_retriever()
docs = retriever.invoke(question)
# Store retrieved documents in the state
return {"retrieved_documents": docs, "chat_history": [AIMessage(content="Retrieved relevant documents.")]}
def generate_raw_answer(state: ResearchState):
"""
Generator Agent: Generates an initial answer based on retrieved documents.
"""
print("\n---AGENT: GENERATING RAW ANSWER---")
question = state["question"]
retrieved_documents = state["retrieved_documents"]
chat_history = state["chat_history"] # Pass history for better context awareness
context_content = "\n\n".join([doc.page_content for doc in retrieved_documents])
# Prompt for generating the raw answer
raw_answer_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful research assistant. Use the following context to answer the user's question. "
"If you cannot find the answer in the context, state that you don't know based on the provided information. "
"Be thorough but focus only on the provided context.\n\nContext: {context}\n\nPrevious conversation:\n{chat_history}"),
("human", "{question}")
])
chain = raw_answer_prompt | rag_llm | StrOutputParser()
# Format chat history for the prompt
formatted_chat_history = "\n".join([f"{msg.type}: {msg.content}" for msg in chat_history])
raw_ans = chain.invoke({
"context": context_content,
"question": question,
"chat_history": formatted_chat_history
})
return {"raw_answer": raw_ans, "chat_history": [AIMessage(content="Generated initial answer.")]}
def summarize_and_refine(state: ResearchState):
"""
Summarizer/Refiner Agent: Summarizes and refines the raw answer.
"""
print("\n---AGENT: SUMMARIZING AND REFINING ANSWER---")
question = state["question"]
raw_answer = state["raw_answer"]
chat_history = state["chat_history"] # Pass history for context
# Prompt for summarizing/refining the answer
refine_prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert editor. Review the following answer and the original question. "
"Refine the answer to be concise, clear, and directly address the question. "
"Remove any redundant information or conversational filler. "
"If the raw answer indicates it doesn't know, simply state that. "
"Question: {question}\n\nRaw Answer: {raw_answer}\n\nRefined Answer:"),
("human", "Refine the raw answer given the original question and previous conversation.")
])
chain = refine_prompt | rag_llm | StrOutputParser()
final_ans = chain.invoke({
"question": question,
"raw_answer": raw_answer,
"chat_history": "\n".join([f"{msg.type}: {msg.content}" for msg in chat_history])
})
return {"final_answer": final_ans, "chat_history": [AIMessage(content="Refined final answer.")]}
# --- Build the LangGraph Workflow ---
workflow = StateGraph(ResearchState)
# Add nodes
workflow.add_node("retrieve", retrieve_documents)
workflow.add_node("generate", generate_raw_answer)
workflow.add_node("refine", summarize_and_refine)
# Define the graph flow
workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", "refine")
workflow.add_edge("refine", END)
# Compile the graph
research_agent_app = workflow.compile()
# --- Streamlit App Setup ---
st.set_page_config(page_title="Document Research Assistant", page_icon="📝")
st.title("📝 Document Research Assistant")
st.markdown(f"*(LLM: {LLM_PROVIDER.capitalize()} {OPENAI_MODEL_CHAT if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_CHAT}, Embeddings: {OPENAI_MODEL_EMBED if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_EMBED})*")
st.markdown("---")
# --- Initialize session state ---
if "messages" not in st.session_state:
st.session_state.messages = [] # Stores {"role": ..., "content": ..., "sources": ...}
if "vectorstore" not in st.session_state:
st.session_state.vectorstore = None
# --- In-memory store for chat histories (for RunnableWithMessageHistory) ---
# This is a simple in-memory store for different user sessions.
# In production, use a persistent store like Redis or a database.
session_history_store: Dict[str, InMemoryChatMessageHistory] = {}
def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
"""Returns a new BaseChatMessageHistory instance for a given session ID."""
if session_id not in session_history_store:
session_history_store[session_id] = InMemoryChatMessageHistory()
return session_history_store[session_id]
# Wrap the LangGraph agent with RunnableWithMessageHistory for overall chat history management
# The research_agent_app expects `messages` as input (a list of BaseMessage)
# So, we need to adapt the input from RunnableWithMessageHistory
# RWMH provides `input` (the current user message string) and `history` (list of BaseMessage)
# Adapter to transform RWMH input to LangGraph ResearchState input format
def _prepare_agent_input(input_dict: dict) -> ResearchState:
current_question = input_dict["input"]
# RWMH gives us history as BaseMessage objects, which is what LangGraph expects for chat_history
history_for_agent = input_dict.get("history", [])
return {
"question": current_question,
"retrieved_documents": [], # Will be populated by 'retrieve' node
"raw_answer": "", # Will be populated by 'generate' node
"final_answer": "", # Will be populated by 'refine' node
"chat_history": history_for_agent + [HumanMessage(content=current_question)], # Full conversation for agent's context
"vectorstore": st.session_state.vectorstore # Pass the Streamlit session's vectorstore
}
# Create a runnable that formats input for the agent and then invokes the agent
agent_runner_chain = _prepare_agent_input | research_agent_app
# The final chain that will be invoked by the Streamlit app, handling session history
final_conversational_chain = RunnableWithMessageHistory(
agent_runner_chain, # Our LangGraph agent wrapped in an input adapter
get_session_history,
input_messages_key="input", # Key for the current user message string
history_messages_key="history", # Key where history (list of BaseMessage) will be injected by RWMH
)
# --- Document Upload and Processing ---
uploaded_file = st.sidebar.file_uploader(
"Upload a PDF document to chat with",
type="pdf",
accept_multiple_files=False,
key="pdf_uploader"
)
# If a file is uploaded and no vectorstore exists in session state
if uploaded_file and st.session_state.vectorstore is None:
with st.spinner("Processing document... This may take a moment."):
try:
# 1. Save uploaded file to a temporary file
with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
tmp_file.write(uploaded_file.getvalue())
tmp_file_path = tmp_file.name
# 2. Load the document
loader = PyPDFLoader(tmp_file_path)
docs = loader.load()
if not docs:
st.warning("Could not extract text from the PDF. Please try another file.")
os.unlink(tmp_file_path) # Clean up temp file
st.stop()
# 3. Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
splits = text_splitter.split_documents(docs)
# 4. Create embeddings and store in Chroma
st.session_state.vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings_model
)
st.sidebar.success(f"Document '{uploaded_file.name}' processed and ready for questions!")
# Clean up temporary file after processing
os.unlink(tmp_file_path)
except Exception as e:
st.sidebar.error(f"Error processing document: {e}")
st.session_state.vectorstore = None # Reset vectorstore on error
if 'tmp_file_path' in locals() and os.path.exists(tmp_file_path):
os.unlink(tmp_file_path)
# --- Display chat messages from history ---
# Initialize session_id for this Streamlit session (a simple unique ID)
# In a real multi-user app, this would come from user login or a more robust mechanism.
if "session_id" not in st.session_state:
import uuid
st.session_state.session_id = str(uuid.uuid4())
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if message["sources"]:
with st.expander("Sources"):
for source in message["sources"]:
st.text(source)
# --- Handle user input ---
if prompt := st.chat_input("Ask a question about the document..."):
if st.session_state.vectorstore is None:
st.warning("Please upload a PDF document first to enable the Research Assistant.")
else:
# Add user message to chat history and display it
st.session_state.messages.append({"role": "user", "content": prompt, "sources": []})
with st.chat_message("user"):
st.markdown(prompt)
with st.chat_message("assistant"):
with st.spinner("Researching and generating answer..."):
full_response = ""
response_container = st.empty()
# Invoke the LangGraph agent chain with streaming
# The `final_conversational_chain` will handle the session history and pass it to the agent
# The agent itself will process the steps and return the final answer in its state.
# We need to stream events to show intermediate steps, or just stream the final answer.
# For simplicity here, we'll stream the final answer's text.
# For more granular streaming of agent's internal steps (like Day 25),
# you'd expose the agent's `astream_events` via an API and consume that.
# For this combined app, we'll get the final output of the agent.
# Invoke the chain, passing the session_id to RWMH via config
# RWMH invokes `agent_runner_chain` with `{"input": prompt, "history": current_session_history}`
# `_prepare_agent_input` then maps this to `ResearchState` for `research_agent_app`
# We can't directly stream the RWMH wrapping the LangGraph agent if the agent's output
# isn't itself a stream of chunks. The agent here returns a final state.
# So we invoke it once and then stream the *final_answer* from the state.
# To get streaming from the LLM inside the agent, we'd need to modify
# `generate_raw_answer` and `summarize_and_refine` to use `.stream()`
# and pass chunks through the graph, which adds complexity to the state.
# For this comprehensive demo, a single final answer from the agent is sufficient.
# Let's re-think streaming here. The RWMH does not directly stream the *intermediate* steps
# of the wrapped runnable unless the runnable itself yields chunks.
# Our LangGraph agent returns a full state at the END.
# So, we will get the final state, and then *display* the final answer.
# If we truly want step-by-step streaming, we would need to invoke the agent_app.astream_events
# and then map those events to displayed chunks, similar to the FastAPI streaming example.
# For a Streamlit app with RAG, a simpler approach is often to show "thinking..."
# and then the full final answer.
# Let's simplify to a single `.invoke` and display the final answer,
# as `st.chat_message` is not designed for live, multi-part updates within a single message easily
# unless you manage a `st.empty()` placeholder as in Day 22.
# For the agent, let's keep it simple and show the final answer.
final_state = final_conversational_chain.invoke(
{"input": prompt},
config={"configurable": {"session_id": st.session_state.session_id}}
)
generated_answer = final_state['final_answer']
# Display the final answer
response_container.markdown(generated_answer)
# Collect sources from the final_state if available
# The 'retrieve_documents' node populates `retrieved_documents`
# Let's ensure our `ResearchState` allows for this.
sources_for_display = []
if final_state and final_state.get("retrieved_documents"):
for i, doc in enumerate(final_state["retrieved_documents"]):
# Assuming documents have page_content and metadata
page_info = doc.metadata.get('page', 'N/A')
source_text = f"Page {page_info}: {doc.page_content[:300]}..."
sources_for_display.append(source_text)
st.session_state.messages.append({
"role": "assistant",
"content": generated_answer,
"sources": sources_for_display
})
if sources_for_display:
with st.expander("Sources"):
for source in sources_for_display:
st.text(source)
# --- How to run this app ---
st.sidebar.markdown("---")
st.sidebar.markdown("### How to run")
st.sidebar.markdown("1. Save this code as `day29-research-assistant.py`")
st.sidebar.markdown("2. Open your terminal in the same directory.")
st.sidebar.markdown("3. Run the command: `streamlit run day29-research-assistant.py`")
st.sidebar.markdown("4. Your browser will open with the Research Assistant application.")
st.sidebar.markdown("---")
st.sidebar.markdown("#### Dependencies")
st.sidebar.markdown("`pip install streamlit langchain-openai langchain-ollama chromadb pypdf langchain-text-splitters langgraph python-dotenv`")
st.sidebar.markdown("---")
st.sidebar.markdown("#### LLM Configuration")
st.sidebar.markdown(f"**Provider:** `{LLM_PROVIDER.capitalize()}`")
if LLM_PROVIDER == 'openai':
st.sidebar.markdown(f"**Chat Model:** `{OPENAI_MODEL_CHAT}`")
st.sidebar.markdown(f"**Embed Model:** `{OPENAI_MODEL_EMBED}`")
else:
st.sidebar.markdown(f"**Chat Model:** `{OLLAMA_MODEL_CHAT}`")
st.sidebar.markdown(f"**Embed Model:** `{OLLAMA_MODEL_EMBED}`")
st.sidebar.markdown("*Set `LLM_PROVIDER`, model names, and API keys (`OPENAI_API_KEY`, `LANGCHAIN_API_KEY`, `LANGCHAIN_PROJECT`) in your `.env` file.*")
Code Explanation & Key Takeaways:
- Unified Configuration: All LLM and embedding model choices (
OPENAIvsOLLAMA, specific model names) are centrally managed via environment variables, ensuring flexibility. ResearchState(LangGraph State):- Defines the comprehensive state passed through the LangGraph agent. It includes the
question, lists to holdretrieved_documents,raw_answer,final_answer,chat_history, and importantly, thevectorstoreinstance. Annotated[List[Any], operator.add]is used for lists (retrieved_documents,chat_history) to ensure new elements are appended to the existing list in the state.
- Defines the comprehensive state passed through the LangGraph agent. It includes the
- LangGraph Agent Nodes (
retrieve_documents,generate_raw_answer,summarize_and_refine):- Each function represents a distinct step in our research process.
retrieve_documents: Accesses thevectorstorefrom the state, performs retrieval, and updates theretrieved_documentsin the state.generate_raw_answer: Takes thequestionandretrieved_documentsto craft an initial answer. It also incorporateschat_historyfrom the state for conversational context within the prompt.summarize_and_refine: Takes theraw_answerandquestion(andchat_history) to produce a concise, polishedfinal_answer. This mimics a self-correction or quality assurance step.
- LangGraph Workflow (
StateGraph):- A simple sequential graph:
START->retrieve->generate->refine->END. This represents a clear, defined flow for answering a query from documents.
- A simple sequential graph:
- Streamlit Integration with RAG and LangGraph:
- Document Upload:
st.sidebar.file_uploaderfor PDFs. The uploaded file is saved to a temporary location, loaded byPyPDFLoader, chunked byRecursiveCharacterTextSplitter, embedded by our chosen embedding model, and stored inChromaDB. st.session_state.vectorstore: TheChromainstance (containing our document’s embeddings) is stored here. This is crucial as Streamlit reruns the script on every interaction, and we don’t want to re-process the PDF each time.RunnableWithMessageHistory: This is the top-level LangChain construct managing the entire conversation’s history.session_history_store: A simple dictionary acts as an in-memory store for differentsession_ids, allowing multiple users or separate conversations within the same app instance._prepare_agent_input: This is a critical adapter function.RunnableWithMessageHistorypassesinput(the current user query string) andhistory(the list ofBaseMessageobjects for the conversation so far). Our LangGraphResearchStateexpects a different structure (e.g.,question,chat_historyas part of its internal state, plusvectorstore). This adapter maps from the RWMH input format to theResearchStateformat that theresearch_agent_appexpects.agent_runner_chain: Combines the input adapter with the compiled LangGraph agent.final_conversational_chain: The outermost runnable thatst.chat_inputfinally invokes, managing the session history.
- Chat Display: Messages are retrieved from
st.session_state.messagesand displayed usingst.chat_message. Critically,sources(retrieved document snippets) are extracted from thefinal_stateof the LangGraph agent and displayed under anst.expanderfor transparency. - User Experience:
st.spinnerprovides feedback during document processing and agent thinking.
- Document Upload:
This comprehensive application demonstrates a powerful integration of a Streamlit frontend, a LangGraph multi-step agent, a RAG pipeline for document querying, and robust session management. It’s a testament to how LangChain empowers developers to build sophisticated GenAI solutions.
Key Takeaway
Day 29 was the culmination of our journey: building a comprehensive GenAI Research Assistant! This full-stack application integrates Streamlit for an intuitive chat UI, a multi-step LangGraph agent for orchestrated intelligence, and RAG with file uploads for grounded answers from user documents. We leveraged st.session_state for persistence and connected all the dots from previous days. This project showcases the immense power of combining LangChain’s modular components to solve complex, real-world problems.

Leave a comment