#30DaysOfLangChain – Day 29: Final Project Day: Build a Comprehensive GenAI Application

Welcome to Day 29 of #30DaysOfLangChain – LangChain 0.3 Edition! We’ve covered a vast landscape, from the fundamentals of LLMs and basic chains to building sophisticated LangGraph agents, creating interactive UIs with Streamlit, exposing APIs with FastAPI, and ensuring observability with LangSmith.

Today is our capstone project: consolidating all this knowledge into a single, comprehensive GenAI application. This is where theory meets practice, and you’ll see how seamlessly these components integrate to form a powerful system.

The Comprehensive GenAI Application: A Research Assistant

For our final project, we’ll build a “Document Research Assistant” that lives in a Streamlit chat interface. This assistant will allow users to:

Upload their own PDF documents: Providing a personal knowledge base. (Day 23)
Ask complex questions: Interacting through a natural chat interface. (Day 22)
Leverage RAG: Grounding answers in the uploaded documents to reduce hallucinations. (Day 23)
Employ a Multi-step LangGraph Agent: Orchestrating the retrieval and summarization process for robust answers. (Day 28)
Utilize LLMs (local or cloud): Configurable for either OpenAI or Ollama. (Day 21, Day 24)
(Implicitly) Enable LangSmith Tracing: For full observability and debugging. (Day 26)

This application mimics a common real-world use case: allowing users to query their specific, private documents with intelligent AI assistance.

Architecting Our Research Assistant

Here’s how the different components will fit together:

Frontend (Streamlit):
- st.file_uploader: To accept PDF documents.
- st.session_state: To persist the chat history (messages) and the processed document’s vector store (vectorstore).
- st.chat_input & st.chat_message: For the conversational interface.
- Visual feedback: Spinners and success/error messages.
Backend Logic (LangChain & LangGraph):
- Document Processing: When a PDF is uploaded:
  - PyPDFLoader: Loads the PDF content.
  - RecursiveCharacterTextSplitter: Chunks the text into smaller, manageable pieces.
  - OpenAIEmbeddings / OllamaEmbeddings: Creates vector embeddings for the chunks.
  - Chroma: Stores the embeddings in a local, in-memory vector store.
- LangGraph Research Agent: This will be the brain of our application. It will be a simple graph with a sequential flow:
  - retrieve_documents node: Takes the user’s question, accesses the vectorstore, and retrieves the most relevant document chunks.
  - generate_raw_answer node: Uses the retrieved chunks and the original question to formulate a comprehensive answer using an LLM.
  - summarize_and_refine node: Takes the raw answer and summarizes/refines it to be concise and directly address the user’s query. This adds another layer of quality control.
- RunnableWithMessageHistory: While the LangGraph agent will manage its internal state, the overall Streamlit app will use InMemoryChatMessageHistory via RunnableWithMessageHistory to pass the full conversation context to the start of the LangGraph agent for each turn.

Putting It All Together: The Code

Before you run the code:

Install necessary libraries: pip install streamlit langchain-openai langchain-ollama chromadb pypdf langchain-text-splitters langgraph python-dotenv
Ensure your OPENAI_API_KEY is set if using OpenAI.
If using Ollama, ensure it’s running and you’ve pulled your desired chat model (e.g., ollama pull llama2) and an embedding model (e.g., ollama pull nomic-embed-text).

import streamlit as st
import os
import tempfile
from typing import List, TypedDict, Annotated, Dict, Any, Union
import operator

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langgraph.graph import StateGraph, END, START
from langgraph.prebuilt import ToolExecutor # Not strictly used as agent has no tools, but good for context
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import InMemoryChatMessageHistory # For session history

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# --- Configuration for LLM and Embeddings ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower() # 'openai' or 'ollama'
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower()
OLLAMA_MODEL_EMBED = os.getenv("OLLAMA_MODEL_EMBED", "nomic-embed-text").lower()
OPENAI_MODEL_CHAT = os.getenv("OPENAI_MODEL_CHAT", "gpt-3.5-turbo")
OPENAI_MODEL_EMBED = os.getenv("OPENAI_MODEL_EMBED", "text-embedding-3-small")

# --- Initialize LLM and Embeddings ---
@st.cache_resource
def get_llm_and_embeddings():
    """Initializes and returns LLM and Embeddings based on provider."""
    llm = None
    embeddings = None

    if LLM_PROVIDER == "openai":
        if not os.getenv("OPENAI_API_KEY"):
            st.error("OPENAI_API_KEY not set for OpenAI provider. Please set it in your .env file or environment variables.")
            st.stop()
        llm = ChatOpenAI(model=OPENAI_MODEL_CHAT, temperature=0.2)
        embeddings = OpenAIEmbeddings(model=OPENAI_MODEL_EMBED)
    elif LLM_PROVIDER == "ollama":
        try:
            llm = ChatOllama(model=OLLAMA_MODEL_CHAT, temperature=0.2)
            # Test chat LLM connection
            llm.invoke("test", config={"stream": False})
            st.success(f"Successfully connected to Ollama chat model: {OLLAMA_MODEL_CHAT}")
        except Exception as e:
            st.error(f"Error connecting to Ollama chat LLM '{OLLAMA_MODEL_CHAT}': {e}")
            st.info(f"Please ensure Ollama is running and you have pulled the model: `ollama pull {OLLAMA_MODEL_CHAT}`")
            st.stop()
        
        try:
            embeddings = OllamaEmbeddings(model=OLLAMA_MODEL_EMBED)
            embeddings.embed_query("test")
            st.success(f"Successfully connected to Ollama embedding model: {OLLAMA_MODEL_EMBED}")
        except Exception as e:
            st.error(f"Error connecting to Ollama embedding model '{OLLAMA_MODEL_EMBED}': {e}")
            st.info(f"Please ensure Ollama is running and you have pulled the embedding model: `ollama pull {OLLAMA_MODEL_EMBED}`")
            st.stop()
    else:
        st.error(f"Invalid LLM provider: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")
        st.stop()
    
    return llm, embeddings

# Initialize LLMs and Embeddings (cached for efficiency)
rag_llm, embeddings_model = get_llm_and_embeddings()

# --- LangGraph Agent State Definition ---
class ResearchState(TypedDict):
    """
    Represents the state of our Research Assistant agent.
    - question: The user's current question.
    - retrieved_documents: List of retrieved documents from the vector store.
    - raw_answer: The initial answer generated by the LLM based on retrieved docs.
    - final_answer: The summarized/refined answer presented to the user.
    - chat_history: Cumulative chat history for context.
    - vectorstore: The ChromaDB instance for retrieval.
    """
    question: str
    retrieved_documents: Annotated[List[Any], operator.add] # Using Any for Document objects
    raw_answer: str
    final_answer: str
    chat_history: Annotated[List[BaseMessage], operator.add] # For LangGraph internal message passing
    vectorstore: Chroma # The actual vectorstore instance

# --- LangGraph Agent Nodes ---

def retrieve_documents(state: ResearchState):
    """
    Retrieval Agent: Retrieves relevant documents from the vector store.
    """
    print("\n---AGENT: RETRIEVING DOCUMENTS---")
    question = state["question"]
    vectorstore = state["vectorstore"]
    
    if not vectorstore:
        raise ValueError("Vector store not initialized. Please upload a document.")

    retriever = vectorstore.as_retriever()
    docs = retriever.invoke(question)
    
    # Store retrieved documents in the state
    return {"retrieved_documents": docs, "chat_history": [AIMessage(content="Retrieved relevant documents.")]}

def generate_raw_answer(state: ResearchState):
    """
    Generator Agent: Generates an initial answer based on retrieved documents.
    """
    print("\n---AGENT: GENERATING RAW ANSWER---")
    question = state["question"]
    retrieved_documents = state["retrieved_documents"]
    chat_history = state["chat_history"] # Pass history for better context awareness

    context_content = "\n\n".join([doc.page_content for doc in retrieved_documents])

    # Prompt for generating the raw answer
    raw_answer_prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful research assistant. Use the following context to answer the user's question. "
                   "If you cannot find the answer in the context, state that you don't know based on the provided information. "
                   "Be thorough but focus only on the provided context.\n\nContext: {context}\n\nPrevious conversation:\n{chat_history}"),
        ("human", "{question}")
    ])
    
    chain = raw_answer_prompt | rag_llm | StrOutputParser()
    
    # Format chat history for the prompt
    formatted_chat_history = "\n".join([f"{msg.type}: {msg.content}" for msg in chat_history])

    raw_ans = chain.invoke({
        "context": context_content,
        "question": question,
        "chat_history": formatted_chat_history
    })
    
    return {"raw_answer": raw_ans, "chat_history": [AIMessage(content="Generated initial answer.")]}

def summarize_and_refine(state: ResearchState):
    """
    Summarizer/Refiner Agent: Summarizes and refines the raw answer.
    """
    print("\n---AGENT: SUMMARIZING AND REFINING ANSWER---")
    question = state["question"]
    raw_answer = state["raw_answer"]
    chat_history = state["chat_history"] # Pass history for context

    # Prompt for summarizing/refining the answer
    refine_prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert editor. Review the following answer and the original question. "
                   "Refine the answer to be concise, clear, and directly address the question. "
                   "Remove any redundant information or conversational filler. "
                   "If the raw answer indicates it doesn't know, simply state that. "
                   "Question: {question}\n\nRaw Answer: {raw_answer}\n\nRefined Answer:"),
        ("human", "Refine the raw answer given the original question and previous conversation.")
    ])
    
    chain = refine_prompt | rag_llm | StrOutputParser()
    
    final_ans = chain.invoke({
        "question": question,
        "raw_answer": raw_answer,
        "chat_history": "\n".join([f"{msg.type}: {msg.content}" for msg in chat_history])
    })
    
    return {"final_answer": final_ans, "chat_history": [AIMessage(content="Refined final answer.")]}


# --- Build the LangGraph Workflow ---
workflow = StateGraph(ResearchState)

# Add nodes
workflow.add_node("retrieve", retrieve_documents)
workflow.add_node("generate", generate_raw_answer)
workflow.add_node("refine", summarize_and_refine)

# Define the graph flow
workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", "refine")
workflow.add_edge("refine", END)

# Compile the graph
research_agent_app = workflow.compile()

# --- Streamlit App Setup ---
st.set_page_config(page_title="Document Research Assistant", page_icon="📝")
st.title("📝 Document Research Assistant")
st.markdown(f"*(LLM: {LLM_PROVIDER.capitalize()} {OPENAI_MODEL_CHAT if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_CHAT}, Embeddings: {OPENAI_MODEL_EMBED if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_EMBED})*")
st.markdown("---")

# --- Initialize session state ---
if "messages" not in st.session_state:
    st.session_state.messages = [] # Stores {"role": ..., "content": ..., "sources": ...}
if "vectorstore" not in st.session_state:
    st.session_state.vectorstore = None

# --- In-memory store for chat histories (for RunnableWithMessageHistory) ---
# This is a simple in-memory store for different user sessions.
# In production, use a persistent store like Redis or a database.
session_history_store: Dict[str, InMemoryChatMessageHistory] = {}

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    """Returns a new BaseChatMessageHistory instance for a given session ID."""
    if session_id not in session_history_store:
        session_history_store[session_id] = InMemoryChatMessageHistory()
    return session_history_store[session_id]

# Wrap the LangGraph agent with RunnableWithMessageHistory for overall chat history management
# The research_agent_app expects `messages` as input (a list of BaseMessage)
# So, we need to adapt the input from RunnableWithMessageHistory
# RWMH provides `input` (the current user message string) and `history` (list of BaseMessage)

# Adapter to transform RWMH input to LangGraph ResearchState input format
def _prepare_agent_input(input_dict: dict) -> ResearchState:
    current_question = input_dict["input"]
    # RWMH gives us history as BaseMessage objects, which is what LangGraph expects for chat_history
    history_for_agent = input_dict.get("history", []) 
    
    return {
        "question": current_question,
        "retrieved_documents": [], # Will be populated by 'retrieve' node
        "raw_answer": "", # Will be populated by 'generate' node
        "final_answer": "", # Will be populated by 'refine' node
        "chat_history": history_for_agent + [HumanMessage(content=current_question)], # Full conversation for agent's context
        "vectorstore": st.session_state.vectorstore # Pass the Streamlit session's vectorstore
    }

# Create a runnable that formats input for the agent and then invokes the agent
agent_runner_chain = _prepare_agent_input | research_agent_app

# The final chain that will be invoked by the Streamlit app, handling session history
final_conversational_chain = RunnableWithMessageHistory(
    agent_runner_chain, # Our LangGraph agent wrapped in an input adapter
    get_session_history,
    input_messages_key="input", # Key for the current user message string
    history_messages_key="history", # Key where history (list of BaseMessage) will be injected by RWMH
)

# --- Document Upload and Processing ---
uploaded_file = st.sidebar.file_uploader(
    "Upload a PDF document to chat with",
    type="pdf",
    accept_multiple_files=False,
    key="pdf_uploader"
)

# If a file is uploaded and no vectorstore exists in session state
if uploaded_file and st.session_state.vectorstore is None:
    with st.spinner("Processing document... This may take a moment."):
        try:
            # 1. Save uploaded file to a temporary file
            with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
                tmp_file.write(uploaded_file.getvalue())
                tmp_file_path = tmp_file.name
            
            # 2. Load the document
            loader = PyPDFLoader(tmp_file_path)
            docs = loader.load()
            
            if not docs:
                st.warning("Could not extract text from the PDF. Please try another file.")
                os.unlink(tmp_file_path) # Clean up temp file
                st.stop()

            # 3. Split documents into chunks
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
            splits = text_splitter.split_documents(docs)

            # 4. Create embeddings and store in Chroma
            st.session_state.vectorstore = Chroma.from_documents(
                documents=splits,
                embedding=embeddings_model
            )
            st.sidebar.success(f"Document '{uploaded_file.name}' processed and ready for questions!")
            # Clean up temporary file after processing
            os.unlink(tmp_file_path)
        except Exception as e:
            st.sidebar.error(f"Error processing document: {e}")
            st.session_state.vectorstore = None # Reset vectorstore on error
            if 'tmp_file_path' in locals() and os.path.exists(tmp_file_path):
                os.unlink(tmp_file_path)

# --- Display chat messages from history ---
# Initialize session_id for this Streamlit session (a simple unique ID)
# In a real multi-user app, this would come from user login or a more robust mechanism.
if "session_id" not in st.session_state:
    import uuid
    st.session_state.session_id = str(uuid.uuid4())

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
        if message["sources"]:
            with st.expander("Sources"):
                for source in message["sources"]:
                    st.text(source)

# --- Handle user input ---
if prompt := st.chat_input("Ask a question about the document..."):
    if st.session_state.vectorstore is None:
        st.warning("Please upload a PDF document first to enable the Research Assistant.")
    else:
        # Add user message to chat history and display it
        st.session_state.messages.append({"role": "user", "content": prompt, "sources": []})
        with st.chat_message("user"):
            st.markdown(prompt)

        with st.chat_message("assistant"):
            with st.spinner("Researching and generating answer..."):
                full_response = ""
                response_container = st.empty()
                
                # Invoke the LangGraph agent chain with streaming
                # The `final_conversational_chain` will handle the session history and pass it to the agent
                # The agent itself will process the steps and return the final answer in its state.
                # We need to stream events to show intermediate steps, or just stream the final answer.
                # For simplicity here, we'll stream the final answer's text.
                
                # For more granular streaming of agent's internal steps (like Day 25),
                # you'd expose the agent's `astream_events` via an API and consume that.
                # For this combined app, we'll get the final output of the agent.

                # Invoke the chain, passing the session_id to RWMH via config
                # RWMH invokes `agent_runner_chain` with `{"input": prompt, "history": current_session_history}`
                # `_prepare_agent_input` then maps this to `ResearchState` for `research_agent_app`
                
                # We can't directly stream the RWMH wrapping the LangGraph agent if the agent's output
                # isn't itself a stream of chunks. The agent here returns a final state.
                # So we invoke it once and then stream the *final_answer* from the state.
                
                # To get streaming from the LLM inside the agent, we'd need to modify
                # `generate_raw_answer` and `summarize_and_refine` to use `.stream()`
                # and pass chunks through the graph, which adds complexity to the state.
                # For this comprehensive demo, a single final answer from the agent is sufficient.

                # Let's re-think streaming here. The RWMH does not directly stream the *intermediate* steps
                # of the wrapped runnable unless the runnable itself yields chunks.
                # Our LangGraph agent returns a full state at the END.
                # So, we will get the final state, and then *display* the final answer.
                
                # If we truly want step-by-step streaming, we would need to invoke the agent_app.astream_events
                # and then map those events to displayed chunks, similar to the FastAPI streaming example.
                # For a Streamlit app with RAG, a simpler approach is often to show "thinking..."
                # and then the full final answer.
                
                # Let's simplify to a single `.invoke` and display the final answer,
                # as `st.chat_message` is not designed for live, multi-part updates within a single message easily
                # unless you manage a `st.empty()` placeholder as in Day 22.
                # For the agent, let's keep it simple and show the final answer.

                final_state = final_conversational_chain.invoke(
                    {"input": prompt},
                    config={"configurable": {"session_id": st.session_state.session_id}}
                )
                
                generated_answer = final_state['final_answer']
                
                # Display the final answer
                response_container.markdown(generated_answer)

                # Collect sources from the final_state if available
                # The 'retrieve_documents' node populates `retrieved_documents`
                # Let's ensure our `ResearchState` allows for this.
                sources_for_display = []
                if final_state and final_state.get("retrieved_documents"):
                    for i, doc in enumerate(final_state["retrieved_documents"]):
                        # Assuming documents have page_content and metadata
                        page_info = doc.metadata.get('page', 'N/A')
                        source_text = f"Page {page_info}: {doc.page_content[:300]}..."
                        sources_for_display.append(source_text)

                st.session_state.messages.append({
                    "role": "assistant",
                    "content": generated_answer,
                    "sources": sources_for_display
                })
                
                if sources_for_display:
                    with st.expander("Sources"):
                        for source in sources_for_display:
                            st.text(source)


# --- How to run this app ---
st.sidebar.markdown("---")
st.sidebar.markdown("### How to run")
st.sidebar.markdown("1. Save this code as `day29-research-assistant.py`")
st.sidebar.markdown("2. Open your terminal in the same directory.")
st.sidebar.markdown("3. Run the command: `streamlit run day29-research-assistant.py`")
st.sidebar.markdown("4. Your browser will open with the Research Assistant application.")
st.sidebar.markdown("---")
st.sidebar.markdown("#### Dependencies")
st.sidebar.markdown("`pip install streamlit langchain-openai langchain-ollama chromadb pypdf langchain-text-splitters langgraph python-dotenv`")
st.sidebar.markdown("---")
st.sidebar.markdown("#### LLM Configuration")
st.sidebar.markdown(f"**Provider:** `{LLM_PROVIDER.capitalize()}`")
if LLM_PROVIDER == 'openai':
    st.sidebar.markdown(f"**Chat Model:** `{OPENAI_MODEL_CHAT}`")
    st.sidebar.markdown(f"**Embed Model:** `{OPENAI_MODEL_EMBED}`")
else:
    st.sidebar.markdown(f"**Chat Model:** `{OLLAMA_MODEL_CHAT}`")
    st.sidebar.markdown(f"**Embed Model:** `{OLLAMA_MODEL_EMBED}`")
st.sidebar.markdown("*Set `LLM_PROVIDER`, model names, and API keys (`OPENAI_API_KEY`, `LANGCHAIN_API_KEY`, `LANGCHAIN_PROJECT`) in your `.env` file.*")

Code Explanation & Key Takeaways:

Unified Configuration: All LLM and embedding model choices (OPENAI vs OLLAMA, specific model names) are centrally managed via environment variables, ensuring flexibility.
ResearchState (LangGraph State):
- Defines the comprehensive state passed through the LangGraph agent. It includes the question, lists to hold retrieved_documents, raw_answer, final_answer, chat_history, and importantly, the vectorstore instance.
- Annotated[List[Any], operator.add] is used for lists (retrieved_documents, chat_history) to ensure new elements are appended to the existing list in the state.
LangGraph Agent Nodes (retrieve_documents, generate_raw_answer, summarize_and_refine):
- Each function represents a distinct step in our research process.
- retrieve_documents: Accesses the vectorstore from the state, performs retrieval, and updates the retrieved_documents in the state.
- generate_raw_answer: Takes the question and retrieved_documents to craft an initial answer. It also incorporates chat_history from the state for conversational context within the prompt.
- summarize_and_refine: Takes the raw_answer and question (and chat_history) to produce a concise, polished final_answer. This mimics a self-correction or quality assurance step.
LangGraph Workflow (StateGraph):
- A simple sequential graph: START -> retrieve -> generate -> refine -> END. This represents a clear, defined flow for answering a query from documents.
Streamlit Integration with RAG and LangGraph:
- Document Upload: st.sidebar.file_uploader for PDFs. The uploaded file is saved to a temporary location, loaded by PyPDFLoader, chunked by RecursiveCharacterTextSplitter, embedded by our chosen embedding model, and stored in ChromaDB.
- st.session_state.vectorstore: The Chroma instance (containing our document’s embeddings) is stored here. This is crucial as Streamlit reruns the script on every interaction, and we don’t want to re-process the PDF each time.
- RunnableWithMessageHistory: This is the top-level LangChain construct managing the entire conversation’s history.
  - session_history_store: A simple dictionary acts as an in-memory store for different session_ids, allowing multiple users or separate conversations within the same app instance.
  - _prepare_agent_input: This is a critical adapter function. RunnableWithMessageHistory passes input (the current user query string) and history (the list of BaseMessage objects for the conversation so far). Our LangGraph ResearchState expects a different structure (e.g., question, chat_history as part of its internal state, plus vectorstore). This adapter maps from the RWMH input format to the ResearchState format that the research_agent_app expects.
  - agent_runner_chain: Combines the input adapter with the compiled LangGraph agent.
  - final_conversational_chain: The outermost runnable that st.chat_input finally invokes, managing the session history.
- Chat Display: Messages are retrieved from st.session_state.messages and displayed using st.chat_message. Critically, sources (retrieved document snippets) are extracted from the final_state of the LangGraph agent and displayed under an st.expander for transparency.
- User Experience: st.spinner provides feedback during document processing and agent thinking.

This comprehensive application demonstrates a powerful integration of a Streamlit frontend, a LangGraph multi-step agent, a RAG pipeline for document querying, and robust session management. It’s a testament to how LangChain empowers developers to build sophisticated GenAI solutions.

Key Takeaway

Day 29 was the culmination of our journey: building a comprehensive GenAI Research Assistant! This full-stack application integrates Streamlit for an intuitive chat UI, a multi-step LangGraph agent for orchestrated intelligence, and RAG with file uploads for grounded answers from user documents. We leveraged st.session_state for persistence and connected all the dots from previous days. This project showcases the immense power of combining LangChain’s modular components to solve complex, real-world problems.

ML Vector