#30DaysOfLangChain – Day 23: Advanced Streamlit: Integrating RAG & File Uploads

Welcome to Day 23 of #30DaysOfLangChain – LangChain 0.3 Edition! Yesterday, we built a simple chat interface with Streamlit. While a great start, the real power of Generative AI often lies in its ability to interact with your specific, private data. Today, we’re taking our Streamlit skills to the next level by building an advanced application that integrates Retrieval-Augmented Generation (RAG) with file uploads.

This means users will be able to upload their own documents (like PDFs!), and our LangChain application will intelligently answer questions based only on the content of those uploaded files. This is a fundamental pattern for building enterprise-grade chatbots, research assistants, and knowledge management systems.

The Power of RAG with User-Uploaded Data

Traditional LLMs are trained on vast amounts of public data. While impressive, they lack knowledge about your specific documents, internal policies, or private conversations. RAG addresses this by:

Providing Context: Retrieving relevant snippets from your private knowledge base.
Reducing Hallucinations: Grounding the LLM’s answers in factual, provided information.
Ensuring Privacy: Keeping your sensitive data within your control, especially when combined with local LLMs (as discussed on Day 21).

When you combine RAG with file uploads, you empower users to personalize their AI experience, making the AI truly useful for their unique needs.

Key Streamlit Features for File-Based RAG

st.file_uploader:
- This widget allows users to upload one or more files directly into your Streamlit application.
- You can specify accepted file types (e.g., type=["pdf", "txt"]) and whether to accept multiple files.
- When a file is uploaded, it returns an UploadedFile object, which behaves like a file-like object, allowing you to read its content.
st.session_state (Revisited):
- For a RAG application, processing a document (chunking, embedding, indexing) can be time-consuming. You don’t want to do this every time the Streamlit app reruns (which happens with every user interaction).
- st.session_state is crucial for storing the generated vector store and the chat history, ensuring they persist for the user’s session once the document is processed.

The RAG Pipeline for Uploaded Documents

The workflow for an uploaded document in a RAG system typically involves these steps:

Load: Read the content from the uploaded file. For PDFs, we’ll use PyPDFLoader after saving the UploadedFile to a temporary file.
Split: Break down the large document text into smaller, manageable “chunks” (e.g., 500-1000 characters with some overlap) using a RecursiveCharacterTextSplitter. This is vital for efficient retrieval and fitting within LLM context windows.
Embed: Convert each text chunk into a numerical vector (embedding) using an embedding model (OpenAIEmbeddings or OllamaEmbeddings).
Store: Save these embeddings into a vector store (like ChromaDB), making them searchable via similarity.
Retrieve: When a user asks a question, convert the question into an embedding and find the most similar chunks from the vector store.
Generate: Pass the retrieved chunks (context) and the user’s question to the LLM to generate a grounded answer.
Display Sources (Transparency!): Show the user which parts of their document were used to formulate the answer.

Project: Interactive RAG with File Uploads in Streamlit

Our project today will build a Streamlit application that allows:

Users to upload PDF documents.
The app to process these documents (chunk, embed, index into ChromaDB).
Users to then ask questions related to the document’s content.
The app to display the LLM’s answer along with the source chunks from the document.

Before you run the code:

Install necessary libraries: pip install streamlit langchain-openai langchain-ollama chromadb pypdf unstructured tiktoken python-dotenv
Ensure you have your OPENAI_API_KEY set if using OpenAI.
If using Ollama, ensure it’s running and you’ve pulled an embedding model (e.g., ollama pull llama2:7b-chat) and a text embedding model (e.g., ollama pull nomic-embed-text). Our code will use nomic-embed-text by default for Ollama embeddings.

import streamlit as st
import os
import tempfile

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_ollama import ChatOllama, OllamaEmbeddings # For local LLM and embeddings
from langchain.document_loaders import PyPDFLoader # For loading PDFs
from langchain.text_splitter import RecursiveCharacterTextSplitter # For chunking
from langchain_community.vectorstores import Chroma # Our vector store
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# --- Configuration for LLM and Embeddings ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower() # 'openai' or 'ollama'
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower()
OLLAMA_MODEL_EMBED = os.getenv("OLLAMA_MODEL_EMBED", "nomic-embed-text").lower()
OPENAI_MODEL_CHAT = os.getenv("OPENAI_MODEL_CHAT", "gpt-3.5-turbo")
OPENAI_MODEL_EMBED = os.getenv("OPENAI_MODEL_EMBED", "text-embedding-ada-002")

# --- Initialize LLM and Embeddings ---
@st.cache_resource
def get_llm_and_embeddings():
    """Initializes and returns LLM and Embeddings based on provider."""
    llm = None
    embeddings = None

    if LLM_PROVIDER == "openai":
        if not os.getenv("OPENAI_API_KEY"):
            st.error("OPENAI_API_KEY not set for OpenAI provider. Please set it.")
            st.stop()
        llm = ChatOpenAI(model=OPENAI_MODEL_CHAT, temperature=0.3)
        embeddings = OpenAIEmbeddings(model=OPENAI_MODEL_EMBED)
    elif LLM_PROVIDER == "ollama":
        try:
            llm = ChatOllama(model=OLLAMA_MODEL_CHAT, temperature=0.3)
            # Test chat LLM connection
            llm.invoke("test", config={"stream": False})
            st.success(f"Successfully connected to Ollama chat model: {OLLAMA_MODEL_CHAT}")
        except Exception as e:
            st.error(f"Error connecting to Ollama chat LLM '{OLLAMA_MODEL_CHAT}': {e}")
            st.info(f"Please ensure Ollama is running and you have pulled the model: `ollama pull {OLLAMA_MODEL_CHAT}`")
            st.stop()
        
        try:
            embeddings = OllamaEmbeddings(model=OLLAMA_MODEL_EMBED)
            # Test embedding model connection
            embeddings.embed_query("test")
            st.success(f"Successfully connected to Ollama embedding model: {OLLAMA_MODEL_EMBED}")
        except Exception as e:
            st.error(f"Error connecting to Ollama embedding model '{OLLAMA_MODEL_EMBED}': {e}")
            st.info(f"Please ensure Ollama is running and you have pulled the embedding model: `ollama pull {OLLAMA_MODEL_EMBED}`")
            st.stop()
    else:
        st.error(f"Invalid LLM provider: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")
        st.stop()
    
    return llm, embeddings

llm, embeddings = get_llm_and_embeddings()


# --- Streamlit App Setup ---
st.set_page_config(page_title="RAG Chat with Your Documents", page_icon="📚")
st.title("📚 RAG Chat with Your Documents")
st.markdown(f"*(LLM: {LLM_PROVIDER.capitalize()} {OPENAI_MODEL_CHAT if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_CHAT}, Embeddings: {OPENAI_MODEL_EMBED if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_EMBED})*")
st.markdown("---")

# --- Initialize chat history in session state ---
if "messages" not in st.session_state:
    st.session_state.messages = [] # Stores list of {"role": "user" or "assistant", "content": "message text", "sources": []}

# --- Initialize vector store in session state ---
# This will hold our document embeddings
if "vectorstore" not in st.session_state:
    st.session_state.vectorstore = None

# --- Document Upload and Processing ---
uploaded_file = st.sidebar.file_uploader(
    "Upload a PDF document",
    type="pdf",
    accept_multiple_files=False,
    key="pdf_uploader"
)

if uploaded_file and st.session_state.vectorstore is None:
    with st.spinner("Processing document... This may take a moment."):
        try:
            # 1. Save uploaded file to a temporary file
            with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
                tmp_file.write(uploaded_file.getvalue())
                tmp_file_path = tmp_file.name
            
            # 2. Load the document
            loader = PyPDFLoader(tmp_file_path)
            docs = loader.load()
            
            if not docs:
                st.warning("Could not extract text from the PDF. Please try another file.")
                os.unlink(tmp_file_path) # Clean up temp file
                st.stop()

            # 3. Split documents into chunks
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
            splits = text_splitter.split_documents(docs)

            # 4. Create embeddings and store in Chroma
            st.session_state.vectorstore = Chroma.from_documents(
                documents=splits,
                embedding=embeddings
            )
            st.sidebar.success(f"Document '{uploaded_file.name}' processed and ready for questions!")
            # Clean up temporary file after processing
            os.unlink(tmp_file_path)
        except Exception as e:
            st.sidebar.error(f"Error processing document: {e}")
            st.session_state.vectorstore = None # Reset vectorstore on error
            if 'tmp_file_path' in locals() and os.path.exists(tmp_file_path):
                os.unlink(tmp_file_path)
    
# --- Display chat messages from history ---
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])
        if message["sources"]:
            with st.expander("Sources"):
                for source in message["sources"]:
                    st.text(source)

# --- Handle user input ---
if prompt := st.chat_input("Ask a question about the document..."):
    if st.session_state.vectorstore is None:
        st.warning("Please upload a PDF document first to enable RAG.")
    else:
        # Add user message to chat history and display it
        st.session_state.messages.append({"role": "user", "content": prompt, "sources": []})
        with st.chat_message("user"):
            st.markdown(prompt)

        # Prepare RAG chain
        retriever = st.session_state.vectorstore.as_retriever()
        
        # Define RAG prompt
        rag_prompt_template = ChatPromptTemplate.from_messages([
            ("system", "You are an AI assistant. Use the following retrieved context to answer the question. "
                       "If you don't know the answer, state that you don't know. Keep your answer concise and to the point. "
                       "Context: {context}"),
            ("human", "{question}")
        ])

        # Define RAG chain with source retrieval
        rag_chain = (
            {"context": retriever, "question": RunnablePassthrough()}
            | rag_prompt_template
            | llm
            | StrOutputParser()
        )

        with st.chat_message("assistant"):
            with st.spinner("Retrieving and Generating..."):
                full_response = ""
                response_container = st.empty()
                
                # We need to invoke the retriever separately to get sources
                retrieved_docs = retriever.invoke(prompt)
                context_content = "\n\n".join([doc.page_content for doc in retrieved_docs])
                
                # Now, invoke the full chain with the context and question
                # For streaming, we pass the context and question to the prompt directly
                # and then stream the LLM response.
                chain_with_context = (
                    rag_prompt_template | llm | StrOutputParser()
                )
                
                for chunk in chain_with_context.stream({
                    "context": context_content,
                    "question": prompt
                }):
                    full_response += chunk
                    response_container.markdown(full_response + "▌")
                
                response_container.markdown(full_response)

                # Collect and display sources
                source_texts = []
                for i, doc in enumerate(retrieved_docs):
                    source_texts.append(f"Source {i+1} (Page {doc.metadata.get('page', 'N/A')}): {doc.page_content[:200]}...") # Display first 200 chars

                st.session_state.messages.append({"role": "assistant", "content": full_response, "sources": source_texts})
                
                if source_texts:
                    with st.expander("Sources"):
                        for source in source_texts:
                            st.text(source)


# --- How to run this app ---
st.sidebar.markdown("---")
st.sidebar.markdown("### How to run")
st.sidebar.markdown("1. Save this code as `day23-rag-file-upload.py`")
st.sidebar.markdown("2. Open your terminal in the same directory.")
st.sidebar.markdown("3. Run the command: `streamlit run day23-rag-file-upload.py`")
st.sidebar.markdown("4. Your browser will open with the RAG application.")
st.sidebar.markdown("---")
st.sidebar.markdown("#### Dependencies")
st.sidebar.markdown("`pip install streamlit langchain-openai langchain-ollama chromadb pypdf unstructured tiktoken python-dotenv`")
st.sidebar.markdown("---")
st.sidebar.markdown("#### Ollama Setup")
st.sidebar.markdown(f"Ensure Ollama is running and models pulled:")
st.sidebar.markdown(f"`ollama pull {OLLAMA_MODEL_CHAT}`")
st.sidebar.markdown(f"`ollama pull {OLLAMA_MODEL_EMBED}`")

Code Explanation & Key Takeaways:

Dependencies: Notice the new pip install commands. pypdf is specifically for reading PDFs, chromadb for our local vector store, and tiktoken for OpenAI’s tokenization if using OpenAIEmbeddings (though RecursiveCharacterTextSplitter can use it implicitly). unstructured is a more general document loader but for simple PDFs, PyPDFLoader from langchain_community.document_loaders is often sufficient.
LLM and Embeddings Initialization (get_llm_and_embeddings()):
- We now initialize both a ChatModel (for generation) and an Embeddings model (for vector creation).
- @st.cache_resource: This decorator is vital! It caches the LLM and embeddings instances, meaning they are initialized only once per Streamlit session, not on every rerun. This saves significant time and resources.
- Error handling is included to guide users if Ollama models are not pulled or API keys are missing.
- Configuration for Ollama embedding model (nomic-embed-text is a good general-purpose choice).
st.session_state.vectorstore:
- A new st.session_state key is introduced to store our Chroma vector store. It starts as None.
- When a file is uploaded, the processing logic checks if st.session_state.vectorstore is None. This ensures that processing only happens once per document upload or when the app first starts without a pre-loaded document.
Document Upload and Processing Flow:
- uploaded_file = st.sidebar.file_uploader(...): The file uploader is placed in the sidebar to keep the main chat area clean. type="pdf" restricts uploads to PDFs.
- Temporary File Handling: PyPDFLoader requires a file path. uploaded_file.getvalue() retrieves the file’s bytes, which are then written to a tempfile.NamedTemporaryFile. This temporary file is crucial for allowing PyPDFLoader to read the content. It’s then cleaned up using os.unlink().
- Loading, Splitting, Embedding, Storing: This is the standard RAG pipeline using PyPDFLoader, RecursiveCharacterTextSplitter, and Chroma.from_documents.
- st.spinner(): Provides visual feedback to the user during the potentially long document processing step.
RAG Chain Construction and Invocation:
- retriever = st.session_state.vectorstore.as_retriever(): Once the vector store is populated, we get a retriever from it.
- RAG Prompt: A ChatPromptTemplate is defined, explicitly including a {context} variable, which will be populated by the retrieved document chunks.
- Retrieving Sources Separately: For the purpose of displaying sources, we explicitly call retriever.invoke(prompt) to get the Document objects. This allows us to extract metadata like page numbers.
- The rag_chain is constructed using LangChain Expression Language (LCEL) with RunnablePassthrough to correctly pass the question to the retriever and then the context/question to the LLM.
- Streaming with Sources: The LLM response is streamed for better UX. After the response, the sources are extracted from the retrieved_docs and stored in st.session_state.messages alongside the content.
Displaying Sources:
- The message display loop now checks if message["sources"]:.
- st.expander("Sources"): Allows users to optionally view the retrieved document chunks that informed the AI’s answer, providing transparency.

This advanced Streamlit application demonstrates a complete RAG workflow where users interact with their own data, showcasing the real-world applicability of LangChain.

Key Takeaway

Day 23 unlocked a powerful capability: building RAG applications where users can upload their own data! We leveraged Streamlit’s st.file_uploader to ingest documents, then engineered a robust LangChain pipeline to load, chunk, embed, and store that data in a Chroma vector store. Critically, st.session_state ensures that the processed document (vector store) persists, avoiding re-processing. The result is an interactive chatbot that intelligently answers questions grounded in user-provided context, complete with source citations!

ML Vector