Welcome to Day 23 of #30DaysOfLangChain – LangChain 0.3 Edition! Yesterday, we built a simple chat interface with Streamlit. While a great start, the real power of Generative AI often lies in its ability to interact with your specific, private data. Today, we’re taking our Streamlit skills to the next level by building an advanced application that integrates Retrieval-Augmented Generation (RAG) with file uploads.
This means users will be able to upload their own documents (like PDFs!), and our LangChain application will intelligently answer questions based only on the content of those uploaded files. This is a fundamental pattern for building enterprise-grade chatbots, research assistants, and knowledge management systems.
The Power of RAG with User-Uploaded Data
Traditional LLMs are trained on vast amounts of public data. While impressive, they lack knowledge about your specific documents, internal policies, or private conversations. RAG addresses this by:
- Providing Context: Retrieving relevant snippets from your private knowledge base.
- Reducing Hallucinations: Grounding the LLM’s answers in factual, provided information.
- Ensuring Privacy: Keeping your sensitive data within your control, especially when combined with local LLMs (as discussed on Day 21).
When you combine RAG with file uploads, you empower users to personalize their AI experience, making the AI truly useful for their unique needs.
Key Streamlit Features for File-Based RAG
st.file_uploader:- This widget allows users to upload one or more files directly into your Streamlit application.
- You can specify accepted file types (e.g.,
type=["pdf", "txt"]) and whether to accept multiple files. - When a file is uploaded, it returns an
UploadedFileobject, which behaves like a file-like object, allowing you to read its content.
st.session_state(Revisited):- For a RAG application, processing a document (chunking, embedding, indexing) can be time-consuming. You don’t want to do this every time the Streamlit app reruns (which happens with every user interaction).
st.session_stateis crucial for storing the generated vector store and the chat history, ensuring they persist for the user’s session once the document is processed.
The RAG Pipeline for Uploaded Documents
The workflow for an uploaded document in a RAG system typically involves these steps:
- Load: Read the content from the uploaded file. For PDFs, we’ll use
PyPDFLoaderafter saving theUploadedFileto a temporary file. - Split: Break down the large document text into smaller, manageable “chunks” (e.g., 500-1000 characters with some overlap) using a
RecursiveCharacterTextSplitter. This is vital for efficient retrieval and fitting within LLM context windows. - Embed: Convert each text chunk into a numerical vector (embedding) using an embedding model (
OpenAIEmbeddingsorOllamaEmbeddings). - Store: Save these embeddings into a vector store (like
ChromaDB), making them searchable via similarity. - Retrieve: When a user asks a question, convert the question into an embedding and find the most similar chunks from the vector store.
- Generate: Pass the retrieved chunks (context) and the user’s question to the LLM to generate a grounded answer.
- Display Sources (Transparency!): Show the user which parts of their document were used to formulate the answer.
Project: Interactive RAG with File Uploads in Streamlit
Our project today will build a Streamlit application that allows:
- Users to upload PDF documents.
- The app to process these documents (chunk, embed, index into ChromaDB).
- Users to then ask questions related to the document’s content.
- The app to display the LLM’s answer along with the source chunks from the document.
Before you run the code:
- Install necessary libraries:
pip install streamlit langchain-openai langchain-ollama chromadb pypdf unstructured tiktoken python-dotenv - Ensure you have your
OPENAI_API_KEYset if using OpenAI. - If using Ollama, ensure it’s running and you’ve pulled an embedding model (e.g.,
ollama pull llama2:7b-chat) and a text embedding model (e.g.,ollama pull nomic-embed-text). Our code will usenomic-embed-textby default for Ollama embeddings.
import streamlit as st
import os
import tempfile
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_ollama import ChatOllama, OllamaEmbeddings # For local LLM and embeddings
from langchain.document_loaders import PyPDFLoader # For loading PDFs
from langchain.text_splitter import RecursiveCharacterTextSplitter # For chunking
from langchain_community.vectorstores import Chroma # Our vector store
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# Load environment variables
from dotenv import load_dotenv
load_dotenv()
# --- Configuration for LLM and Embeddings ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "ollama").lower() # 'openai' or 'ollama'
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower()
OLLAMA_MODEL_EMBED = os.getenv("OLLAMA_MODEL_EMBED", "nomic-embed-text").lower()
OPENAI_MODEL_CHAT = os.getenv("OPENAI_MODEL_CHAT", "gpt-3.5-turbo")
OPENAI_MODEL_EMBED = os.getenv("OPENAI_MODEL_EMBED", "text-embedding-ada-002")
# --- Initialize LLM and Embeddings ---
@st.cache_resource
def get_llm_and_embeddings():
"""Initializes and returns LLM and Embeddings based on provider."""
llm = None
embeddings = None
if LLM_PROVIDER == "openai":
if not os.getenv("OPENAI_API_KEY"):
st.error("OPENAI_API_KEY not set for OpenAI provider. Please set it.")
st.stop()
llm = ChatOpenAI(model=OPENAI_MODEL_CHAT, temperature=0.3)
embeddings = OpenAIEmbeddings(model=OPENAI_MODEL_EMBED)
elif LLM_PROVIDER == "ollama":
try:
llm = ChatOllama(model=OLLAMA_MODEL_CHAT, temperature=0.3)
# Test chat LLM connection
llm.invoke("test", config={"stream": False})
st.success(f"Successfully connected to Ollama chat model: {OLLAMA_MODEL_CHAT}")
except Exception as e:
st.error(f"Error connecting to Ollama chat LLM '{OLLAMA_MODEL_CHAT}': {e}")
st.info(f"Please ensure Ollama is running and you have pulled the model: `ollama pull {OLLAMA_MODEL_CHAT}`")
st.stop()
try:
embeddings = OllamaEmbeddings(model=OLLAMA_MODEL_EMBED)
# Test embedding model connection
embeddings.embed_query("test")
st.success(f"Successfully connected to Ollama embedding model: {OLLAMA_MODEL_EMBED}")
except Exception as e:
st.error(f"Error connecting to Ollama embedding model '{OLLAMA_MODEL_EMBED}': {e}")
st.info(f"Please ensure Ollama is running and you have pulled the embedding model: `ollama pull {OLLAMA_MODEL_EMBED}`")
st.stop()
else:
st.error(f"Invalid LLM provider: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")
st.stop()
return llm, embeddings
llm, embeddings = get_llm_and_embeddings()
# --- Streamlit App Setup ---
st.set_page_config(page_title="RAG Chat with Your Documents", page_icon="📚")
st.title("📚 RAG Chat with Your Documents")
st.markdown(f"*(LLM: {LLM_PROVIDER.capitalize()} {OPENAI_MODEL_CHAT if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_CHAT}, Embeddings: {OPENAI_MODEL_EMBED if LLM_PROVIDER == 'openai' else OLLAMA_MODEL_EMBED})*")
st.markdown("---")
# --- Initialize chat history in session state ---
if "messages" not in st.session_state:
st.session_state.messages = [] # Stores list of {"role": "user" or "assistant", "content": "message text", "sources": []}
# --- Initialize vector store in session state ---
# This will hold our document embeddings
if "vectorstore" not in st.session_state:
st.session_state.vectorstore = None
# --- Document Upload and Processing ---
uploaded_file = st.sidebar.file_uploader(
"Upload a PDF document",
type="pdf",
accept_multiple_files=False,
key="pdf_uploader"
)
if uploaded_file and st.session_state.vectorstore is None:
with st.spinner("Processing document... This may take a moment."):
try:
# 1. Save uploaded file to a temporary file
with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
tmp_file.write(uploaded_file.getvalue())
tmp_file_path = tmp_file.name
# 2. Load the document
loader = PyPDFLoader(tmp_file_path)
docs = loader.load()
if not docs:
st.warning("Could not extract text from the PDF. Please try another file.")
os.unlink(tmp_file_path) # Clean up temp file
st.stop()
# 3. Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
splits = text_splitter.split_documents(docs)
# 4. Create embeddings and store in Chroma
st.session_state.vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings
)
st.sidebar.success(f"Document '{uploaded_file.name}' processed and ready for questions!")
# Clean up temporary file after processing
os.unlink(tmp_file_path)
except Exception as e:
st.sidebar.error(f"Error processing document: {e}")
st.session_state.vectorstore = None # Reset vectorstore on error
if 'tmp_file_path' in locals() and os.path.exists(tmp_file_path):
os.unlink(tmp_file_path)
# --- Display chat messages from history ---
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if message["sources"]:
with st.expander("Sources"):
for source in message["sources"]:
st.text(source)
# --- Handle user input ---
if prompt := st.chat_input("Ask a question about the document..."):
if st.session_state.vectorstore is None:
st.warning("Please upload a PDF document first to enable RAG.")
else:
# Add user message to chat history and display it
st.session_state.messages.append({"role": "user", "content": prompt, "sources": []})
with st.chat_message("user"):
st.markdown(prompt)
# Prepare RAG chain
retriever = st.session_state.vectorstore.as_retriever()
# Define RAG prompt
rag_prompt_template = ChatPromptTemplate.from_messages([
("system", "You are an AI assistant. Use the following retrieved context to answer the question. "
"If you don't know the answer, state that you don't know. Keep your answer concise and to the point. "
"Context: {context}"),
("human", "{question}")
])
# Define RAG chain with source retrieval
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| rag_prompt_template
| llm
| StrOutputParser()
)
with st.chat_message("assistant"):
with st.spinner("Retrieving and Generating..."):
full_response = ""
response_container = st.empty()
# We need to invoke the retriever separately to get sources
retrieved_docs = retriever.invoke(prompt)
context_content = "\n\n".join([doc.page_content for doc in retrieved_docs])
# Now, invoke the full chain with the context and question
# For streaming, we pass the context and question to the prompt directly
# and then stream the LLM response.
chain_with_context = (
rag_prompt_template | llm | StrOutputParser()
)
for chunk in chain_with_context.stream({
"context": context_content,
"question": prompt
}):
full_response += chunk
response_container.markdown(full_response + "▌")
response_container.markdown(full_response)
# Collect and display sources
source_texts = []
for i, doc in enumerate(retrieved_docs):
source_texts.append(f"Source {i+1} (Page {doc.metadata.get('page', 'N/A')}): {doc.page_content[:200]}...") # Display first 200 chars
st.session_state.messages.append({"role": "assistant", "content": full_response, "sources": source_texts})
if source_texts:
with st.expander("Sources"):
for source in source_texts:
st.text(source)
# --- How to run this app ---
st.sidebar.markdown("---")
st.sidebar.markdown("### How to run")
st.sidebar.markdown("1. Save this code as `day23-rag-file-upload.py`")
st.sidebar.markdown("2. Open your terminal in the same directory.")
st.sidebar.markdown("3. Run the command: `streamlit run day23-rag-file-upload.py`")
st.sidebar.markdown("4. Your browser will open with the RAG application.")
st.sidebar.markdown("---")
st.sidebar.markdown("#### Dependencies")
st.sidebar.markdown("`pip install streamlit langchain-openai langchain-ollama chromadb pypdf unstructured tiktoken python-dotenv`")
st.sidebar.markdown("---")
st.sidebar.markdown("#### Ollama Setup")
st.sidebar.markdown(f"Ensure Ollama is running and models pulled:")
st.sidebar.markdown(f"`ollama pull {OLLAMA_MODEL_CHAT}`")
st.sidebar.markdown(f"`ollama pull {OLLAMA_MODEL_EMBED}`")
Code Explanation & Key Takeaways:
- Dependencies: Notice the new
pip installcommands.pypdfis specifically for reading PDFs,chromadbfor our local vector store, andtiktokenfor OpenAI’s tokenization if usingOpenAIEmbeddings(thoughRecursiveCharacterTextSplittercan use it implicitly).unstructuredis a more general document loader but for simple PDFs,PyPDFLoaderfromlangchain_community.document_loadersis often sufficient. - LLM and Embeddings Initialization (
get_llm_and_embeddings()):- We now initialize both a
ChatModel(for generation) and anEmbeddingsmodel (for vector creation). @st.cache_resource: This decorator is vital! It caches the LLM and embeddings instances, meaning they are initialized only once per Streamlit session, not on every rerun. This saves significant time and resources.- Error handling is included to guide users if Ollama models are not pulled or API keys are missing.
- Configuration for Ollama embedding model (
nomic-embed-textis a good general-purpose choice).
- We now initialize both a
st.session_state.vectorstore:- A new
st.session_statekey is introduced to store ourChromavector store. It starts asNone. - When a file is uploaded, the processing logic checks
if st.session_state.vectorstore is None. This ensures that processing only happens once per document upload or when the app first starts without a pre-loaded document.
- A new
- Document Upload and Processing Flow:
uploaded_file = st.sidebar.file_uploader(...): The file uploader is placed in the sidebar to keep the main chat area clean.type="pdf"restricts uploads to PDFs.- Temporary File Handling:
PyPDFLoaderrequires a file path.uploaded_file.getvalue()retrieves the file’s bytes, which are then written to atempfile.NamedTemporaryFile. This temporary file is crucial for allowingPyPDFLoaderto read the content. It’s then cleaned up usingos.unlink(). - Loading, Splitting, Embedding, Storing: This is the standard RAG pipeline using
PyPDFLoader,RecursiveCharacterTextSplitter, andChroma.from_documents. st.spinner(): Provides visual feedback to the user during the potentially long document processing step.
- RAG Chain Construction and Invocation:
retriever = st.session_state.vectorstore.as_retriever(): Once the vector store is populated, we get a retriever from it.- RAG Prompt: A
ChatPromptTemplateis defined, explicitly including a{context}variable, which will be populated by the retrieved document chunks. - Retrieving Sources Separately: For the purpose of displaying sources, we explicitly call
retriever.invoke(prompt)to get theDocumentobjects. This allows us to extract metadata like page numbers. - The
rag_chainis constructed using LangChain Expression Language (LCEL) withRunnablePassthroughto correctly pass the question to the retriever and then the context/question to the LLM. - Streaming with Sources: The LLM response is streamed for better UX. After the response, the
sourcesare extracted from theretrieved_docsand stored inst.session_state.messagesalongside the content.
- Displaying Sources:
- The message display loop now checks
if message["sources"]:. st.expander("Sources"): Allows users to optionally view the retrieved document chunks that informed the AI’s answer, providing transparency.
- The message display loop now checks
This advanced Streamlit application demonstrates a complete RAG workflow where users interact with their own data, showcasing the real-world applicability of LangChain.
Key Takeaway
Day 23 unlocked a powerful capability: building RAG applications where users can upload their own data! We leveraged Streamlit’s st.file_uploader to ingest documents, then engineered a robust LangChain pipeline to load, chunk, embed, and store that data in a Chroma vector store. Critically, st.session_state ensures that the processed document (vector store) persists, avoiding re-processing. The result is an interactive chatbot that intelligently answers questions grounded in user-provided context, complete with source citations!

Leave a comment