30DaysOfLangChain – Day 5: The Brain of RAG: Embeddings & Vector Stores

Welcome to Day 5 of #30DaysOfLangChain! On Day 4, we learned how to prepare our raw data by loading documents and splitting them into manageable chunks. Now, how do we make these chunks searchable by an LLM based on meaning, not just keywords? The answer lies in Embeddings and Vector Stores.

These two components are the “brain” behind a RAG system, enabling your LLM to intelligently retrieve context from a vast knowledge base.

1. Embeddings: Giving Text Meaning to Machines

Imagine trying to find a recipe for “spicy chicken” by searching only for “chicken” or “spicy.” You might miss recipes for “hot poultry” or “fiery fowl.” This is where embeddings come in.

What are they? An embedding is a high-dimensional vector (a list of numbers) that numerically represents text. Words, sentences, or entire document chunks that are semantically similar are mapped to points that are geometrically “close” to each other in this high-dimensional space.
Why are they crucial for RAG? They allow for semantic search. Instead of keyword matching, you can find text chunks that are conceptually similar to your query, even if they don’t share exact words. This is fundamental for finding relevant context for an LLM.
How LangChain helps: LangChain provides an Embeddings interface that abstracts away the complexity of various embedding models.
- OpenAIEmbeddings: Uses OpenAI’s powerful embedding models (e.g., text-embedding-ada-002). Requires an API key.
- OllamaEmbeddings: Integrates with local embedding models served via Ollama (e.g., nomic-embed-text, mxbai-embed-large). Great for privacy and cost savings.

2. Vector Stores: The Semantic Database

Once your text chunks are converted into numerical embeddings (vectors), you need a place to store them efficiently and perform lightning-fast similarity searches. This is the job of a Vector Store.

What is it? A vector store is essentially a database optimized for storing vectors and their associated metadata (like the original text chunk and its source). Crucially, it provides highly optimized algorithms for finding the “nearest neighbors” to a given query vector.
How does Similarity Search work?
1. A user asks a question (e.g., “What is LCEL?”).
2. This question is converted into an embedding vector using the same embedding model used for your document chunks.
3. The vector store then calculates the “distance” (or similarity score) between the query vector and all the stored document chunk vectors.
4. It returns the top-K (e.g., top 4) most similar document chunks, which are then used as context for the LLM.

Examples:

Chroma: A popular lightweight, open-source vector database that can run in-memory, as a client-server, or persistent locally. Excellent for rapid prototyping and smaller applications.
FAISS (Facebook AI Similarity Search): A library for efficient similarity search and clustering of dense vectors. Primarily used for in-memory or disk-backed indexing, often within Python.
Dedicated Cloud Services: Pinecone, Weaviate, Milvus, Qdrant, Supabase, Azure AI Search, etc., for large-scale, production-ready vector databases.

For Day 5, we’ll use Chroma because of its ease of setup and use for demonstration purposes.

For more details, check out the official LangChain documentation:

Project: Embedding Chunks & Semantic Search with Chroma

We’ll pick up from Day 4’s data preparation. We’ll load our sample_document.txt, split it into chunks, then use an embedding model (configurable, like Day 3) to convert these chunks into vectors. Finally, we’ll store them in an in-memory Chroma vector store and perform a semantic search to retrieve relevant information.

Before you run the code (if using Ollama embeddings):

Ensure Ollama is installed and running (ollama serve).
Pull an embedding model: ollama pull nomic-embed-text (or ollama pull mxbai-embed-large).

import os
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# --- Configuration ---
# Set to 'openai' or 'ollama' to choose your embedding model
# You can set this in your .env file: EMBEDDING_PROVIDER=ollama
EMBEDDING_PROVIDER = os.getenv("EMBEDDING_PROVIDER", "openai").lower()


document_content = """
LangChain is a framework designed to simplify the creation of applications using large language models (LLMs).
It provides a standardized interface for chains, agents, and retrieval.
The core idea is to allow chaining together different components to create more complex use cases.

One of the most important concepts in LangChain 0.3 is the Runnable interface.
Nearly every component, from prompts to models to parsers, implements Runnable.
This standardization enables seamless composition using the LangChain Expression Language (LCEL) and the pipe operator (|).

LCEL allows for highly declarative and concurrent execution of chains.
It supports streaming, asynchronous operations, and is designed for production-grade applications.
Understanding Runnables and LCEL is fundamental to mastering modern LangChain.

When building Retrieval-Augmented Generation (RAG) systems, preparing your data is crucial.
This involves using Document Loaders to ingest data from various sources like text files, PDFs, or web pages.
Once loaded, Text Splitters break down these large documents into smaller, manageable chunks.
These chunks are then typically embedded and stored in a vector database for efficient retrieval based on semantic similarity.
This process ensures that only relevant information is passed to the LLM, optimizing context window usage and improving response quality.
"""

file_path = "sample_document.txt"
with open(file_path, "w") as f:
    f.write(document_content)
print(f"Created '{file_path}' for demonstration.\n")

# --- Step 1: Load and Split the Document ---
print("--- Loading and Splitting Document ---")
loader = TextLoader(file_path)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20,
    length_function=len,
    add_start_index=True
)
chunks = text_splitter.split_documents(documents)
print(f"Original document split into {len(chunks)} chunks.\n")

# --- Step 2: Initialize Embedding Model ---
embeddings = None
if EMBEDDING_PROVIDER == "openai":
    if not os.getenv("OPENAI_API_KEY"):
        raise ValueError("OPENAI_API_KEY not set for OpenAI embedding provider.")
    embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
    print("Using OpenAIEmbeddings (text-embedding-ada-002).")
elif EMBEDDING_PROVIDER == "ollama":
    try:
        # Ensure Ollama server is running and model is pulled (e.g., ollama pull nomic-embed-text)
        embeddings = OllamaEmbeddings(model="nomic-embed-text")
        # Test connection by embedding a small string
        _ = embeddings.embed_query("test")
        print("Using OllamaEmbeddings (nomic-embed-text).")
    except Exception as e:
        print(f"Error connecting to Ollama or model 'nomic-embed-text' not found: {e}")
        print("Please ensure:")
        print("1. Ollama is installed and running (`ollama serve`).")
        print("2. The model 'nomic-embed-text' is pulled (`ollama pull nomic-embed-text`).")
        print("Exiting...")
        exit()
else:
    raise ValueError(f"Invalid EMBEDDING_PROVIDER: {EMBEDDING_PROVIDER}. Must be 'openai' or 'ollama'.")

# --- Step 3: Create a Vector Store (Chroma in-memory) ---
print("--- Creating Chroma Vector Store and Adding Documents ---")
# Chroma.from_documents takes chunks and the embedding model to create and populate the store
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="langchain_30_days_embeddings" # Optional: name for your collection
)
print(f"Vector store created with {vectorstore._collection.count()} documents.\n")

# --- Step 4: Perform a Similarity Search ---
print("--- Performing Similarity Search ---")
query = "What is LCEL and why is it important?"
print(f"Query: '{query}'")

# Perform a similarity search to retrieve the top 2 most relevant chunks
retrieved_docs = vectorstore.similarity_search(query, k=2)

print("\n--- Retrieved Documents (Top 2) ---")
for i, doc in enumerate(retrieved_docs):
    print(f"Document {i+1} (Score/Relevance not shown by default, but these are closest):")
    print(f"  Content (first 150 chars): {doc.page_content[:150]}...")
    print(f"  Metadata: {doc.metadata}")
    print("-" * 30)

# Clean up the dummy file
# os.remove(file_path)
# print(f"\nCleaned up '{file_path}'.")

Code Explanation:

Document Loading & Splitting: We reuse the logic from Day 4 to prepare our text data.
Embedding Model Initialization:
- We configure EMBEDDING_PROVIDER (similar to Day 3’s LLM_PROVIDER) to switch between OpenAIEmbeddings and OllamaEmbeddings.
- Crucially, if using OllamaEmbeddings, we include a try-except block to guide users through potential setup issues (Ollama server not running, embedding model not pulled). Remember to pull an embedding model like nomic-embed-text with ollama pull nomic-embed-text.
Chroma.from_documents(): This is the magic!
- It takes your chunks (list of Document objects) and your embedding model.
- Internally, it calls the embedding model to convert each chunk’s page_content into a vector.
- It then stores these vectors along with their original Document objects (including page_content and metadata) in the Chroma vector store. For this example, Chroma is initialized in-memory, meaning the data is lost when the script ends.
vectorstore.similarity_search(query, k=N):
- When you provide a query, the vector store first uses its internal embedding model (the one you provided during creation) to convert the query string into a vector.
- It then performs a similarity search, finding the k most similar vectors (and their associated Document objects) in its database.
- The returned retrieved_docs are Document objects, ready to be passed to an LLM for RAG.

This project completes the foundational data preparation steps for RAG, showing how unstructured text becomes semantically searchable. Next, we’ll put it all together into a basic RAG chain!

ML Vector