#30DaysOfLangChain – Day 21: Integrating Local Models for Cost-Effective LangGraph Applications

Welcome to Day 21 of #30DaysOfLangChain – LangChain 0.3 Edition! So far, many of our sophisticated LangGraph agents have relied on cloud-hosted LLMs like OpenAI’s GPT models. While incredibly powerful, these come with per-token costs and send your data to external servers, which can be a concern for privacy and security.

Today, we shift gears to explore a game-changing alternative: integrating local, open-source Large Language Models (LLMs) into our LangGraph applications. This approach offers significant advantages in terms of cost control, data privacy, and even offline functionality.

The Compelling Case for Local LLMs in AI Agents

Why would you choose to run LLMs locally when powerful cloud APIs are readily available?

Cost-Effectiveness: Once downloaded, local models incur no per-token inference fees. This is a massive saving for high-volume or long-running agentic workflows that consume millions of tokens. Your only cost is the electricity to run your hardware.
Data Privacy & Security: For sensitive applications (healthcare, finance, proprietary business data), keeping data on-premises is paramount. Local LLMs ensure that no sensitive information ever leaves your controlled environment.
Offline Capability: Your applications can run without an internet connection, ideal for remote environments or situations where network access is unreliable.
Reduced Latency: Eliminating network round-trips to cloud APIs can lead to faster response times, especially for iterative agentic loops.
Customization & Control: You have more direct control over the model’s environment, parameters, and even the ability to fine-tune it locally if you have the expertise and resources.

Introducing Ollama: Your Local LLM Companion

Running open-source LLMs can sometimes be complex, involving managing dependencies, CUDA/GPU drivers, and various inference engines. This is where Ollama shines. Ollama simplifies the process of downloading, running, and managing open-source LLMs locally. It provides a simple command-line interface and an API, making it incredibly easy to integrate into applications.

Getting Started with Ollama:

Download & Install: Visit ollama.com and download the appropriate installer for your operating system (macOS, Windows, Linux).
Pull a Model: Once Ollama is installed and running (it usually runs in the background as a service), open your terminal and pull a model. Popular choices include llama2, mistral, gemma, or phi3.

ollama pull llama2
ollama pull mistral
ollama pull phi3:mini # or a specific version

You can browse the available models at ollama.com/library.
Verify: You can test it directly in the terminal: ollama run mistral and start chatting.

Ollama serves these models via a local API endpoint (default: http://localhost:11434), which LangChain’s ChatOllama effortlessly connects to.

Integrating `ChatOllama` into LangGraph Nodes

LangChain’s ChatOllama class acts as a seamless wrapper, making local LLMs behave just like any other BaseChatModel from langchain_core.language_models.chat_models. This means integrating them into your LangGraph nodes is incredibly straightforward. You simply replace your ChatOpenAI (or similar) instantiation with ChatOllama.

from langchain_ollama import ChatOllama

# Initialize a local LLM via Ollama
local_llm = ChatOllama(model="llama2", temperature=0.7) # Ensure 'llama2' is pulled via 'ollama pull llama2'

# Now you can use local_llm just like any other LLM
response = local_llm.invoke("Hello, how are you?")
print(response.content)

Project: Transforming a Multi-Agent Workflow to Use Local LLMs

For our project today, we’ll revisit our Multi-Agent Writer-Editor workflow from Day 18. This complex agentic system, originally designed with ChatOpenAI, will be adapted to run entirely on local LLMs powered by Ollama.

The beauty of LangGraph is its modularity. The core logic of our writer_node and editor_node, as well as the conditional routing, remains exactly the same. The only change required is in how we initialize our LLM instances!

Key Adaptation Steps:

Ensure Ollama Setup: Make sure Ollama is installed and you’ve pulled the desired models (e.g., llama2 or mistral).
Import ChatOllama: Add from langchain_ollama import ChatOllama.
Initialize LLMs with ChatOllama: Replace ChatOpenAI(...) with ChatOllama(model="your_local_model_name", ...) for both your writer and editor LLMs.
Update Configuration (Optional): Add environment variables to easily switch between LLM providers if desired.

Observe how the multi-turn conversation and iterative refinement process, which previously used a cloud API, now leverage your local machine’s computational power, keeping your data entirely private.

import os
from typing import TypedDict, Annotated, List
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama # Import ChatOllama for local models
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages

# Load environment variables from a .env file (for LLM_PROVIDER if used)
from dotenv import load_dotenv
load_dotenv()

# --- Configuration for Local LLMs ---
# IMPORTANT: Ensure Ollama is installed and running, and you've pulled the models.
# Example: ollama pull llama2
# Example: ollama pull mistral
# Example: ollama pull phi3:mini (smaller, might run faster on less powerful hardware)

WRITER_MODEL = os.getenv("WRITER_MODEL", "llama2") # Model for the writer agent
EDITOR_MODEL = os.getenv("EDITOR_MODEL", "mistral") # Model for the editor agent

# --- LLM Initialization (using ChatOllama for local models) ---
def initialize_local_llm(model_name: str, temp: float = 0.7):
    """Initializes and returns a ChatOllama instance for a local LLM."""
    try:
        llm_instance = ChatOllama(model=model_name, temperature=temp)
        # Test connection by making a dummy call
        llm_instance.invoke("Hello!", config={"stream": False})
        print(f"Successfully connected to Ollama model: {model_name}")
        return llm_instance
    except Exception as e:
        print(f"Error connecting to Ollama LLM '{model_name}' or model not found: {e}")
        print(f"Please ensure Ollama is running and you have pulled the model:")
        print(f"  ollama pull {model_name}")
        exit()

# Initialize our local LLMs for the writer and editor
writer_llm = initialize_local_llm(WRITER_MODEL)
editor_llm = initialize_local_llm(EDITOR_MODEL)

print(f"\nWriter Agent using local LLM: {WRITER_MODEL}")
print(f"Editor Agent using local LLM: {EDITOR_MODEL}\n")


# --- 1. Agent State Definition (reusing from Day 18) ---
class AgentState(TypedDict):
    """
    Represents the shared memory for our multi-agent workflow.
    - messages: Conversation history.
    - draft: The current draft of the content.
    - feedback: Feedback on the draft.
    - iterations: Counter for the number of editing rounds.
    """
    messages: Annotated[List[BaseMessage], add_messages]
    draft: str
    feedback: str
    iterations: int


# --- 2. Define Agent Nodes (reusing from Day 18, but with local LLMs) ---

# Node 1: Writer Agent
def writer_node(state: AgentState) -> AgentState:
    """
    Generates an initial draft or revises an existing draft based on feedback.
    Uses the local writer_llm.
    """
    print("--- Node: Writer Agent ---")
    messages = state['messages']
    user_query = messages[0].content # Initial user query

    if state['draft']: # If there's an existing draft, it's a revision
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a content writer. You will revise the existing draft based on the editor's feedback. Focus only on the content and quality, do not add meta-commentary like 'I have revised the draft based on the feedback.'"),
            ("human", f"Original request: {user_query}\n\nExisting draft:\n{state['draft']}\n\nEditor feedback:\n{state['feedback']}\n\nRevise the draft:")
        ])
        print(f"  Revising draft based on feedback (Iteration: {state['iterations']})...")
    else: # First time, generate initial draft
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a content writer. Your task is to write a concise article draft based on the user's request. Keep it focused and to the point."),
            ("human", f"Write an article draft on: {user_query}")
        ])
        print("  Generating initial draft...")
    
    response = writer_llm.invoke(prompt)
    new_draft = response.content.strip()
    
    print(f"  Draft generated/revised (first 100 chars): {new_draft[:100]}...")
    return {"draft": new_draft, "messages": [AIMessage(content=f"Writer generated draft.")]}

# Node 2: Editor Agent
def editor_node(state: AgentState) -> AgentState:
    """
    Reviews the draft and provides constructive feedback.
    Uses the local editor_llm.
    """
    print("\n--- Node: Editor Agent ---")
    draft = state['draft']
    iterations = state['iterations']

    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a meticulous content editor. Review the following draft. If it meets high quality standards (clear, concise, accurate, directly addresses the prompt), say 'APPROVED'. Otherwise, provide specific, actionable feedback for revision. Focus on improvements, not just general praise. Limit your feedback to 2-3 concise points."),
        ("human", f"Here is the draft:\n{draft}\n\nProvide feedback or approve:")
    ])

    response = editor_llm.invoke(prompt)
    feedback_content = response.content.strip()

    if "APPROVED" in feedback_content.upper():
        print("  Draft APPROVED by Editor.")
        return {"feedback": feedback_content, "status": "approved", "messages": [AIMessage(content=f"Editor approved the draft.")]}
    else:
        print(f"  Editor provided feedback: {feedback_content[:100]}...")
        # Increment iterations and set status for revision
        return {"feedback": feedback_content, "iterations": iterations + 1, "status": "needs_revision", "messages": [AIMessage(content=f"Editor provided feedback for revision.")]}


# --- 3. Define Graph Structure and Conditional Logic ---

# Define the function that determines the next step based on the editor's output
def should_continue(state: AgentState) -> str:
    """
    Decides whether the workflow should continue (needs revision) or end (approved).
    """
    if state["status"] == "approved":
        print("\n--- Router: Draft Approved. Ending Workflow. ---")
        return "end"
    elif state["iterations"] >= 3: # Max 3 revision cycles
        print("\n--- Router: Max Iterations Reached. Ending Workflow. ---")
        return "end"
    else:
        print("\n--- Router: Draft Needs Revision. Looping back to Writer. ---")
        return "continue"


# --- 4. Build the LangGraph Workflow ---
print("--- Building the Local LLM Agent Workflow ---")

workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("writer", writer_node)
workflow.add_node("editor", editor_node)

# Set entry point
workflow.set_entry_point("writer")

# Define edges
workflow.add_edge("writer", "editor")

# Define conditional edge from editor based on 'should_continue'
workflow.add_conditional_edges(
    "editor",
    should_continue,
    {
        "continue": "writer",  # If needs revision, go back to writer
        "end": END             # If approved or max iterations, end the graph
    }
)

# Compile the graph
local_llm_app = workflow.compile()
print("Local LLM Agent workflow compiled successfully.\n")


# --- 5. Run the Workflow with Local LLMs ---
print("--- Running Multi-Agent Workflow with Local LLMs ---")

user_input = "Write a short article about the benefits of local LLMs for developers."

print(f"USER REQUEST: {user_input}\n")

# Initial state for the workflow
initial_state = {
    "messages": [HumanMessage(content=user_input)],
    "draft": "",
    "feedback": "",
    "iterations": 0,
    "status": "" # Will be set by nodes
}

# Run the graph
final_state = local_llm_app.invoke(initial_state)

print("\n" + "="*60)
print("FINAL ARTICLE DRAFT:")
print(final_state['draft'])
print("="*60)
print(f"FINAL STATUS: {final_state['status'].upper()}")
print(f"TOTAL ITERATIONS: {final_state['iterations']}")

# Optional: Print all messages to see the conversation flow
# print("\n--- Full Conversation History ---")
# for msg in final_state['messages']:
#     print(f"{msg.type.upper()}: {msg.content}")

print("\n--- Local LLM Agent Workflow Complete ---")
print(f"Observe the power of LangGraph running entirely on local models ({WRITER_MODEL} & {EDITOR_MODEL}).")

Code Explanation & Key Takeaways:

Ollama Initialization (initialize_local_llm):
- We now import ChatOllama from langchain_ollama.
- The initialize_local_llm helper function takes a model_name (e.g., “llama2”, “mistral”) and attempts to create a ChatOllama instance.
- Crucially, it includes a try-except block and a test invoke call to ensure that Ollama is running and the specified model is available locally. This provides helpful feedback if the setup isn’t correct.
- WRITER_MODEL and EDITOR_MODEL environment variables (or default values) allow for easy switching of the specific local models used by each agent.
Unchanged Agent Logic:
- Notice that the writer_node and editor_node functions themselves remain virtually identical to Day 18. They still take the AgentState and return an updated AgentState.
- The ChatPromptTemplate usage is also unchanged.
- This highlights a major strength of LangChain/LangGraph: the abstraction of the LLM. As long as you provide a BaseChatModel (whether ChatOpenAI, ChatGoogleGenerativeAI, ChatOllama, etc.), the rest of your agentic logic can remain consistent.
Seamless Integration:
- The workflow definition, node additions, edges, and conditional logic (should_continue) are exactly the same as when we used remote models. LangGraph doesn’t care where your LLM comes from, only that it conforms to the expected LangChain interface.

This project beautifully illustrates the power of local LLMs in providing a cost-effective, private, and flexible way to run your LangGraph applications. By leveraging tools like Ollama, the barrier to entry for local LLM development is significantly lowered, opening up new possibilities for AI agents in diverse environments.

Key Takeaway

Day 21 was all about empowering your LangGraph agents with local LLMs. We saw how Ollama simplifies running models like Llama 2 or Mistral on your own hardware, and how seamlessly ChatOllama integrates into existing LangChain/LangGraph workflows. This shift unlocks huge benefits in terms of cost savings, enhanced data privacy, and the ability to operate AI applications offline, making our agentic solutions more versatile and robust for real-world deployment.

ML Vector