30DaysOfLangChain – Day 3: Flexible LLMs: Integrating Remote & Local Models with LCEL

Welcome to Day 3 of #30DaysOfLangChain! After establishing our LCEL foundation on Day 1 and mastering prompts and parsers on Day 2, it’s time to talk about the core intelligence of our applications: the Large Language Models themselves.

Today, we’ll learn how to integrate various LLMs into our LCEL pipelines, focusing on the distinction between remote (API-based) and local (self-hosted) models. Understanding how to swap between them offers immense flexibility in terms of cost, privacy, and performance.

Remote vs. Local LLMs: A Quick Overview

The world of LLMs offers a spectrum of choices, primarily categorized by where they run:

Remote LLMs (API-based):
- Examples: OpenAI (GPT models), Anthropic (Claude), Google (Gemini), Cohere.
- Pros: Generally highest performance, largest models, easy to use (just an API key), no local setup.
- Cons: Cost per token, data privacy concerns (data sent to provider), reliance on external service, potential latency.
Local LLMs (Self-hosted):
- Examples: Llama 2, Mistral, Gemma, Phi-2 (run via tools like Ollama, LM Studio, vLLM).
- Pros: Zero ongoing cost after initial setup, full data privacy (data stays on your machine), no internet dependency (after model download), greater control and customization.
- Cons: Requires powerful local hardware (GPU often essential for larger models), initial setup can be more involved, performance might lag behind top-tier remote models.

LangChain provides seamless integrations for both, allowing you to build your LCEL pipelines with a clear separation of concerns.

Integrating LLMs with LangChain: A Deep Dive into Ollama

LangChain abstracts away the specifics of each LLM provider through its ChatModel interface. Whether it’s ChatOpenAI, ChatOllama, ChatGoogleGenerativeAI, or others, they all behave as Runnables, making them perfectly compatible with LCEL.

langchain-openai: For integrating with OpenAI’s models. You typically provide your OPENAI_API_KEY as an environment variable.
langchain-ollama: This is your gateway to running powerful open-source LLMs directly on your machine.What is Ollama? Ollama is an incredibly user-friendly tool that simplifies the process of downloading, running, and managing large language models locally. It takes care of the complexities of model weights, configurations, and dependencies, presenting a simple command-line interface and a local API (usually on http://localhost:11434). Think of it as a “Docker for LLMs” – it bundles the model, its configuration, and its runtime into a single, easy-to-use package. Why use Ollama?
- Privacy: Your data never leaves your machine. Ideal for sensitive applications.
- Cost-Effective: No per-token costs; you only pay for your hardware and electricity.
- Offline Capability: Once models are downloaded, you can run them without an internet connection.
- Rapid Iteration: Local inference means faster response times, accelerating development and experimentation.
- Accessibility: Makes powerful open-source models accessible even for those without extensive ML ops experience.
Getting Started with Ollama (Crucial Pre-requisites):
1. Install Ollama: Go to https://ollama.ai/download and download the installer for your operating system (macOS, Linux, Windows). Follow the straightforward installation steps. Ollama usually starts a background server automatically upon installation.
2. Pull a Model: Open your terminal or command prompt and use the ollama pull command to download a model. For instance, to get Llama 2 (a popular choice for general use):

ollama pull llama2

You can explore other available models and their sizes on the Ollama Models Library
. Common choices include mistral, gemma, codellama, etc.

3. Verify Ollama is Running: You can often see an Ollama icon in your system tray (Windows/macOS) indicating it’s running. To verify via command line, you can try:

ollama run llama2 "hi" # This will run llama2 and let you chat

Or simply ensure the server is active: ollama serve (if it’s not automatically running in the background).

Once Ollama is installed and you’ve pulled a model, langchain-ollama‘s ChatOllama class connects directly to this local server, allowing you to seamlessly integrate your local LLMs into your LangChain applications.

Swapping Models in an LCEL Pipeline

The beauty of LCEL and the Runnable interface is that swapping models is as simple as replacing one Runnable LLM instance with another, provided their input/output schemas are compatible.

For more details, check out the official LangChain documentation:

Project: Configurable LLM Pipeline

In this project, we’ll build a simple conversational LCEL pipeline that can be configured to use either an OpenAI model or a local Ollama model (e.g., llama2). This demonstrates how easily you can switch between different LLM providers without altering your core LCEL chain structure.

import os
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# --- Configuration ---
# Set to 'openai' or 'ollama' to choose your LLM
# You can set this in your .env file: LLM_PROVIDER=ollama
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai").lower() # Default to openai if not set

# --- Initialize LLM based on configuration ---
llm = None
if LLM_PROVIDER == "openai":
    if not os.getenv("OPENAI_API_KEY"):
        raise ValueError("OPENAI_API_KEY not set. Please set it for OpenAI provider.")
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
    print("Using OpenAI GPT-3.5-Turbo.")
elif LLM_PROVIDER == "ollama":
    try:
        # Ensure Ollama server is running and model is pulled (e.g., ollama pull llama2)
        llm = ChatOllama(model="llama2", temperature=0.7)
        # Test connection by making a small call (optional, but good for debugging)
        llm.invoke("Hello!")
        print("Using local Ollama Llama 2.")
    except Exception as e:
        print(f"Error connecting to Ollama or model 'llama2' not found: {e}")
        print("Please ensure:")
        print("1. Ollama is installed and running (`ollama serve`).")
        print("2. The model 'llama2' is pulled (`ollama pull llama2`).")
        print("Exiting...")
        exit()
else:
    raise ValueError(f"Invalid LLM_PROVIDER: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")

# --- Define the Prompt Template ---
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise AI assistant. Respond briefly and to the point."),
    ("user", "{question}")
])

# --- Define the Output Parser ---
output_parser = StrOutputParser()

# --- Construct the LCEL Chain ---
# The chain structure remains identical regardless of the LLM provider
chain = prompt | llm | output_parser

# --- Invoke the Chain with questions ---
questions = [
    "What is the capital of Japan?",
    "Explain quantum entanglement in one sentence.",
    "What's a creative use case for LLMs?"
]

print("\n--- Conversational Responses ---")
for q in questions:
    response = chain.invoke({"question": q})
    print(f"Q: {q}")
    print(f"A: {response}\n")

# Example of changing model parameters (e.g., temperature) if desired
# For a specific invocation, you can use .with_config
print("\n--- Example with changed temperature for one invocation ---")
high_temp_response = chain.with_config(run_name="high_temp_query").invoke({"question": "Tell me a very creative and imaginative story idea in 2-3 sentences."}, config={"run_config": {"llm_config": {"temperature": 1.5}}})
print(f"Q: Tell me a very creative and imaginative story idea in 2-3 sentences.")
print(f"A: {high_temp_response}\n")

Code Explanation:

LLM_PROVIDER Configuration: We introduce a simple environment variable LLM_PROVIDER to switch between openai and ollama. This makes your script flexible without code changes.
Conditional LLM Initialization: Based on LLM_PROVIDER, we instantiate either ChatOpenAI or ChatOllama. Notice they both implement the same ChatModel interface, making them interchangeable in the pipeline.
Ollama Setup Validation: The code now includes a more robust try-except block specifically for ChatOllama to catch common setup issues (Ollama server not running, model not pulled) and provides clear instructions.
LCEL Chain Simplicity: The chain = prompt | llm | output_parser line remains unchanged. This highlights the power of LCEL – the underlying llm object can be swapped out, but the pipeline’s structure stays consistent.
.with_config(): Demonstrated how to temporarily override model parameters like temperature for a single invocation. This is useful for dynamic adjustments without re-instantiating the LLM.

This project reinforces how LCEL enables flexible and modular design, allowing you to easily manage and switch your LLM providers.

ML Vector