Welcome to Day 24 of #30DaysOfLangChain – LangChain 0.3 Edition! Over the past few days, we’ve built interactive UIs with Streamlit. While great for demos and internal tools, many real-world AI applications need to integrate with existing systems, mobile apps, or other services. This is where APIs (Application Programming Interfaces) come in.

Today, we’ll shift our focus to the backend, learning how to expose our LangChain applications as RESTful APIs using FastAPI. This allows external clients to interact with our AI models programmatically, opening up a world of integration possibilities.

The Need for APIs in Generative AI

Why build an API for your LangChain application?

  • Decoupling: Separate your AI logic (backend) from your user interface (frontend). This allows different teams to work independently and choose different technologies for each part.
  • Scalability: APIs can be deployed on robust servers, scaled independently, and handle multiple concurrent requests, unlike a single Streamlit app instance.
  • Integration: Easily integrate your AI capabilities into mobile apps, complex web dashboards, internal business systems, or even other microservices.
  • Accessibility: Provide a programmatic interface for developers to build on top of your AI functionality.

FastAPI and Uvicorn: The Dynamic Duo

FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.8+ based on standard Python type hints. It’s known for:

  • Incredible Speed: Comparable to Node.js and Go, thanks to Starlette for the web parts and Pydantic for data handling.
  • Automatic Docs: Generates interactive API documentation (Swagger UI / OpenAPI) automatically from your code.
  • Type Hinting: Leverages Python type hints for data validation, serialization, and autocompletion.

Uvicorn is an ASGI (Asynchronous Server Gateway Interface) server that FastAPI applications run on. It’s lightning-fast and designed for asynchronous Python applications. Think of it as the engine that runs your FastAPI car.

Maintaining Conversation History with RunnableWithMessageHistory

One of the biggest challenges when building conversational AI as an API is that HTTP is stateless. Each request is independent, meaning the server doesn’t inherently remember past interactions. If you’re building a chatbot, this is a problem!

RunnableWithMessageHistory from LangChain solves this elegantly. It’s a runnable wrapper that:

  • Takes a base LangChain runnable (your LLM chain).
  • Manages a BaseChatMessageHistory instance (like InMemoryChatMessageHistory for simple cases or PostgresChatMessageHistory for production).
  • Uses a unique session_id (provided by the client in each request) to retrieve and update the correct conversation history.
  • Feeds the relevant history into your LLM chain, ensuring your bot maintains context.

For this project, we’ll use InMemoryChatMessageHistory for simplicity. In a production setting, you’d typically replace this with a persistent store (e.g., database, Redis).

Project: A Basic LangChain Chat API with FastAPI

Today’s project will demonstrate how to:

  1. Set up a FastAPI application.
  2. Define Pydantic models for incoming chat requests and outgoing responses.
  3. Integrate a simple LangChain ChatModel.
  4. Wrap the LLM in RunnableWithMessageHistory to manage per-session chat history.
  5. Create a POST endpoint (/chat) that accepts a message and a session ID, and returns the AI’s response.

Before you run the code:

  • Install necessary libraries: pip install fastapi uvicorn "langchain-openai" "langchain-ollama" "langchain-community" python-dotenv
  • Set your OPENAI_API_KEY environment variable if using OpenAI.
  • If using Ollama, ensure it’s running and you’ve pulled your desired chat model (e.g., ollama pull llama2).
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn
import os
from typing import Dict, Any, Optional

from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import InMemoryChatMessageHistory
from langchain_core.output_parsers import StrOutputParser

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# --- Configuration for LLM ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai").lower() # 'openai' or 'ollama'
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower() # e.g., 'llama2', 'mistral'
OPENAI_MODEL_CHAT = os.getenv("OPENAI_MODEL_CHAT", "gpt-3.5-turbo") # e.g., 'gpt-4o', 'gpt-3.5-turbo'

# --- Initialize LLM ---
def get_llm():
    """Initializes and returns the ChatLargeLanguageModel based on provider."""
    if LLM_PROVIDER == "openai":
        if not os.getenv("OPENAI_API_KEY"):
            raise ValueError("OPENAI_API_KEY not set for OpenAI provider. Please set it.")
        return ChatOpenAI(model=OPENAI_MODEL_CHAT, temperature=0.7)
    elif LLM_PROVIDER == "ollama":
        try:
            llm_instance = ChatOllama(model=OLLAMA_MODEL_CHAT, temperature=0.7)
            # Test connection (optional but good practice)
            llm_instance.invoke("test", config={"stream": False})
            return llm_instance
        except Exception as e:
            raise RuntimeError(f"Error connecting to Ollama LLM '{OLLAMA_MODEL_CHAT}' or model not found: {e}. "
                               f"Please ensure Ollama is running and you have pulled the model: `ollama pull {OLLAMA_MODEL_CHAT}`") from e
    else:
        raise ValueError(f"Invalid LLM provider: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")

llm = get_llm()

# --- In-memory store for chat histories ---
# In a real application, this would be a persistent database (e.g., Redis, Postgres)
store: Dict[str, InMemoryChatMessageHistory] = {}

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    """Returns a new BaseChatMessageHistory instance for a given session ID."""
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# --- LangChain Runnable with Message History ---
# Define the prompt template with a placeholder for messages
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful AI assistant. Answer user questions concisely.",
        ),
        MessagesPlaceholder(variable_name="history"), # Placeholder for chat history
        ("human", "{input}"), # User's current input
    ]
)

# Create the base chain: prompt -> LLM -> output parser
chain = prompt | llm | StrOutputParser()

# Wrap the chain with RunnableWithMessageHistory
# `get_session_history` is a function that returns the history object for a given session_id
# `input_messages_key` tells LangChain which key in the input dictionary corresponds to the new message
# `history_messages_key` tells LangChain which key in the prompt expects the history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# --- FastAPI App Setup ---
app = FastAPI(
    title="LangChain Chatbot API",
    description="A simple FastAPI endpoint for a LangChain chat bot with conversational memory.",
    version="0.1.0",
)

# --- Pydantic Models for Request and Response ---
class ChatRequest(BaseModel):
    """Request schema for the chat endpoint."""
    session_id: str
    message: str

class ChatResponse(BaseModel):
    """Response schema for the chat endpoint."""
    session_id: str
    response: str
    message_count: int # For demonstration, show how many messages in history

# --- API Endpoint ---
@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
    """
    Handles chat messages, maintains conversation history, and returns AI response.
    """
    try:
        # Invoke the chain with the current input and session configuration
        # The session_id from the request is used by `get_session_history`
        response = await chain_with_history.ainvoke(
            {"input": request.message},
            config={"configurable": {"session_id": request.session_id}}
        )
        
        # Retrieve the updated message count for the session
        current_history = get_session_history(request.session_id)
        message_count = len(current_history.messages)

        return ChatResponse(
            session_id=request.session_id,
            response=response,
            message_count=message_count
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health_check():
    """Health check endpoint."""
    return {"status": "ok", "llm_provider": LLM_PROVIDER}

# --- How to run this app ---
# To run this file directly (for development with auto-reload):
# uvicorn day24-fastapi-chat:app --reload --host 0.0.0.0 --port 8000
#
# Open your browser to http://localhost:8000/docs for interactive API documentation.
# Test with a tool like curl:
# curl -X POST "http://localhost:8000/chat" -H "Content-Type: application/json" -d '{"session_id": "test_session_123", "message": "Hi there!"}'
# curl -X POST "http://localhost:8000/chat" -H "Content-Type: application/json" -d '{"session_id": "test_session_123", "message": "What did I just ask you?"}'

Code Explanation & Key Takeaways:

  1. Dependencies: We added fastapi and uvicorn. langchain-community provides InMemoryChatMessageHistory.
  2. LLM Initialization (get_llm()):
    • Similar to previous days, but now it raises HTTPException or RuntimeError on failure, which FastAPI can gracefully handle and return as an error response.
    • Uses os.getenv for flexible configuration of LLM provider and models.
  3. In-Memory Chat History (store, get_session_history):
    • store: A global dictionary store: Dict[str, InMemoryChatMessageHistory] acts as our simple in-memory database to hold chat histories, keyed by session_id.
    • get_session_history(session_id: str): This function is crucial. It’s passed to RunnableWithMessageHistory and is responsible for getting or creating the chat history object for a given session_id.
  4. LangChain RunnableWithMessageHistory:
    • Prompt: The ChatPromptTemplate now includes MessagesPlaceholder(variable_name="history"). This is where RunnableWithMessageHistory will inject the entire conversation history.
    • Base Chain: prompt | llm | StrOutputParser() is our core LLM call.
    • Wrapper: RunnableWithMessageHistory(chain, get_session_history, ...) wraps our base chain.
      • input_messages_key="input": Specifies that the new user message will be under the key “input” in the dictionary passed to chain_with_history.invoke().
      • history_messages_key="history": Specifies that the chat history should be injected into the prompt under the variable name “history”.
  5. FastAPI App (FastAPI(), BaseModel):
    • app = FastAPI(...): Instantiates our FastAPI application with some metadata.
    • Pydantic Models: ChatRequest and ChatResponse define the expected structure for incoming JSON requests and outgoing JSON responses. FastAPI uses these for automatic data validation and serialization/deserialization.
  6. API Endpoint (@app.post("/chat")):
    • @app.post("/chat"): Defines a POST endpoint at the /chat path.
    • async def chat_endpoint(request: ChatRequest): The function is async (FastAPI is asynchronous by default) and takes request as an argument, which is automatically validated against our ChatRequest Pydantic model.
    • await chain_with_history.ainvoke(...): The asynchronous invocation of our LangChain chain_with_history.
      • {"input": request.message}: Provides the current user message.
      • config={"configurable": {"session_id": request.session_id}}: This is how RunnableWithMessageHistory receives the session_id to know which conversation history to use.
    • The response is formatted using the ChatResponse model.
    • Error Handling: A try-except block catches potential errors and returns them as a 500 HTTP error.
  7. Running the App:
    • The comments at the end of the file provide clear instructions on how to run the application using uvicorn.
    • uvicorn day24-fastapi-chat:app --reload --host 0.0.0.0 --port 8000 is the standard command.
    • Accessing http://localhost:8000/docs in your browser will show the interactive OpenAPI (Swagger UI) documentation, where you can test the endpoint directly.
    • curl examples are also provided for command-line testing.

This project lays the groundwork for building robust, production-ready LangChain backends that can be consumed by any client application, effectively decoupling your AI logic from your frontend presentation.


Key Takeaway

Day 24 was a pivotal step, moving our LangChain applications from local scripts/UIs to scalable, exposed RESTful APIs with FastAPI. We learned how FastAPI, powered by Uvicorn, enables high-performance Python backends. Most importantly, we mastered RunnableWithMessageHistory to seamlessly manage conversational state across stateless HTTP requests, ensuring our AI bots can remember past interactions for each user. This sets the stage for integrating GenAI into diverse applications!

Leave a comment

I’m Arpan

I’m a Software Engineer driven by curiosity and a deep interest in Generative AI Technologies. I believe we’re standing at the frontier of a new era—where machines not only learn but create, and I’m excited to explore what’s possible at this intersection of intelligence and imagination.

When I’m not writing code or experimenting with new AI models, you’ll probably find me travelling, soaking in new cultures, or reading a book that challenges how I think. I thrive on new ideas—especially ones that can be turned into meaningful, impactful projects. If it’s bold, innovative, and GenAI-related, I’m all in.

“The future belongs to those who believe in the beauty of their dreams.”Eleanor Roosevelt

“Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world.”Albert Einstein

This blog, MLVector, is my space to share technical insights, project breakdowns, and explorations in GenAI—from the models shaping tomorrow to the code powering today.

Let’s build the future, one vector at a time.

Let’s connect