Welcome to Day 24 of #30DaysOfLangChain – LangChain 0.3 Edition! Over the past few days, we’ve built interactive UIs with Streamlit. While great for demos and internal tools, many real-world AI applications need to integrate with existing systems, mobile apps, or other services. This is where APIs (Application Programming Interfaces) come in.
Today, we’ll shift our focus to the backend, learning how to expose our LangChain applications as RESTful APIs using FastAPI. This allows external clients to interact with our AI models programmatically, opening up a world of integration possibilities.
The Need for APIs in Generative AI
Why build an API for your LangChain application?
- Decoupling: Separate your AI logic (backend) from your user interface (frontend). This allows different teams to work independently and choose different technologies for each part.
- Scalability: APIs can be deployed on robust servers, scaled independently, and handle multiple concurrent requests, unlike a single Streamlit app instance.
- Integration: Easily integrate your AI capabilities into mobile apps, complex web dashboards, internal business systems, or even other microservices.
- Accessibility: Provide a programmatic interface for developers to build on top of your AI functionality.
FastAPI and Uvicorn: The Dynamic Duo
FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.8+ based on standard Python type hints. It’s known for:
- Incredible Speed: Comparable to Node.js and Go, thanks to Starlette for the web parts and Pydantic for data handling.
- Automatic Docs: Generates interactive API documentation (Swagger UI / OpenAPI) automatically from your code.
- Type Hinting: Leverages Python type hints for data validation, serialization, and autocompletion.
Uvicorn is an ASGI (Asynchronous Server Gateway Interface) server that FastAPI applications run on. It’s lightning-fast and designed for asynchronous Python applications. Think of it as the engine that runs your FastAPI car.
Maintaining Conversation History with RunnableWithMessageHistory
One of the biggest challenges when building conversational AI as an API is that HTTP is stateless. Each request is independent, meaning the server doesn’t inherently remember past interactions. If you’re building a chatbot, this is a problem!
RunnableWithMessageHistory from LangChain solves this elegantly. It’s a runnable wrapper that:
- Takes a base LangChain runnable (your LLM chain).
- Manages a
BaseChatMessageHistoryinstance (likeInMemoryChatMessageHistoryfor simple cases orPostgresChatMessageHistoryfor production). - Uses a unique
session_id(provided by the client in each request) to retrieve and update the correct conversation history. - Feeds the relevant history into your LLM chain, ensuring your bot maintains context.
For this project, we’ll use InMemoryChatMessageHistory for simplicity. In a production setting, you’d typically replace this with a persistent store (e.g., database, Redis).
Project: A Basic LangChain Chat API with FastAPI
Today’s project will demonstrate how to:
- Set up a FastAPI application.
- Define Pydantic models for incoming chat requests and outgoing responses.
- Integrate a simple LangChain
ChatModel. - Wrap the LLM in
RunnableWithMessageHistoryto manage per-session chat history. - Create a
POSTendpoint (/chat) that accepts a message and a session ID, and returns the AI’s response.
Before you run the code:
- Install necessary libraries:
pip install fastapi uvicorn "langchain-openai" "langchain-ollama" "langchain-community" python-dotenv - Set your
OPENAI_API_KEYenvironment variable if using OpenAI. - If using Ollama, ensure it’s running and you’ve pulled your desired chat model (e.g.,
ollama pull llama2).
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn
import os
from typing import Dict, Any, Optional
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import InMemoryChatMessageHistory
from langchain_core.output_parsers import StrOutputParser
# Load environment variables
from dotenv import load_dotenv
load_dotenv()
# --- Configuration for LLM ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai").lower() # 'openai' or 'ollama'
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower() # e.g., 'llama2', 'mistral'
OPENAI_MODEL_CHAT = os.getenv("OPENAI_MODEL_CHAT", "gpt-3.5-turbo") # e.g., 'gpt-4o', 'gpt-3.5-turbo'
# --- Initialize LLM ---
def get_llm():
"""Initializes and returns the ChatLargeLanguageModel based on provider."""
if LLM_PROVIDER == "openai":
if not os.getenv("OPENAI_API_KEY"):
raise ValueError("OPENAI_API_KEY not set for OpenAI provider. Please set it.")
return ChatOpenAI(model=OPENAI_MODEL_CHAT, temperature=0.7)
elif LLM_PROVIDER == "ollama":
try:
llm_instance = ChatOllama(model=OLLAMA_MODEL_CHAT, temperature=0.7)
# Test connection (optional but good practice)
llm_instance.invoke("test", config={"stream": False})
return llm_instance
except Exception as e:
raise RuntimeError(f"Error connecting to Ollama LLM '{OLLAMA_MODEL_CHAT}' or model not found: {e}. "
f"Please ensure Ollama is running and you have pulled the model: `ollama pull {OLLAMA_MODEL_CHAT}`") from e
else:
raise ValueError(f"Invalid LLM provider: {LLM_PROVIDER}. Must be 'openai' or 'ollama'.")
llm = get_llm()
# --- In-memory store for chat histories ---
# In a real application, this would be a persistent database (e.g., Redis, Postgres)
store: Dict[str, InMemoryChatMessageHistory] = {}
def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
"""Returns a new BaseChatMessageHistory instance for a given session ID."""
if session_id not in store:
store[session_id] = InMemoryChatMessageHistory()
return store[session_id]
# --- LangChain Runnable with Message History ---
# Define the prompt template with a placeholder for messages
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful AI assistant. Answer user questions concisely.",
),
MessagesPlaceholder(variable_name="history"), # Placeholder for chat history
("human", "{input}"), # User's current input
]
)
# Create the base chain: prompt -> LLM -> output parser
chain = prompt | llm | StrOutputParser()
# Wrap the chain with RunnableWithMessageHistory
# `get_session_history` is a function that returns the history object for a given session_id
# `input_messages_key` tells LangChain which key in the input dictionary corresponds to the new message
# `history_messages_key` tells LangChain which key in the prompt expects the history
chain_with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
# --- FastAPI App Setup ---
app = FastAPI(
title="LangChain Chatbot API",
description="A simple FastAPI endpoint for a LangChain chat bot with conversational memory.",
version="0.1.0",
)
# --- Pydantic Models for Request and Response ---
class ChatRequest(BaseModel):
"""Request schema for the chat endpoint."""
session_id: str
message: str
class ChatResponse(BaseModel):
"""Response schema for the chat endpoint."""
session_id: str
response: str
message_count: int # For demonstration, show how many messages in history
# --- API Endpoint ---
@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
"""
Handles chat messages, maintains conversation history, and returns AI response.
"""
try:
# Invoke the chain with the current input and session configuration
# The session_id from the request is used by `get_session_history`
response = await chain_with_history.ainvoke(
{"input": request.message},
config={"configurable": {"session_id": request.session_id}}
)
# Retrieve the updated message count for the session
current_history = get_session_history(request.session_id)
message_count = len(current_history.messages)
return ChatResponse(
session_id=request.session_id,
response=response,
message_count=message_count
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health_check():
"""Health check endpoint."""
return {"status": "ok", "llm_provider": LLM_PROVIDER}
# --- How to run this app ---
# To run this file directly (for development with auto-reload):
# uvicorn day24-fastapi-chat:app --reload --host 0.0.0.0 --port 8000
#
# Open your browser to http://localhost:8000/docs for interactive API documentation.
# Test with a tool like curl:
# curl -X POST "http://localhost:8000/chat" -H "Content-Type: application/json" -d '{"session_id": "test_session_123", "message": "Hi there!"}'
# curl -X POST "http://localhost:8000/chat" -H "Content-Type: application/json" -d '{"session_id": "test_session_123", "message": "What did I just ask you?"}'
Code Explanation & Key Takeaways:
- Dependencies: We added
fastapianduvicorn.langchain-communityprovidesInMemoryChatMessageHistory. - LLM Initialization (
get_llm()):- Similar to previous days, but now it raises
HTTPExceptionorRuntimeErroron failure, which FastAPI can gracefully handle and return as an error response. - Uses
os.getenvfor flexible configuration of LLM provider and models.
- Similar to previous days, but now it raises
- In-Memory Chat History (
store,get_session_history):store: A global dictionarystore: Dict[str, InMemoryChatMessageHistory]acts as our simple in-memory database to hold chat histories, keyed bysession_id.get_session_history(session_id: str): This function is crucial. It’s passed toRunnableWithMessageHistoryand is responsible for getting or creating the chat history object for a givensession_id.
- LangChain
RunnableWithMessageHistory:- Prompt: The
ChatPromptTemplatenow includesMessagesPlaceholder(variable_name="history"). This is whereRunnableWithMessageHistorywill inject the entire conversation history. - Base Chain:
prompt | llm | StrOutputParser()is our core LLM call. - Wrapper:
RunnableWithMessageHistory(chain, get_session_history, ...)wraps our base chain.input_messages_key="input": Specifies that the new user message will be under the key “input” in the dictionary passed tochain_with_history.invoke().history_messages_key="history": Specifies that the chat history should be injected into the prompt under the variable name “history”.
- Prompt: The
- FastAPI App (
FastAPI(),BaseModel):app = FastAPI(...): Instantiates our FastAPI application with some metadata.Pydantic Models:ChatRequestandChatResponsedefine the expected structure for incoming JSON requests and outgoing JSON responses. FastAPI uses these for automatic data validation and serialization/deserialization.
- API Endpoint (
@app.post("/chat")):@app.post("/chat"): Defines aPOSTendpoint at the/chatpath.async def chat_endpoint(request: ChatRequest): The function isasync(FastAPI is asynchronous by default) and takesrequestas an argument, which is automatically validated against ourChatRequestPydantic model.await chain_with_history.ainvoke(...): The asynchronous invocation of our LangChainchain_with_history.{"input": request.message}: Provides the current user message.config={"configurable": {"session_id": request.session_id}}: This is howRunnableWithMessageHistoryreceives thesession_idto know which conversation history to use.
- The response is formatted using the
ChatResponsemodel. - Error Handling: A
try-exceptblock catches potential errors and returns them as a 500 HTTP error.
- Running the App:
- The comments at the end of the file provide clear instructions on how to run the application using
uvicorn. uvicorn day24-fastapi-chat:app --reload --host 0.0.0.0 --port 8000is the standard command.- Accessing
http://localhost:8000/docsin your browser will show the interactive OpenAPI (Swagger UI) documentation, where you can test the endpoint directly. curlexamples are also provided for command-line testing.
- The comments at the end of the file provide clear instructions on how to run the application using
This project lays the groundwork for building robust, production-ready LangChain backends that can be consumed by any client application, effectively decoupling your AI logic from your frontend presentation.
Key Takeaway
Day 24 was a pivotal step, moving our LangChain applications from local scripts/UIs to scalable, exposed RESTful APIs with FastAPI. We learned how FastAPI, powered by Uvicorn, enables high-performance Python backends. Most importantly, we mastered RunnableWithMessageHistory to seamlessly manage conversational state across stateless HTTP requests, ensuring our AI bots can remember past interactions for each user. This sets the stage for integrating GenAI into diverse applications!

Leave a comment