Welcome to Day 13 of #30DaysOfLangChain! We’ve made great strides in building dynamic agents with LangGraph, but what happens when an LLM hallucinates, a tool fails, or an external API returns an unexpected response? In the real world, errors are inevitable. Today, we’ll equip ourselves with strategies for Error Handling and Debugging to make our LangGraph applications more resilient and reliable.
Why Error Handling Matters in LLM Applications
LLM-powered applications, especially those involving tools and complex workflows like LangGraph, introduce unique challenges:
- Non-deterministic LLMs: LLMs are creative and can sometimes output unexpected formats, leading to parsing errors.
- External Dependencies: Tools often rely on APIs, databases, or web services that can fail due to network issues, rate limits, or incorrect inputs.
- State Management: In complex graphs, inconsistencies in the shared
GraphStatecan lead to downstream failures. - User Experience: Unhandled errors result in crashes or poor user experiences, making your application seem unreliable.
Robust error handling allows your application to gracefully recover, inform the user, log issues, or attempt alternative paths.
Debugging Strategies in LangGraph
Before we handle errors, we need to find them! LangGraph provides several powerful debugging aids:
verbose=True: As we’ve seen, settingverbose=Truewhen compiling and invoking yourappis invaluable. It prints out every node execution, every state transition, and every conditional decision, giving you a detailed trace of the graph’s execution path.return_intermediate_steps=True: When invoking your compiled graph, you can setreturn_intermediate_steps=True. This will include a list of all intermediateToolMessageandAIMessageobjects (representing tool calls and LLM outputs, including thoughts) in the final output dictionary. This provides a programmatic way to inspect the full reasoning trace.- Graph Visualization: LangGraph supports visualizing your graph using Graphviz. This helps you understand complex flows and identify unintended paths. While setting it up can sometimes be tricky (requires
graphvizlibrary and system installation), it’s highly recommended for intricate graphs.
- Logging & Print Statements: Don’t underestimate simple
print()statements or Python’sloggingmodule within your nodes. They provide real-time insights into what’s happening at specific points in your graph.
Implementing Robust Error Handling
The core idea for error handling in LangGraph is to combine Python’s try-except blocks with LangGraph’s conditional routing:
- Wrap Node Logic in
try-except: Any code within your nodes that might fail (e.g., tool calls, API requests, complex parsing) should be wrapped intry-exceptblocks. - Update
GraphStateon Error: If an exception occurs, catch it and update yourGraphStateto reflect the error. This could be by setting anerror_flag(e.g.,error: Annotated[bool, operator.or_]), adding anerror_message, or modifying a status. - Conditional Routing to an Error Handler: Define a dedicated “error handler” node. Use a conditional edge (with a router function) to divert the flow to this error handler if the
error_flag(or similar state variable) is set. This keeps your main workflow clean and ensures errors are addressed. - The Error Handler Node: This node can perform various actions: log the error, send an alert, provide a generic fallback message to the user, or even attempt a retry.
For more details, check out the official LangGraph documentation (while specific error handling tutorials are evolving, the core principles of conditional routing apply):
Project: Agent with Error Handling & Debugging Features
We’ll modify our autonomous agent to include intentional error scenarios and robust handling for them. Our calculator tool will sometimes fail, and we’ll route the graph to an error-handling node when this happens.
Before you run the code:
- Ensure all dependencies from Day 12 are installed.
- We’ll introduce a new state variable
error_occurred: Annotated[bool, operator.or_]to signal an error.
import os
import random
from typing import TypedDict, Annotated, List, Union
from datetime import datetime
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage, ToolMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain import tool
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
import operator # For Annotated[bool, operator.or_]
# Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()
# --- Configuration ---
LLM_PROVIDER = os.getenv("LLM_PROVIDER", "openai").lower()
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower()
# --- Step 1: Define Agent State with Error Flag ---
class AgentState(TypedDict):
"""
Represents the state of our agent's graph.
- messages: A list of messages forming the conversation history.
- error_occurred: A flag to indicate if an error happened in a node.
operator.or_ combines boolean updates (True if any update is True).
"""
messages: Annotated[List[BaseMessage], add_messages]
error_occurred: Annotated[bool, operator.or_] # New error flag
# --- Step 2: Define Custom Tools (with a potential for error) ---
@tool
def get_current_datetime() -> str:
"""Returns the current date and time in a human-readable format."""
print("\n--- Tool Action: Executing get_current_datetime ---")
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
@tool
def risky_calculator(expression: str) -> str:
"""
Evaluates a simple mathematical expression.
This tool has a 30% chance of failing to demonstrate error handling.
Example: '2 + 2', '(5 * 3) / 2'
"""
print(f"\n--- Tool Action: Executing risky_calculator on '{expression}' ---")
if random.random() < 0.3: # 30% chance of failure
raise ValueError("Simulated calculation error!")
try:
return str(eval(expression))
except Exception as e:
# Catch explicit evaluation errors as well
raise ValueError(f"Calculation expression error: {e}")
tools = [get_current_datetime, risky_calculator]
print(f"Available tools: {[tool.name for tool in tools]}\n")
# --- Step 3: Initialize LLM ---
def initialize_llm(provider, model_name=None, temp=0.7):
if provider == "openai":
if not os.getenv("OPENAI_API_KEY"):
raise ValueError("OPENAI_API_KEY not set for OpenAI provider.")
return ChatOpenAI(model=model_name or "gpt-3.5-turbo", temperature=temp).bind_tools(tools)
elif provider == "ollama":
try:
llm = ChatOllama(model=model_name or OLLAMA_MODEL_CHAT, temperature=temp)
llm.invoke("Hello!") # Test connection
return llm
except Exception as e:
print(f"Error connecting to Ollama LLM or model '{model_name or OLLAMA_MODEL_CHAT}' not found: {e}")
print("Please ensure Ollama is running and the specified model is pulled.")
exit()
else:
raise ValueError(f"Invalid LLM provider: {provider}. Must be 'openai' or 'ollama'.")
llm = initialize_llm(LLM_PROVIDER)
print(f"Using LLM: {LLM_PROVIDER} ({llm.model_name if hasattr(llm, 'model_name') else OLLAMA_MODEL_CHAT})\n")
# --- Step 4: Define Graph Nodes ---
def call_llm_node(state: AgentState) -> AgentState:
"""Node to call the LLM and get its response."""
print("--- Node: call_llm_node ---")
messages = state['messages']
# Reset error flag at the start of LLM call (new thought cycle)
# This is important if we want to potentially recover or retry in a loop
return_state = {"error_occurred": False}
if LLM_PROVIDER == "ollama":
tool_names = ", ".join([t.name for t in tools])
tool_descriptions = "\n".join([f"Tool Name: {t.name}\nTool Description: {t.description}\nTool Schema: {t.args_schema.schema() if t.args_schema else 'No schema'}" for t in tools])
system_message = (
"You are a helpful AI assistant. You have access to the following tools: "
f"{tool_names}.\n\n"
f"Here are their descriptions and schemas:\n{tool_descriptions}\n\n"
"If you need to use a tool, respond with a JSON object like: "
"```json\n{{\"tool_name\": \"<tool_name>\", \"tool_input\": {{...}}}}\n```. "
"Think step by step. If a tool is useful, call it. Otherwise, provide a direct answer. "
"Your response should be either a tool call JSON or a direct final answer."
)
prompt = ChatPromptTemplate.from_messages([
("system", system_message),
*messages
])
response = llm.invoke(prompt)
else:
response = llm.invoke(messages)
return_state["messages"] = [response]
return return_state
def call_tool_node(state: AgentState) -> AgentState:
"""Node to execute a tool, with error handling."""
print("--- Node: call_tool_node ---")
messages = state['messages']
last_message = messages[-1]
tool_outputs = []
error_flag = False
if last_message.tool_calls:
for tool_call in last_message.tool_calls:
tool_name = tool_call.name
tool_input = tool_call.args
print(f"Attempting to execute tool: {tool_name} with input: {tool_input}")
selected_tool = next((t for t in tools if t.name == tool_name), None)
if selected_tool:
try:
output = selected_tool.invoke(tool_input)
tool_outputs.append(ToolMessage(content=str(output), tool_call_id=tool_call.id))
except Exception as e:
print(f"!!! Error executing {tool_name}: {e} !!!")
tool_outputs.append(ToolMessage(content=f"Error: {e}", tool_call_id=tool_call.id))
error_flag = True # Set error flag on tool failure
else:
print(f"!!! Error: Tool '{tool_name}' not found. !!!")
tool_outputs.append(ToolMessage(content=f"Tool '{tool_name}' not found.", tool_call_id=tool_call.id))
error_flag = True # Set error flag
elif LLM_PROVIDER == "ollama" and isinstance(last_message.content, str):
import json
try:
json_str = last_message.content[last_message.content.find('{'):last_message.content.rfind('}')+1]
tool_call_data = json.loads(json_str)
tool_name = tool_call_data.get("tool_name")
tool_input = tool_call_data.get("tool_input", {})
print(f"Attempting to execute Ollama-parsed tool: {tool_name} with input: {tool_input}")
selected_tool = next((t for t in tools if t.name == tool_name), None)
if selected_tool:
output = selected_tool.invoke(tool_input)
tool_outputs.append(AIMessage(content=f"Tool output for {tool_name}: {output}"))
else:
print(f"!!! Error: Ollama-parsed Tool '{tool_name}' not found. !!!")
tool_outputs.append(AIMessage(content=f"Tool '{tool_name}' not found for Ollama.", tool_call_id=None)) # tool_call_id None for AI message
error_flag = True
except (json.JSONDecodeError, StopIteration, ValueError) as e:
print(f"!!! Error parsing Ollama tool call or executing: {e} !!!")
tool_outputs.append(AIMessage(content=f"Error parsing or executing Ollama tool: {e}"))
error_flag = True
else:
print("No tool calls detected or parsed for execution.")
error_flag = True # Treat as error if call_tool was reached but no tool call found
return {"messages": tool_outputs, "error_occurred": error_flag}
def error_handler_node(state: AgentState) -> AgentState:
"""
Node to handle errors. It logs the error and provides a fallback message.
"""
print("\n--- Node: error_handler_node (Error Detected!) ---")
error_message = "An unexpected error occurred during processing. Please try again or rephrase your request."
# Optionally, you could log the full state or specific error details here
final_messages = state['messages'] + [AIMessage(content=error_message)]
print(f"Error handled. Final message: {error_message}")
return {"messages": final_messages, "error_occurred": False} # Reset error flag
# --- Step 5: Define the Routing/Decider Function ---
def route_decision(state: AgentState) -> str:
"""
Decides the next step based on error flag or LLM's last message.
"""
print("--- Decider: route_decision ---")
if state.get("error_occurred", False):
print("Decision: Error detected, routing to error_handler_node.")
return "handle_error"
last_message = state['messages'][-1]
if last_message.tool_calls:
print("Decision: LLM wants to call a tool.")
return "tool_call"
if LLM_PROVIDER == "ollama" and isinstance(last_message.content, str):
import json
try:
json_str = last_message.content[last_message.content.find('{'):last_message.content.rfind('}')+1]
json.loads(json_str)
print("Decision: Ollama LLM seems to want to call a tool (parsed JSON).")
return "tool_call"
except json.JSONDecodeError:
print("Decision: Ollama LLM content looks like text (no tool call JSON).")
return "end"
print("Decision: LLM has a final answer or no tool needed.")
return "end"
# --- Step 6: Build the LangGraph with Error Handling ---
print("--- Building the LangGraph with Error Handling & Debugging ---")
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("llm", call_llm_node)
workflow.add_node("tool", call_tool_node)
workflow.add_node("error_handler", error_handler_node) # New error handler node
# Set entry point
workflow.set_entry_point("llm")
# Add conditional edge from 'llm' node (can route to tool or end)
workflow.add_conditional_edges(
"llm",
route_decision,
{
"tool_call": "tool",
"end": END,
"handle_error": "error_handler" # If an error somehow occurs before tool
}
)
# Add conditional edge from 'tool' node (after tool execution)
# Crucially, after a tool, we check for errors first, then loop back to LLM.
workflow.add_conditional_edges(
"tool",
route_decision,
{
"tool_call": "tool", # This path shouldn't typically be taken directly after a tool
"end": "llm", # If no error and no more tool calls, loop back to LLM to finalize
"handle_error": "error_handler"
}
)
# After handling an error, we want to terminate the graph for this query,
# or you could try a retry mechanism (more advanced)
workflow.add_edge("error_handler", END)
# Compile the graph
# return_intermediate_steps=True will give us the full trace in the output
app = workflow.compile(
# debug=True # LangGraph also has a debug option for more verbose internal logging
)
print("Agent graph with error handling compiled successfully.\n")
# --- Step 7: Invoke the Agent ---
print("--- Invoking the Agent (Verbose output below) ---")
agent_questions = [
"What is the current date and time?", # Expected to succeed
"Calculate 10 / 0.", # Expected to fail (div by zero in calculator)
"Calculate 5 * 5.", # Expected to sometimes fail (simulated error)
"Tell me a joke.", # Expected to succeed (no tool needed)
]
for i, question in enumerate(agent_questions):
print(f"\n===== Agent Turn {i+1} =====")
print(f"User Question: {question}")
initial_input = {"messages": [HumanMessage(content=question)], "error_occurred": False} # Initialize error flag to False
try:
# Use return_intermediate_steps=True to inspect the full trace programmatically
final_result = app.invoke(initial_input, config={"return_intermediate_steps": True})
# Access the final state and intermediate steps
final_messages = final_result['messages']
intermediate_steps = final_result.get('intermediate_steps', [])
print(f"\nAgent Final Answer: {final_messages[-1].content}")
# print("\n--- Intermediate Steps (for debugging) ---")
# for step in intermediate_steps:
# print(step) # Print the raw step objects if you want
# print("------------------------------------------")
except Exception as e:
print(f"Global Agent Executor encountered an unexpected error: {e}")
print("\n" + "="*80 + "\n")
# You can still use the graph visualization:
# from IPython.display import Image, display
# try:
# display(Image(app.get_graph().draw_mermaid_png()))
# print("Graph visualization generated (if graphviz is installed and path configured).")
# except Exception as e:
# print(f"Could not generate graph visualization: {e}")
# print("Ensure `pip install pygraphviz graphviz` and Graphviz binaries are in PATH.")
Code Explanation:
AgentStatewitherror_occurred: We addederror_occurred: Annotated[bool, operator.or_]to ourGraphState.operator.or_ensures that iferror_occurredis set toTrueby any node, it remainsTrueacross state updates for that step. We initialize it toFalsefor each new query.risky_calculatorTool: This tool now has arandom.random() < 0.3chance toraise ValueError("Simulated calculation error!"), allowing us to reliably test our error handling.call_llm_nodeUpdate: We reseterror_occurredtoFalseat the beginning of this node’s execution. This is important for new turns or retries, ensuring the error flag doesn’t persist from a previous, unrelated failure.call_tool_nodewithtry-except:- The crucial
try-exceptblock is wrapped around theselected_tool.invoke(tool_input). - If an
Exceptionoccurs during tool execution, we print an error message, append aToolMessageindicating the error, and, most importantly, seterror_flag = True. Thiserror_flagis then returned as part of the state update.
- The crucial
error_handler_node(New Node):- This is a dedicated node that the graph will route to if an error is detected.
- It prints a message, generates a user-friendly error response (
AIMessage), and then resetserror_occurredtoFalse(important if you want to allow new queries without a persistent error state).
route_decision(Updated):- Now, the very first check in
route_decisionisif state.get("error_occurred", False): return "handle_error". This prioritizes routing to the error handler if a previous node (likecall_tool_node) reported an error. - The rest of the logic remains the same for routing to
toolorend.
- Now, the very first check in
- Graph Edges with Error Routing:
workflow.add_conditional_edges("llm", route_decision, {... "handle_error": "error_handler" ...}): If an error were somehow detected immediately after an LLM call (e.g., if LLM parsing failed or if thecall_llm_nodeitself had an internal issue), it would route toerror_handler.workflow.add_conditional_edges("tool", route_decision, {... "handle_error": "error_handler", "end": "llm" ...}): This is the key error handling logic. Aftercall_tool_noderuns, ifroute_decisiondetectserror_occurredisTruein the state, it will route toerror_handler. Otherwise, if no error and no more tools needed, it routes back tollm(the normal loop path).workflow.add_edge("error_handler", END): After an error is handled, we terminate the graph for that query. For more advanced setups, you might route back to the LLM for a retry.
app.invoke(..., config={"return_intermediate_steps": True}): This configuration allows you to programmatically inspect all the steps taken by the agent, which is invaluable for debugging complex flows, especially when errors occur.
This project significantly enhances the robustness of our LangGraph agent, allowing it to gracefully respond to errors and provide a better user experience.

Leave a comment