Welcome to Day 7 of #30DaysOfLangChain! We’ve covered the fundamentals of LCEL, integrated various LLMs, and even built a basic RAG chain. Today, we’re going to elevate our LCEL mastery by exploring advanced patterns that make your applications more resilient, efficient, and versatile: Parallelism and Fallbacks.

These patterns are key to moving beyond simple sequential chains and building production-grade solutions that can handle real-world challenges like API rate limits, slow responses, or the need for diverse outputs.

1. Parallelism with RunnableParallel: Doing More at Once

Imagine you need to perform multiple tasks based on a single input, and these tasks don’t depend on each other’s output. Running them sequentially would be slow. This is where parallelism shines.

  • What is it? RunnableParallel (or simply a dictionary of Runnables in LCEL) allows you to execute multiple Runnable components concurrently based on the same input. It takes a dictionary where keys become the output keys, and values are the Runnables to execute.
  • Why use it?
    • Speed: Execute independent operations simultaneously, reducing overall latency.
    • Diverse Outputs: Generate multiple types of responses from a single query (e.g., a summary, a list of keywords, a sentiment analysis) in one go.
    • Comparing Models: Run the same query through different LLMs to compare their outputs or get a “second opinion.”

Example Scenario (Conceptual): You provide a news article. In parallel, you want to:

  1. Summarize the article.
  2. Extract key entities.
  3. Categorize the article.

Instead of three separate sequential calls, RunnableParallel lets these happen at the same time.

2. Fallbacks with .with_fallbacks(): Ensuring Reliability

In the real world, APIs can fail, LLMs can hit rate limits, or specific models might be temporarily unavailable. A robust application shouldn’t crash; it should have a backup plan. That’s what fallbacks provide.

  • What is it? The .with_fallbacks() method is a powerful feature available on any Runnable. You chain it to a Runnable (your primary option) and provide one or more fallback Runnables. If the primary Runnable fails (e.g., throws an exception, hits a timeout), LangChain automatically tries the next fallback in the list.
  • Why use it?
    • Resilience: Your application continues to function even if a primary service fails.
    • Cost Optimization: You can try a cheaper, faster LLM first, and only fall back to a more expensive, robust one if the primary fails.
    • Graceful Degradation: Provide a slightly less optimal but still functional response instead of an error.

Example Scenario (Conceptual): You want to use a powerful, expensive LLM (e.g., GPT-4) as your primary. If GPT-4 hits a rate limit or has an error, you fall back to a cheaper, slightly less capable model (e.g., GPT-3.5-Turbo), or even a local Ollama model.

For more details, check out the official LangChain documentation:


Project: LCEL Chains with Parallelism and Fallbacks

We’ll build a script that demonstrates both concepts. For parallelism, we’ll ask two different LLMs (or the same LLM with different instructions) to respond concurrently. For fallbacks, we’ll set up a primary and a secondary LLM, showing how the system gracefully handles failure.

Before you run the code:

  • Ensure Ollama is installed and running (ollama serve).
  • Pull any necessary Ollama models (e.g., llama2, mistral).
  • Ensure your OPENAI_API_KEY is set if using OpenAI models.
import os
import time
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_core.runnables import RunnableParallel, RunnableLambda
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# --- Configuration ---
# LLM Provider for primary and fallback LLMs (you can mix and match)
# Set these in your .env file:
# PRIMARY_LLM_PROVIDER=openai
# FALLBACK_LLM_PROVIDER=ollama
# OLLAMA_MODEL_CHAT=llama2
PRIMARY_LLM_PROVIDER = os.getenv("PRIMARY_LLM_PROVIDER", "openai").lower()
FALLBACK_LLM_PROVIDER = os.getenv("FALLBACK_LLM_PROVIDER", "ollama").lower()
OLLAMA_MODEL_CHAT = os.getenv("OLLAMA_MODEL_CHAT", "llama2").lower()

# --- Initialize LLMs ---
def initialize_llm(provider, model_name=None, temp=0.7, timeout=None):
    if provider == "openai":
        if not os.getenv("OPENAI_API_KEY"):
            raise ValueError("OPENAI_API_KEY not set for OpenAI provider.")
        return ChatOpenAI(model=model_name or "gpt-3.5-turbo", temperature=temp, request_timeout=timeout)
    elif provider == "ollama":
        try:
            llm = ChatOllama(model=model_name or OLLAMA_MODEL_CHAT, temperature=temp)
            # Test connection (optional, but good for debugging)
            llm.invoke("Hello!")
            return llm
        except Exception as e:
            print(f"Error connecting to Ollama LLM or model '{model_name or OLLAMA_MODEL_CHAT}' not found: {e}")
            print("Please ensure Ollama is running and the specified model is pulled.")
            exit()
    else:
        raise ValueError(f"Invalid LLM provider: {provider}. Must be 'openai' or 'ollama'.")

# --- Example 1: Parallelism ---
print("--- Example 1: Parallelism with RunnableParallel ---")

# Define two different LLM instances or instructions
llm_fast = initialize_llm(PRIMARY_LLM_PROVIDER, temp=0.5, model_name="gpt-3.5-turbo" if PRIMARY_LLM_PROVIDER == "openai" else OLLAMA_MODEL_CHAT)
llm_creative = initialize_llm(PRIMARY_LLM_PROVIDER, temp=0.9, model_name="gpt-3.5-turbo" if PRIMARY_LLM_PROVIDER == "openai" else OLLAMA_MODEL_CHAT) # Can be same model, just different temp

# Define prompts for different outputs
summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise summarizer. Summarize the following text briefly."),
    ("user", "{text}")
])

keywords_prompt = ChatPromptTemplate.from_messages([
    ("system", "Extract 3-5 keywords from the following text, separated by commas."),
    ("user", "{text}")
])

# Create parallel chains
parallel_chain = RunnableParallel(
    summary=summary_prompt | llm_fast | StrOutputParser(),
    keywords=keywords_prompt | llm_creative | StrOutputParser()
)

# Invoke the parallel chain
input_text = "LangChain is a framework for developing applications powered by language models. It enables chaining together different components to create more complex use cases around LLMs. This includes components for prompt management, LLMs, chat models, output parsers, retrievers, document loaders, and more. It emphasizes composability and supports the LangChain Expression Language (LCEL) for building flexible and robust chains."

print(f"\nInput Text: {input_text[:100]}...\n")
print("Running parallel chain...")
start_time = time.time()
parallel_output = parallel_chain.invoke({"text": input_text})
end_time = time.time()

print(f"Parallel execution took: {end_time - start_time:.2f} seconds")
print(f"Summary: {parallel_output['summary']}")
print(f"Keywords: {parallel_output['keywords']}\n")

# --- Example 2: Fallbacks ---
print("--- Example 2: Fallbacks with .with_fallbacks() ---")

# Define primary LLM (could be more expensive/prone to rate limits)
# We can simulate failure by setting a very short timeout or using a non-existent model
primary_llm = initialize_llm(PRIMARY_LLM_PROVIDER, temp=0.7, model_name="gpt-3.5-turbo" if PRIMARY_LLM_PROVIDER == "openai" else OLLAMA_MODEL_CHAT, timeout=0.01) # Simulate failure with a tiny timeout

# Define fallback LLM (could be cheaper/local/more reliable)
fallback_llm = initialize_llm(FALLBACK_LLM_PROVIDER, temp=0.7, model_name="gpt-3.5-turbo-0125" if FALLBACK_LLM_PROVIDER == "openai" else OLLAMA_MODEL_CHAT)

# Define a simple prompt
simple_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{question}")
])

# Create a chain with fallbacks
# The primary_llm will be tried first. If it fails, fallback_llm is used.
fallback_chain = (
    simple_prompt
    | primary_llm.with_fallbacks([fallback_llm])
    | StrOutputParser()
)

question_fallback = "What is the capital of France?"
print(f"Question for fallback: '{question_fallback}'")
print(f"Attempting to use primary LLM ({PRIMARY_LLM_PROVIDER}) first, falling back to ({FALLBACK_LLM_PROVIDER}) if needed...")

start_time_fallback = time.time()
try:
    response_fallback = fallback_chain.invoke({"question": question_fallback})
    print(f"Response: {response_fallback}")
except Exception as e:
    print(f"Fallback chain failed completely: {e}") # Should not happen if fallback is robust
finally:
    end_time_fallback = time.time()
    print(f"Fallback execution took: {end_time_fallback - start_time_fallback:.2f} seconds")

# Optional: Clean up dummy file if created (not applicable for this script)

Code Explanation:

  1. Configurable LLMs: We reuse the flexible LLM initialization logic from Day 3, allowing you to easily switch between OpenAI and Ollama for both primary and fallback roles.
  2. RunnableParallel (Example 1):
    • We define two different ChatPromptTemplate instances, one for summarization and one for keyword extraction.
    • RunnableParallel(summary=..., keywords=...) creates a dictionary-like structure. When parallel_chain.invoke({"text": input_text}) is called, both summary and keywords chains run concurrently using the same input_text.
    • The output is a dictionary {'summary': '...', 'keywords': '...'}.
  3. .with_fallbacks() (Example 2):
    • We define primary_llm and fallback_llm.
    • Simulating Failure: For demonstration, primary_llm is initialized with request_timeout=0.01. In a real scenario, this timeout would be higher, or you’d rely on actual API errors (rate limits, network issues).
    • primary_llm.with_fallbacks([fallback_llm]): This is the key. LangChain will attempt to run primary_llm. If it raises an exception (like a timeout or API error), it will automatically try fallback_llm.
    • The print statements will indicate which LLM likely handled the request or if the fallback mechanism was triggered due to the timeout.

This project showcases how LCEL’s advanced patterns empower you to build more sophisticated, efficient, and robust LLM applications.

Leave a comment

I’m Arpan

I’m a Software Engineer driven by curiosity and a deep interest in Generative AI Technologies. I believe we’re standing at the frontier of a new era—where machines not only learn but create, and I’m excited to explore what’s possible at this intersection of intelligence and imagination.

When I’m not writing code or experimenting with new AI models, you’ll probably find me travelling, soaking in new cultures, or reading a book that challenges how I think. I thrive on new ideas—especially ones that can be turned into meaningful, impactful projects. If it’s bold, innovative, and GenAI-related, I’m all in.

“The future belongs to those who believe in the beauty of their dreams.”Eleanor Roosevelt

“Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world.”Albert Einstein

This blog, MLVector, is my space to share technical insights, project breakdowns, and explorations in GenAI—from the models shaping tomorrow to the code powering today.

Let’s build the future, one vector at a time.

Let’s connect