AIPython

LangChain & Chain-of-Thought Reasoning: Building Smarter AI Applications

2/21/2026
10 min read

Introduction

Have you ever noticed that when you ask a complex question—like "What's 7 × 8 + 52?"—it helps to show your work? The same principle applies to Large Language Models (LLMs). Instead of jumping straight to an answer, modern AI systems are learning to think out loud, breaking problems into intermediate reasoning steps.

This is where LangChain comes in. LangChain is a framework for building applications with LLMs by orchestrating sequences of reasoning steps—essentially teaching AI to work through problems methodically rather than guessing. Combined with advanced reasoning techniques like Chain-of-Thought (CoT) prompting and Chain-of-Code execution, LangChain enables developers to build remarkably intelligent applications.

In this article, we'll explore:

  • Why LLMs need chains of reasoning
  • How to implement CoT reasoning in LangChain
  • Advanced techniques like mixing code execution with LM reasoning
  • Practical patterns you can use today

Let's dive in.

Understanding the Problem: Why Simple Prompts Fail

The Limitation of Single-Step Reasoning

Imagine asking an LLM: "If there are 5 apples on a tree, and I pick 2 per day for 3 days, how many remain?"

A naive LLM might respond immediately with an answer—and likely get it wrong. Why? Because arithmetic reasoning requires multiple computational steps, and LLMs struggle with this without explicit scaffolding.

Here's the issue:

  1. Pattern matching vs. reasoning: LLMs are fundamentally pattern-matching systems, trained on text sequences. They're phenomenal at recognizing patterns but can struggle with complex logic chains.

  2. Token-level generation: LLMs generate one token at a time. A direct jump to an answer skips the intermediate states where logical errors can compound.

  3. No error correction: Without showing steps, there's nowhere for the model to self-correct when it makes a mistake.

The Research Evidence

Recent studies show that when LLMs are asked to provide intermediate reasoning steps before answering, their accuracy on complex tasks improves dramatically. On arithmetic and logic problems, explicit reasoning can improve accuracy by 40-60%.

This observation led to the development of Chain-of-Thought reasoning—and by extension, frameworks like LangChain to implement these chains systematically.

What is Chain-of-Thought Reasoning?

The Core Concept

Chain-of-Thought (CoT) is a prompting technique that encourages LLMs to decompose problems into explicit intermediate steps. Instead of:

Q: If there are 5 apples, pick 2 per day for 3 days, how many remain?
A: 1

We get:

Q: If there are 5 apples, pick 2 per day for 3 days, how many remain?
A: Let me think step by step.
Day 1: Pick 2 apples. Remaining: 5 - 2 = 3
Day 2: Pick 2 apples. Remaining: 3 - 2 = 1
Day 3: Pick 2 apples. Remaining: 1 - 2 = -1 (not possible, so we stop)
Actually, on Day 3, there's only 1 apple left, so we can only pick 1.
Final answer: 0 apples remain.

The model shows its work, making reasoning transparent and more reliable.

Why This Works

CoT works for several reasons:

  • Explicit intermediate states: Each step creates an anchor point. If reasoning derails, it's visible.
  • Reduced token distance: Instead of predicting the answer from just the question, the model predicts each step from the previous step—a simpler task.
  • Self-correction: Humans reading the steps can spot errors and the model can learn to avoid them.

Implementing Chain-of-Thought in LangChain

Setting Up Your Environment

First, let's set up a basic LangChain project:

python
# Installation
# pip install langchain langchain-openai python-dotenv

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import os

# Initialize the LLM
llm = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,
    api_key=os.getenv("OPENAI_API_KEY")
)

Creating a CoT Chain

The simplest CoT implementation uses a prompt template that explicitly asks for reasoning:

python
# Define a Chain-of-Thought prompt
cot_prompt = PromptTemplate(
    input_variables=["question"],
    template="""
Answer the following question by thinking step by step.

Question: {question}

Let me work through this step by step:
1. First, I'll identify what we know:
2. Next, I'll break down the problem:
3. Then, I'll solve each part:
4. Finally, the answer is:

Answer:
"""
)

# Create a chain
cot_chain = LLMChain(llm=llm, prompt=cot_prompt)

# Use it
question = "A store has 12 items. They receive 8 more, then sell half. How many remain?"
result = cot_chain.run(question=question)
print(result)

Output:

Let me work through this step by step:
1. First, I'll identify what we know:
   - Starting items: 12
   - Items received: 8
   - Items sold: half of the total after receiving

2. Next, I'll break down the problem:
   - After receiving: 12 + 8 = 20 items
   - After selling: 20 / 2 = 10 items sold
   - Remaining: 20 - 10 = 10 items

3. Then, I'll solve each part:
   This is a multi-step arithmetic problem requiring addition, division, and subtraction.

4. Finally, the answer is:
   10 items remain in the store.

Answer: 10 items

Notice how the model naturally fills in the step-by-step reasoning when prompted.

Advanced: Chain of Code

Moving Beyond Natural Language Reasoning

Natural language reasoning is excellent for qualitative problems, but what about tasks requiring precise counting or symbolic computation?

Researchers discovered something powerful: when you ask LLMs to write executable code alongside natural language reasoning, performance improves dramatically. This is called Chain of Code reasoning.

The idea:

  1. LM generates pseudocode with explicit logic
  2. Interpreter executes standard Python operations
  3. When the interpreter encounters undefined functions, the LM emulates them

Here's a concrete example:

python
# Question: "Which countries are these cities in?"
# Problem: The model might hallucinate country associations

# Chain of Code approach:
code = """
places = ["Mumbai", "Shanghai", "Cairo", "Paris"]
countries = set()

for place in places:
    country = get_country(place)  # LM will emulate this
    countries.add(country)

answer = list(countries)
"""

# The interpreter executes the loop and set operations (precise)
# The LM provides the get_country() function outputs (semantic knowledge)

Implementing Chain of Code in LangChain

python
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
import subprocess
import json

class ChainOfCodeExecutor:
    def __init__(self, llm):
        self.llm = llm
        self.undefined_functions = {}
    
    def get_lm_emulation(self, function_name, *args):
        """Ask the LM to emulate an undefined function."""
        prompt = f"""
        What should the function '{function_name}' return for these inputs: {args}?
        Provide only the return value, no explanation.
        """
        response = self.llm.predict(prompt)
        return response.strip()
    
    def execute_chain_of_code(self, question):
        """Execute a Chain of Code reasoning chain."""
        
        # Step 1: Prompt the LM to generate code
        code_generation_prompt = PromptTemplate(
            input_variables=["question"],
            template="""
            Write Python code to solve this question. 
            Use standard Python operations where possible.
            For semantic tasks (like getting a country name), 
            use helper functions like get_country(place).
            
            Question: {question}
            
            Code:
            """
        )
        
        generated_code = self.llm.predict(
            code_generation_prompt.format(question=question)
        )
        
        print("Generated Code:")
        print(generated_code)
        
        # Step 2: Execute the code, catching undefined functions
        execution_globals = {'get_country': self.get_lm_emulation}
        
        try:
            exec(generated_code, execution_globals)
            result = execution_globals.get('answer', 'No answer found')
            return result
        except NameError as e:
            print(f"Undefined function encountered: {e}")
            # The custom get_lm_emulation will handle this
            return None

# Usage
llm = ChatOpenAI(model="gpt-4", temperature=0)
executor = ChainOfCodeExecutor(llm)

question = "What are the countries of these cities: Mumbai, Tokyo, Cairo?"
result = executor.execute_chain_of_code(question)
print(f"Final Answer: {result}")

This approach is 12% more accurate than pure CoT on complex reasoning tasks because it separates:

  • Symbolic operations (code execution)
  • Semantic knowledge (LM emulation)

Tree of Thought: Going Beyond Linear Chains

The Limitation of Single Chains

Chain-of-Thought generates one reasoning path. But what if that path leads to a dead end? What if there are multiple valid solution approaches, and we want the best one?

Tree of Thought (ToT) extends this idea: instead of a single chain, explore a tree of reasoning paths and select the best one.

              Question
                  |
        ___________+___________
       /           |           \
    Path A       Path B       Path C
     / \          / \          / \
   A1  A2       B1  B2       C1  C2
   |   |        |   |        |   |
  Dead Good    Dead Dead    Good Okay

Implementing a Simple ToT in LangChain

python
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from typing import List, Tuple

class TreeOfThought:
    def __init__(self, llm, max_depth=2, branching_factor=3):
        self.llm = llm
        self.max_depth = max_depth
        self.branching_factor = branching_factor
    
    def generate_next_steps(self, question: str, current_reasoning: str) -> List[str]:
        """Generate multiple possible next reasoning steps."""
        prompt = f"""
        Question: {question}
        
        Current reasoning so far:
        {current_reasoning}
        
        Generate {self.branching_factor} different ways to continue reasoning.
        Format as a numbered list (1., 2., 3., etc.)
        """
        
        response = self.llm.predict(prompt)
        
        # Parse the response into individual steps
        steps = [
            line.strip() 
            for line in response.split('\n') 
            if line.strip() and line[0].isdigit()
        ]
        return steps[:self.branching_factor]
    
    def evaluate_step(self, reasoning: str) -> float:
        """Evaluate how promising a reasoning step is (0-1 score)."""
        prompt = f"""
        Evaluate how sound this reasoning step is on a scale of 0-1.
        Consider clarity, logical validity, and progress toward a solution.
        
        Reasoning: {reasoning}
        
        Score (just a number between 0 and 1):
        """
        
        response = self.llm.predict(prompt)
        try:
            return float(response.strip())
        except:
            return 0.5
    
    def explore_tree(self, question: str, depth: int = 0, 
                     current_reasoning: str = "") -> Tuple[str, float]:
        """Recursively explore the reasoning tree."""
        
        if depth >= self.max_depth:
            # Leaf node: generate final answer
            prompt = f"""
            Based on this reasoning:
            {current_reasoning}
            
            Question: {question}
            
            Provide your final answer:
            """
            answer = self.llm.predict(prompt)
            score = self.evaluate_step(answer)
            return answer, score
        
        # Generate next possible steps
        next_steps = self.generate_next_steps(question, current_reasoning)
        
        best_answer = ""
        best_score = 0.0
        
        for step in next_steps:
            # Evaluate this step
            step_score = self.evaluate_step(step)
            
            # Only explore promising branches (pruning)
            if step_score > 0.3:
                new_reasoning = current_reasoning + "\n" + step
                answer, answer_score = self.explore_tree(
                    question, depth + 1, new_reasoning
                )
                
                if answer_score > best_score:
                    best_score = answer_score
                    best_answer = answer
        
        return best_answer, best_score

# Usage
llm = ChatOpenAI(model="gpt-4", temperature=0.7)
tot = TreeOfThought(llm, max_depth=2, branching_factor=2)

question = "How would you design a system to detect fake news?"
answer, confidence = tot.explore_tree(question)

print(f"Answer:\n{answer}")
print(f"Confidence: {confidence:.2f}")

The Tree of Thought approach is particularly powerful for:

  • Creative problem solving (multiple valid approaches)
  • Complex planning (different strategies)
  • Diagnosis and troubleshooting (exploring multiple hypotheses)

Understanding CoT Limitations

Is CoT Real Reasoning or Pattern Matching?

An important caveat: recent research suggests that CoT might be less genuine reasoning than we thought. Studies show that:

  1. Distribution matters: CoT works exceptionally well on in-distribution problems but can struggle on out-of-distribution examples.

  2. Memorization vs. computation: Some apparent "reasoning" might be pattern matching rather than logical deduction.

  3. Confidence gaps: Models can produce confident-sounding reasoning that's actually incorrect.

This doesn't diminish CoT's practical value, but it means:

python
# ⚠️ Best Practice: Validate CoT outputs with external verification

from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate

class ValidatedCoT:
    def __init__(self, llm):
        self.llm = llm
    
    def reason_with_verification(self, question: str, 
                                 verification_fn=None) -> dict:
        """Generate CoT reasoning and verify the answer."""
        
        # Step 1: Generate reasoning
        reasoning_prompt = PromptTemplate(
            input_variables=["question"],
            template="""
            Question: {question}
            
            Let me think through this step by step:
            """
        )
        
        reasoning = self.llm.predict(reasoning_prompt.format(question=question))
        
        # Step 2: Extract final answer
        answer_prompt = PromptTemplate(
            input_variables=["reasoning"],
            template="""
            Based on this reasoning:
            {reasoning}
            
            What is the final answer? (Just the answer, no explanation):
            """
        )
        
        answer = self.llm.predict(answer_prompt.format(reasoning=reasoning))
        
        # Step 3: Verify if verification function provided
        verified = True
        verification_result = "No verification provided"
        
        if verification_fn:
            verified = verification_fn(answer)
            verification_result = "Verifie

Share this article

Chalamaiah Chinnam

Chalamaiah Chinnam

AI Engineer & Senior Software Engineer

15+ years of enterprise software experience, specializing in applied AI systems, multi-agent architectures, and RAG pipelines. Currently building AI-powered automation at LinkedIn.