LangChain & Chain-of-Thought Reasoning: Building Smarter AI Applications
Introduction
Have you ever noticed that when you ask a complex question—like "What's 7 × 8 + 52?"—it helps to show your work? The same principle applies to Large Language Models (LLMs). Instead of jumping straight to an answer, modern AI systems are learning to think out loud, breaking problems into intermediate reasoning steps.
This is where LangChain comes in. LangChain is a framework for building applications with LLMs by orchestrating sequences of reasoning steps—essentially teaching AI to work through problems methodically rather than guessing. Combined with advanced reasoning techniques like Chain-of-Thought (CoT) prompting and Chain-of-Code execution, LangChain enables developers to build remarkably intelligent applications.
In this article, we'll explore:
- Why LLMs need chains of reasoning
- How to implement CoT reasoning in LangChain
- Advanced techniques like mixing code execution with LM reasoning
- Practical patterns you can use today
Let's dive in.
Understanding the Problem: Why Simple Prompts Fail
The Limitation of Single-Step Reasoning
Imagine asking an LLM: "If there are 5 apples on a tree, and I pick 2 per day for 3 days, how many remain?"
A naive LLM might respond immediately with an answer—and likely get it wrong. Why? Because arithmetic reasoning requires multiple computational steps, and LLMs struggle with this without explicit scaffolding.
Here's the issue:
-
Pattern matching vs. reasoning: LLMs are fundamentally pattern-matching systems, trained on text sequences. They're phenomenal at recognizing patterns but can struggle with complex logic chains.
-
Token-level generation: LLMs generate one token at a time. A direct jump to an answer skips the intermediate states where logical errors can compound.
-
No error correction: Without showing steps, there's nowhere for the model to self-correct when it makes a mistake.
The Research Evidence
Recent studies show that when LLMs are asked to provide intermediate reasoning steps before answering, their accuracy on complex tasks improves dramatically. On arithmetic and logic problems, explicit reasoning can improve accuracy by 40-60%.
This observation led to the development of Chain-of-Thought reasoning—and by extension, frameworks like LangChain to implement these chains systematically.
What is Chain-of-Thought Reasoning?
The Core Concept
Chain-of-Thought (CoT) is a prompting technique that encourages LLMs to decompose problems into explicit intermediate steps. Instead of:
Q: If there are 5 apples, pick 2 per day for 3 days, how many remain?
A: 1
We get:
Q: If there are 5 apples, pick 2 per day for 3 days, how many remain?
A: Let me think step by step.
Day 1: Pick 2 apples. Remaining: 5 - 2 = 3
Day 2: Pick 2 apples. Remaining: 3 - 2 = 1
Day 3: Pick 2 apples. Remaining: 1 - 2 = -1 (not possible, so we stop)
Actually, on Day 3, there's only 1 apple left, so we can only pick 1.
Final answer: 0 apples remain.
The model shows its work, making reasoning transparent and more reliable.
Why This Works
CoT works for several reasons:
- Explicit intermediate states: Each step creates an anchor point. If reasoning derails, it's visible.
- Reduced token distance: Instead of predicting the answer from just the question, the model predicts each step from the previous step—a simpler task.
- Self-correction: Humans reading the steps can spot errors and the model can learn to avoid them.
Implementing Chain-of-Thought in LangChain
Setting Up Your Environment
First, let's set up a basic LangChain project:
python# Installation # pip install langchain langchain-openai python-dotenv from langchain_openai import ChatOpenAI from langchain.prompts import PromptTemplate from langchain.chains import LLMChain import os # Initialize the LLM llm = ChatOpenAI( model="gpt-4", temperature=0.7, api_key=os.getenv("OPENAI_API_KEY") )
Creating a CoT Chain
The simplest CoT implementation uses a prompt template that explicitly asks for reasoning:
python# Define a Chain-of-Thought prompt cot_prompt = PromptTemplate( input_variables=["question"], template=""" Answer the following question by thinking step by step. Question: {question} Let me work through this step by step: 1. First, I'll identify what we know: 2. Next, I'll break down the problem: 3. Then, I'll solve each part: 4. Finally, the answer is: Answer: """ ) # Create a chain cot_chain = LLMChain(llm=llm, prompt=cot_prompt) # Use it question = "A store has 12 items. They receive 8 more, then sell half. How many remain?" result = cot_chain.run(question=question) print(result)
Output:
Let me work through this step by step:
1. First, I'll identify what we know:
- Starting items: 12
- Items received: 8
- Items sold: half of the total after receiving
2. Next, I'll break down the problem:
- After receiving: 12 + 8 = 20 items
- After selling: 20 / 2 = 10 items sold
- Remaining: 20 - 10 = 10 items
3. Then, I'll solve each part:
This is a multi-step arithmetic problem requiring addition, division, and subtraction.
4. Finally, the answer is:
10 items remain in the store.
Answer: 10 items
Notice how the model naturally fills in the step-by-step reasoning when prompted.
Advanced: Chain of Code
Moving Beyond Natural Language Reasoning
Natural language reasoning is excellent for qualitative problems, but what about tasks requiring precise counting or symbolic computation?
Researchers discovered something powerful: when you ask LLMs to write executable code alongside natural language reasoning, performance improves dramatically. This is called Chain of Code reasoning.
The idea:
- LM generates pseudocode with explicit logic
- Interpreter executes standard Python operations
- When the interpreter encounters undefined functions, the LM emulates them
Here's a concrete example:
python# Question: "Which countries are these cities in?" # Problem: The model might hallucinate country associations # Chain of Code approach: code = """ places = ["Mumbai", "Shanghai", "Cairo", "Paris"] countries = set() for place in places: country = get_country(place) # LM will emulate this countries.add(country) answer = list(countries) """ # The interpreter executes the loop and set operations (precise) # The LM provides the get_country() function outputs (semantic knowledge)
Implementing Chain of Code in LangChain
pythonfrom langchain_openai import ChatOpenAI from langchain.prompts import PromptTemplate import subprocess import json class ChainOfCodeExecutor: def __init__(self, llm): self.llm = llm self.undefined_functions = {} def get_lm_emulation(self, function_name, *args): """Ask the LM to emulate an undefined function.""" prompt = f""" What should the function '{function_name}' return for these inputs: {args}? Provide only the return value, no explanation. """ response = self.llm.predict(prompt) return response.strip() def execute_chain_of_code(self, question): """Execute a Chain of Code reasoning chain.""" # Step 1: Prompt the LM to generate code code_generation_prompt = PromptTemplate( input_variables=["question"], template=""" Write Python code to solve this question. Use standard Python operations where possible. For semantic tasks (like getting a country name), use helper functions like get_country(place). Question: {question} Code: """ ) generated_code = self.llm.predict( code_generation_prompt.format(question=question) ) print("Generated Code:") print(generated_code) # Step 2: Execute the code, catching undefined functions execution_globals = {'get_country': self.get_lm_emulation} try: exec(generated_code, execution_globals) result = execution_globals.get('answer', 'No answer found') return result except NameError as e: print(f"Undefined function encountered: {e}") # The custom get_lm_emulation will handle this return None # Usage llm = ChatOpenAI(model="gpt-4", temperature=0) executor = ChainOfCodeExecutor(llm) question = "What are the countries of these cities: Mumbai, Tokyo, Cairo?" result = executor.execute_chain_of_code(question) print(f"Final Answer: {result}")
This approach is 12% more accurate than pure CoT on complex reasoning tasks because it separates:
- Symbolic operations (code execution)
- Semantic knowledge (LM emulation)
Tree of Thought: Going Beyond Linear Chains
The Limitation of Single Chains
Chain-of-Thought generates one reasoning path. But what if that path leads to a dead end? What if there are multiple valid solution approaches, and we want the best one?
Tree of Thought (ToT) extends this idea: instead of a single chain, explore a tree of reasoning paths and select the best one.
Question
|
___________+___________
/ | \
Path A Path B Path C
/ \ / \ / \
A1 A2 B1 B2 C1 C2
| | | | | |
Dead Good Dead Dead Good Okay
Implementing a Simple ToT in LangChain
pythonfrom langchain_openai import ChatOpenAI from langchain.prompts import PromptTemplate from typing import List, Tuple class TreeOfThought: def __init__(self, llm, max_depth=2, branching_factor=3): self.llm = llm self.max_depth = max_depth self.branching_factor = branching_factor def generate_next_steps(self, question: str, current_reasoning: str) -> List[str]: """Generate multiple possible next reasoning steps.""" prompt = f""" Question: {question} Current reasoning so far: {current_reasoning} Generate {self.branching_factor} different ways to continue reasoning. Format as a numbered list (1., 2., 3., etc.) """ response = self.llm.predict(prompt) # Parse the response into individual steps steps = [ line.strip() for line in response.split('\n') if line.strip() and line[0].isdigit() ] return steps[:self.branching_factor] def evaluate_step(self, reasoning: str) -> float: """Evaluate how promising a reasoning step is (0-1 score).""" prompt = f""" Evaluate how sound this reasoning step is on a scale of 0-1. Consider clarity, logical validity, and progress toward a solution. Reasoning: {reasoning} Score (just a number between 0 and 1): """ response = self.llm.predict(prompt) try: return float(response.strip()) except: return 0.5 def explore_tree(self, question: str, depth: int = 0, current_reasoning: str = "") -> Tuple[str, float]: """Recursively explore the reasoning tree.""" if depth >= self.max_depth: # Leaf node: generate final answer prompt = f""" Based on this reasoning: {current_reasoning} Question: {question} Provide your final answer: """ answer = self.llm.predict(prompt) score = self.evaluate_step(answer) return answer, score # Generate next possible steps next_steps = self.generate_next_steps(question, current_reasoning) best_answer = "" best_score = 0.0 for step in next_steps: # Evaluate this step step_score = self.evaluate_step(step) # Only explore promising branches (pruning) if step_score > 0.3: new_reasoning = current_reasoning + "\n" + step answer, answer_score = self.explore_tree( question, depth + 1, new_reasoning ) if answer_score > best_score: best_score = answer_score best_answer = answer return best_answer, best_score # Usage llm = ChatOpenAI(model="gpt-4", temperature=0.7) tot = TreeOfThought(llm, max_depth=2, branching_factor=2) question = "How would you design a system to detect fake news?" answer, confidence = tot.explore_tree(question) print(f"Answer:\n{answer}") print(f"Confidence: {confidence:.2f}")
The Tree of Thought approach is particularly powerful for:
- Creative problem solving (multiple valid approaches)
- Complex planning (different strategies)
- Diagnosis and troubleshooting (exploring multiple hypotheses)
Understanding CoT Limitations
Is CoT Real Reasoning or Pattern Matching?
An important caveat: recent research suggests that CoT might be less genuine reasoning than we thought. Studies show that:
-
Distribution matters: CoT works exceptionally well on in-distribution problems but can struggle on out-of-distribution examples.
-
Memorization vs. computation: Some apparent "reasoning" might be pattern matching rather than logical deduction.
-
Confidence gaps: Models can produce confident-sounding reasoning that's actually incorrect.
This doesn't diminish CoT's practical value, but it means:
python# ⚠️ Best Practice: Validate CoT outputs with external verification from langchain_openai import ChatOpenAI from langchain.prompts import PromptTemplate class ValidatedCoT: def __init__(self, llm): self.llm = llm def reason_with_verification(self, question: str, verification_fn=None) -> dict: """Generate CoT reasoning and verify the answer.""" # Step 1: Generate reasoning reasoning_prompt = PromptTemplate( input_variables=["question"], template=""" Question: {question} Let me think through this step by step: """ ) reasoning = self.llm.predict(reasoning_prompt.format(question=question)) # Step 2: Extract final answer answer_prompt = PromptTemplate( input_variables=["reasoning"], template=""" Based on this reasoning: {reasoning} What is the final answer? (Just the answer, no explanation): """ ) answer = self.llm.predict(answer_prompt.format(reasoning=reasoning)) # Step 3: Verify if verification function provided verified = True verification_result = "No verification provided" if verification_fn: verified = verification_fn(answer) verification_result = "Verifie
Share this article
Related Articles
Deep Learning 101: From Foundations to Real-World Applications
A deep dive into Deep learning for AI engineers.
Machine Learning Models 101: From Theory to Practice
A deep dive into Machine Learning Models for AI engineers.
Cosine Search and Cosine Distance in RAG: The Foundation of Semantic Retrieval
A deep dive into Cosine Search and Cosine Distance in RAG for AI engineers.

