Chalamaiah Chinnam

Why This Matters

Imagine asking an AI assistant to help you plan a research project. A traditional chatbot would generate a response in one shot—it thinks, then answers. But what if that assistant could think about what it needs to know, search for that information, evaluate what it found, and think again? That's the fundamental insight behind ReAct (Reasoning + Acting), a groundbreaking framework that's reshaping how we build autonomous AI agents.

ReAct demonstrates something profound: reasoning and action create a virtuous cycle. The agent reasons about what to do, takes an action in the world, observes the result, and uses that feedback to reason further. This simple but powerful pattern is becoming the foundation for everything from code assistants to robotics to knowledge-intensive search systems.

In this article, we'll explore ReAct, understand how it fits into broader agentic AI architectures, and learn practical patterns for building agents that leverage both internal reasoning and external action. Whether you're building a retrieval system, a code assistant, or a robotic controller, these patterns will shape your design decisions.

Part 1: Understanding the Reasoning-Acting Cycle

The Problem with Single-Pass Generation

Traditional language models follow a simple pattern:

Input: User query
Processing: Generate response
Output: Done

This works well for many tasks, but fails when the answer requires:

Exploration: "I need to search multiple sources"
Refinement: "That answer doesn't look right; let me reconsider"
Grounding: "I need to actually execute this command and see what happens"
Adaptation: "Based on the feedback, here's my revised approach"

How ReAct Changes the Game

ReAct introduces an interleaved pattern that mirrors human problem-solving:

┌─────────────────────────────────────────────┐
│  Thought: "What should I do?"               │
├─────────────────────────────────────────────┤
│  Action: Execute in environment             │
├─────────────────────────────────────────────┤
│  Observation: "Here's what happened"        │
├─────────────────────────────────────────────┤
│  Thought: "What does this mean?"            │
├─────────────────────────────────────────────┤
│  Action: Execute next step                  │
└─────────────────────────────────────────────┘

The beauty of this cycle is that observations from real-world actions inform subsequent reasoning. The agent isn't just planning in a vacuum—it's grounding its thinking in reality.

A Concrete Example: Question Answering

Let's say you ask: "What's the revenue of OpenAI's latest funding round?"

Without ReAct (single-pass):

Model thinks and generates: "OpenAI raised $100 billion in Series C funding..."
[Often hallucinates or uses outdated training data]

With ReAct:

Thought: I need current information about OpenAI's funding.
Action: Search("OpenAI latest funding round 2024")
Observation: Found article from November 2024 about $6.5B funding at $157B valuation
Thought: This seems to be the most recent funding. I have the information needed.
Action: Answer("OpenAI's latest funding round was $6.5 billion...")

The key difference: ReAct agents retrieve real information before answering, dramatically improving accuracy.

Part 2: The Cognitive Architecture of Language Agents

To understand ReAct's place in the broader ecosystem, we need a framework for thinking about agents. Researchers at Princeton developed CoALA (Cognitive Architecture for Language Agents), which decomposes agents into four key components:

Component 1: Memory Systems

Modern agents don't just process one input at a time. They maintain different types of memory:

| Memory Type | Purpose | Example | |---|---|---| | Procedural | How to do things; encoded in decision logic | "How to format a search query" | | Semantic | World knowledge; databases or embeddings | Product catalog, codebase documentation | | Episodic | Past interactions; conversation history | "The user previously asked about X" | | Working | Current context; what's in the prompt | Current task state, previous thoughts |

A simple retail assistant might only use working memory (the current conversation) plus semantic memory (the product database). A sophisticated multi-agent code assistant uses all four types.

Component 2: Action Space

Agents can perform two categories of actions:

Internal Actions (thinking within the model):

Reasoning: Chain-of-thought, planning
Retrieval: Searching memory systems
Learning: Updating knowledge or procedures

External Actions (interacting with the world):

Grounding: Executing code, moving robots, clicking buttons
Dialogue: Asking humans for clarification
API Calls: Fetching data from external systems

Component 3: Environmental Grounding

Where can actions actually execute?

Physical Environments: Robotics with cameras and actuators
Digital Environments: APIs, websites, software systems
Dialogue Environments: Human-agent conversation

Each requires converting between LLM outputs (text) and environment-specific formats (pixel commands, API calls, etc.).

Component 4: Decision-Making Procedure

How does the agent decide what to do next? This can be:

Simple: Fixed alternation (think → act → think → act)
Complex: Learned policies that choose which action type to take
Hierarchical: High-level planning with low-level execution

Where ReAct Sits in This Framework

Here's how ReAct compares to other agent architectures:

| Agent | Memory | Action Types | Decision Logic | Best For | |-------|--------|--------------|----------------|----------| | ReAct | Procedural only | Reasoning + grounding | Fixed: alternating thought/action | Robust, generalizable QA | | Voyager | Hierarchical procedural | All four | Learned policy with code abstraction | Exploration, learning complex skills | | SayCan | Procedural + value function | External (551 skills) | Combines LLM utility + learned value | Robotics with pre-trained skills | | Generative Agents | Full (semantic, episodic, procedural, working) | All four | Complex learned procedures | Social simulation, multi-agent interaction |

Based on: Sumers et al., "Cognitive Architectures for Language Agents"

ReAct's strength: Its simplicity and generalizability. By alternating between reasoning and grounding without complex memory management, ReAct proves robust across diverse domains.

Part 3: Building a ReAct Agent from Scratch

Let's implement a basic ReAct agent for a question-answering task. This will help you understand the mechanics before diving into advanced variations.


python
from typing import Optional
import json

class ReActAgent:
    """
    A simple ReAct agent that alternates between thinking and taking actions.
    """
    
    def __init__(self, llm_client, tools: dict):
        """
        Args:
            llm_client: Any LLM with a generate() method
            tools: Dictionary mapping tool names to callable functions
                   Example: {"search": search_fn, "calculator": calc_fn}
        """
        self.llm = llm_client
        self.tools = tools
        self.max_iterations = 10
        self.step_count = 0
    
    def _format_thought_action_prompt(self, task: str, history: list) -> str:
        """Format the prompt for the next thought and action."""
        history_str = "\n".join(history)
        
        prompt = f"""You are an AI agent that answers questions by thinking and taking actions.

Available tools: {list(self.tools.keys())}

Task: {task}

History:
{history_str}

Now, provide your next response in exactly this format:
Thought: <your reasoning about what to do>
Action: <tool_name>
Input: <json input to the tool>

Only respond with Thought, Action, and Input. Do not add anything else."""
        
        return prompt
    
    def _parse_action(self, response: str) -> Optional[tuple]:
        """Parse the LLM response into (thought, tool_name, input)."""
        lines = response.strip().split('\n')
        
        thought = None
        action = None
        action_input = None
        
        for i, line in enumerate(lines):
            if line.startswith("Thought:"):
                thought = line[8:].strip()
            elif line.startswith("Action:"):
                action = line[7:].strip()
            elif line.startswith("Input:"):
                try:
                    action_input = json.loads(line[6:].strip())
                except json.JSONDecodeError:
                    action_input = line[6:].strip()
        
        if thought and action:
            return thought, action, action_input
        return None
    
    def run(self, task: str) -> dict:
        """
        Run the agent to completion.
        
        Args:
            task: The question or task to solve
            
        Returns:
            Dictionary with final_answer and full trajectory
        """
        history = []
        self.step_count = 0
        
        while self.step_count < self.max_iterations:
            # Get next thought and action
            prompt = self._format_thought_action_prompt(task, history)
            response = self.llm.generate(prompt)
            
            parsed = self._parse_action(response)
            if not parsed:
                # LLM couldn't parse format; try again
                history.append(f"Invalid format. Please follow Thought/Action/Input format.")
                self.step_count += 1
                continue
            
            thought, action, action_input = parsed
            history.append(f"Thought: {thought}")
            history.append(f"Action: {action}")
            
            # Check if agent is done
            if action == "Finish":
                return {
                    "final_answer": action_input,
                    "trajectory": history,
                    "steps": self.step_count
                }
            
            # Execute the action
            if action not in self.tools:
                observation = f"Tool '{action}' not found. Available tools: {list(self.tools.keys())}"
            else:
                try:
                    observation = self.tools[action](action_input)
                except Exception as e:
                    observation = f"Error executing {action}: {str(e)}"
            
            history.append(f"Observation: {observation}")
            self.step_count += 1
        
        return {
            "final_answer": "Max iterations reached",
            "trajectory": history,
            "steps": self.step_count
        }


# Example tools
def search(query: str) -> str:
    """Simulate searching a knowledge base."""
    knowledge_base = {
        "python": "Python is a high-level programming language.",
        "react": "ReAct is a framework for language agents with reasoning and acting.",
        "ai": "Artificial Intelligence enables machines to learn and make decisions."
    }
    
    query_lower = query.lower()
    for key, value in knowledge_base.items():
        if key in query_lower:
            return value
    return "No information found for that query."


def calculator(expression: str) -> str:
    """Simple calculator tool."""
    try:
        result = eval(expression)
        return str(result)
    except:
        return "Invalid expression"


# Usage example
if __name__ == "__main__":
    # Mock LLM client (replace with real OpenAI/Anthropic client)
    class MockLLM:
        def generate(self, prompt: str) -> str:
            # In reality, this would call an LLM API
            # This is a simplified mock for demonstration
            return """Thought: I need to search for information about ReAct
Action: search
Input: "react framework language agents\""""
    
    agent = ReActAgent(
        llm_client=MockLLM(),
        tools={"search": search, "calculator": calculator, "Finish": lambda x: x}
    )
    
    result = agent.run("What is ReAct?")
    print("Final Answer:", result['final_answer'])
    print("\nTrajectory:")
    for step in result['trajectory']:
        print(step)

Understanding the Flow

Format Prompt: We ask the LLM to produce Thought → Action → Input
Parse Response: Extract the three components from the LLM's text
Execute Action: Call the corresponding tool with the input
Observation: Capture what happened and add to history
Loop: Repeat until the agent says "Finish"

Key insight: The history grows with each step, providing context for subsequent reasoning. This is what creates the virtuous cycle—each observation refines the agent's understanding.

Part 4: Agentic RAG—Where ReAct Meets Retrieval

ReAct's greatest impact has been in Retrieval-Augmented Generation (RAG). Traditional RAG retrieves documents once, then generates an answer. Agentic RAG lets agents decide when to retrieve, what to retrieve, and how to integrate multiple retrieval rounds.

The Architecture

Figure: An agentic RAG system demonstrates how a QA agent can reason about retrieval quality and iteratively refine results. The agent evaluates confidence and routes low-confidence outputs back through retrieval loops. — Source: "Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation"

Notice the feedback loop. The agent:

Classifies the user's intent
Reformulates the query
Retrieves relevant documents
Re-ranks results
Evaluates confidence (new!)
If confidence is low, loops back to retrieve differently
Only produces final answer when confident

Optimizing Agentic RAG with RAG-Gym

Researchers at Xiong et al. identified three optimization dimensions for agentic RAG:

1. Prompt Engineering: Re2Search


python
# Standard retrieval-based QA prompt
prompt = f"""Given context: {retrieved_docs}
Question: {query}
Answer: """

# ReAct-style reasoning reflection prompt (Re2Search)
prompt = f"""Given context: {retrieved_docs}
Question: {query}

Reason through this step by step:
1. What key concepts does the question ask about?
2. Which parts of the context are relevant?
3. Are there gaps in the retrieved context?
4. What's my confidence in this answer?

If confidence is low, what should I search for next?

Answer: """

The key: Making the agent explicit about gaps and confidence dramatically improves performance.

2. Actor Tuning: Direct Preference Optimization (DPO)

Instead of just fine-tuning on correct answers, fine-tune on preferences:

Preferred answer: [high-quality, reasoning-rich response]
Rejected answer: [low-quality, hallucinated response]

Results show DPO outperforms standard fine-tuning by 3-11% on RAG tasks.

3. Critic Training: Learned Evaluation

Train a separate "critic" model to evaluate intermediate steps:


python
class RagCritic:
    def evaluate_intermediate_step(self, 
                                  query: str,
                                  retrieved_docs: list,
                                  candidate_answer: str) -> float:
        """Rate how good this answer would be (0-1)."""
        # Trained via supervised learning on human preferences
        score = self.critic_model(query, retrieved_docs, candidate_answer)
        return score

# Use critic to filter low-quality generation attempts
for attempt in candidate_answers:
    if critic.evaluate_intermediate_step(query, docs, attempt) > 0.7:
        return attempt

Results

![Results of RAG-Gym optimizations](/api/images/af461

ReAct in Agentic AI: Building Intelligent Agents That Think and Act