ReAct in Agentic AI: Building Intelligent Agents That Think and Act
Why This Matters
Imagine asking an AI assistant to help you plan a research project. A traditional chatbot would generate a response in one shot—it thinks, then answers. But what if that assistant could think about what it needs to know, search for that information, evaluate what it found, and think again? That's the fundamental insight behind ReAct (Reasoning + Acting), a groundbreaking framework that's reshaping how we build autonomous AI agents.
ReAct demonstrates something profound: reasoning and action create a virtuous cycle. The agent reasons about what to do, takes an action in the world, observes the result, and uses that feedback to reason further. This simple but powerful pattern is becoming the foundation for everything from code assistants to robotics to knowledge-intensive search systems.
In this article, we'll explore ReAct, understand how it fits into broader agentic AI architectures, and learn practical patterns for building agents that leverage both internal reasoning and external action. Whether you're building a retrieval system, a code assistant, or a robotic controller, these patterns will shape your design decisions.
Part 1: Understanding the Reasoning-Acting Cycle
The Problem with Single-Pass Generation
Traditional language models follow a simple pattern:
- Input: User query
- Processing: Generate response
- Output: Done
This works well for many tasks, but fails when the answer requires:
- Exploration: "I need to search multiple sources"
- Refinement: "That answer doesn't look right; let me reconsider"
- Grounding: "I need to actually execute this command and see what happens"
- Adaptation: "Based on the feedback, here's my revised approach"
How ReAct Changes the Game
ReAct introduces an interleaved pattern that mirrors human problem-solving:
┌─────────────────────────────────────────────┐
│ Thought: "What should I do?" │
├─────────────────────────────────────────────┤
│ Action: Execute in environment │
├─────────────────────────────────────────────┤
│ Observation: "Here's what happened" │
├─────────────────────────────────────────────┤
│ Thought: "What does this mean?" │
├─────────────────────────────────────────────┤
│ Action: Execute next step │
└─────────────────────────────────────────────┘
The beauty of this cycle is that observations from real-world actions inform subsequent reasoning. The agent isn't just planning in a vacuum—it's grounding its thinking in reality.
A Concrete Example: Question Answering
Let's say you ask: "What's the revenue of OpenAI's latest funding round?"
Without ReAct (single-pass):
Model thinks and generates: "OpenAI raised $100 billion in Series C funding..."
[Often hallucinates or uses outdated training data]
With ReAct:
Thought: I need current information about OpenAI's funding.
Action: Search("OpenAI latest funding round 2024")
Observation: Found article from November 2024 about $6.5B funding at $157B valuation
Thought: This seems to be the most recent funding. I have the information needed.
Action: Answer("OpenAI's latest funding round was $6.5 billion...")
The key difference: ReAct agents retrieve real information before answering, dramatically improving accuracy.
Part 2: The Cognitive Architecture of Language Agents
To understand ReAct's place in the broader ecosystem, we need a framework for thinking about agents. Researchers at Princeton developed CoALA (Cognitive Architecture for Language Agents), which decomposes agents into four key components:
Component 1: Memory Systems
Modern agents don't just process one input at a time. They maintain different types of memory:
| Memory Type | Purpose | Example | |---|---|---| | Procedural | How to do things; encoded in decision logic | "How to format a search query" | | Semantic | World knowledge; databases or embeddings | Product catalog, codebase documentation | | Episodic | Past interactions; conversation history | "The user previously asked about X" | | Working | Current context; what's in the prompt | Current task state, previous thoughts |
A simple retail assistant might only use working memory (the current conversation) plus semantic memory (the product database). A sophisticated multi-agent code assistant uses all four types.
Component 2: Action Space
Agents can perform two categories of actions:
Internal Actions (thinking within the model):
- Reasoning: Chain-of-thought, planning
- Retrieval: Searching memory systems
- Learning: Updating knowledge or procedures
External Actions (interacting with the world):
- Grounding: Executing code, moving robots, clicking buttons
- Dialogue: Asking humans for clarification
- API Calls: Fetching data from external systems
Component 3: Environmental Grounding
Where can actions actually execute?
- Physical Environments: Robotics with cameras and actuators
- Digital Environments: APIs, websites, software systems
- Dialogue Environments: Human-agent conversation
Each requires converting between LLM outputs (text) and environment-specific formats (pixel commands, API calls, etc.).
Component 4: Decision-Making Procedure
How does the agent decide what to do next? This can be:
- Simple: Fixed alternation (think → act → think → act)
- Complex: Learned policies that choose which action type to take
- Hierarchical: High-level planning with low-level execution
Where ReAct Sits in This Framework
Here's how ReAct compares to other agent architectures:
| Agent | Memory | Action Types | Decision Logic | Best For | |-------|--------|--------------|----------------|----------| | ReAct | Procedural only | Reasoning + grounding | Fixed: alternating thought/action | Robust, generalizable QA | | Voyager | Hierarchical procedural | All four | Learned policy with code abstraction | Exploration, learning complex skills | | SayCan | Procedural + value function | External (551 skills) | Combines LLM utility + learned value | Robotics with pre-trained skills | | Generative Agents | Full (semantic, episodic, procedural, working) | All four | Complex learned procedures | Social simulation, multi-agent interaction |
Based on: Sumers et al., "Cognitive Architectures for Language Agents"
ReAct's strength: Its simplicity and generalizability. By alternating between reasoning and grounding without complex memory management, ReAct proves robust across diverse domains.
Part 3: Building a ReAct Agent from Scratch
Let's implement a basic ReAct agent for a question-answering task. This will help you understand the mechanics before diving into advanced variations.
pythonfrom typing import Optional import json class ReActAgent: """ A simple ReAct agent that alternates between thinking and taking actions. """ def __init__(self, llm_client, tools: dict): """ Args: llm_client: Any LLM with a generate() method tools: Dictionary mapping tool names to callable functions Example: {"search": search_fn, "calculator": calc_fn} """ self.llm = llm_client self.tools = tools self.max_iterations = 10 self.step_count = 0 def _format_thought_action_prompt(self, task: str, history: list) -> str: """Format the prompt for the next thought and action.""" history_str = "\n".join(history) prompt = f"""You are an AI agent that answers questions by thinking and taking actions. Available tools: {list(self.tools.keys())} Task: {task} History: {history_str} Now, provide your next response in exactly this format: Thought: <your reasoning about what to do> Action: <tool_name> Input: <json input to the tool> Only respond with Thought, Action, and Input. Do not add anything else.""" return prompt def _parse_action(self, response: str) -> Optional[tuple]: """Parse the LLM response into (thought, tool_name, input).""" lines = response.strip().split('\n') thought = None action = None action_input = None for i, line in enumerate(lines): if line.startswith("Thought:"): thought = line[8:].strip() elif line.startswith("Action:"): action = line[7:].strip() elif line.startswith("Input:"): try: action_input = json.loads(line[6:].strip()) except json.JSONDecodeError: action_input = line[6:].strip() if thought and action: return thought, action, action_input return None def run(self, task: str) -> dict: """ Run the agent to completion. Args: task: The question or task to solve Returns: Dictionary with final_answer and full trajectory """ history = [] self.step_count = 0 while self.step_count < self.max_iterations: # Get next thought and action prompt = self._format_thought_action_prompt(task, history) response = self.llm.generate(prompt) parsed = self._parse_action(response) if not parsed: # LLM couldn't parse format; try again history.append(f"Invalid format. Please follow Thought/Action/Input format.") self.step_count += 1 continue thought, action, action_input = parsed history.append(f"Thought: {thought}") history.append(f"Action: {action}") # Check if agent is done if action == "Finish": return { "final_answer": action_input, "trajectory": history, "steps": self.step_count } # Execute the action if action not in self.tools: observation = f"Tool '{action}' not found. Available tools: {list(self.tools.keys())}" else: try: observation = self.tools[action](action_input) except Exception as e: observation = f"Error executing {action}: {str(e)}" history.append(f"Observation: {observation}") self.step_count += 1 return { "final_answer": "Max iterations reached", "trajectory": history, "steps": self.step_count } # Example tools def search(query: str) -> str: """Simulate searching a knowledge base.""" knowledge_base = { "python": "Python is a high-level programming language.", "react": "ReAct is a framework for language agents with reasoning and acting.", "ai": "Artificial Intelligence enables machines to learn and make decisions." } query_lower = query.lower() for key, value in knowledge_base.items(): if key in query_lower: return value return "No information found for that query." def calculator(expression: str) -> str: """Simple calculator tool.""" try: result = eval(expression) return str(result) except: return "Invalid expression" # Usage example if __name__ == "__main__": # Mock LLM client (replace with real OpenAI/Anthropic client) class MockLLM: def generate(self, prompt: str) -> str: # In reality, this would call an LLM API # This is a simplified mock for demonstration return """Thought: I need to search for information about ReAct Action: search Input: "react framework language agents\"""" agent = ReActAgent( llm_client=MockLLM(), tools={"search": search, "calculator": calculator, "Finish": lambda x: x} ) result = agent.run("What is ReAct?") print("Final Answer:", result['final_answer']) print("\nTrajectory:") for step in result['trajectory']: print(step)
Understanding the Flow
- Format Prompt: We ask the LLM to produce Thought → Action → Input
- Parse Response: Extract the three components from the LLM's text
- Execute Action: Call the corresponding tool with the input
- Observation: Capture what happened and add to history
- Loop: Repeat until the agent says "Finish"
Key insight: The history grows with each step, providing context for subsequent reasoning. This is what creates the virtuous cycle—each observation refines the agent's understanding.
Part 4: Agentic RAG—Where ReAct Meets Retrieval
ReAct's greatest impact has been in Retrieval-Augmented Generation (RAG). Traditional RAG retrieves documents once, then generates an answer. Agentic RAG lets agents decide when to retrieve, what to retrieve, and how to integrate multiple retrieval rounds.
The Architecture
Notice the feedback loop. The agent:
- Classifies the user's intent
- Reformulates the query
- Retrieves relevant documents
- Re-ranks results
- Evaluates confidence (new!)
- If confidence is low, loops back to retrieve differently
- Only produces final answer when confident
Optimizing Agentic RAG with RAG-Gym
Researchers at Xiong et al. identified three optimization dimensions for agentic RAG:
1. Prompt Engineering: Re2Search
python# Standard retrieval-based QA prompt prompt = f"""Given context: {retrieved_docs} Question: {query} Answer: """ # ReAct-style reasoning reflection prompt (Re2Search) prompt = f"""Given context: {retrieved_docs} Question: {query} Reason through this step by step: 1. What key concepts does the question ask about? 2. Which parts of the context are relevant? 3. Are there gaps in the retrieved context? 4. What's my confidence in this answer? If confidence is low, what should I search for next? Answer: """
The key: Making the agent explicit about gaps and confidence dramatically improves performance.
2. Actor Tuning: Direct Preference Optimization (DPO)
Instead of just fine-tuning on correct answers, fine-tune on preferences:
Preferred answer: [high-quality, reasoning-rich response]
Rejected answer: [low-quality, hallucinated response]
Results show DPO outperforms standard fine-tuning by 3-11% on RAG tasks.
3. Critic Training: Learned Evaluation
Train a separate "critic" model to evaluate intermediate steps:
pythonclass RagCritic: def evaluate_intermediate_step(self, query: str, retrieved_docs: list, candidate_answer: str) -> float: """Rate how good this answer would be (0-1).""" # Trained via supervised learning on human preferences score = self.critic_model(query, retrieved_docs, candidate_answer) return score # Use critic to filter low-quality generation attempts for attempt in candidate_answers: if critic.evaluate_intermediate_step(query, docs, attempt) > 0.7: return attempt
Results

