Building Agentic RAG From Scratch in Pure Python

By Dave Ebbelaar

Share:

Key Concepts

  • Agentic RAG (Retrieval-Augmented Generation): An iterative AI architecture where an agent uses tools to search, read, and self-correct, as opposed to the linear, single-call process of traditional Semantic RAG.
  • Tool-Use Loop: The mechanism where an LLM calls functions (list, search, read) based on its reasoning, receives output, and decides the next step.
  • Ripgrep (rg): A high-performance, Rust-based command-line search tool used in production environments to efficiently search file systems while respecting .gitignore rules.
  • Subprocess Module: A Python library used to execute external system commands (like ripgrep) from within the Python environment.
  • Structured Output: Forcing the LLM to return data in a specific schema (e.g., JSON with citations) to ensure compatibility with downstream software components.

1. Agentic RAG vs. Semantic RAG

  • Semantic RAG: A linear process involving a single LLM API call after retrieving information. It is preferred for low-latency and cost-sensitive scenarios.
  • Agentic RAG: An iterative loop where the agent uses search and read tools to gather information, evaluate results, and self-correct if the initial findings are insufficient. It generally outperforms Semantic RAG in complex tasks due to its ability to refine its search strategy.

2. Core Tooling Framework

The system relies on three fundamental primitives to interact with a file system (specifically Markdown files):

  1. List Files: Uses pathlib.Path.glob to identify relevant files. The use of .relative_to() is critical to shorten file paths, saving context window tokens.
  2. Grab (Search): Uses regular expressions (re module) to find specific patterns within files. It returns the file name, line number, and the content of the matching line.
  3. Read Files: Opens and extracts text from specific files. Includes a safety check (is_relative_to) to ensure the agent remains contained within the designated "notes" directory.

3. Step-by-Step Implementation

  • Step 1 (Primitives): Define the directory structure using pathlib.
  • Step 2 (Search Logic): Compile regex patterns with re.IGNORECASE. Use enumerate() during file iteration to provide human-readable line numbers to the LLM.
  • Step 3 (Agent Integration): Use an agent framework (e.g., pydantic-ai) to bind these functions as tools. The LLM uses the function's docstrings to understand when and how to use them.
  • Step 4 (Debugging/Streaming): Implement streaming steps to intercept tool calls. This allows developers to see the exact parameters the LLM passes to the tools, which is essential for optimizing performance.
  • Step 5 (Structured Output): Define a Pydantic model for the final response, requiring the LLM to provide an answer alongside a list of citations (file, quote, line number).

4. Production Best Practices

  • Ripgrep Integration: Replace basic Python regex searches with ripgrep via subprocess for speed and better file handling (e.g., ignoring hidden files).
  • Error Handling: Instead of raising exceptions that crash the agent, return human-readable error messages. This allows the LLM to "see" the error and attempt a different approach (self-correction).
  • Safety Constraints: Implement agent_request_limit and read_max_lines to prevent infinite loops and context window overflow.
  • Logging: Use robust logging to track the agent's decision-making process in production environments.

5. Notable Quotes

  • "Semantic RAG is a more linear process... whereas with Agentic RAG, we have this loop where we have search tools and read tools that can get results, get back to the language model and we can utilize its intelligence in a loop."
  • "Everything that you put in here—your tool definition and even the docstring—this is all information that the large language model will use."

6. Synthesis and Conclusion

Agentic RAG transforms the LLM from a passive responder into an active researcher. By building a modular system using list, grab, and read tools, developers can provide LLMs with access to private, domain-specific data. While more resource-intensive than traditional RAG, the ability to self-correct and provide verifiable citations makes it the superior choice for complex engineering or enterprise knowledge tasks. The transition to production requires moving from basic Python string manipulation to optimized system tools like ripgrep and implementing strict error handling to ensure the agent remains reliable and contained.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video