Exploration is All You Need!

Agentic File Search: Combining Semantic & Traditional Retrieval for Enhanced RAG

Key Concepts:

Retrieval Augmented Generation (RAG): A technique to improve the performance of Large Language Models (LLMs) by retrieving relevant information from an external knowledge source and incorporating it into the prompt.
Agentic File Search: A system utilizing LLMs as agents with simple tools (file operations, code execution) to explore and search documents.
Parallel Scan: Initial phase of the agentic search, quickly reading document beginnings to identify potentially relevant files.
Deep Dive: Second phase, sending candidate documents to an LLM for detailed analysis.
Backtracking: Third phase, enabling the system to revisit documents referenced within others, crucial for interconnected information.
Smart Chunking: Dividing documents into sub-documents for embedding creation and semantic search.
Dual Path Search Pipeline: Combining semantic search (using embeddings) and agentic search for improved retrieval.
Lang Extract: A Gemini-powered library for automated information extraction and metadata creation.
Harness Engineering: A field focused on providing agents with generalized tools for problem-solving, rather than rigid workflows.

I. The Problem with Current RAG Systems & Introduction to the Agentic Approach

The video addresses a core challenge in Retrieval Augmented Generation (RAG): achieving high accuracy in retrieving relevant information. Traditional RAG relies heavily on embedding-based semantic retrieval. However, the presenter argues that simple search tools, already available to coding agents (file operations, code execution), can achieve comparable accuracy. The core idea is to build a system that leverages these simple tools for retrieval, and then explore combining this with traditional embedding-based methods.

This work builds on a previous two-part series detailing an agentic file search system. The project is open-source with nearly 500 GitHub stars as of the video’s creation. The system operates in three phases: Parallel Scan, Deep Dive, and Backtracking. The Parallel Scan quickly identifies potentially relevant documents. The Deep Dive uses an LLM to analyze these candidates. Crucially, the Backtracking phase allows the system to revisit documents referenced within others, a critical feature for complex, interconnected information like legal documents (e.g., patents referencing diagrams in other documents).

The presenter notes a significant drawback: the system’s sequential nature makes it slow and unsuitable for real-time applications. This leads to the central question explored in the video: can this agentic approach be combined with traditional RAG as a filtering step to improve speed?

II. Introducing the Dual Path Search Pipeline & Smart Chunking

The proposed solution is a Dual Path Search Pipeline. The existing system currently parses documents into Markdown using Dockling but lacks pre-processing beyond that. The new setup introduces Smart Chunking, dividing documents into sub-documents for embedding generation. Embeddings are currently generated using the Gemini model.

Alongside embeddings, the system extracts metadata from each document. This metadata can be user-defined or automatically extracted using Lang Extract, a Gemini-powered information extraction library previously covered by the presenter. All data is stored in a DuckDB database, resulting in four tables: original documents with metadata, created chunks, embeddings, and the metadata schema itself.

The presenter emphasizes the importance of metadata for real-world applications. For example, with invoice data, an LLM can be instructed to extract specific information (payee, invoice amount, dates) as metadata, enabling targeted filtering before the agentic search begins.

III. Query Time Operation & Four Modes of Operation

During query processing, the user query is processed through multiple passes depending on the selected mode. The system utilizes both a semantic search path and a metadata-based filtering path. The metadata path filters documents (not chunks), reducing the search space for the agentic search. The system then deduplicates results before feeding them to the agentic file search component.

The presenter highlights that the agentic search’s inherent capabilities minimize the need for perfect initial retrieval accuracy. The system operates in four modes:

Pure Agentic Search: The system described in previous videos, relying solely on the agent and its tools. Inspired by Harness Engineering, which advocates for providing agents with generalized tools rather than rigid workflows.
Semantic Search as Prefiltering: Semantic search on chunk level is used to pre-filter documents before passing them to the agentic file search.
Metadata-Based Filtering: Documents are filtered based on user-defined or automatically extracted metadata before being sent to the agent.
Combined Mode: Utilizes both semantic search and metadata filtering before the agentic search.

Currently, the system is powered by the Gemini model due to its large context window and ability to handle the "needling the haystack" problem (finding relevant information in a large corpus). However, the system is designed to be flexible and can accommodate other LLM providers.

IV. Installation & Demonstration via Web UI

The installation process is straightforward: cloning the repository and running dependency installation commands. Using proprietary models like Gemini requires providing an API key. A separate branch allows running the system with a local model, but the presenter recommends a 32B model or larger due to the complexity of the multi-step agentic process.

A demonstration using the web UI showcases the system’s functionality. The UI allows selecting a folder containing documents (in this case, 11 legal documents related to an acquisition). The system indexes the documents, parsing them, extracting metadata, and normalizing the data.

The user can then enable or disable different components (semantic search, metadata filtering, agentic pipeline) and submit a query. The example query, “summarize the risk assessment PDF file,” demonstrates the system’s ability to automatically identify the task and retrieve the relevant document. The system then retrieves the full document, sends it to the Gemini API, and generates a comprehensive assessment with references to specific sections within the document.

V. Conclusion & Future Directions

The presenter concludes that this dual-path approach represents a “powerful design pattern” for RAG. While acknowledging that the system is not yet production-ready, it offers a significant improvement in retrieval accuracy and efficiency.

The presenter offers consulting services for those needing assistance with building RAG systems. He encourages viewers to explore the open-source project and thanks them for watching. He also mentions plans for future videos exploring Harness Engineering in more detail.

Notable Quote:

“You don’t really have to worry about the accuracy of retrieval in this first step because the agent is going to take care of most of that.” – The presenter, emphasizing the strength of the agentic search component.