Beyond RAG - A system that understands your documents

By Prompt Engineering

Share:

Agentic File Search: A Deep Dive into Intelligent Document Exploration

Key Concepts:

  • RAG (Retrieval-Augmented Generation): A standard pipeline for question answering involving embedding documents, retrieving relevant chunks, and feeding them to an LLM.
  • Embeddings: Vector representations of text used for semantic similarity search.
  • Vector Database: A database optimized for storing and querying vector embeddings.
  • Agentic File Search: A dynamic approach to document search where an AI agent navigates documents like a human, following references and reasoning about context.
  • Chunking: Splitting documents into smaller pieces (chunks) for embedding and retrieval.
  • Cross-References: Links within a document or between documents pointing to related information.
  • Workflow Engine: A system for orchestrating a sequence of actions or tasks, in this case, the agent's exploration process.
  • LLM (Large Language Model): A powerful AI model capable of understanding and generating human-like text (e.g., Gemini 3 Flash).
  • Grip: A tool utilizing regular expression search within a file.
  • Glob: A tool for finding files based on patterns.

I. Limitations of Traditional RAG Systems

The video begins by outlining the shortcomings of the standard RAG pipeline when applied to complex, real-world documents. While effective for simple use cases, traditional RAG struggles with:

  • Context Loss: Chunking documents into smaller pieces (e.g., 500 tokens) destroys the relationships between sections. For example, information about a purchase price is disconnected from the relevant exhibit (Exhibit B) located pages later.
  • Invisible Cross-References: Embedding models cannot follow links or references within and between documents, rendering them effectively invisible to the system.
  • Semantic Similarity vs. Relevance: Similarity in meaning doesn’t guarantee usefulness. Two semantically similar chunks may not both be relevant to answering a specific question.
  • Lack of Reasoning: RAG operates on pattern matching at a surface level, lacking true understanding of the document's structure and content. It simply finds text that looks similar, not text that answers the question based on logical connections.

II. The Agentic File Search Approach: Mimicking Human Document Navigation

The core argument presented is that instead of relying on precomputed chunks, an AI agent should navigate documents intelligently, mirroring how a human would approach the task. This involves:

  • Initial Scan: The agent begins by quickly previewing all documents in a folder in parallel, gaining a broad understanding of the available information.
  • Targeted Deep Dive: Based on the initial scan, the agent identifies relevant documents and performs a full, in-depth reading of their content.
  • Backtracking & Reference Following: Critically, if a document references another document (e.g., "See Exhibit B"), the agent automatically retrieves and reads that referenced document, even if it was initially skipped. This is the key differentiator from traditional RAG.

Example: When searching for a return policy, a human would consult the table of contents, not start reading from page one. A traditional RAG system might find chunks mentioning "return policy" but fail to locate the complete, relevant section. The agentic approach, however, would navigate to the appropriate chapter and pinpoint the exact policy details.

III. Three-Phase Strategy & Technical Implementation

The agentic file search is implemented using a three-phase strategy:

  1. Phase 1: Parallel Scan: All documents are scanned in parallel, extracting a quick preview (approximately 1500 characters) of each. This is designed for speed and categorization.
  2. Phase 2: Deep Dive: The agent reads the full content of documents identified as relevant during the scan phase.
  3. Phase 3: Backtrack: The agent identifies and retrieves any documents referenced within the currently processed documents, ensuring complete context.

Technical Components:

  • Workflow Engine (Llama Index Workflow): Manages the asynchronous execution of the agent's actions, handling timeouts and ensuring separation of concerns.
  • Agent (Gemini 3 Flash): The decision-making component, receiving conversation history and returning structured JSON outputs defining the next action.
  • Document Parser (Dockling): Extracts text from various document formats (PDF, Word, PowerPoint) and converts it to clean markdown.
  • Tools: The agent has access to six tools:
    • Scan Documents: Parallel preview of all documents in a folder.
    • Preview File: Quick preview of a single document.
    • Parse File: Full document extraction.
    • Read File: Faster reading of plain text files.
    • Grip: Regular expression search within a file.
    • Glob: File finding based on patterns.
  • Caching: Parsed document content is cached to avoid redundant processing during backtracking.

IV. Key Differences: RAG vs. Agentic File Search

| Feature | RAG | Agentic File Search | |---|---|---| | Embeddings | Precomputed, fixed at index time | Dynamic, adapted to each query | | Context | Limited to pre-defined chunks | Full document context, including cross-references | | Reasoning | Pattern matching on surface level | Understands document structure and relationships | | Exploration | Static retrieval | Dynamic exploration based on question and document content | | Speed | Generally faster | Generally slower | | Accuracy | Lower for complex queries | Higher for complex, multi-document reasoning |

V. Practical Considerations & Use Cases

  • Performance: Agentic file search is slower and more token-intensive than traditional RAG. It's best suited for scenarios where accuracy is paramount and latency is less critical.
  • Cost: Higher token usage translates to increased cost, especially with proprietary LLMs. Utilizing local, long-context open-weight models can mitigate this.
  • Ideal Use Cases:
    • Complex Multi-Document Reasoning: Situations requiring information from multiple interconnected documents.
    • Structured Documents: Legal documents, technical specifications, financial filings where cross-references are common.
    • Accuracy-Critical Applications: Where precise and complete answers are essential.
  • RAG Still Valuable: RAG remains a suitable choice for simple question-answering tasks, large corpora, and scenarios where speed is a priority.

VI. Demonstration & Step-by-Step Example

The video demonstrates the agentic file search using a dataset from a large acquisition. When asked about employee retention benefits and non-competes, the agent:

  1. Scanned all 26 files in the folder.
  2. Parsed the most relevant document based on the scan.
  3. Identified cross-references to other documents (e.g., "Key Employee Retention Agreement").
  4. Read those referenced documents.
  5. Synthesized a comprehensive answer, citing the specific files and sections used.

This contrasts with a traditional RAG system, which would likely return fragmented information and fail to follow the logical connections between documents.

VII. Extensibility & Future Directions

The system is designed to be easily extensible:

  • Adding New Tools: Adding a new tool involves defining a function, adding it to the tool directory, updating the model configuration, and documenting it in the system prompt.
  • Swapping LLMs: Replacing the LLM (currently Gemini 3 Flash) is as simple as updating the import and the generate call function.

The presenter concludes that the future of document search lies not just in retrieval, but in exploration – enabling AI agents to navigate and understand documents with human-like intelligence.

Notable Quote:

“Rag is retriever, it finds similar text. Agentic file search is reasoning, it understands documents. Embedding finds similar text, agents understand documents and I think future is not just retrieval, it's exploration.” – Presenter.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Beyond RAG - A system that understands your documents". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video