RAG is Dead. Again. (Claude Agent SDK + Memory)

By Prompt Engineering

Share:

Key Concepts

  • Agentic RAG (Retrieval-Augmented Generation): An autonomous system where an AI agent uses tools to retrieve, reason, and synthesize information from complex documents.
  • Multi-layered Memory System: A dual-approach architecture combining semantic vector search with file-system-based tool access.
  • Milvus: A high-performance, scalable, open-source vector database used for storing and querying high-dimensional embeddings.
  • LlamaIndex Light Parser: A specialized tool for parsing complex PDF structures, including text, tables, and images.
  • Semantic Similarity Search: A method of retrieving information based on the meaning of the query rather than exact keyword matching.
  • Backtracking/Reasoning Loop: The agent’s ability to evaluate its own search results and re-query or scan additional documents if the initial retrieval is insufficient.

1. System Architecture and Methodology

The system provides Claude (via Agent SDK or Claude Code) with a persistent memory layer. It operates through a two-pronged approach:

  • Semantic Layer: Uses Milvus to perform high-speed vector searches. This reduces the search space by identifying the most relevant document chunks based on cosine similarity.
  • File System Layer: Provides the agent with "bash-like" tools to scan, parse, and read specific documents. This allows the agent to perform deep dives into files that the semantic search identified as relevant.

The Ingestion Pipeline:

  1. Parsing: Documents are processed via LlamaIndex Light Parser, which handles complex layouts (tables, free-flowing text).
  2. Visual Extraction: If a page contains images or graphs, the system takes a screenshot and stores the file path separately.
  3. Chunking & Embedding: Text is chunked and converted into high-dimensional embeddings using Gemini embeddings.
  4. Storage: Data is stored in Milvus, including the source document, extracted text, embedding vectors, image paths, and metadata for filtering.

2. The Retrieval Process

The agent follows a sophisticated, multi-step reasoning loop:

  1. Planning: The agent analyzes the user's query and creates a plan.
  2. Initial Scan: It performs a semantic search to identify candidate documents.
  3. Deep Dive: It uses file-system tools to read the identified documents in detail.
  4. Reasoning & Backtracking: If the agent determines that the initial retrieval missed critical information, it triggers a secondary search or scans additional files to ensure the final response is well-grounded.

3. Technical Specifications: Milvus

Milvus is chosen for its scalability and performance:

  • Scalability: Supports distributed, Kubernetes-native architecture capable of handling billions of vectors.
  • Performance: Supports both CPU and GPU acceleration.
  • Flexibility: Offers "Milvus Lite" for single-file local setups and "Zilliz Cloud" for fully managed, hosted solutions.
  • Licensing: Apache 2.0, making it suitable for commercial applications.

4. Real-World Application: Medical Data Analysis

The video demonstrates the system using complex medical documents (e.g., FDA/ADA guidelines) that contain mixed media (tables, images, text).

  • Simple Query: "What are the side effects of [medicine]?" The agent uses semantic search to find the relevant table and provides a grounded answer.
  • Comparative Query: "Compare diabetes medication guides from FDA vs. ADA." The agent performs query decomposition, scanning both specific files and reasoning through the differences.
  • Open-ended/Complex Query: "What food interactions should a patient on blood pressure medication be aware of?" The agent executes a 16-step process, iteratively refining its search queries based on the context gathered to ensure comprehensive coverage.

5. Notable Quotes

  • "The system is smart enough that it can go back and look at a specific document if it was missed during the initial retrieval process."
  • "Instead of retrieval of information, you can use this system as a memory system for your agent."

6. Synthesis and Conclusion

This architecture moves beyond standard RAG by integrating agentic reasoning with multi-modal document parsing. By combining the speed of vector databases (Milvus) with the precision of file-system-level inspection, the system overcomes the limitations of traditional RAG, which often struggles with complex document layouts or multi-document synthesis. The ability to "backtrack" and perform iterative searches makes this a robust framework for long-horizon, knowledge-intensive tasks.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video