RAG is Dead. Again. (Claude Agent SDK + Memory)
By Prompt Engineering
Key Concepts
- Agentic RAG (Retrieval-Augmented Generation): An autonomous system where an AI agent uses tools to retrieve, reason, and synthesize information from complex documents.
- Multi-layered Memory System: A dual-approach architecture combining semantic vector search with file-system-based tool access.
- Milvus: A high-performance, scalable, open-source vector database used for storing and querying high-dimensional embeddings.
- LlamaIndex Light Parser: A specialized tool for parsing complex PDF structures, including text, tables, and images.
- Semantic Similarity Search: A method of retrieving information based on the meaning of the query rather than exact keyword matching.
- Backtracking/Reasoning Loop: The agent’s ability to evaluate its own search results and re-query or scan additional documents if the initial retrieval is insufficient.
1. System Architecture and Methodology
The system provides Claude (via Agent SDK or Claude Code) with a persistent memory layer. It operates through a two-pronged approach:
- Semantic Layer: Uses Milvus to perform high-speed vector searches. This reduces the search space by identifying the most relevant document chunks based on cosine similarity.
- File System Layer: Provides the agent with "bash-like" tools to scan, parse, and read specific documents. This allows the agent to perform deep dives into files that the semantic search identified as relevant.
The Ingestion Pipeline:
- Parsing: Documents are processed via LlamaIndex Light Parser, which handles complex layouts (tables, free-flowing text).
- Visual Extraction: If a page contains images or graphs, the system takes a screenshot and stores the file path separately.
- Chunking & Embedding: Text is chunked and converted into high-dimensional embeddings using Gemini embeddings.
- Storage: Data is stored in Milvus, including the source document, extracted text, embedding vectors, image paths, and metadata for filtering.
2. The Retrieval Process
The agent follows a sophisticated, multi-step reasoning loop:
- Planning: The agent analyzes the user's query and creates a plan.
- Initial Scan: It performs a semantic search to identify candidate documents.
- Deep Dive: It uses file-system tools to read the identified documents in detail.
- Reasoning & Backtracking: If the agent determines that the initial retrieval missed critical information, it triggers a secondary search or scans additional files to ensure the final response is well-grounded.
3. Technical Specifications: Milvus
Milvus is chosen for its scalability and performance:
- Scalability: Supports distributed, Kubernetes-native architecture capable of handling billions of vectors.
- Performance: Supports both CPU and GPU acceleration.
- Flexibility: Offers "Milvus Lite" for single-file local setups and "Zilliz Cloud" for fully managed, hosted solutions.
- Licensing: Apache 2.0, making it suitable for commercial applications.
4. Real-World Application: Medical Data Analysis
The video demonstrates the system using complex medical documents (e.g., FDA/ADA guidelines) that contain mixed media (tables, images, text).
- Simple Query: "What are the side effects of [medicine]?" The agent uses semantic search to find the relevant table and provides a grounded answer.
- Comparative Query: "Compare diabetes medication guides from FDA vs. ADA." The agent performs query decomposition, scanning both specific files and reasoning through the differences.
- Open-ended/Complex Query: "What food interactions should a patient on blood pressure medication be aware of?" The agent executes a 16-step process, iteratively refining its search queries based on the context gathered to ensure comprehensive coverage.
5. Notable Quotes
- "The system is smart enough that it can go back and look at a specific document if it was missed during the initial retrieval process."
- "Instead of retrieval of information, you can use this system as a memory system for your agent."
6. Synthesis and Conclusion
This architecture moves beyond standard RAG by integrating agentic reasoning with multi-modal document parsing. By combining the speed of vector databases (Milvus) with the precision of file-system-level inspection, the system overcomes the limitations of traditional RAG, which often struggles with complex document layouts or multi-document synthesis. The ability to "backtrack" and perform iterative searches makes this a robust framework for long-horizon, knowledge-intensive tasks.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.