Back to all videos

RAG is Dead. Again. (Claude Agent SDK + Memory)

By Prompt Engineering

Constraint 1: Precise sub-categories Agentic RAG:* This is the core topic Milvus:* Vector database scalability

Share:

Key Concepts

Agentic RAG (Retrieval-Augmented Generation): An autonomous system where an AI agent uses tools to retrieve, reason, and synthesize information from complex documents.
Multi-layered Memory System: A dual-approach architecture combining semantic vector search with file-system-based tool access.
Milvus: A high-performance, scalable, open-source vector database used for storing and querying high-dimensional embeddings.
LlamaIndex Light Parser: A specialized tool for parsing complex PDF structures, including text, tables, and images.
Semantic Similarity Search: A method of retrieving information based on the meaning of the query rather than exact keyword matching.
Backtracking/Reasoning Loop: The agent’s ability to evaluate its own search results and re-query or scan additional documents if the initial retrieval is insufficient.

1. System Architecture and Methodology

The system provides Claude (via Agent SDK or Claude Code) with a persistent memory layer. It operates through a two-pronged approach:

Semantic Layer: Uses Milvus to perform high-speed vector searches. This reduces the search space by identifying the most relevant document chunks based on cosine similarity.
File System Layer: Provides the agent with "bash-like" tools to scan, parse, and read specific documents. This allows the agent to perform deep dives into files that the semantic search identified as relevant.

The Ingestion Pipeline:

Parsing: Documents are processed via LlamaIndex Light Parser, which handles complex layouts (tables, free-flowing text).
Visual Extraction: If a page contains images or graphs, the system takes a screenshot and stores the file path separately.
Chunking & Embedding: Text is chunked and converted into high-dimensional embeddings using Gemini embeddings.
Storage: Data is stored in Milvus, including the source document, extracted text, embedding vectors, image paths, and metadata for filtering.

2. The Retrieval Process

The agent follows a sophisticated, multi-step reasoning loop:

Planning: The agent analyzes the user's query and creates a plan.
Initial Scan: It performs a semantic search to identify candidate documents.
Deep Dive: It uses file-system tools to read the identified documents in detail.
Reasoning & Backtracking: If the agent determines that the initial retrieval missed critical information, it triggers a secondary search or scans additional files to ensure the final response is well-grounded.

3. Technical Specifications: Milvus

Milvus is chosen for its scalability and performance:

Scalability: Supports distributed, Kubernetes-native architecture capable of handling billions of vectors.
Performance: Supports both CPU and GPU acceleration.
Flexibility: Offers "Milvus Lite" for single-file local setups and "Zilliz Cloud" for fully managed, hosted solutions.
Licensing: Apache 2.0, making it suitable for commercial applications.

4. Real-World Application: Medical Data Analysis

The video demonstrates the system using complex medical documents (e.g., FDA/ADA guidelines) that contain mixed media (tables, images, text).

Simple Query: "What are the side effects of [medicine]?" The agent uses semantic search to find the relevant table and provides a grounded answer.
Comparative Query: "Compare diabetes medication guides from FDA vs. ADA." The agent performs query decomposition, scanning both specific files and reasoning through the differences.
Open-ended/Complex Query: "What food interactions should a patient on blood pressure medication be aware of?" The agent executes a 16-step process, iteratively refining its search queries based on the context gathered to ensure comprehensive coverage.

5. Notable Quotes

"The system is smart enough that it can go back and look at a specific document if it was missed during the initial retrieval process."
"Instead of retrieval of information, you can use this system as a memory system for your agent."

6. Synthesis and Conclusion

This architecture moves beyond standard RAG by integrating agentic reasoning with multi-modal document parsing. By combining the speed of vector databases (Milvus) with the precision of file-system-level inspection, the system overcomes the limitations of traditional RAG, which often struggles with complex document layouts or multi-document synthesis. The ability to "backtrack" and perform iterative searches makes this a robust framework for long-horizon, knowledge-intensive tasks.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video