The Complete Agentic RAG Build: 8 Modules, 2+ Hours, Full Stack

By The AI Automators

Share:

Key Concepts

  • RAG Evolution: The shift from basic RAG to agentic RAG, incorporating hybrid retrieval strategies and tools beyond vector search.
  • Agentic Capabilities: Utilizing agents, sub-agents, and sandboxed execution environments for flexible data access and processing.
  • Iterative Development: A collaborative “plan, build, validate” loop with AI coding assistance (Claude Code) and continuous integration via Git.
  • Performance Optimization: Addressing bottlenecks in document ingestion and leveraging concurrency to improve processing speed.
  • Security Considerations: Implementing robust security measures, particularly Row-Level Security (RLS) in Supabase, to protect sensitive data.
  • Modular Design: Building a RAG system with interchangeable components (LLMs, embedding models, vector stores) for customization and cost efficiency.

Project Overview & Initial Setup (Part 1)

The project aims to build a feature-rich, agentic RAG web application using a modern tech stack: Python (FastAPI) backend, React (TypeScript, Tailwind CSS, Shad CN UI, Vit) frontend, Superbase (vector search & storage), and Dockling (document processing). The build supports both cloud AI models (Claude Code, OpenAI, Open Router) and local models (Quinn 3 via LM Studio) for air-gapped deployment. The development methodology centers around a collaborative AI dev loop: PRD definition, planning with Claude Code (“plan mode”), building with Claude Code, validation (manual & automated), and iteration. Initial setup involved cloning a GitHub repository, installing dependencies, and configuring environment variables. The foundational RAG application was validated using a Whirlpool refrigerator document, demonstrating successful retrieval and response generation. Langsmith integration was established for observability and debugging.

Transition to Self-Hosted RAG & Debugging (Part 2)

The project transitioned from utilizing OpenAI’s managed RAG service (Manage Drag) to a self-hosted solution for greater control and cost efficiency. Initial testing confirmed basic RAG functionality within Manage Drag. A significant challenge involved resolving tracing issues in Langsmith, which was ultimately addressed through debugging with Cursor’s debug mode and multiple agents. Git version control was implemented, and appliance manuals (9,000 PDFs, initially testing with 150) were uploaded to Manage Drag for testing. The limitations of Manage Drag – lack of visibility into chunking, embedding models, and retrieval tools, along with associated costs ($0.10/GB storage, $2.50/1000 tool calls) – motivated the move to a self-hosted approach. Plans were made to integrate Open Router and a locally hosted AI server (Quinncree VL30B) and utilize PG Vector within Superbase as the vector database. Module 1 was completed and released as version 0.2 on GitHub.

Expanding Capabilities & Performance Optimization (Part 3)

Performance optimization was a primary focus, addressing initial memory consumption issues (8GB) during document ingestion. Claude was used to analyze the process and identify bottlenecks: double loading of files and a lack of concurrency limits. Increasing concurrent ingestions to 10 provided a slight improvement. Module 7 introduced two new tools: text-to-SQL (querying a Superbase “order history” table with a read-only database user for security) and web search (using the Tavly API). Module 8 introduced sub-agents to overcome context window limitations, allowing the main agent to delegate complex tasks (full document analysis) to isolated sub-agents. Debugging involved resolving issues with database functions, Superbase migrations, UI rendering, and sub-agent document access. The context window was increased to 70,000 tokens.

Current Status & Future Directions (Part 4)

The project reached an “alpha” stage, featuring a fully functional agentic RAG system with document management, ingestion (Dockling supporting PDF, DocX, PowerPoint, and VLM integration), and a chat interface. The application supports LLM swapping, embedding configuration, rerankers, and web searches. User authentication and data isolation are handled by Supabase with Row-Level Security (RLS). A critical security flaw – the lack of RLS on the “sales data” table – was identified and highlighted as a priority for remediation. The developer emphasized the extensive work remaining and positioned this project as the first in a series, directing viewers to over 30 related videos on the channel covering advanced RAG topics, including RAG design patterns, deep evaluation, zero trust RAG, knowledge graphs, and context expansion.

Conclusion

This project demonstrates a comprehensive approach to building a modern, agentic RAG application. The iterative development process, leveraging AI assistance and emphasizing security and performance optimization, showcases a practical pathway for grounding AI systems in private data. While the current alpha version requires further testing and refinement, it represents a significant step towards a powerful and customizable RAG solution. The project’s modular design and focus on open-source tools provide a flexible foundation for future expansion and innovation.

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "The Complete Agentic RAG Build: 8 Modules, 2+ Hours, Full Stack". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video