OpenRAG: An open-source stack for RAG — Phil Nash

Key Concepts

RAG (Retrieval-Augmented Generation): A framework for improving LLM responses by grounding them in external, private, or domain-specific data.
Agentic Retrieval: An advanced RAG approach where an AI agent autonomously decides when and how to search for information using tools, rather than relying on a static retrieval pipeline.
Docling: An IBM-developed open-source library for parsing and structuring complex documents (PDFs, HTML, Word, etc.) into machine-readable formats.
OpenSearch: A distributed, open-source search and analytics engine used here for hybrid (vector + keyword) search.
Langflow: A visual, drag-and-drop orchestration tool for building AI agents and workflows.
JVector: A high-performance, disk-based vector index that allows for live indexing without requiring the entire dataset to reside in RAM.

1. The "RAG is Dead" Debate

Phil Nash addresses the common industry sentiment that RAG is obsolete due to increasing LLM context windows. He argues that RAG is not "dead" but rather "solved" in theory, while remaining difficult in practice. Challenges include:

Data Complexity: Handling diverse formats like PDFs, slides, and spreadsheets.
Pipeline Maintenance: The need for constant tuning of chunking strategies, embedding models, and search techniques (e.g., re-ranking, query rewriting).
Cost/Efficiency: The impracticality of passing millions of tokens into a context window for every query.

2. Open RAG: The Architecture

Open RAG is an open-source stack designed by IBM to provide a high-quality, extensible baseline for RAG systems. It integrates three core technologies:

Ingestion (Docling):
- Pipelines: Offers specialized pipelines for different media. Standard pipelines handle text-heavy files; ASR (Automatic Speech Recognition) handles audio/video; and PDF pipelines use layout analysis, table extraction, and optional OCR for scanned documents.
- Intermediate Representation: Docling converts documents into "Doc Tags" (XML-like structure), which preserves hierarchy and allows for intelligent, context-aware chunking.
Indexing (OpenSearch + JVector):
- Uses JVector for the K-Nearest Neighbor (kNN) plugin, enabling disk-based indexing that scales beyond memory limits.
- Supports Hybrid Search, combining vector similarity with keyword-based filtering for higher precision.
Orchestration (Langflow):
- Acts as the "glue" for the system, allowing developers to visually design the agent’s logic, add guardrails, and integrate external tools (e.g., calculators, web search).

3. Agentic Retrieval vs. Traditional RAG

Traditional RAG: A linear process where a query is embedded, a top-K search is performed, and the results are passed to the LLM.
Agentic Retrieval: The LLM acts as an agent. It receives the user query and a set of tools. The agent decides if it needs to search, how many times to search, and how to synthesize the results. This allows for multi-step reasoning and more accurate information retrieval.

4. Practical Implementation & Customization

Local vs. Cloud: The stack supports both cloud-based APIs (OpenAI, Anthropic, WatsonX) and local execution via Ollama (e.g., running Granite 3B or Qwen 3 models), making it suitable for air-gapped environments.
Knowledge Filters: Users can apply metadata-based filters in OpenSearch to restrict the agent’s search scope to specific document sets.
External Sync: The system includes OAuth connectors for Google Drive, SharePoint, and OneDrive, allowing for real-time synchronization of external document stores.
Extensibility: Because the backend is built on Langflow, developers can "unlock" the flow to add custom nodes, such as input guardrails or specialized data parsers, without rewriting the core application.

5. Notable Quotes

"If every business has less than a million tokens worth of data then sure RAG is dead and probably so are all those businesses."
"I think that agents and models shouldn't be doing arithmetic. They're language models, not math models—so a calculator is always useful."

6. Synthesis and Conclusion

Open RAG serves as a modular, opinionated, yet highly flexible framework that addresses the "hard" parts of RAG—specifically document parsing and agentic orchestration. By leveraging Docling for robust ingestion, OpenSearch for scalable indexing, and Langflow for visual agent design, it provides a production-ready baseline. The project (currently v0.4.0) emphasizes that while the core RAG process is well-understood, the real value lies in the ability to customize the pipeline to fit specific data structures and user interaction patterns.