HybridRAG: A Fusion of Graph and Vector Retrieval - Mitesh Patel, NVIDIA

Key Concepts

Knowledge Graph, Graph RAG, Hybrid RAG, Triplets (Entity-Relationship-Entity), Ontology, Semantic Vector Database, Vector Embeddings, Chunking, Overlap, Retrieval Strategies (Single-hop, Multi-hop), Latency, Cool Graph, NetworkX, Ragas, LLM Evaluation, Faithfulness, Answer Relevancy, Precision, Recall, Helpfulness, Collectiveness, Coherence, Complexity, Verbosity, Llama 3, LoRA, Data Cleaning, Acceleration.

1. Introduction

Mitesh from Nvidia's developer advocate team introduces the concept of Graph RAG (Retrieval Augmented Generation) systems and their hybrid nature. The talk focuses on a project done with a partner, exploring the advantages of Graph RAG and how a hybrid approach can be beneficial. While a deep dive into the codebase isn't possible, a high-level overview is provided, along with a GitHub link for accessing relevant notebooks.

2. Knowledge Graphs: A Refresher

Definition: A network representing relationships between entities (people, places, concepts, events).
Example: Mitesh (entity) is a speaker (relationship) at the AI Worldfare Conference (entity).
Importance: Knowledge graphs excel at exploiting relationships, making them valuable for RAG systems.
Goal: To create triplets that define relationships between entities.
Advantage over Semantic RAG: Captures information between entities in more detail, providing a comprehensive view of knowledge.
Data Organization: Can organize data from multiple sources.

3. Building a Graph RAG System: The Four Components

The process is broken down into four components:

Data: The foundation of the system.
Data Processing: Crucial for knowledge graph quality.
Graph Creation/Semantic Embedding Vector Database Creation: Building the knowledge base.
Inferencing: Querying the system and generating responses.

These components are further divided into offline (data processing, graph creation) and online (querying and response generation) stages.

Offline: Involves building the semantic vector database (documents -> vector embeddings -> vector database) and creating the knowledge graph.
Online: Focuses on querying the knowledge graph or vector database and converting the retrieved information into a user-readable response.

4. Creating the Knowledge Graph: Triplet Extraction

Challenge: Extracting triplets from unstructured documents.
Example: From an Exxon Mobile results document: "Exxon Mobile (entity) cut (relationship) spending on oil and gas exploration (entity)."
Solution: Using LLMs and prompt engineering to extract and structure information based on a defined ontology.
Ontology Definition: Defining the specific relationships and entities relevant to the use case.
Prompt Engineering: Crafting prompts that guide the LLM to extract ontology-specific information and structure it into triplets.
Iterative Process: Refining the ontology and prompts through iterative testing and improvement.
Time Investment: Emphasized that 80% of the time will be spent on getting the ontology right.

5. Semantic Vector Database Creation

Process:
1. Pick a document (e.g., the first page of the "Attention is All You Need" paper).
2. Break it into chunks of a defined size.
3. Use an overlap between chunks to maintain context.
4. Convert each chunk into a vector embedding using an embedding model.
5. Store the embeddings in a vector database.
Chunk Size and Overlap: Important parameters to consider for maintaining context between chunks.
Graph RAG Advantage: Exploits relationships between entities, which semantic vector databases often miss.

6. Retrieval Strategies

Querying: Asking a question (e.g., "What is Exxon Mobile's cut this quarter?").
Knowledge Graph Retrieval: Retrieving relevant nodes (entities) and their relationships.
Multi-Hop Exploitation: Emphasized the importance of exploring relationships through multiple nodes (going beyond a single hop).
Trade-off: Deeper exploration provides better context but increases latency.
Optimization: Finding a balance between depth of exploration and acceptable latency.
Acceleration: Using libraries like Cool Graph (integrated with NetworkX) to accelerate graph searches.

7. Performance Evaluation

Factors: Faithfulness, answer relevancy, precision, recall, helpfulness, collectiveness, coherence, complexity, verbosity.
Ragas Library: A pip installable library for end-to-end evaluation of RAG workflows. It evaluates the query, retrieval, and response.
LLM Integration: Ragas uses an LLM (default: GPT) for evaluation but allows integration with custom models.
Reward Models: Using models like Lanimotron 340M to evaluate the responses of other LLMs.

8. Optimization Strategies

80/20 Rule: Getting a basic Graph RAG system working takes 20% of the time, but optimizing it for production takes 80%.
Knowledge Graph Quality: Improving the quality of the knowledge graph is crucial.
Data Processing: Cleaning data (removing reax, apostrophes, brackets, etc.) can improve results.
Fine-tuning LLMs: Fine-tuning an LLM model can improve the quality of triplet generation.
Experimentation: Testing different strategies and tweaks to optimize performance.

9. Experiment Results

Data Cleaning: Removing apostrophes and other characters led to better results.
Output Reduction: Reducing the length of the output improved performance.
Fine-tuning Llama 3: Fine-tuning Llama 3.3 with LoRA improved accuracy from 71% to 87% on a test set of 100 documents.
Acceleration with Cool Graph: Using Cool Graph with NetworkX significantly reduced latency for graph searches.

10. Graph RAG vs. Semantic RAG vs. Hybrid RAG

Diplomatic Answer: "It depends."
Data Structure: Graph-based systems are well-suited for structured data (e.g., retail, FSI, employee databases).
Knowledge Graph Creation: If a good knowledge graph can be created from unstructured data, it's worth experimenting with Graph RAG.
Application Requirements: Use Graph RAG if the use case requires understanding complex relationships.
Compute Intensity: Graph RAG systems are compute-heavy, so consider this factor.

11. Conclusion

The talk provides a comprehensive overview of Graph RAG systems, highlighting their advantages, challenges, and optimization strategies. The key takeaway is that building a successful Graph RAG system requires careful attention to data processing, ontology definition, retrieval strategies, and performance evaluation. While compute-intensive, Graph RAG can be a powerful tool for applications that require understanding complex relationships between entities. The speaker encourages attendees to explore the provided GitHub resources and join Nvidia's developer programs for further learning.