Your RAG Agent Needs a Hybrid Search Engine (n8n)

Key Concepts

RAG (Retrieval-Augmented Generation): An AI framework that combines information retrieval with text generation.
Hybrid Search: Combining different search techniques (dense embeddings, sparse embeddings, pattern matching) for improved retrieval.
Dense Embeddings (Semantic Search/Vector Search): Representing text as vectors in a high-dimensional space to capture semantic meaning.
Sparse Embeddings (Lexical Retrieval/Full Text Search): Tokenizing text and using techniques like inverted indexes or sparse vectors for keyword-based search.
Pattern Matching (Fuzzy Matching): Searching for specific character patterns or substrings within text, often using techniques like engrams or wildcard searches.
Dynamic Hybrid Search: Adjusting the weights of different search techniques based on the query type.
N8N: A workflow automation platform used to build the RAG agent.
Superbase: A backend-as-a-service platform used for storing and querying data.
Pinecone: A vector database used for storing and querying vector embeddings.

1. Introduction: The Information Retrieval Challenge in RAG

Building an effective RAG agent hinges on accurate information retrieval.
Many knowledge bases are unstructured, necessitating a robust search engine.
Semantic search (vector search) alone is insufficient for all query types.
A solid text-based search foundation is crucial for handling cases where semantic search fails.
The video introduces a dynamic hybrid search system where the AI agent adjusts retrieval weights based on the question.
Pattern matching (fuzzy matching) is added as a fallback for cases where embeddings fail, particularly for IDs or codes.

2. The Problem: Inaccurate Retrieval with Existing Hybrid Search

The presenter previously loaded thousands of product manuals (10GB, 215,000 chunks) into a Superbase vector store.
Initial tests revealed inaccurate retrieval, specifically failing to find a Whirlpool refrigerator manual.
The agent failed to retrieve chunks containing a specific product code, even though the code was present in the Superbase database.
A direct filter in Superbase using "I like" operator and wildcard search successfully found the product code, highlighting the agent's retrieval failure.

3. Causes of Inaccurate Retrieval: Messy, Unstructured Data

80-90% of enterprise data is unstructured (PDFs, Word documents, emails, etc.).
Unstructured text poses challenges for RAG systems.
Example: A 36-page PDF translated to 200 chunks, with only two containing the product code.
Text extraction issues:
- Multilingual documents (English and French) complicate extraction.
- Product codes are not extracted cleanly, running into other words.
- The "Extract from PDF" node in N8N extracts machine-readable text but can result in messy layout.
- Selecting text can be inaccurate due to layout issues.
OCR (Optical Character Recognition) can improve text extraction by maintaining layout and hierarchical structure.
However, OCR is not perfect and can introduce errors, especially with messy scanned documents.
Data pre-processing (human annotation or AI-powered extraction) can clean data before ingestion.
- LLMs can extract relevant information and prepend it to chunks or add it as metadata.
Even with pre-processing, data will never be 100% clean, necessitating fallback search mechanisms.

4. Dense Embeddings (Semantic Search) Explained

Dense embeddings are a typical semantic search approach using a vector store.
Embedding models (e.g., OpenAI's text-embedding-3-small) transform text into dense vector embeddings.
A sentence from a product manual (including a part number) is transformed into a 1536-dimensional vector.
Vectors are plotted in a high-dimensional space, capturing the semantic meaning of the text.
Words with similar meanings (e.g., "car," "automobile," "vehicle") are plotted close together.
When a query is made (e.g., "why the ice maker was not working"), it's also transformed into a vector.
Cosine similarity is used to find the closest vectors in the vector store, representing the most semantically similar content.
Strengths:
- Understands meaning and context.
- Finds conceptually similar content across different phrasings.
- Handles synonyms and related terms.
- Supports multilingual embeddings.
Weaknesses:
- Lacks transparency (black box).
- Doesn't work well with exact codes and identifiers.
- Cannot guarantee exact matches.
Examples of dense embedding models: OpenAI, Cohere, Gemini, GINA, BERT, Colbert.
MTEB leaderboard (linked in description) provides a comparison of industry-standard embedding models.
The original query ("Tell me about the part number") failed because dense embedding models are not trained on arbitrary part numbers.

5. Sparse Embeddings (Lexical Retrieval/Full Text Search) Explained

Also known as full text search, sparse representation, or lexical search.
Uses a tokenization approach.
Text is processed through an analyzer and tokenizer, which may lowercase, stem, remove stop words, and remove punctuation.
The part number can be split apart during tokenization, leading to inaccurate results.
Two implementations: inverted index and sparse vectors.
Inverted Index (Superbase Example):
- Chunks are stored with metadata and dense embeddings.
- Sparse representation is stored in the "full text search" field using TS vectors (Postgres).
- Keywords and their positions are extracted and saved in an inverted index.
- Full text search queries search across these extracted keywords.
- The example shows that spaces were not included in the extracted output, leading to bizarre keywords.
- Superbase uses TS vector and TS rank.
Sparse Vector Embeddings (Pinecone Example):
- Text is sent to a sparse embedding model (e.g., Pinecone sparse English v0.0 model) to get term IDs and weights.
- Term IDs represent words within a vocabulary.
- Weights indicate the importance of words within a context.
- Example: "water" is mapped to ID 4522 with a weight of 0.78.
- Term IDs and weights are plotted in a vector store.
- Querying involves tokenizing the query and using cosine similarity to find the closest vectors.
Scoring:
- Inverted index: Scoring happens after candidate retrieval (e.g., using BM25 or TS rank).
- Sparse vector: Scoring happens before ingestion into the vector store.
Strengths:
- High precision.
- Fast.
- Good for exact matches.
- Explainable and transparent.
- Forms the core of many search engines.
- Some semantic expansion.
Weaknesses:
- Not as semantic as dense embeddings.
- Not ideal for multilingual text.
- Not great for misspellings and typos.
- Dependency on training data.
Examples: BM25, TF, Postgres TS vector and TS rank, splade, deeper impact.
N8N examples:
- Full text search with TS vector in Superbase returns relevant chunks about ice dispensers and ice makers.
- Sparse embedding search using Pinecone's sparse model returns chunks with lexical matches.

6. Pattern-Based Retrieval Explained

Creating an engram index (array of overlapping character fragments).
Example: "the water inlet" becomes "the," "he w," "e wa," " wat," etc.
These fragments are stored in an engram or trigram index.
Query is also split into character fragments.
A lookup happens in the trigram index, and results are ranked based on similarity.
Wildcard search in Superbase using "I like" operator and percentage signs finds chunks containing the product code.
The AI agent now retrieves chunks with the product code and generates an accurate answer.
The LLM prioritizes pattern matching (70%) for product code queries, with lower weights for sparse (20%) and dense (10%) matching.
Strengths:
- Ultra-high precision for exact matches.
- Ideal for IDs, codes, and tokens.
- Ideal for fuzzy matching (typos, partial codes).
- Supports regex.
- Language agnostic.
Weaknesses:
- No semantics or meaning.
- Rudimentary similarity score.
- Produces large indexes.
Examples: wildcard search, "I like" search, regex, edit distance (Levenshtein algorithm).

7. Implementation of Dynamic Hybrid Search

Pattern matching is integrated into the hybrid search function on Superbase.
An N8N sub-workflow is called for hybrid search.
The sub-workflow generates a dense embedding for the query.
The dense embedding, query text, and weights for dense vectors, sparse keywords, and pattern matching are sent to the Superbase dynamic hybrid search edge function.
The edge function passes the payload to a database function.
The database function declares variables, sets up filter logic, and performs weighted queries:
- Vector search: queries the vector column.
- Keyword search: queries the FTS column using TS rank.
- Pattern match search: queries the content field using "I like" with wildcards.
Reciprocal rank fusion is used to fuse the result sets based on the weights.
The AI agent can now answer the question "Tell me about the product code" accurately.

8. Conclusion

The video demonstrates a dynamic hybrid search system that combines dense embeddings, sparse embeddings, and pattern matching for improved RAG performance.
Pattern matching is crucial for handling exact matches and fuzzy matching of codes and identifiers.
The AI agent can dynamically adjust the weights of different search techniques based on the query type.
The next step is to add a tool to load the full document or a summary of the document based on the retrieved chunks.
Access to the hybrid search workflows and RAG agent is available in the AI Automators community (link in description).

Your RAG Agent Needs a Hybrid Search Engine (n8n)

Key Concepts

1. Introduction: The Information Retrieval Challenge in RAG

2. The Problem: Inaccurate Retrieval with Existing Hybrid Search

3. Causes of Inaccurate Retrieval: Messy, Unstructured Data

4. Dense Embeddings (Semantic Search) Explained

5. Sparse Embeddings (Lexical Retrieval/Full Text Search) Explained

6. Pattern-Based Retrieval Explained

7. Implementation of Dynamic Hybrid Search

8. Conclusion

Chat with this Video

Related Videos

Ready to summarize another video?