Your RAG Agent Needs a Hybrid Search Engine (n8n)
By The AI Automators
AITechnologyBusiness
Share:
Key Concepts
- RAG (Retrieval-Augmented Generation): An AI framework that combines information retrieval with text generation.
- Hybrid Search: Combining different search techniques (dense embeddings, sparse embeddings, pattern matching) for improved retrieval.
- Dense Embeddings (Semantic Search/Vector Search): Representing text as vectors in a high-dimensional space to capture semantic meaning.
- Sparse Embeddings (Lexical Retrieval/Full Text Search): Tokenizing text and using techniques like inverted indexes or sparse vectors for keyword-based search.
- Pattern Matching (Fuzzy Matching): Searching for specific character patterns or substrings within text, often using techniques like engrams or wildcard searches.
- Dynamic Hybrid Search: Adjusting the weights of different search techniques based on the query type.
- N8N: A workflow automation platform used to build the RAG agent.
- Superbase: A backend-as-a-service platform used for storing and querying data.
- Pinecone: A vector database used for storing and querying vector embeddings.
1. Introduction: The Information Retrieval Challenge in RAG
- Building an effective RAG agent hinges on accurate information retrieval.
- Many knowledge bases are unstructured, necessitating a robust search engine.
- Semantic search (vector search) alone is insufficient for all query types.
- A solid text-based search foundation is crucial for handling cases where semantic search fails.
- The video introduces a dynamic hybrid search system where the AI agent adjusts retrieval weights based on the question.
- Pattern matching (fuzzy matching) is added as a fallback for cases where embeddings fail, particularly for IDs or codes.
2. The Problem: Inaccurate Retrieval with Existing Hybrid Search
- The presenter previously loaded thousands of product manuals (10GB, 215,000 chunks) into a Superbase vector store.
- Initial tests revealed inaccurate retrieval, specifically failing to find a Whirlpool refrigerator manual.
- The agent failed to retrieve chunks containing a specific product code, even though the code was present in the Superbase database.
- A direct filter in Superbase using "I like" operator and wildcard search successfully found the product code, highlighting the agent's retrieval failure.
3. Causes of Inaccurate Retrieval: Messy, Unstructured Data
- 80-90% of enterprise data is unstructured (PDFs, Word documents, emails, etc.).
- Unstructured text poses challenges for RAG systems.
- Example: A 36-page PDF translated to 200 chunks, with only two containing the product code.
- Text extraction issues:
- Multilingual documents (English and French) complicate extraction.
- Product codes are not extracted cleanly, running into other words.
- The "Extract from PDF" node in N8N extracts machine-readable text but can result in messy layout.
- Selecting text can be inaccurate due to layout issues.
- OCR (Optical Character Recognition) can improve text extraction by maintaining layout and hierarchical structure.
- However, OCR is not perfect and can introduce errors, especially with messy scanned documents.
- Data pre-processing (human annotation or AI-powered extraction) can clean data before ingestion.
- LLMs can extract relevant information and prepend it to chunks or add it as metadata.
- Even with pre-processing, data will never be 100% clean, necessitating fallback search mechanisms.
4. Dense Embeddings (Semantic Search) Explained
- Dense embeddings are a typical semantic search approach using a vector store.
- Embedding models (e.g., OpenAI's text-embedding-3-small) transform text into dense vector embeddings.
- A sentence from a product manual (including a part number) is transformed into a 1536-dimensional vector.
- Vectors are plotted in a high-dimensional space, capturing the semantic meaning of the text.
- Words with similar meanings (e.g., "car," "automobile," "vehicle") are plotted close together.
- When a query is made (e.g., "why the ice maker was not working"), it's also transformed into a vector.
- Cosine similarity is used to find the closest vectors in the vector store, representing the most semantically similar content.
- Strengths:
- Understands meaning and context.
- Finds conceptually similar content across different phrasings.
- Handles synonyms and related terms.
- Supports multilingual embeddings.
- Weaknesses:
- Lacks transparency (black box).
- Doesn't work well with exact codes and identifiers.
- Cannot guarantee exact matches.
- Examples of dense embedding models: OpenAI, Cohere, Gemini, GINA, BERT, Colbert.
- MTEB leaderboard (linked in description) provides a comparison of industry-standard embedding models.
- The original query ("Tell me about the part number") failed because dense embedding models are not trained on arbitrary part numbers.
5. Sparse Embeddings (Lexical Retrieval/Full Text Search) Explained
- Also known as full text search, sparse representation, or lexical search.
- Uses a tokenization approach.
- Text is processed through an analyzer and tokenizer, which may lowercase, stem, remove stop words, and remove punctuation.
- The part number can be split apart during tokenization, leading to inaccurate results.
- Two implementations: inverted index and sparse vectors.
- Inverted Index (Superbase Example):
- Chunks are stored with metadata and dense embeddings.
- Sparse representation is stored in the "full text search" field using TS vectors (Postgres).
- Keywords and their positions are extracted and saved in an inverted index.
- Full text search queries search across these extracted keywords.
- The example shows that spaces were not included in the extracted output, leading to bizarre keywords.
- Superbase uses TS vector and TS rank.
- Sparse Vector Embeddings (Pinecone Example):
- Text is sent to a sparse embedding model (e.g., Pinecone sparse English v0.0 model) to get term IDs and weights.
- Term IDs represent words within a vocabulary.
- Weights indicate the importance of words within a context.
- Example: "water" is mapped to ID 4522 with a weight of 0.78.
- Term IDs and weights are plotted in a vector store.
- Querying involves tokenizing the query and using cosine similarity to find the closest vectors.
- Scoring:
- Inverted index: Scoring happens after candidate retrieval (e.g., using BM25 or TS rank).
- Sparse vector: Scoring happens before ingestion into the vector store.
- Strengths:
- High precision.
- Fast.
- Good for exact matches.
- Explainable and transparent.
- Forms the core of many search engines.
- Some semantic expansion.
- Weaknesses:
- Not as semantic as dense embeddings.
- Not ideal for multilingual text.
- Not great for misspellings and typos.
- Dependency on training data.
- Examples: BM25, TF, Postgres TS vector and TS rank, splade, deeper impact.
- N8N examples:
- Full text search with TS vector in Superbase returns relevant chunks about ice dispensers and ice makers.
- Sparse embedding search using Pinecone's sparse model returns chunks with lexical matches.
6. Pattern-Based Retrieval Explained
- Creating an engram index (array of overlapping character fragments).
- Example: "the water inlet" becomes "the," "he w," "e wa," " wat," etc.
- These fragments are stored in an engram or trigram index.
- Query is also split into character fragments.
- A lookup happens in the trigram index, and results are ranked based on similarity.
- Wildcard search in Superbase using "I like" operator and percentage signs finds chunks containing the product code.
- The AI agent now retrieves chunks with the product code and generates an accurate answer.
- The LLM prioritizes pattern matching (70%) for product code queries, with lower weights for sparse (20%) and dense (10%) matching.
- Strengths:
- Ultra-high precision for exact matches.
- Ideal for IDs, codes, and tokens.
- Ideal for fuzzy matching (typos, partial codes).
- Supports regex.
- Language agnostic.
- Weaknesses:
- No semantics or meaning.
- Rudimentary similarity score.
- Produces large indexes.
- Examples: wildcard search, "I like" search, regex, edit distance (Levenshtein algorithm).
7. Implementation of Dynamic Hybrid Search
- Pattern matching is integrated into the hybrid search function on Superbase.
- An N8N sub-workflow is called for hybrid search.
- The sub-workflow generates a dense embedding for the query.
- The dense embedding, query text, and weights for dense vectors, sparse keywords, and pattern matching are sent to the Superbase dynamic hybrid search edge function.
- The edge function passes the payload to a database function.
- The database function declares variables, sets up filter logic, and performs weighted queries:
- Vector search: queries the vector column.
- Keyword search: queries the FTS column using TS rank.
- Pattern match search: queries the content field using "I like" with wildcards.
- Reciprocal rank fusion is used to fuse the result sets based on the weights.
- The AI agent can now answer the question "Tell me about the product code" accurately.
8. Conclusion
- The video demonstrates a dynamic hybrid search system that combines dense embeddings, sparse embeddings, and pattern matching for improved RAG performance.
- Pattern matching is crucial for handling exact matches and fuzzy matching of codes and identifiers.
- The AI agent can dynamically adjust the weights of different search techniques based on the query type.
- The next step is to add a tool to load the full document or a summary of the document based on the retrieved chunks.
- Access to the hybrid search workflows and RAG agent is available in the AI Automators community (link in description).
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Your RAG Agent Needs a Hybrid Search Engine (n8n)". What would you like to know?
Chat is based on the transcript of this video and may not be 100% accurate.