Your RAG Agent Needs a Hybrid Search Engine (n8n)

By The AI Automators

AITechnologyBusiness
Share:

Key Concepts

  • RAG (Retrieval-Augmented Generation): An AI framework that combines information retrieval with text generation.
  • Hybrid Search: Combining different search techniques (dense embeddings, sparse embeddings, pattern matching) for improved retrieval.
  • Dense Embeddings (Semantic Search/Vector Search): Representing text as vectors in a high-dimensional space to capture semantic meaning.
  • Sparse Embeddings (Lexical Retrieval/Full Text Search): Tokenizing text and using techniques like inverted indexes or sparse vectors for keyword-based search.
  • Pattern Matching (Fuzzy Matching): Searching for specific character patterns or substrings within text, often using techniques like engrams or wildcard searches.
  • Dynamic Hybrid Search: Adjusting the weights of different search techniques based on the query type.
  • N8N: A workflow automation platform used to build the RAG agent.
  • Superbase: A backend-as-a-service platform used for storing and querying data.
  • Pinecone: A vector database used for storing and querying vector embeddings.

1. Introduction: The Information Retrieval Challenge in RAG

  • Building an effective RAG agent hinges on accurate information retrieval.
  • Many knowledge bases are unstructured, necessitating a robust search engine.
  • Semantic search (vector search) alone is insufficient for all query types.
  • A solid text-based search foundation is crucial for handling cases where semantic search fails.
  • The video introduces a dynamic hybrid search system where the AI agent adjusts retrieval weights based on the question.
  • Pattern matching (fuzzy matching) is added as a fallback for cases where embeddings fail, particularly for IDs or codes.

2. The Problem: Inaccurate Retrieval with Existing Hybrid Search

  • The presenter previously loaded thousands of product manuals (10GB, 215,000 chunks) into a Superbase vector store.
  • Initial tests revealed inaccurate retrieval, specifically failing to find a Whirlpool refrigerator manual.
  • The agent failed to retrieve chunks containing a specific product code, even though the code was present in the Superbase database.
  • A direct filter in Superbase using "I like" operator and wildcard search successfully found the product code, highlighting the agent's retrieval failure.

3. Causes of Inaccurate Retrieval: Messy, Unstructured Data

  • 80-90% of enterprise data is unstructured (PDFs, Word documents, emails, etc.).
  • Unstructured text poses challenges for RAG systems.
  • Example: A 36-page PDF translated to 200 chunks, with only two containing the product code.
  • Text extraction issues:
    • Multilingual documents (English and French) complicate extraction.
    • Product codes are not extracted cleanly, running into other words.
    • The "Extract from PDF" node in N8N extracts machine-readable text but can result in messy layout.
    • Selecting text can be inaccurate due to layout issues.
  • OCR (Optical Character Recognition) can improve text extraction by maintaining layout and hierarchical structure.
  • However, OCR is not perfect and can introduce errors, especially with messy scanned documents.
  • Data pre-processing (human annotation or AI-powered extraction) can clean data before ingestion.
    • LLMs can extract relevant information and prepend it to chunks or add it as metadata.
  • Even with pre-processing, data will never be 100% clean, necessitating fallback search mechanisms.

4. Dense Embeddings (Semantic Search) Explained

  • Dense embeddings are a typical semantic search approach using a vector store.
  • Embedding models (e.g., OpenAI's text-embedding-3-small) transform text into dense vector embeddings.
  • A sentence from a product manual (including a part number) is transformed into a 1536-dimensional vector.
  • Vectors are plotted in a high-dimensional space, capturing the semantic meaning of the text.
  • Words with similar meanings (e.g., "car," "automobile," "vehicle") are plotted close together.
  • When a query is made (e.g., "why the ice maker was not working"), it's also transformed into a vector.
  • Cosine similarity is used to find the closest vectors in the vector store, representing the most semantically similar content.
  • Strengths:
    • Understands meaning and context.
    • Finds conceptually similar content across different phrasings.
    • Handles synonyms and related terms.
    • Supports multilingual embeddings.
  • Weaknesses:
    • Lacks transparency (black box).
    • Doesn't work well with exact codes and identifiers.
    • Cannot guarantee exact matches.
  • Examples of dense embedding models: OpenAI, Cohere, Gemini, GINA, BERT, Colbert.
  • MTEB leaderboard (linked in description) provides a comparison of industry-standard embedding models.
  • The original query ("Tell me about the part number") failed because dense embedding models are not trained on arbitrary part numbers.

5. Sparse Embeddings (Lexical Retrieval/Full Text Search) Explained

  • Also known as full text search, sparse representation, or lexical search.
  • Uses a tokenization approach.
  • Text is processed through an analyzer and tokenizer, which may lowercase, stem, remove stop words, and remove punctuation.
  • The part number can be split apart during tokenization, leading to inaccurate results.
  • Two implementations: inverted index and sparse vectors.
  • Inverted Index (Superbase Example):
    • Chunks are stored with metadata and dense embeddings.
    • Sparse representation is stored in the "full text search" field using TS vectors (Postgres).
    • Keywords and their positions are extracted and saved in an inverted index.
    • Full text search queries search across these extracted keywords.
    • The example shows that spaces were not included in the extracted output, leading to bizarre keywords.
    • Superbase uses TS vector and TS rank.
  • Sparse Vector Embeddings (Pinecone Example):
    • Text is sent to a sparse embedding model (e.g., Pinecone sparse English v0.0 model) to get term IDs and weights.
    • Term IDs represent words within a vocabulary.
    • Weights indicate the importance of words within a context.
    • Example: "water" is mapped to ID 4522 with a weight of 0.78.
    • Term IDs and weights are plotted in a vector store.
    • Querying involves tokenizing the query and using cosine similarity to find the closest vectors.
  • Scoring:
    • Inverted index: Scoring happens after candidate retrieval (e.g., using BM25 or TS rank).
    • Sparse vector: Scoring happens before ingestion into the vector store.
  • Strengths:
    • High precision.
    • Fast.
    • Good for exact matches.
    • Explainable and transparent.
    • Forms the core of many search engines.
    • Some semantic expansion.
  • Weaknesses:
    • Not as semantic as dense embeddings.
    • Not ideal for multilingual text.
    • Not great for misspellings and typos.
    • Dependency on training data.
  • Examples: BM25, TF, Postgres TS vector and TS rank, splade, deeper impact.
  • N8N examples:
    • Full text search with TS vector in Superbase returns relevant chunks about ice dispensers and ice makers.
    • Sparse embedding search using Pinecone's sparse model returns chunks with lexical matches.

6. Pattern-Based Retrieval Explained

  • Creating an engram index (array of overlapping character fragments).
  • Example: "the water inlet" becomes "the," "he w," "e wa," " wat," etc.
  • These fragments are stored in an engram or trigram index.
  • Query is also split into character fragments.
  • A lookup happens in the trigram index, and results are ranked based on similarity.
  • Wildcard search in Superbase using "I like" operator and percentage signs finds chunks containing the product code.
  • The AI agent now retrieves chunks with the product code and generates an accurate answer.
  • The LLM prioritizes pattern matching (70%) for product code queries, with lower weights for sparse (20%) and dense (10%) matching.
  • Strengths:
    • Ultra-high precision for exact matches.
    • Ideal for IDs, codes, and tokens.
    • Ideal for fuzzy matching (typos, partial codes).
    • Supports regex.
    • Language agnostic.
  • Weaknesses:
    • No semantics or meaning.
    • Rudimentary similarity score.
    • Produces large indexes.
  • Examples: wildcard search, "I like" search, regex, edit distance (Levenshtein algorithm).

7. Implementation of Dynamic Hybrid Search

  • Pattern matching is integrated into the hybrid search function on Superbase.
  • An N8N sub-workflow is called for hybrid search.
  • The sub-workflow generates a dense embedding for the query.
  • The dense embedding, query text, and weights for dense vectors, sparse keywords, and pattern matching are sent to the Superbase dynamic hybrid search edge function.
  • The edge function passes the payload to a database function.
  • The database function declares variables, sets up filter logic, and performs weighted queries:
    • Vector search: queries the vector column.
    • Keyword search: queries the FTS column using TS rank.
    • Pattern match search: queries the content field using "I like" with wildcards.
  • Reciprocal rank fusion is used to fuse the result sets based on the weights.
  • The AI agent can now answer the question "Tell me about the product code" accurately.

8. Conclusion

  • The video demonstrates a dynamic hybrid search system that combines dense embeddings, sparse embeddings, and pattern matching for improved RAG performance.
  • Pattern matching is crucial for handling exact matches and fuzzy matching of codes and identifiers.
  • The AI agent can dynamically adjust the weights of different search techniques based on the query type.
  • The next step is to add a tool to load the full document or a summary of the document based on the retrieved chunks.
  • Access to the hybrid search workflows and RAG agent is available in the AI Automators community (link in description).

Chat with this Video

AI-Powered

Hi! I can answer questions about this video "Your RAG Agent Needs a Hybrid Search Engine (n8n)". What would you like to know?

Chat is based on the transcript of this video and may not be 100% accurate.

Related Videos

Ready to summarize another video?

Summarize YouTube Video