Offline vector search with SQLite and EmbeddingGemma

Key Concepts

Vectors: Numerical representations of data (text, images, etc.) that capture semantic meaning.
Vector Stores/Databases: Systems designed to store and query vectors efficiently.
On-device/Local-first Applications: Applications that prioritize running computations and storing data locally on the user's device.
Encoders/Decoders: Models used to convert data into vectors (encoders) and potentially back into a human-readable format (decoders).
Firebase Firestore: A NoSQL cloud database with real-time synchronization capabilities, now supporting vector data types.
SQLite: A lightweight, embedded relational database engine that can be compiled to WebAssembly (Wasm) and supports extensions.
SQLite VEC: An extension for SQLite that adds vector search capabilities, including k-Nearest Neighbors (k-NN) queries.
Gemma (Embedding Gemma): A family of encoder/decoder models developed by Google, suitable for on-device deployment, with a 768-dimensional embedding space.
Transformers.js: A JavaScript library that enables running transformer models (like Gemma) for inference on the client-side using CPU and GPU.
Ollama: A tool that simplifies the deployment of large language models, including Gemma, on servers.
k-Nearest Neighbors (k-NN): An algorithm used in vector search to find the 'k' most similar data points to a given query point.
WebAssembly (Wasm): A binary instruction format for a stack-based virtual machine, enabling high-performance execution of code in web browsers.

On-Device Vector Applications: A Hybrid Approach

Rodie, a Developer Relations Engineer at Google on the AI workflows team, discusses the advantages and implementation of on-device and local-first vector applications at the WebAI summit. The core idea is to leverage local processing for vector data, overcoming limitations of intermittent network connectivity and enhancing user privacy.

Vectors and Databases: Trade-offs and Considerations

Vector Generation: Vectors can be generated using both hosted (server-side) and local (on-device) models. Each has trade-offs depending on application needs.
Vector Store Size: Vector stores can become very large and often require APIs for access. This can be problematic for applications with unreliable network connections.
Encoder/Decoder Consistency: A crucial point is that the same encoder and decoder must be used for both querying and updating documents. You cannot mix a powerful encoder with a lightweight decoder.
Server-Side Advantages: Server-side solutions benefit from ample RAM, optimized NVMe storage, and faster processing due to these resources.
Client-Side Advantages:
- User Privacy: Vectors are stored only for the user, preventing cross-user data leakage or dimensionality issues.
- Partitioning: Data can be pre-partitioned for individual users on the client.

Hybrid Approach: Combining Server and Client Strengths

A hybrid approach is presented as the most appropriate solution, leveraging the strengths of both server and client.

Server-Side Batch Encoding: Utilize server-side parallel compute for efficient batch encoding of numerous vectors.
Firebase Firestore for Synchronization: Store these encoded vectors in Firebase Firestore, which offers vector support and a robust syncing modality.
- Vectors can be stored in a "bucket per user."
- Firestore acts as a fallback when the model is not downloaded locally.
SQLite for Local Querying: Integrate SQLite with vector extensions (like SQLite VEC) to query vectors directly on the client.
- Vectors can be pulled from Firestore into SQLite.
- This enables incremental regeneration of documents using local encoders/decoders as users make changes, eliminating the need for constant network round trips.

Embedding Gemma: A Powerful On-Device Model

Gemma Model: Embedding Gemma is highlighted as an excellent encoder/decoder, with approximately 38 million parameters, designed for mobile devices but also usable on servers (e.g., with Ollama for Cloud Run).
Fallback API: It can serve as a fallback API when the model isn't downloaded, providing an ad-hoc experience.
Dimensionality: Gemma offers 768 dimensions, providing significant quality for various tasks. The Gemma family is configurable, and while Gemma 3N is available, this talk focuses on database and vector support without LLMs.
Transformers.js Integration: This library allows Gemma models to run inference on client-side CPUs and GPUs, supporting the 768-dimensional vector space.

Code Snippet Example (Embedding Gemma with ONNX Runtime)

A code snippet demonstrates using the ONNX Runtime for a 300 million parameter version of Embedding Gemma. It outlines creating a pipeline for feature extraction, specifying task types (query, document, etc.), and normalizing vectors to a float32 array, which is compatible with SQLite and Firestore.

Firebase Firestore Vector Support

Seamless Syncing: Firestore's vector support simplifies syncing. When an application loads, it pulls queried documents. Updates are handled incrementally, with Firestore managing the sync logic.
Collocated Data: Vectors can be added directly to documents as a vector type, keeping them collocated with user data and collections.
JavaScript SDK Example: A snippet shows how to create a Firestore application, add a document with an embedding, and utilize the vector type from the SDK.

SQLite VEC: Local Vector Databases

SQLite VEC Extension: This project enables low-level k-NN queries directly within SQLite by extending its syntax.
Advanced Features: SQLite VEC now includes metadata filtering, partitioning, and virtual columns.
Optimized Queries: Storing float32 blobs in virtual tables is optimized for queries, avoiding full table scans.
Wasm Compilation: SQLite compiles to Wasm, allowing for the inclusion of extensions like SQLite VEC.
GitHub Example: A forthcoming GitHub example will feature pre-installed SQLite VEC, but custom extensions can also be added.

SQLite VEC Query Example

Importing Libraries: The example shows importing the official SQLite package and a Wasm module.
Table Syntax: A VEX0 table syntax is used to define a table with a float 768 dimension, adaptable to different encoder/decoder types.
Client-Side Operation: All operations occur on the client, capable of handling massive datasets and millions of queries per second in the browser.
SQL-like Querying: Querying is similar to SQL, allowing developers to create custom views.
MATCH Keyword: The MATCH keyword, utilizing VEX0 functions and k-NN queries with a limit, is demonstrated for ordering results by distance.

Demo: Enhanced Emoji Search

Application: A demo showcases a better vector search for emojis by vectorizing the entire Unicode dataset descriptions.
Real-time Vectorization: As the user types, the query is vectorized on keypress, returning emojis closest to the embedding space.
Offline Capability: Once the model is downloaded to the browser, the application can function completely offline.
Broader Use Cases: This approach can be expanded to vectorize business data, tool definitions, and more, offering flexibility in how models and data are presented to users.
GitHub Availability: The code for this demo is available on GitHub at emojis search. Rodie is also available on GitHub, Twitter, and LinkedIn for further questions.

Conclusion

The presentation advocates for a hybrid approach to on-device vector applications, combining server-side batch processing with client-side storage and querying using Firebase Firestore and SQLite VEC. This strategy enhances user privacy, improves performance, and enables offline functionality, opening up a wide range of new application possibilities. The use of models like Embedding Gemma and libraries like Transformers.js further empowers developers to build sophisticated AI-powered experiences directly on the client.