Gemini Nano on device — Florina Muntenescu & Oli Gaymond, Google DeepMind

By AI Engineer

Share:

Key Concepts

  • AI Core: A system-level service on Android that manages, optimizes, and provides access to on-device AI models (specifically Gemini Nano).
  • Gemini Nano: Google’s most efficient, on-device Large Language Model (LLM) optimized for Android hardware.
  • ML Kit GenAI APIs: A set of APIs that allow developers to integrate generative AI features (summarization, proofreading, rewriting, and general prompting) into Android apps.
  • Hybrid Inference: A strategy that uses on-device models when available for privacy and low latency, and falls back to cloud-based models (via Firebase AI Logic) for broader device reach or more complex tasks.
  • LightRT LM: A framework for developers who require highly customizable, custom-built models beyond the standard Gemini Nano offering.
  • RAG (Retrieval-Augmented Generation): A technique for enhancing LLM responses by providing external data; supported via the Prompt API, with upcoming native embedding support.

1. Building Intelligent Experiences on Android

Developers can choose between three primary deployment architectures:

  • On-Device: Processes data locally. Benefits include zero data transmission (privacy), offline capability, and no inference costs.
  • Cloud: Uses powerful models (Gemini Pro/Flash) via Vertex AI or Gemini Developer API.
  • Hybrid: Automatically switches between on-device and cloud based on device capability, maximizing both performance and reach.

2. The Role of AI Core

AI Core acts as a centralized system service. Instead of every app bundling its own 1GB–4GB model, the system manages a single instance of Gemini Nano.

  • Hardware Optimization: It ensures the model is optimized for the specific silicon of the device at runtime.
  • Resource Management: It handles scheduling, queuing, and battery optimization. Foreground apps receive priority, while background tasks are managed to prevent system degradation.
  • Privacy: Data processed via AI Core is isolated; input and output data are not stored on the device.

3. ML Kit GenAI APIs

These APIs provide a simplified interface for developers to interact with Gemini Nano:

  • Task-Specific APIs: Pre-built functions for summarization, proofreading, and rewriting.
  • Prompt API: The most versatile tool, supporting text and image inputs with text outputs. It is ideal for content analysis, entity extraction, and general assistance.
  • Future Roadmap: Google is actively working on adding an Embedding API to facilitate RAG-like solutions and vectorization of local data.

4. Addressing Technical Challenges

  • Battery and Performance: While running LLMs has a power cost, the speakers noted that typical user interaction (10–20 times a day) is negligible. For batch processing, developers are encouraged to run tasks when the device is charging.
  • Device Fragmentation: Gemini Nano is currently targeted at flagship devices (e.g., Pixel 9/10 generation). For broader reach, developers should use the Firebase AI Logic to implement hybrid inference, ensuring the app remains functional on older or non-flagship hardware.
  • Customization: For developers needing specific model behavior, LightRT LM allows for custom model deployment, though it requires the developer to handle their own profiling and testing across the device ecosystem.

5. Key Perspectives and Quotes

  • On Centralization: "If we put this in the system, we do that once and everyone can share and benefit from it... that shares that cost." — Ollie, PM for Android AI
  • On User Value: "If the user feels they're getting value out of the app, they're very happy to use that [battery]. If, however, the app is doing something that doesn't provide a lot of value... they might not want to use that." — Ollie, PM for Android AI
  • On Developer Experience: The goal of the Android AI team is to provide consistent APIs that allow developers to "blend what is needed" between local and cloud resources without needing to manage complex infrastructure.

6. Synthesis

The Android AI ecosystem is moving toward a "system-first" approach where complex LLM infrastructure is abstracted away via AI Core. By providing a unified API surface (ML Kit), Google enables developers to build privacy-focused, low-latency features that scale across devices. While flagship devices benefit from local Gemini Nano execution, the Hybrid Inference model ensures that developers can maintain a consistent user experience across the entire Android spectrum by intelligently routing requests to the cloud when necessary.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video