What's new in the Gemma open model family
By Google for Developers
Key Concepts
- Gemma 4: The latest generation of Google DeepMind’s open-weight large language model (LLM) family.
- Open Weights: Models that allow developers to access and customize the underlying weights for fine-tuning and deployment.
- Agentic Era: The shift toward AI models capable of multi-step planning, tool use, and autonomous task completion.
- Efficiency (Intelligence per Watt): A core design philosophy focusing on high performance on local hardware with minimal resource consumption.
- Apache 2.0 License: The new, permissive licensing model for Gemma 4, facilitating easier production deployment.
- Multimodality: The ability of models to process and understand text, images, audio, and video.
- Gemmaverse: The ecosystem of over 100,000 community-created model variants and fine-tuned versions.
1. Gemma 4 Model Family Overview
Gemma 4 is designed to run "everywhere," from IoT devices to high-end consumer GPUs. The lineup includes four sizes:
- 2B (Billion Parameters): Optimized for edge and IoT devices.
- 4B: Designed for high-end phones and lower-end laptops.
- 26B (Mixture of Experts - MoE): Optimized for high efficiency and low latency on powerful machines.
- 31B (Dense): The most capable model, designed for high-quality fine-tuning.
Key Technical Improvements:
- Context Window: Increased to 128,000 tokens for smaller models and 256,000 tokens for the 26B and 31B variants.
- Architecture: Introduction of "per-layer embedding" for edge optimization and MTP (Multi-Token Prediction) Drafter for speculative decoding, providing up to 3x speedups.
- Reasoning & Function Calling: Native support for complex planning and tool interaction, making them "agent-ready."
2. Deployment Frameworks
Google provides three primary tiers for deploying Gemma:
- Cloud Run: The easiest, most straightforward solution. It provides an endpoint that scales to zero when not in use, optimizing costs.
- Gemini Enterprise Agent Platform (Model Garden): A middle-ground solution offering one-click deployment, managed endpoints, and "Model as a Service" (pay-per-token) options.
- Google Kubernetes Engine (GKE): The advanced tier providing full control over infrastructure, VMs, and hardware configurations (GPUs/TPUs).
3. Real-World Applications & Demos
- Agentic Workflows: The "AIventure" project demonstrates an open-source dungeon crawler where Gemma 4 writes code (HTML/CSS/JS) on the fly and executes autonomous tool calls to solve in-game puzzles.
- Mobile/Edge Autonomy: Using LiteRT and AICore, Gemma 4 runs locally on mobile devices (e.g., Pixel 10 Pro) to perform mood tracking, image-to-JSON schema conversion, and offline object identification.
- Accessibility: A prototype "running agent" uses Gemma 4 to provide real-time audio navigation cues for visually impaired runners, identifying obstacles and lane positioning.
- Robotics: The "Odum" (Open Duck Mini) project uses Gemma 4 on Raspberry Pi 5 and Jetson Orin Nano to create interactive, multimodal robots capable of speech-to-text, reasoning, and physical expression.
4. Benchmarks & Performance
- Efficiency: Gemma 4 models are reported to match the performance of models 20x their size.
- Multilingualism: The 31B model ranks in the top 5 across major European languages on the EuroEval leaderboard and shows strong performance in Japanese and Southeast Asian languages.
- Reasoning: In the "FoodTruck Bench," the 31B model competes with massive closed-source models (e.g., DeepSeek v4 Pro) in deep reasoning and function-calling tasks.
5. The "Gemmaverse" & Fine-Tuning
The ecosystem has grown to over 500 million downloads. Notable variants include:
- MedGemma: Fine-tuned for healthcare and medical imaging analysis.
- Cell2Sentence: A specialized model assisting in cancer treatment research.
- Community Impact: Projects like Crane AI Labs (Swahili optimization) and government implementations (e.g., ePermit in Ukraine) demonstrate the practical utility of fine-tuning for specific regional or institutional needs.
Synthesis
Gemma 4 represents a strategic pivot toward the "agentic era," prioritizing local execution, multimodal capabilities, and developer flexibility. By moving to an Apache 2.0 license and providing a wide range of model sizes, Google DeepMind has successfully lowered the barrier for deploying high-intelligence models on everything from low-power IoT sensors to enterprise-grade cloud clusters. The core takeaway is that the future of AI lies in local autonomy—enabling devices to see, hear, and reason without constant reliance on internet connectivity.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.