Gemini Omni, Gemini 3.2 Flash, a 12M Context Window Model, Claude Replaces Analysts, & More! AI NEWS
By WorldofAI
Key Concepts
- Sub-Quadratic Sparse Attention: A novel architecture that optimizes compute by focusing only on relevant word relationships, enabling massive context windows.
- Multi-token Prediction: An inference technique where models generate multiple tokens simultaneously to increase speed without sacrificing quality.
- Omni Models: Multimodal AI systems capable of native video generation and processing.
- Agentic Workflows: AI systems designed to autonomously execute end-to-end tasks (e.g., financial analysis, lead outreach) using pre-built templates.
- Context Window: The amount of data (tokens) an AI model can process at once; the new standard is reaching 12 million tokens.
1. Google’s AI Ecosystem Updates
Google is aggressively preparing for its I/O conference with multiple model variants currently in A/B testing:
- Gemini 3.2 Flash: Positioned as an "all-rounder" model, it combines high-speed performance with reasoning capabilities comparable to Gemini 3.1 Pro. Leaked pricing suggests $0.25 per 1M input tokens and $2 per 1M output tokens, with a January 2026 knowledge cutoff.
- New Checkpoints: Google is testing four new variants: Ajax, Hercules, Hector, and Orpheus.
- Omni & Video: Leaks suggest a new Omni model integrated with "Toucan" (internal code for video systems powered by Veo), potentially enabling native video generation within Gemini.
- Project Mariner Evolution: Google has sunset the "Project Mariner" web-browsing agent, shifting focus toward a persistent, 24/7 AI personal agent integrated directly into the Gemini app.
- Gemma 4 & Tools: Gemma 4 now features multi-token prediction drafters, increasing inference speeds by up to 3x. Google AI Studio has added Nano Banana for custom image asset generation and a redesigned visual edit tool. Notebook LM received updates to its mind-mapping features, and Pompi was introduced as a free tool for generating marketing campaigns and AI-powered product photoshoots.
2. Breakthrough in Model Architecture: SubQ
A company called SubQ has introduced a model utilizing a fully sub-quadratic sparse attention architecture.
- Technical Significance: By ignoring irrelevant word relationships, it achieves a 12 million token context window.
- Performance Metrics: It is reported to be 52 times faster than Flash Attention at 1 million tokens and requires 1,000 times less compute, costing less than 5% of models like Claude Opus.
3. OpenAI and Anthropic Developments
- OpenAI: Released GPT-5.5 Instant, an optimized version of their flagship model designed for real-time use. It features improved factual accuracy, particularly in high-stakes domains like medicine, law, and finance.
- Anthropic: Launched a suite of Claude agent templates specifically for the financial sector. These templates automate repetitive tasks such as pitch building, meeting preparation, and valuation reviews, effectively creating a "digital workforce" that could replace entry-level analyst roles.
4. Perplexity’s Financial Expansion
Perplexity is positioning itself as a financial operating system with the launch of the Perplexity Computer Finance Agent.
- Integration: It plugs into licensed data from providers like Morningstar, Pitchbook, and Carbon Arc.
- Functionality: It includes 35 dedicated finance workflows to automate weekly analyst tasks, directly competing with Anthropic’s enterprise offerings.
Synthesis and Conclusion
The AI landscape is currently defined by a shift from simple chatbot interfaces to autonomous agentic workflows and architectural efficiency. Google is consolidating its lead by integrating multimodal capabilities (Omni/Veo) and faster inference (Gemma 4) into a unified ecosystem. Simultaneously, the industry is moving toward massive context windows (SubQ’s 12M tokens) and specialized enterprise automation (Anthropic and Perplexity’s financial agents). The upcoming Google I/O conference is expected to be the catalyst for the next generation of flagship model releases, likely centered around the Gemini 3.2 series.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.