Intelligence is Getting MORE Expensive (Google I/O 2026, with Sam Witteveen)
By Prompt Engineering
Key Concepts
- Agentic Workflows: The shift from simple task-based AI to long-running, autonomous agents (e.g., Gemini Spark).
- Model vs. Product: The industry trend where the utility of the product (integration, context, UX) is becoming more critical than the raw model performance.
- Compute Scarcity & Pricing: The realization that "intelligence" is not becoming cheaper; models are becoming more verbose and compute-intensive, leading to higher costs.
- RL Environments: The use of specialized reinforcement learning environments (e.g., Excel/spreadsheet simulations) to train models for complex skill execution.
- Harness Engineering: The transition from using generic agent frameworks to building custom, application-specific harnesses for observability, sandboxing, and error recovery.
- Jagged Intelligence: The observation that AI models exhibit genius-level performance in specific domains while failing at simple tasks in others.
1. The Shift: Products Over Models
The speakers argue that the AI industry has entered a new era where the focus is shifting from the underlying models (like Gemini 3.5 Flash) to the products built around them.
- Context is King: Google’s competitive advantage lies in its deep integration with user data (Gmail, Calendar, YouTube).
- Personal Agents: Products like "Gemini Spark" represent a move toward long-running agents that perform "cron-job" style tasks, such as monitoring the web or managing personal workflows.
- Ecosystem Integration: The use of MCPs (Model Context Protocols) allows these agents to extend beyond the Google ecosystem, enabling third-party integrations that signal to hyperscalers which startups are gaining traction and potential acquisition targets.
2. The Economics of Intelligence
Contrary to the narrative that AI tokens would become "too cheap to meter," the speakers observe a trend toward higher costs.
- Verbosity: Newer models are increasingly verbose, consuming more tokens to reach an answer.
- Chain of Thought (CoT): While CoT improves reasoning, it increases token usage. Labs are currently working to optimize these chains to be shorter while maintaining accuracy.
- Enterprise Demand: Large enterprise customers are consuming trillions of tokens per day, which contributes to the current compute scarcity and hardware shortages.
3. Methodologies: RL and Sandboxing
The speakers highlight that modern AI capabilities—such as skill use and tool calling—cannot be achieved through pre-training alone.
- Post-Training: Effective agents require rigorous post-training using RL environments. By generating thousands of "perfect" examples (e.g., Excel files), models learn to navigate complex tasks.
- Managed Agents vs. SDKs: There is a split in the market between "managed agents" (all-in-one solutions with built-in observability and sandboxing) and "agent SDKs" for companies that require private, custom infrastructure.
- The Death of Frameworks: The speakers suggest that generic agent frameworks are becoming obsolete, replaced by "harness engineering"—the practice of building custom wrappers for hosting, logging, and auto-restarting agents.
4. Open Source and Local AI
The discussion on open-source models (e.g., Gemma 4, Qwen) suggests a tiered future:
- Small Models: User-focused, capable of running locally on devices (e.g., Android phones) using per-layer embedding tricks.
- Mid-Size Models: Ideal for small business applications and specialized workflows.
- Large Models: Enterprise-grade models (1T+ parameters) that require significant hardware, often hosted on-premise by banks or hospitals for privacy and control.
- Licensing: The speakers note that open-weight models are increasingly subject to strict licensing, and some "open" releases are actually fine-tuned versions of proprietary models (e.g., the Cursor/Kimi controversy).
5. Synthesis and Conclusion
The main takeaway is that the "AI race" is no longer just about who has the smartest model, but who can best manage the "jagged intelligence" of these systems.
- Actionable Insight: Practitioners should stop relying on general benchmarks and instead build their own evaluation (eval) suites tailored to their specific use cases.
- Future Outlook: The industry is moving toward a focus on inference efficiency and application-specific harnesses. As the speakers conclude, the real winners are the humans who can leverage these tools to solve problems that were previously cost-prohibitive or impossible.
Notable Quote: "We’re dealing with such jagged intelligence now that the model can be a genius many times over in one area and dumber than a 5-year-old kid in other areas." — Sam (attributed to the speakers' consensus on model reliability).
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.