Back to all videos

How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne, Google DeepMind

By AI Engineer

AI Agent Frameworks LLM Infrastructure AI-Assisted Software Engineering

Share:

Key Concepts

Agentic Software/Harness: Software systems designed to perform autonomous tasks, plan workflows, and utilize tools to achieve specific goals.
Antigravity: An internal Google DeepMind IDE-integrated platform that manages multiple agents, project workflows, and tool execution.
Skills: Modular, reusable functions or capabilities that agents can invoke to perform specific tasks (e.g., debugging, log analysis).
Token Economy/Quota Management: The challenge of managing high computational costs and token consumption in agentic systems.
Human-in-the-loop (HITL): A design philosophy where human oversight is integrated into the agent’s workflow, allowing for intervention, feedback, and approval.
Agent Trajectory Store: A system for logging and auditing the step-by-step actions of an agent to diagnose failures or "looping" behaviors.

1. Main Topics and Frameworks

The presentation focuses on how Google DeepMind builds and scales agentic software. The core framework discussed is Antigravity, an IDE-integrated environment that allows developers to spawn multiple agents to work on complex projects.

The Agentic Workflow: Agents in this system are capable of analyzing specifications, implementing code, interacting with browsers (DOM inspection), and providing progress reports.
The "Digital Assembly Line": The speakers describe the future of AI development as humans acting as supervisors on a digital assembly line, where agents handle the execution of tasks, and humans provide high-level guidance and review.

2. Real-World Applications and Use Cases

Deep Research Agent: A tool available via the Interactions API that automates complex research tasks. The team is currently working on integrating this into the Antigravity harness to allow research components to collaborate within a shared file system.
Coding and PR Reviews: Agents are used to generate code, rewrite entire projects based on specs, and perform automated code reviews. One speaker noted receiving a PR review comment from an agent that was not even explicitly triggered by them, highlighting the autonomous nature of these systems.
Game Interaction: The demo showcased an agent attempting to learn and play a web-based game by inspecting the DOM and figuring out controls autonomously.

3. Methodologies and Processes

Skill Library: DeepMind maintains a library of "skills." To prevent "skill sprawl," they employ a Darwinian approach where only the most effective skills are maintained and promoted.
Observability: The team uses a custom web app to monitor agent backends. This allows developers to drill down into the hierarchy of agent actions, all the way to the raw prediction requests sent to the model.
Evaluation: Evaluating agent success is described as "meta" and difficult. The team uses mock TPUs to test the harness and agentic flows without consuming excessive production compute resources.

4. Key Arguments and Challenges

Scaling and Compute Costs: A primary challenge is the "token-hungry" nature of agents. The speakers argue that future systems must seamlessly switch between models (e.g., using high-end models for complex tasks and cheaper models like Gemma 4 for simpler ones) to manage costs.
Quota Management: Currently, quota management is handled via "brute force" (stopping jobs when limits are hit). The goal is to build systems that handle these transitions gracefully without interrupting the user's workflow.
Agent-to-Agent Communication: While current systems allow for multiple tracks of operation, the speakers identify efficient agent-to-agent communication as the next frontier for scaling complex, multi-step projects.

5. Notable Quotes

"I think that’s going to be the future: how do we make agent-to-agent communication efficient and then also how do we give us as the human the ability to really shape that and almost act like a supervisor on a digital assembly line?" — N Ballantyne
"I’m definitely team skills... we have these skills contributed by folks who are absolute experts in that particular area, and then I kind of I and the agent get that knowledge for free." — KP Sawney

6. Synthesis and Conclusion

The session highlights that Google DeepMind is moving toward a highly modular, skill-based agentic architecture. The transition from simple chatbots to autonomous agents requires robust infrastructure for observability, cost management, and human oversight. The ultimate vision is a collaborative environment where agents handle the "boring" execution of tasks—from coding to research—while humans focus on high-level supervision and quality control. The immediate focus for the team remains on optimizing the "harness" to make these systems faster, cheaper, and more reliable at scale.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video