The future of Cloud AI: Mastering MCP servers, Gemini, and agentic workflows
By Google Cloud Tech
Key Concepts
- ADK (Agent Development Kit): An open-source framework for building enterprise-ready, production-grade AI agents.
- Planner Agent: A specialized agent architecture designed to orchestrate complex, multi-step tasks by coordinating other agents and tools.
- Skills: A modular framework for agent capabilities consisting of YAML metadata (for context-efficient loading) and markdown bodies (containing logic, scripts, or references).
- MCP (Model Context Protocol): A standard for connecting AI agents to external services, databases, and APIs (e.g., Google Maps).
- Grounding: The process of ensuring AI outputs are based on real-world data (e.g., GeoJSON coordinates) rather than probabilistic hallucinations.
- Non-deterministic Criteria: Guidelines or requirements (like race logistics) that are codified in text rather than hard-coded logic, allowing the model to interpret and apply them contextually.
1. Agent Architecture and Development
The speaker emphasizes that building agents in 2026 requires moving beyond simple prompt engineering toward a structured, modular approach. The ADK serves as the foundational library, supporting multiple languages including Python, Go, TypeScript, and Java.
- The Brain: While the framework is model-agnostic (supporting Cloud, GPT, or local GKE-hosted models), the speaker highlights Gemini 3.1 (Flash/Pro) as the preferred compute engine for these agents.
- Context Management: To maintain efficiency, the system uses a "lazy loading" approach for skills. The agent only loads the lightweight YAML metadata into its context initially. The full markdown body (containing complex scripts or documentation) is only fetched when the agent determines that a specific task requires that skill.
2. Real-World Application: Marathon Simulation
The presentation features a 3D simulation of Las Vegas used to plan a marathon for 10,000 runners. The agent utilizes three primary skills to solve this:
- GIS Spatial Engineering: Uses Python scripts to process GeoJSON data. It performs mathematical operations to ensure the route is exactly 26.2 miles (42.195 km) and adheres to city boundaries (geofencing) to prevent routes from entering the desert.
- Mapping Skill: Integrates with the Google Maps MCP server. This allows the agent to perform natural language queries to find landmarks (e.g., Bellagio fountains, the Sphere) and access historical weather data to optimize the race date.
- Race Director Skill: Converts unstructured documentation (originally in a Google Doc) into a machine-readable skill. This provides the agent with "soft" requirements, such as lane width, water station spacing, and porta-potty density, ensuring the plan is logistically viable.
3. Methodologies and Frameworks
- Skill Conversion: The speaker demonstrates that models are highly effective at converting human-readable documents (like race planning guides) into structured skills. By using the Google Workspace server, the agent can ingest links to documents and transform them into actionable logic.
- Deployment: While the agent can be deployed anywhere, the speaker recommends Cloud Run or GKE for scalability.
- Evolution: The ADK 2.0 release introduces graph-based features, allowing for more complex, non-linear agent workflows.
4. Supporting Evidence and Resources
- Open Source Availability: All code for the "Race Condition" simulation and the planner agent is available on GitHub.
- Code Labs: Google has provided step-by-step code labs that mirror the demo, allowing developers to learn how to implement the GIS, mapping, and race director skills.
- Tooling: The development process leverages modern coding harnesses like the Cloud Gemini CLI and Anti-gravity to streamline deployment and testing.
5. Notable Quotes
- "We don't have our model to randomly guess or make up that information. That information has to be grounded to the real world somehow." — Highlighting the necessity of using GeoJSON and external APIs over pure LLM generation.
- "It's not about just what model I choose and what agent framework I use. It's more about, how do I give the agent the right tools, the right skills, and the right place to run." — Defining the shift in focus for 2026-era AI development.
Synthesis
The session outlines a transition from simple LLM-based chatbots to sophisticated, multi-agent systems. By utilizing the ADK and a modular Skill-based architecture, developers can create agents that are context-efficient, grounded in real-world data, and capable of executing complex, multi-step operations. The key takeaway is that successful enterprise agents are defined by their ability to integrate external tools (via MCP) and interpret codified domain knowledge (via converted documentation) rather than relying solely on the model's internal training data.
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.