Agent-first workflows from prompt to production

By Google Cloud Tech

Share:

Key Concepts

  • Agentic Workflows: Systems where AI agents autonomously perform tasks (debugging, optimization, remediation) rather than just generating code.
  • MCP (Model Context Protocol): A standard for connecting AI agents to external data sources and services (e.g., Cloud Run, BigQuery) without manual API key management.
  • ADK (Agent Development Kit): A framework for building agents that handles state management, tool routing, and retry logic.
  • A2A (Agent-to-Agent Communication): A collaborative architecture where specialized agents (CI, CD, Remediation) communicate to solve complex problems.
  • Day Two Operations: The phase of software development focused on monitoring, debugging, scaling, and maintaining production systems.
  • Vibe Coding: A colloquial term for rapid, intuitive prototyping using AI tools.

1. Agent-First Debugging

The presenters demonstrated how to move beyond manual log analysis by using an agentic tool (Antigravity) connected via Google-managed MCP servers.

  • Process: Instead of manually searching logs, the developer uses voice input to ask the agent to diagnose a "503 Service Unavailable" error.
  • Technical Advantage: The agent accesses the Developer Knowledge MCP, which provides a snapshot of the latest Google documentation (updated every 8–12 hours), ensuring the agent uses current best practices.
  • Outcome: The agent identifies the root cause in the leaderboard code, proposes a fix, and updates the main.py file automatically.

2. Data-Driven Optimization (0-ETL Analytics)

To optimize the game without building complex ETL (Extract, Transform, Load) pipelines, the team utilized the Data Agent Kit.

  • Methodology: The agent automates a "0-ETL" pipeline by creating a log router sync that streams live logs directly into BigQuery.
  • Execution: Using the BigQuery MCP, the agent runs parallel queries to analyze user behavior (e.g., win rates by dinosaur type).
  • Result: The agent generates an interactive dashboard and allows the developer to query data using natural language, eliminating the need for manual SQL writing or context switching between the IDE and the Google Cloud Console.

3. Autonomous Remediation and CI/CD

The most advanced workflow involved a "self-healing" system using three specialized agents: Remediation, CI, and CD agents.

  • Architecture: Hosted on Cloud Run, these agents remain dormant until an error triggers an event via Eventarc and Pub/Sub.
  • Self-Healing Loop:
    1. Remediation Agent: Detects the crash, investigates, fixes the code, opens a GitHub PR, and notifies the team via Slack.
    2. CI Agent: Performs security scans and testing.
    3. CD Agent: Acts as a release manager, assessing risk scores to determine if a Canary release is safe.
  • A2A Collaboration: The agents communicate directly to resolve conflicts (e.g., if the CI agent tries to deploy a non-existent image, the CD agent queries it for clarification).
  • Security: Each agent inherits specific IAM permissions, ensuring the Remediation agent cannot perform unauthorized actions like building images.

4. Tools and Frameworks

  • Agent CLI: A tool used to scaffold agent projects, providing boilerplate code for instructions, tools, testing, and containerization.
  • Spend Caps (Private Preview): A new feature for Google Cloud services (Cloud Run, Gemini API) that pauses traffic once a budget threshold is reached, preventing runaway costs.
  • Agent Registry: A central directory that tracks available agents, their specific "personalities" (descriptions of their logic), and the tools they have access to.

Synthesis and Conclusion

The presentation highlights a fundamental shift in engineering: Day Two operations are becoming agentic. By leveraging frameworks like ADK and protocols like MCP, developers can transition from manual, repetitive tasks to designing high-level, autonomous systems. The key takeaway is that while the "vibe coding" of the initial prototype is fast, the true power lies in building observable, secure, and collaborative agent swarms that can monitor, fix, and optimize production environments 24/7 without human intervention.

"The problem-solving that we love as engineers hasn't gone away. It just shifted around. We're doing design work, we're planning work, we're interacting with the systems." — Richard Seroter

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video