Stop building slow AI: Optimizing multi-agent systems for production
By Google Cloud Tech
Key Concepts
- Multi-Agent Architecture: A system design where multiple specialized AI agents collaborate to perform complex tasks.
- Agentic Workflow: The process of defining agent roles (Planner, Simulator, Evaluator) and their communication pathways.
- Real-time Evaluation: The practice of using a secondary, independent model to validate the outputs of a primary model to ensure accuracy and reduce bias.
- Skills vs. Tools: A distinction where "Skills" are prompt-based instructions/capabilities, and "Tools" are functional utilities (e.g., validation scripts, API calls).
- Agent Registry: A centralized repository providing a 360-degree overview of deployed agents, their capabilities, and governance metadata.
- Event-Driven Architecture: Using Pub/Sub and WebSockets to handle high-concurrency communication between agents and front-end interfaces.
1. Multi-Agent System Architecture
The developers implemented a three-agent architecture for their keynote demo:
- Planner Agent: Orchestrates the high-level strategy (e.g., planning a marathon).
- Simulator Agent: Executes specific tasks (e.g., running 1,000 concurrent runner sessions).
- Evaluator Agent: Acts as a "judge" to verify the quality and accuracy of the Planner’s output in real-time.
Key Insight: The team initially considered a larger number of agents but reduced the count to three to mitigate latency and network overhead, which are critical constraints in live keynote presentations.
2. The Evaluation Framework
To ensure reliability, the team employed a "Judge" pattern:
- Cross-Model Validation: They used Gemini 3 Flash for plan generation and Gemini 3.1 Pro for evaluation. This prevents the "same-model bias" where a model might fail to recognize its own hallucinations.
- Optimization: Initially, they performed seven separate calls for different criteria. To improve performance, they consolidated these into a single, focused call to the Pro model.
- Managed Services: They integrated with the Gemini App Enterprise Agent platform, which allows for tracking custom and pre-built metrics over time, moving evaluation from a one-time check to a continuous monitoring process.
3. UI Integration and "Agent-to-UI"
The system allows agents to generate user interfaces dynamically:
- Methodology: Agents emit a JSON payload describing the UI.
- Default Catalog: The UI is built using a set of 18 standard components (forms, modals, media).
- Validation Tool: A specific tool was bundled with the agent's "skill" to validate the JSON payload before it reaches the front-end, ensuring the generated UI is renderable and error-free.
4. Scaling and Communication Protocols
To handle 1,000 concurrent runner sessions, the team moved beyond standard HTTP:
- Pub/Sub & Memorystore: Used for asynchronous messaging to handle high-volume data (velocity, hydration levels) without blocking the main thread.
- Extensible Protocol: The agents were designed to be protocol-agnostic; they could be called via standard HTTP or via Pub/Sub messages, depending on the specific agent's role.
- WebSocket Streaming: The front-end listens via WebSockets to receive real-time events from the agent lifecycle, providing a transparent view of the "inner workings" of the agents.
5. Governance and Registry
- Agent Registry: Used to provide a centralized view of all deployed agents. This is essential for enterprise environments where developers need to understand the capabilities, descriptions, and access controls of agents they did not personally build.
- Gateway: Implemented to enforce strict controls over data access and service interaction, ensuring that only authorized components can trigger specific agent actions.
Synthesis and Conclusion
The demo highlights the transition from "prototyping" to "production-grade" multi-agent systems. The primary takeaways are:
- Performance is a Design Constraint: Latency must be managed through architectural choices (e.g., consolidating evaluation calls, using efficient messaging protocols like Pub/Sub).
- Evaluation is Continuous: It is not a final step but a core component of the agent lifecycle, requiring independent models to ensure trust.
- Extensibility Matters: By using a plugin-based architecture (similar to open-source patterns), developers can hook into the agent lifecycle to expose data, validate outputs, and maintain system transparency.
Quote: "Evaluation is not just a one-time thing, but it's something that is recurrent while you are building the entire agent system." — Ivan Nardini
Chat with this Video
AI-PoweredLoad the transcript when you're ready to chat so the initial page stays lighter.