Why 99% of AI agents fail in production (and how to fix it)

Key Concepts

Agentic AI: Systems where AI models act autonomously to perform tasks, manage workflows, and interact with tools.
ADK (Agent Development Kit) 2.0: A framework for building, evaluating, and deploying production-grade AI agents.
Agent CLI: A command-line interface tool that packages skills for scaffolding, testing, and deploying agents on Google Cloud.
Agent Engine: The deployment runtime for hosting AI agents in a production environment.
Graph-based Workflows: A design pattern in ADK 2.0 that separates deterministic logic (compliance/fixed steps) from reasoning nodes (model-driven decisions).
Ambient Agents: Agents that operate proactively based on triggers (e.g., file uploads, cron jobs) rather than reactive user prompts.
Resume Agents: A feature allowing agents to recover from interruptions (network drops, service failures) by tracking state and skipping completed tasks.
Soul File: A configuration file defining an agent’s personality and specific operational context.

1. Evolution of AI Development

Shubam Sabu highlights the shift from the "GPT-3 era," where success relied on manual prompt engineering (hammering models to get structured output), to the current era where models are "table stakes." Today, the primary challenge is problem shaping—understanding user needs and defining the logic around the model. He emphasizes that while models are universal functions, the "harness" (the agentic framework) is what determines the success of an application.

2. Agent CLI: Streamlining the Development Lifecycle

Agent CLI simplifies the transition from local prototyping to cloud deployment.

Process: Developers install the CLI via a single command (uvx). It integrates with coding agents (e.g., Gemini CLI, Claude Code) to provide context on the ADK codebase.
Capabilities:
- Scaffolding: Automatically generates agent files and directory structures.
- Testing: Provides a local web UI for real-time interaction and log inspection (states, artifacts).
- Evaluation: Uses agents to generate and run evaluation criteria against the code.
- Deployment: Deploys directly to the Agent Engine with explicit approval steps, ensuring production-ready features like observability and identity management are included.

3. Multi-Agent Systems and Real-World Applications

Sabu demonstrates a "PR Roster" application to illustrate multi-agent orchestration:

Workflow: A code analyst agent reviews a GitHub Pull Request, which then feeds data to a "Roast Master" agent. The Roast Master generates a critique and a prompt for an image model (Gemini Flash) to create a meme.
Key Insight: "Prompt history is the new code." Complex systems can be built entirely through natural language prompts within the terminal, bypassing traditional IDEs.

4. ADK 2.0: Production-Grade Framework

ADK 2.0 addresses the gap between 30-second demos and long-running production agents:

Reliability: By using graph-based workflows, developers can ensure that critical business logic remains deterministic while allowing the model to handle reasoning tasks.
Resilience: The is_resumable flag allows agents to survive infrastructure failures by tracking progress and resuming from the last successful state.
Proactivity: Ambient agents allow for autonomous execution based on external events, providing built-in concurrency limits and retry logic.

5. Philosophy: Managing AI as an Employee

Sabu argues that developers often fail because they treat agents as "magic" rather than employees.

The Context Trap: Dumping massive amounts of data into an agent causes it to "drown."
The Solution: Onboard agents like interns. Start simple, allow the agent to interview the user to learn preferences, and store this in a "Soul File" (personality) and "User MD" (context).
Communication: Soft skills, such as user empathy and clear communication, are becoming more critical than raw coding ability as the technical barriers to building agents lower.

6. Rapid Fire Insights

Rag vs. Long Context: The debate is shifting; long context windows may eventually replace RAG for many use cases.
Architecture: The simplest architecture that solves the problem is always the best.
Evaluation: The biggest bottleneck in AI today is the lack of standardized evaluation. "Vibe-based" testing is insufficient for production.
Open Source vs. Proprietary: Open source will likely handle 80% of use cases, while proprietary models will remain necessary for highly complex, state-of-the-art reasoning.

Conclusion

The main takeaway is that the barrier to entry for building sophisticated, production-ready AI agents has been significantly lowered by tools like Agent CLI and ADK 2.0. Developers should focus on problem definition, clear communication with agents, and rigorous evaluation rather than getting lost in the complexity of the underlying models. As Sabu notes, the future of development involves agents building agents, making the ability to design clear, reliable agentic workflows the most valuable skill for the next generation of engineers.

Why 99% of AI agents fail in production (and how to fix it) | The Agent Factory