12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer

12 Factors of AI Agents: Rethinking Agent Architecture

Key Concepts:

JSON Transformation: LLMs' ability to convert natural language into structured JSON.
Tool Use Harmful (in quotes): The over-abstraction of tool use as a magical process, rather than deterministic code execution based on LLM output.
Directed Acyclic Graphs (DAGs): Representing workflows as a series of nodes with dependencies.
Context Window: The information passed to the LLM to inform its decisions.
Micro Agents: Small, focused agent loops within a larger, mostly deterministic workflow.
Statelessness: Agents should not maintain internal state; state should be managed externally.
Context Engineering: The art of crafting the right input tokens to elicit desired outputs from the LLM.

Introduction

The speaker discusses the common experience of building AI agents, reaching a certain level of functionality (70-80%), and then struggling to improve further due to complexity and abstraction. He emphasizes that not every problem is suitable for an agent-based solution, citing the example of a DevOps agent for building projects. Through conversations with over 100 founders, builders, and engineers, he identified common patterns in successful LLM-based applications, leading to the formulation of the "12 Factors of AI Agents." This is not an anti-framework stance but rather a wish list for improving existing frameworks to better serve builders needing high reliability and speed.

Factor 1: JSON Transformation

The most valuable capability of LLMs is their ability to transform natural language into JSON. The subsequent factors then dictate how to utilize this JSON.

Factor 4: Tool Use Harmful (in quotes)

The speaker argues that the abstraction of "tool use" as a magical process is harmful. Instead, tool use should be viewed as the LLM outputting JSON, which is then processed by deterministic code. This code performs actions, and the results may be fed back to the LLM. The key is to recognize that tools are simply JSON and code, enabling the use of loops and switch statements for control flow.

Factor 8: Owning Your Control Flow (and Execution State)

Traditional agent architecture often involves a simple loop: the LLM determines the next step, context is built up, and the loop continues until the LLM indicates completion. This approach, while seemingly intuitive, often fails in longer workflows due to context window limitations and other factors.

The Agent Loop Breakdown:

Prompt: Provides instructions for selecting the next step.
Switch Statement: Processes the LLM's JSON output.
Context Window: Stores relevant information.
Loop: Manages the execution flow, including break, switch, summarize, and LLM as judge.

Managing Execution and Business State:

Agents should be treated as software, with the ability to launch, pause, and resume workflows. This requires managing both execution state (current step, next step, retry counts) and business state (messages, data displayed to the user, pending approvals).

Implementation:

Wrap the agent behind a REST API or message queue (e.g., MCP server).
Serialize the context window into a database.
Upon resuming, load the context window from the database and append the result.
The agent remains unaware of the background processes.

Factor 2: Owning Your Prompts

While abstractions can provide a good starting point, achieving optimal quality requires hand-crafting prompts, token by token. LLMs are pure functions (tokens in, tokens out), and the quality of the output depends on the quality of the input tokens.

Owning Your Context Building

Similar to prompts, context building should be carefully managed. The speaker suggests experimenting with different formats (e.g., OpenAI messages format, single user message) and stringifying the event state and thread model as needed. Optimizing the density and clarity of information passed to the LLM is crucial for achieving better results.

Error Handling and Context Management

When the model makes errors (e.g., incorrect API calls, API downtime), the error information should be carefully managed before being added to the context window. Blindly adding errors can lead to the agent spinning out of control. Instead, consider summarizing errors or clearing pending errors after a valid tool call.

Contacting Humans with Tools

The speaker emphasizes the importance of allowing the model to communicate with humans using natural language tokens. This enables the model to request clarification, escalate to a manager, or indicate completion. This approach pushes the intent onto the initial token generation, leveraging the model's understanding of natural language.

Trigger Things from Anywhere and Meet Users Where They Are

Agents should be accessible through various channels (email, Slack, Discord, SMS) to avoid forcing users to use dedicated agent interfaces.

Small Focused Agents

Instead of large, monolithic agents, the speaker advocates for micro agents: small agent loops (3-10 steps) within a larger, mostly deterministic DAG.

Example: Human Layer Deployment Bot:

Deterministic CI/CD pipeline.
Upon merging a GitHub PR and passing tests, the task is handed to a model.
The model proposes deploying the front end.
A human can override the decision (e.g., deploy the back end first).
The back end is deployed, followed by the front end.
Deterministic code runs end-to-end tests.
If tests fail, a rollback agent is invoked.

This approach allows for manageable context, clear responsibilities, and easy integration with existing systems.

Statelessness

Agents should be stateless, with state managed externally. This provides flexibility and control over the agent's behavior.

Conclusion

Agents are software, and building effective agents requires applying sound software engineering principles. LLMs are stateless functions, so focus on crafting the right context to achieve the best results. Own your state and control flow for maximum flexibility. Find the bleeding edge by curating the model's inputs and outputs. Integrate agents with human collaboration. The speaker encourages developers to focus on the hard AI parts of the problem (prompts, flow, tokens) and leverage tools that simplify the other aspects of agent development.