Back to all videos

Cut AI token usage by 96%? Here’s how AWS Strands Agents does it.

By The New Stack

AI Agent Frameworks LLM Token Optimization Cloud AI Infrastructure.

Share:

Key Concepts

Strands Agents SDK: An agentic framework for building custom AI agents using Python or TypeScript.
Agentic Loop: A model-driven approach where an agent is provided with a task and tools, then autonomously determines the steps to solve the task.
Intent-Based Tool Design: Designing tools around specific outcomes or tasks rather than mapping them one-to-one to granular API endpoints.
MCP (Model Context Protocol): A standard for connecting AI assistants to systems and data sources.
Agent Core Gateway: An AWS service that hosts MCP servers, allowing for secure, scalable tool management and semantic search.
Semantic Search for Tools: Using embeddings to dynamically filter and provide only the most relevant tools to an agent based on the user's prompt.
Context Bloat: The inefficiency caused by providing an agent with too many irrelevant tools, leading to higher token consumption and potential performance degradation.

1. Main Topics and Key Points

The discussion focuses on optimizing AI agent performance through better tool design and efficient context management.

Model-Driven vs. Workflow-Driven: Unlike traditional, brittle workflows where steps are hard-coded, Strands uses a model-driven approach where the agent reasons through the task.
Token Efficiency: The speaker emphasizes that token consumption is directly tied to how tools are defined and how many are exposed to the agent.
Steering: A feature in Strands that uses a "steering model" to keep agents aligned with instructions and prevent them from going "rogue."

2. Real-World Application: Accounting API

The demo uses an accounting API to illustrate the transition from granular to intent-based design:

Granular Approach: Mapping 12+ individual API endpoints (e.g., get_customer, list_invoices) directly as tools. This resulted in 5 tool calls and ~52,000 tokens consumed for a single query.
Intent-Based Approach: Rolling multiple API calls into single, outcome-oriented tools (e.g., get_latest_invoice_status). This reduced the process to 1 tool call and ~2,000 tokens.

3. Methodologies and Frameworks

Tool Design Strategy: Move away from exposing raw, granular APIs. Instead, create an abstraction layer that maps APIs to specific user intents.
Dynamic Tool Mapping: Using Agent Core Gateway to host tools and applying semantic search to inject only the necessary tools into the agent's context at runtime.
Agent Design: The speaker suggests that "narrowly defined" agents (specialized agents) perform better than general-purpose agents, as they allow for more precise tool mapping and lower context load.

4. Key Arguments

Trust in Models: The core philosophy of Strands is that modern models are capable enough to reason through tasks without needing every step explicitly defined by the developer.
Observability: Because agents are non-deterministic, developers must implement observability to monitor where agents get "hung up" or fail to follow instructions.
Context Optimization: Reducing the number of tools exposed to an agent is critical for both cost (token usage) and accuracy (reducing the chance of the agent selecting the wrong tool).

5. Notable Quotes

"The big bet of Strands is that the models are getting better, that we don't need to tell the model everything." — Frederick Claudelnoir
"Agents like intent-based tool design. They like outcome-oriented tool design." — Morgan Willis

6. Logical Connections

The presentation follows a logical progression:

Introduction: Explaining the Strands framework and its model-driven philosophy.
Problem Identification: Demonstrating how granular API mapping leads to high token costs and inefficiency.
Optimization (Design): Showing how intent-based tools drastically reduce token usage.
Optimization (Infrastructure): Introducing Agent Core Gateway and semantic search to solve the problem of "context bloat" in enterprise environments where hundreds of tools might exist.

7. Synthesis and Conclusion

The main takeaway is that building efficient AI agents requires a shift in mindset: developers should stop treating agents as simple API wrappers and start designing "intent-based" tool layers. By combining this design strategy with dynamic, semantic tool discovery (via platforms like Agent Core Gateway), developers can significantly reduce token costs, improve agent accuracy, and create more scalable, maintainable AI systems. While agents are non-deterministic, narrowing their scope and optimizing their context are the most effective ways to ensure reliable performance.

Chat with this Video

AI-Powered

Load the transcript when you're ready to chat so the initial page stays lighter.

Related Videos

Ready to summarize another video?

Summarize YouTube Video