Anthropic killed Tool calling

Key Concepts

Tool Calling (Function Calling): The mechanism allowing LLMs to output structured JSON to invoke external APIs or functions.
Programmatic Tool Calling: A paradigm shift where the LLM writes code (e.g., TypeScript) to execute multiple functions, rather than outputting JSON for each individual step.
MCP (Model Context Protocol): A standard for connecting AI agents to external data sources and tools.
Context Window Optimization: Techniques to reduce token consumption by filtering noise and deferring tool loading.
Dynamic Filtering: A method to extract only relevant data from large outputs (like HTML) before feeding it back to the LLM.
Tool Search: A mechanism to dynamically retrieve tool schemas only when needed, rather than loading all available tools into the context window.
Input Examples: Providing few-shot examples within tool definitions to improve the accuracy of complex parameter generation.

1. The Evolution of Tool Calling

Traditional tool calling, established two years ago, relies on a "ping-pong" round-trip mechanism: the LLM outputs JSON, the server executes the function, and the result is sent back to the LLM.

Limitations: This process is non-deterministic, inefficient for complex tasks, and consumes excessive tokens by forcing the LLM to handle large, unnecessary metadata from every tool response.
The Context Problem: Even with 1M token windows, effective context is often limited to 120k–200k tokens. Redundant data from tool outputs (e.g., raw HTML from web fetches) quickly exhausts this space.

2. Programmatic Tool Calling

Anthropic’s new approach moves away from the LLM acting as a "glue" for JSON outputs and instead treats it as a code generator.

Methodology: The LLM is provided with a "code execution" environment. Instead of outputting JSON for one step, it writes a script (e.g., using for loops or conditionals) to invoke multiple tools.
Benefits:
- Efficiency: Reduces token consumption by 30%–50%.
- Determinism: Allows for complex logic (filtering, batching) within the code rather than relying on the LLM to "reason" through every step.
Implementation: Add a code_execution tool to the agent and set the allowed_caller parameter to the code execution tool for your existing functions.

3. Dynamic Filtering for Web Fetch

This feature addresses the "noise" problem when fetching web content.

Process: Instead of dumping full raw HTML into the context window, a middle layer runs code to filter and extract only the relevant information.
Impact: Testing shows an average 24% reduction in token consumption by stripping irrelevant HTML tags and metadata before the LLM processes the content.

4. Tool Search and Deferred Loading

To solve the scalability issue of loading hundreds of tool schemas into the context window, Anthropic introduced "Tool Search."

Mechanism: By setting defer_loading: true in the MCP configuration, tools remain hidden from the agent by default.
Efficiency: The agent uses a single "Tool Search" tool (costing ~500 tokens) to retrieve only the necessary tool definitions dynamically. This can lead to up to 80% optimization of the context window for agents with large toolsets.

5. Tool Use Examples

This feature addresses the difficulty LLMs face when dealing with complex, nested, or optional parameters.

Application: Developers can now provide an array of input_examples within the tool definition.
Evidence: Anthropic’s internal testing showed that providing these examples improved accuracy in complex parameter handling from 72% to 90%. It is particularly effective for ensuring correct date formats or handling dependencies between fields (e.g., SLA hours based on ticket priority).

Synthesis and Conclusion

The updates to Anthropic’s tool-calling capabilities represent a shift toward deterministic, code-first agent orchestration. By moving from simple JSON-based "ping-pong" interactions to programmatic code execution, dynamic filtering, and deferred tool loading, developers can build agents that are significantly more token-efficient and accurate. These improvements allow agents to handle complex, multi-step workflows without hitting the limitations of the context window, ultimately enabling more robust and scalable AI applications.