The BIG Problem with MCP Servers (and the Solution!)
By Cole Medin
Key Concepts
- MCP (Messaging Communication Protocol): A protocol for connecting AI agents to tools and data.
- Token Consumption: The amount of "tokens" (words or sub-word units) an LLM uses to process information.
- Context Rot: The degradation of an LLM's performance or ability to recall information due to an overloaded context window.
- Tool Definitions: Descriptions of available tools and their parameters that are loaded into an agent's context.
- API Wrappers: Software that allows agents to interact with existing APIs.
- Real-time Discovery: The ability for an agent to find and load capabilities only when they are needed.
- Code Execution: The ability for an agent to generate and run code to interact with APIs or perform tasks.
- Claude Skills: A feature developed by Anthropic that allows agents to generate scripts and instructions for on-demand execution, significantly reducing initial token consumption.
- Code Sandbox Environment: A secure, isolated environment for executing code to prevent security risks.
- Control vs. Flexibility Trade-off: The balance between having predictable and controlled agent behavior (MCP) and allowing for more dynamic and adaptable capabilities (Skills/Code Execution).
The Problem with MCP: Token Inefficiency and Context Rot
The video highlights a significant flaw in the Messaging Communication Protocol (MCP), which has become a popular method for connecting AI agents to tools and data. The core issue lies in its token inefficiency and the resulting context rot.
1. Token Inefficiency:
- Every tool definition, including its purpose ("when") and parameters, consumes tokens.
- Each instance of a tool being leveraged also adds to token consumption.
- This leads to a bloated context window, especially as agents are given more capabilities.
2. Context Rot:
- While modern Large Language Models (LLMs) can handle a large number of tokens, simply fitting them into the context window doesn't guarantee graceful handling.
- Overwhelming an agent with too many tool definitions upfront can lead to performance degradation, similar to earlier LLMs like GPT 3.5 in 2022, which could only manage a few tools effectively.
Example of Token Consumption:
- The speaker provides a stark example using five standard MCP servers for AI coding.
- These servers, even before any user interaction, consume thousands of tokens each to describe their capabilities.
- In a live demonstration within "Cloud Code," connecting these MCP tools resulted in 97,000 tokens being consumed, representing 48% of the context available for Claude Sonnet 4.5.
- This level of token usage, even with LLMs capable of millions of tokens, can overwhelm the agent.
The Proposed Solution: Real-time Discovery and Code Execution
The solution proposed, and championed by Anthropic, focuses on providing capabilities to agents only when they are needed, rather than loading everything upfront. This involves real-time discovery and code execution.
1. Real-time Discovery:
- Instead of pre-loading all tool definitions, the agent should be able to discover capabilities dynamically.
- This means loading the content and context of specific capabilities only at the moment they are required for use.
2. Code Execution:
- Many MCP servers function as API wrappers. The speaker uses their open-source project, Archon, as an example, which provides API endpoints for knowledge, projects, and tasks.
- The core idea is to allow the agent to write code directly to interact with these API endpoints, bypassing the MCP server as a middleman that loads dozens of tools upfront.
Benefits of Code Execution:
- Reduced Token Usage: Significantly less token consumption at the start of a conversation.
- Increased Flexibility: The agent can decide how to interact with APIs, rather than being constrained by predefined MCP tools.
- Code Reusability: The agent can generate and save code for later use, creating its own instruction sets and scripts.
Mechanism:
- The AI agent generates scripts on demand.
- When the agent needs to leverage a capability, it loads the instruction set plus the relevant code context and executes it.
- This process is triggered by the user's query, for example, if a query relates to Google Drive, the agent would then load the specific capabilities for interacting with Google Drive.
Claude Skills: A Practical Implementation of the Solution
Anthropic's Claude Skills are presented as a direct embodiment of the solution to the MCP problem.
How Claude Skills Work:
- Skills enable AI agents to generate scripts for interacting with API endpoints and create instructions for using these scripts.
- At the start of a conversation, only a brief description of the skill (a couple of sentences) is provided to the AI agent, drastically reducing initial token usage compared to MCP's thousands of tokens for tool definitions.
- When the agent decides to use a skill, the full instruction set and relevant code are loaded and executed. The agent can even load just the function descriptions and how to call them, further optimizing token usage.
Token Efficiency of Skills:
- Claude Skills are reported to be 2-3% of the token usage at the start of a conversation compared to MCP.
Example with Archon:
- The speaker demonstrates how their Archon MCP server was transformed into a Claude Skill.
- This transformation maintains full functionality while being significantly more token-efficient and flexible, allowing the coding assistant to generate its own code to interact with the Archon API.
- This approach allows for dozens of skills to be given to agents simultaneously without overwhelming the context window.
- The Archon skill example is available in the description and was demonstrated in the Dynamus community, proving its effectiveness and low upfront token cost (a few hundred tokens).
Is This the End of MCP?
The video addresses the question of whether Claude Skills and code execution signal the end of MCP. The short answer is no.
Arguments for MCP's Continued Relevance:
- Predictability and Control: MCP offers a "what you see is what you get" approach. Developers define the tools users can leverage, providing more control and predictability.
- Security: Code execution, while flexible, introduces security risks. MCP, by not generating arbitrary code, offers a more controlled environment.
- Credential Management: Managing environment variables and credentials in skills is still an area that needs further development.
- Reduced Risk of Missed Capabilities: With everything provided upfront in MCP, there's less chance of the agent missing capabilities due to a lack of search and discovery, especially if the LLM is overwhelmed.
Arguments for the Rise of Skills/Code Execution:
- Flexibility: Agents can have dozens of capabilities available without context overload.
- On-Demand Loading: Instructions and code are loaded only when needed.
- Workflow Definition: Agents can define custom workflows by combining API calls.
- LLM Advancement: As LLMs become more powerful and trustworthy, they can handle more complex tasks like code execution and dynamic capability discovery.
- Autonomy: The shift towards skills and code execution enhances agent autonomy, aligning with the fundamental purpose of AI agents.
Conclusion: The speaker believes that flexibility will increasingly win over control as LLMs continue to evolve. The trend is towards making AI agents more autonomous by increasing their flexibility through mechanisms like Claude Skills and code execution. While MCP will likely persist for scenarios requiring high control and predictability, the future points towards more dynamic and efficient capability management.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "The BIG Problem with MCP Servers (and the Solution!)". What would you like to know?