Why Agents Are Ignoring Your Skills (Literally)

Skills vs. Agents.md: A Deep Dive into Agent Context Management

Key Concepts:

Skills: A progressive disclosure mechanism for providing context to agents, introduced by Anthropic, allowing agents to access detailed information only when needed.
Agents.md (or Clot.md): A grounding document containing all relevant information for an agent, loaded directly into the context window.
MCP (Multi-Modal Context Provider): A server that loads tool definitions into the context window, potentially leading to context window bloat.
Progressive Disclosure: The practice of revealing information to an agent incrementally, only when it’s required.
Context Window: The limited amount of text a language model can process at once.
RL (Reinforcement Learning): A training method used to optimize agent behavior through rewards and penalties.
Post-Training: Additional training applied to a model after its initial training phase, often to improve specific capabilities.
Context Rot: The issue of information in the context window becoming outdated or irrelevant.

1. The Problem with Skills: Low Invocation Rates

The video centers around a recent observation: despite the elegant design of “skills” as a progressive disclosure method for agent context, agents frequently ignore them. A Versel blog post highlighted that using agents.md consistently outperforms skills in their evaluations (evolves). Specifically, in 56% of cases within Versel’s testing, agents completely failed to invoke available skills, rendering the feature ineffective. This contradicts the intended purpose of skills – to provide focused information only when an agent needs it.

2. Skills vs. MCPs & the Rise of Progressive Disclosure

Skills were conceived as a solution to the problems associated with MCPs. MCPs load all tool definitions into the context window upfront, even if the agent doesn’t immediately need them, leading to wasted tokens and potential performance degradation. Skills, in contrast, aim for progressive disclosure: the agent initially has access to skill descriptions, and only retrieves detailed information when it explicitly calls or invokes a skill. However, the core issue is that agents aren’t reliably invoking these skills in the first place.

3. Real-World Application: Next.js 16 Updates & Documentation Management

The video uses the example of a new version of Next.js (version 16) with updates to previous versions as a practical use case for skills. Skills are ideally suited for managing documentation updates or conflicting information between libraries. They offer separation of concerns and prevent context window bloat by making documentation available on demand. However, even in this scenario, Versel found that agents weren’t utilizing the skills effectively.

4. The Root Cause: Lack of Model Training

The primary reason for the low skill invocation rate is that most frontier language models haven’t been adequately trained to utilize skills. While Anthropic has employed reinforcement learning (RL) to train their agents on skill usage, this post-training hasn’t been widely adopted by other leading model developers like those behind Gemini 3 or GPT-5.2. The video draws a parallel to Kimmy’s Agent Swarm, which required “parallel agent reinforcement learning” to effectively manage multiple sub-agents, demonstrating the significant training effort required for complex agent orchestration. The speaker emphasizes that simply adopting the “skills” standard isn’t enough; substantial investment in training is crucial.

5. Unexpected Results: Skills Can Worsen Performance

Interestingly, Versel’s research revealed that adding skills didn’t just fail to improve performance; it actually decreased performance in certain situations compared to a baseline without skills. This highlights the potential for poorly implemented skills to actively hinder agent effectiveness.

6. The agents.md Approach: Simplicity and Effectiveness

The research indicated that the simplest approach – including all relevant information in a single agents.md file (or clot.md for coding agents) – consistently outperformed skills. agents.md functions as a grounding document, providing the agent with all necessary context. While this approach can lead to context window limitations, Versel successfully compressed a 40KB skills context down to 8KB through summarization without significant performance loss. clot.md is presented as the equivalent of agents.md specifically for coding agents, supported by Codex, Gemini, and Anti-Gravity.

7. Prompt Engineering & Skill Invocation

The video notes that how an agent is instructed to use skills significantly impacts invocation rates. Explicitly telling the agent to “use skill [skill name]” increases the likelihood of invocation, but the effectiveness is heavily dependent on the specific prompt used. Therefore, substantial prompt engineering is required to successfully leverage skills.

8. Recommendations & Future Directions

The speaker recommends the following:

Model Selection: Consider the underlying model. Opus 4.5, due to potential RL post-training, might be more capable of utilizing skills.
agents.md as a Default: Prioritize using agents.md (or clot.md) with indexing for a summarized overview of available skills and links to detailed documentation. This aligns with Versel’s recommended pattern.
Context Compression: Summarize skill information to minimize context window usage.
Community Feedback: Encourage sharing experiences and solutions related to skill implementation.

9. Notable Quote:

“Although almost every frontier lab is adopting skills, most of these models are not trained to actually use these. It's a completely different abstraction compared to tool usage or function calling.” – The speaker, highlighting the critical need for model training.

10. Synthesis & Conclusion

The video presents a compelling argument that, currently, the “skills” abstraction for agent context management is often ineffective due to a lack of adequate model training. While the concept of progressive disclosure is sound, agents frequently fail to invoke available skills, leading to diminished performance. The simpler approach of utilizing a comprehensive agents.md (or clot.md) document, potentially with indexing and summarization, appears to be more reliable. The key takeaway is that successful agent context management requires careful consideration of the underlying model’s capabilities and a willingness to experiment with prompt engineering and data compression techniques. Further investment in model training, specifically focused on skill utilization, is crucial for realizing the full potential of this approach.

Why Agents Are Ignoring Your Skills (Literally)

Skills vs. Agents.md: A Deep Dive into Agent Context Management

Chat with this Video

Related Videos

Ready to summarize another video?