How Anthropic Skills Actually Work (Live Stream Deep Dive)
By Eduards Ruzga
Key Concepts
- Skills: A new feature by Anthropic that allows LLMs to dynamically load prompts and use code more effectively. They are hierarchical, with a main skill file linking to other skill files.
- MCPs (Model-Centric Prompts): A previous approach where full prompts were loaded into the context, potentially consuming many tokens. Skills offer a more token-efficient alternative.
- Code Execution: A capability within Claude Desktop that enables the AI to edit and create files, including office documents.
- Internet Access for Skills: A recently released feature allowing skills to access the internet, though with limitations.
- Virtual Machine (VM) / Container: An isolated environment provided for each chat session in Claude, equipped with resources like RAM and storage, and capable of executing code.
- Persistence: The ability for a chat session's environment and data to persist over time, though not shared between different chat sessions.
- Desktop Commander: A tool developed by the speakers that allows LLMs to use a user's local computer as a tool, offering similar functionalities to Claude's skills but running locally.
- Skill Development: The process of creating and improving skills, which can involve writing prompts, scripts, and even modifying existing skills.
- Test-Driven Development (TDD) for Skills: The concept of creating tests for skills to ensure they function correctly and to prevent regressions when modifications are made.
Claude Skills: A Deep Dive
This video explores Anthropic's new "Skills" feature, a significant step towards enabling AI to interact with software and code more dynamically. The speakers, Edwards and Mitri, compare this to their own tool, Desktop Commander, and discuss the architecture, capabilities, and limitations of Claude's skills.
Architecture and Functionality of Skills
Skills are presented as a more efficient and flexible way to provide LLMs with access to functionality compared to previous MCPs.
- Hierarchical Structure: Skills are organized hierarchically. A main skill file acts as an entry point, linking to other skill files that contain specific prompts and code. This allows for modularity and efficient loading of only necessary prompts.
- Token Efficiency: Unlike MCPs which load entire prompts into context, skills load only a description of the skill. The AI then loads the relevant prompts dynamically when needed, saving tokens.
- Code Integration: Skills can include scripts (e.g., Python code) that the LLM can execute. This is contrasted with tools like ChatGPT's Code Interpreter, which generates code from scratch. Skills leverage pre-written, tested code, making execution faster, cheaper, and more deterministic.
- Dynamic Loading: The AI decides which skill to use and loads its associated prompts only when required. This is described as a "fractal" or "context on demand" approach.
- Self-Improvement: A key aspect of skills is the AI's ability to write, modify, and improve them, leading to a self-improving system.
Enabling and Using Skills in Claude Desktop
The speakers demonstrate how to enable and interact with skills within Claude Desktop.
- Enabling Skills: Skills are not enabled by default. Users must go to Settings > Capabilities and enable Code Execution and then turn on Skills.
- Skill Creator: A specific skill, "Skill Creator," allows the AI to create new skills collaboratively with the user.
- Office File Skills: Claude has released skills for working with PDF, DOC, Excel, and PowerPoint files. These skills often involve generating HTML first and then converting it to the desired format.
- Internet Access: A recent update allows skills to access the internet. Users can grant access to all skills, only package managers, or none. This was previously limited to package managers for installing dependencies.
Demonstration: PowerPoint Skill
The speakers test a skill for creating PowerPoint presentations.
- Process: The AI reads documentation on working with PowerPoint, identifies the need to use HTML-to-PowerPoint conversion, and generates HTML files. It then writes code to convert these HTML files into a PPTX format.
- Code Execution: The AI uses a script to check if the content fits within the slides, demonstrating that LLMs, while not inherently good at geometry, can leverage scripts within skills to perform such checks and make adjustments.
- Output: The skill generates an HTML-based PowerPoint presentation. While the initial output is described as "okayish," it's considered a significant improvement over previous methods.
The Claude Environment and Persistence
The underlying environment for skills in Claude is explored.
- Isolated Containers: Each chat session runs in a separate, persistent virtual machine (container) on the same machine. These containers are well-resourced (e.g., 9 GB RAM, 5 GB space).
- Persistence: Data and the environment within a chat session persist. Files created in one chat are not accessible in another, but the container itself remains active for a period. This is contrasted with some other platforms where instances might be shared and reset more frequently.
- Limitations: While containers have internet access, they are identified as bots, limiting their ability to scrape websites like YouTube.
Exploring Skills and Repositories
The speakers delve into the available skills and their sources.
- Public Skills Repository: Anthropic maintains a public repository of skills, which are mirrored within the Claude environment.
- Example Skills: Some skills are provided as examples and can be used as a basis for creating custom skills.
- Licensing: Public skills provided by Anthropic have a restrictive license, stating they are for use "as is" and cannot be reproduced or modified. Example skills, however, are more permissive.
Desktop Commander vs. Claude Skills
A comparison is drawn between Claude's skills and the Desktop Commander tool.
- Local vs. Remote: Desktop Commander leverages the user's local machine, offering more power and flexibility, while Claude's skills run in a remote, controlled environment.
- Internet Access: While Claude skills now have internet access, it's limited. Desktop Commander, running locally, has full internet access through the user's machine.
- File Upload Limits: Claude has file upload limits (e.g., 31 MB), whereas Desktop Commander can handle larger files locally, such as video compression.
- Universality: Skills are presented as potentially universal, usable by any AI tool with file system and terminal access. Desktop Commander is an example of such a tool.
Limitations and Future Directions
Several limitations and potential future developments are discussed.
- Bot Detection: The isolated environments are easily detected as bots, preventing activities like web scraping.
- Lack of Background Processes: Claude's tasks do not appear to run in the background, and leaving a chat can disrupt ongoing processes.
- Context Persistence: While skills persist, the AI's "knowledge" or context from previous interactions within a chat session is not explicitly stored in the skill environment itself. Context needs to be provided through the chat or other means.
- Testability: A key missing feature for skills is the ability to integrate tests, enabling a form of test-driven development for AI.
- UI/UX Evolution: The speakers believe that while voice and text interfaces are important, visual and gestural communication will remain crucial for human-AI interaction, leading to new UI/UX challenges.
- Skill Publishing: A desire is expressed for a mechanism to easily publish created skills.
Creating and Publishing a New Skill
The speakers demonstrate the creation and publishing of a new HTML-to-PDF skill.
- Process: They ask Claude to create a skill for generating beautiful PDFs from HTML, similar to the HTML-to-PPTX skill.
- Iteration: The AI attempts to create the skill, and the speakers provide feedback, asking for improvements.
- Output: A functional HTML-to-PDF skill is created, including a README, the PDF itself, and the skill files.
- Publishing: Using Desktop Commander and its GitHub CLI integration, the newly created skill is then published to a GitHub repository without leaving the AI interface.
Conclusion
Anthropic's Skills feature represents a significant advancement in enabling LLMs to interact with external tools and code. The ability to dynamically load prompts, execute scripts, and potentially self-improve skills offers a more efficient and powerful way for AI to solve problems. However, limitations in areas like internet access, bot detection, and background processing remain. The speakers highlight the power of local execution with tools like Desktop Commander, offering greater flexibility and control. The future of AI interaction is seen as increasingly tool-centric, with users building and customizing their own AI capabilities.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "How Anthropic Skills Actually Work (Live Stream Deep Dive)". What would you like to know?