Why Anthropic is keeping Claude Mythos Preview unreleased
By GitHub
Key Concepts
- Project Glasswing: A collaborative initiative led by Anthropic to secure critical software infrastructure using advanced AI.
- Frontier Model: A highly capable, unreleased AI model (Claude 2 preview) that demonstrates advanced reasoning and coding skills.
- Vulnerability Chaining: The process of linking multiple minor security flaws to achieve full system compromise.
- CyberGym: A benchmark platform used to evaluate the cybersecurity capabilities of AI models.
- Defensive Security: Proactive measures taken to identify and patch software vulnerabilities before they are exploited by malicious actors.
Overview of Project Glasswing
Anthropic has launched Project Glasswing, a strategic initiative designed to leverage the advanced capabilities of their unreleased "Claude 2 preview" model to fortify global software infrastructure. The project involves a coalition of major technology companies working to identify and remediate security flaws in critical open-source and proprietary software.
Technical Capabilities of Claude 2 Preview
The Claude 2 preview model has demonstrated unprecedented proficiency in cybersecurity, significantly outperforming existing models. Key performance indicators include:
- Benchmark Performance: The model achieved an 83.1% score on CyberGym, a notable increase over the 66.6% score achieved by Opus 4.6.
- Vulnerability Discovery: The model successfully identified a 27-year-old vulnerability in OpenBSD and a 16-year-old vulnerability in FFmpeg.
- Autonomous Exploitation: The model demonstrated the ability to autonomously chain together multiple Linux kernel vulnerabilities, resulting in complete system control.
Strategic Investment and Resource Allocation
Anthropic is backing this initiative with significant financial and technical resources to ensure the technology is used for defensive purposes:
- Usage Credits: A commitment of $100 million in usage credits for the Claude 2 preview, specifically earmarked for defensive security research and infrastructure hardening.
- Direct Funding: A $4 million donation to open-source security organizations to support the maintenance and auditing of critical software libraries.
The "Defensive Race" Framework
The core argument behind Project Glasswing is the existence of a "defensive race." As AI models reach a threshold where they can identify and exploit complex software vulnerabilities, there is an urgent need to use these same capabilities to patch systems before malicious actors develop or acquire similar AI-driven offensive tools.
Deployment and Safety Protocols
Due to the high-risk nature of the Claude 2 preview’s capabilities, Anthropic has established strict access controls:
- Restricted Availability: The Claude 2 preview will not be released for general public use.
- Safe Cards: Anthropic plans to introduce "safe cards" alongside the upcoming Claude Opus model. These are intended to provide safety guidelines and guardrails, enabling the eventual deployment of Claude 2-class capabilities in a controlled and secure manner.
Conclusion
Project Glasswing represents a proactive shift in AI safety, moving from theoretical risk assessment to the active application of frontier AI in cybersecurity. By providing massive financial support and restricted access to high-capability models, Anthropic aims to stay ahead of the "offensive" potential of AI, ensuring that the same technology capable of breaking systems is primarily utilized to secure them.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Why Anthropic is keeping Claude Mythos Preview unreleased". What would you like to know?