Luisa Crawford
Jul 03, 2026 00:21
Anthropic shares cybersecurity measures for Fable 5 and unveils a new AI jailbreak severity framework, aiming for industry-wide collaboration.
Anthropic, the AI research powerhouse valued at $380 billion, has unveiled detailed cybersecurity safeguards for its Fable 5 model and proposed a framework to assess the severity of AI jailbreaks. Fable 5, part of Anthropic’s Claude family of AI models, was recently re-deployed globally following the lifting of U.S. export controls on advanced AI systems.
Key to Anthropic’s announcement is the introduction of safety classifiers designed to block or monitor potentially harmful use cases of Fable 5. These classifiers categorize activities into four distinct groups: prohibited use, high-risk dual use, low-risk dual use, and benign use. For example, prohibited activities include ransomware development and command-and-control operations, while benign uses involve secure coding and malware reverse engineering. The company has also expanded its “safety margin,” blocking certain low-risk activities as an extra precaution to prevent misuse.
Dual-use challenges are central to Anthropic’s approach. Cybersecurity tools often serve both defenders and attackers, making it critical to distinguish between legitimate defensive applications and malicious exploitation. By training safety classifiers, Anthropic aims to support defensive applications like vulnerability scanning while mitigating risks of abuse.
Alongside safeguards, Anthropic introduced an early draft of its Cyber Jailbreak Severity (CJS) framework. Jailbreaks refer to methods that bypass AI safeguards, enabling potentially harmful outputs. The CJS framework grades jailbreak severity on a logarithmic scale from 0 (informational) to 4 (critical) based on factors such as capability gain, breadth of harmful potential, ease of weaponization, and discoverability. For example, a “turnkey” jailbreak that enables critical domain-expert-level attacks across multiple offensive categories would score at the highest level, CJS-4.
The framework is intended to provide a common language for AI developers and policymakers to assess risks. Anthropic has partnered with Glasswing, a cybersecurity firm, to refine the framework and is inviting input from industry, academia, and government. Additionally, a new HackerOne program allows security researchers to report potential jailbreaks for review.
This announcement follows a period of rapid growth for Anthropic. The company raised $30 billion in a Series G round earlier this year, cementing a $380 billion valuation. Secondary trades in April and May 2026 have reportedly implied valuations nearing $1 trillion. Annualized revenue exceeded $30 billion as of April, underscoring the commercial significance of its Claude models.
Anthropic’s emphasis on AI safety reflects both market and regulatory pressures. President Daniela Amodei recently noted that advanced AI models hold “great promise but also great risks.” By sharing safeguards and frameworks like the CJS, Anthropic aims to establish itself as a leader in responsible AI governance. The company’s commitment to transparency is evident in its public invitation for feedback and its proactive engagement with the security community.
Industry observers will be watching closely as Anthropic’s frameworks evolve. The company’s efforts to standardize AI safety protocols could influence not only its own operations but also broader industry norms, particularly as governments worldwide grapple with the dual-use nature of advanced AI technologies.
Image source: Shutterstock
Credit: Source link




















