UK AISI: GPT-5.5 Matches Claude Mythos on Cybersecurity Benchmarks

When Anthropic restricted access to Claude Mythos Preview to approximately 40 organizations under its Project Glasswing initiative, the justification was specific: Mythos represented an unprecedented cybersecurity capability — one too dangerous for general release. That rationale has now been substantially challenged. The UK’s AI Security Institute (AISI) ran OpenAI’s newly public GPT-5.5 through the same 95-challenge cybersecurity gauntlet used to evaluate Mythos — and GPT-5.5 matched it at every level. On the hardest Expert challenges, GPT-5.5 scored 71.4% vs Mythos’s 68.6% — within the statistical margin of error.

What the AISI Found

The AISI’s cybersecurity evaluation used 95 challenges designed to test a model’s ability to identify, analyze, and exploit software vulnerabilities — spanning from reconnaissance tasks to building complete working exploits. The suite was calibrated against Mythos’s capabilities specifically, making it the most directly comparable test available.

GPT-5.5’s performance across the benchmark was not slightly below Mythos — it matched it. In one benchmark challenge, GPT-5.5 built a disassembler for a Rust binary completely autonomously in 10 minutes and 22 seconds at a cost of $1.73. The AISI concluded that the cybersecurity risk is not a breakthrough unique to Anthropic’s model: it is “a byproduct of more general improvements in long-horizon autonomy, reasoning, and coding” across frontier models as a category.

That conclusion is significant. Anthropic’s justification for Mythos’s restricted access rested on the premise that the model represented a qualitative leap in cybersecurity capability — something sufficiently different from existing public models to warrant unprecedented access controls. If GPT-5.5, which is publicly available to any Plus, Pro, Business, or Enterprise user, matches Mythos on the same benchmark, the “unprecedented” framing is difficult to sustain.

The Implication for Restricted Access Models

The AISI finding raises a question that the AI safety community has not fully resolved: does restricting access to one particularly capable model achieve meaningful security benefit if comparable capability is available through other public channels?

The argument for Mythos’s restrictions was essentially first-mover: give defenders — the 40 Project Glasswing organizations — a head start in hardening their systems before Mythos-class capabilities became widely accessible. If GPT-5.5 is already at Mythos’s level and is available to any ChatGPT subscriber, the head start was shorter than anticipated. The window between Mythos’s restricted deployment and the availability of equivalent public capability has apparently closed in weeks, not months.

This doesn’t invalidate Anthropic’s decision to restrict Mythos — the restrictions made sense given what was known at the time of deployment, and the Project Glasswing defensive work is producing real security improvements in critical software. But it does suggest that model-level access controls may not be a durable cybersecurity strategy when frontier capability is diffusing across multiple public models simultaneously.

OpenAI’s Response: Filters Over Access Controls

OpenAI’s approach to the same capability question is instructive. Rather than restricting GPT-5.5’s access to a small set of vetted organizations, OpenAI deployed stricter cybersecurity classifiers that refuse requests the model judges to be potentially enabling attack development or exploit creation outside authorized contexts. The tradeoff is different: broad access with behavioral guardrails versus narrow access with fewer guardrails.

Neither approach is clearly superior. Classifier-based filtering can be probed, jailbroken, or circumvented in ways that a closed-access model cannot. But access restrictions don’t prevent the capability from being replicated by competing labs — as GPT-5.5 demonstrates — and they create friction for the defensive security researchers who are precisely the users most likely to benefit from the capability legitimately.

What the UK and US Regulators Are Watching

UK and US financial regulators who spent April urgently briefing banks on the risks of Claude Mythos Preview now face an updated picture: the capability they were warned about is no longer exclusive to a single restricted model. The Bank of England, FCA, and US Treasury’s conversations with financial institutions about AI cybersecurity risk need to expand from “one restricted model” to “any frontier model available commercially” — a substantially broader and less tractable threat surface.

Conclusion

The AISI’s GPT-5.5 benchmark results are one of the most important cybersecurity findings of 2026 — not because they reveal new capability, but because they reveal that the industry’s assumption that frontier cybersecurity capability is uniquely concentrated in one restricted model was incorrect. The implication for every organization that has built its AI security posture around monitoring Mythos access: the threat model needs to be updated. Browse our directory to explore Claude, ChatGPT, and the AI tools whose capabilities are reshaping the cybersecurity landscape.

AI coding ChatGPT Claude Cursor developer tools GitHub Copilot

Written by

Daily-Hub-Admin

All Posts →

UK Security Institute Finds GPT-5.5 Matches Claude Mythos on Cybersecurity — Undermining the Case for Restricted Access

What the AISI Found

The Implication for Restricted Access Models

OpenAI’s Response: Filters Over Access Controls

What the UK and US Regulators Are Watching

Conclusion

Daily-Hub-Admin

You Might Also Like

Cerebras Is on Track for a $26.6 Billion IPO — The AI Chip Startup That Builds the World’s Largest Single Chips

Replit Just Hit a $1 Billion Revenue Run Rate — Going From $2.8M to $1B in 18 Months

SpaceX Has a $60 Billion Option to Acquire Cursor — The Biggest AI Coding Deal Ever Proposed

Stay ahead of AI