Large language models that once produced mostly misleading output are increasingly effective at finding security flaws in widely used software — a development that could both strengthen defenses and make attacks easier.
Anthropic recently introduced Mythos Preview, a model the company says could reshape cybersecurity after it reportedly identified high-severity vulnerabilities in major operating systems and web browsers and suggested exploitation paths. Anthropic is limiting access to Mythos Preview to roughly 50 vetted companies and organizations through a program called Project Glasswing, arguing the model is too risky for a broad public release. The company plans to release other models over time and aims to enable safe deployment of Mythos-class tools at scale.
Security professionals say the immediate concern is for security teams and infrastructure maintainers rather than ordinary users. Daniel Blackford, VP of Threat Research at Proofpoint, notes that everyday people remain most exposed to routine threats like credential theft. Still, better automated bug finding accelerates both defensive work and the potential for misuse: it helps developers locate and fix defects faster, but it can also lower the barrier for malicious actors to discover exploitable bugs.
Open-source maintainers are already experimenting with these models. Jim Zemlin, CEO of the Linux Foundation, said kernel maintainers participating in Project Glasswing have begun testing Mythos to learn how it can assist their work. He argues such tools could significantly reduce the workload on maintainers who are often stretched thin.
The transition from noisy, low-quality AI reports to useful findings became visible in early 2026. Daniel Stenberg, lead developer of cURL, recounted how 2025 brought a flood of low-quality, likely AI-generated vulnerability reports — he received 185 submissions that year but found few real issues and stopped paying bounties because many reports were time-consuming and bogus. Those early AI reports were often long and elaborate where a human would be concise.
By 2026 the quality mix shifted: submission volume stayed high, but the share of legitimate findings rose. A 2025 HackerOne survey had already shown wide interest in auditing AI systems; by the next year, many reports were surfacing real bugs. Stenberg estimates about one in ten reports now identifies security vulnerabilities, and his team fixed more issues in the first three months of 2026 than in either of the two previous years. He also uses AI himself, and a single automated pass flagged over 100 issues that had previously survived human review and static analysis.
Linux kernel maintainers and independent researchers have reported similar gains. Security researchers such as Nicholas Carlini have demonstrated finding kernel vulnerabilities with earlier Anthropic models, and Alex Stamos, chief security officer at Corridor, has said that large language models have surpassed human capability for some types of bug finding. The release of stronger commercial models in late 2025 and into 2026 appears to have driven the recent jump in effectiveness.
Experts caution that finding a bug is only the first stage of an attack. Turning a vulnerability into a reliable exploit requires additional expertise, judgment, and testing; many leading models from labs like Anthropic, OpenAI, and DeepMind include guardrails designed to limit generation of exploit code. The greater worry is that open-weight models — publicly available models whose parameters can be inspected and modified — could close the gap with proprietary models. Those open models could be altered to remove safeguards and used both to discover and to weaponize flaws. Some reports suggest the most advanced open-weight models are only months behind the top closed models, raising concerns that defensive advantages may erode if powerful capabilities become widely available.
There is also debate around AI’s role in fixing vulnerabilities. Stenberg and others say AI excels at detection but still struggles with the contextual judgment needed to implement safe, maintainable patches. Deciding how to remediate a bug in the context of overall design, coordinating changes across teams, and validating fixes remain largely human responsibilities. Meanwhile, companies such as HackerOne are building more agentic tools that aim to both find and propose patches, blurring the line between detection and remediation.
The emergence of these capabilities has political and procurement implications. The U.S. Department of Defense labeled Anthropic a supply-chain risk after the company urged governments not to use its tech for autonomous weapons and mass surveillance; that designation restricts government and contractor use of Anthropic products and is currently contested. Some security experts argue that Anthropic’s decision to limit Mythos Preview’s distribution gives defenders and national security teams time to prepare.
For now, the cybersecurity community faces mixed outcomes: AI is surfacing higher-quality reports and uncovering issues missed by conventional tooling, but maintainers — often understaffed — must triage a higher volume of findings and make careful remediation choices. The coming months and years will test whether the defensive benefits of AI can outpace the risks of misuse as more powerful models proliferate and as open-weight alternatives narrow the gap with closed systems.