AI Cyber Offense Capabilities Are Doubling Every Six Months

Overview

AI offensive cyber capabilities are not progressing linearly; they are accelerating exponentially. A recent study by Lyptus Research quantified this alarming trend, finding that the ability of large language models (LLMs) to execute complex cyber tasks has doubled every 9.8 months since 2019. More critically, the rate of progress has sharply accelerated since 2024, now doubling every 5.7 months. This pace suggests that defensive measures and current security protocols are rapidly falling behind the technological curve.

The research, which utilized the METR time-horizon method and tested models against 291 distinct tasks, provides concrete metrics of this escalation. It demonstrates that advanced closed-source models, such as GPT-5.3 Codex and Opus 4.6, can now solve highly complex tasks with a 50 percent success rate using a modest two-million-token budget. These tasks, which would require an estimated three hours of effort from human security experts, are now within the grasp of AI systems.

The implications of this acceleration are profound, suggesting that the current understanding of AI threat vectors is severely underestimated. The data points to a systemic failure in defensive readiness, requiring immediate attention from both government and private sector security entities.

Quantifying the Escalation of AI Threat Vectors

AI Cyber Offense Capabilities Are Doubling Every Six Months

Quantifying the Escalation of AI Threat Vectors

The study’s most striking finding is the sheer magnitude of the performance jump when scaling resource budgets. When the token budget increases from two million to ten million, the capabilities of GPT-5.3 Codex expand dramatically. The time horizon for task completion jumps from an estimated three hours to over ten and a half hours. This scaling ability suggests that the current benchmark tasks are merely the tip of the iceberg, and the true rate of progress is likely even faster.

This quantitative jump highlights that the problem is not just the existence of capable models, but the speed at which their operational scope expands. The ability to manage a ten-million-token context window allows the AI to maintain coherence and complexity over extended, multi-stage cyber operations. This moves the threat profile from simple exploit generation to sophisticated, multi-vector campaign planning.

Furthermore, the research established a clear performance gap between proprietary and open-source models. Open-source alternatives lag behind their closed-source counterparts by approximately 5.7 months. While this gap provides a temporary window of opportunity for defenders, the accelerating pace of the closed-source leaders means that this gap is unlikely to widen, making open-source models increasingly inadequate for high-level threat modeling.

The Operational Gap Between Theory and Practice

The successful execution of these tasks requires more than just brute-force token processing; it demands a deep understanding of exploit chaining, vulnerability mapping, and lateral movement—the hallmarks of professional human penetration testing. The fact that models like GPT-5.3 Codex are achieving a 50 percent success rate on these tasks indicates a significant shift in the threat landscape.

Historically, AI was viewed as a powerful tool for code generation or data analysis. The current data reframes it as a fully integrated, autonomous offensive platform. The models are not merely suggesting code; they are solving multi-step, resource-intensive problems that mimic the cognitive process of a highly skilled, dedicated adversary.

This capability fundamentally changes the defensive calculus. Traditional perimeter defenses, which rely on detecting known signatures or patterns of attack, are insufficient. The threat is now intellectual and systemic, requiring defenses that model the intent and process of the attack, rather than just the payload itself.

The Open-Source Arms Race and Defensive Lag

The differential performance between closed-source and open-source models introduces a critical vulnerability into the global security ecosystem. While open-source models promote transparency and decentralized research, their current performance gap of 5.7 months relative to proprietary systems means that critical infrastructure relying solely on open-source tooling is operating with a built-in, quantifiable risk deficit.

This gap is not merely a technical inconvenience; it is a strategic vulnerability. Organizations that cannot afford or implement the most advanced closed-source AI tools will face a significantly higher risk profile when confronted by state-level or well-funded criminal actors leveraging the latest proprietary models.

The data suggests that the industry is currently reacting to the threat rather than predicting it. The acceleration rate—doubling every 5.7 months—demands a proactive shift in defensive investment, moving away from reactive patching cycles toward predictive, AI-driven threat modeling and resilience engineering.

AI Cyber Offense Capabilities Are Doubling Every Six Months

Key Points

Overview

Quantifying the Escalation of AI Threat Vectors

The Operational Gap Between Theory and Practice

The Open-Source Arms Race and Defensive Lag

More stories

Anthropic discovers "functional emotions" in Claude that influence its behavior

GPT-5.4 Just Dropped: Is OpenAI's New Model the AI Powerhouse We've Been Waiting For?

Gemma 4 Brings Private Agentic AI to Smartphones