‍Autonomous Offense IRL: What Anthropic’s GTG‑1002 Exposes, and How We Scale the Fight Back

AI-enabled attackers are no longer theoretical. Anthropic’s GTG-1002 report shows an LLM executing the majority of a real attack lifecycle, from recon to lateral movement. This shift makes clear that traditional pentesting cannot keep up. Security teams now need a way to validate their defenses against AI-level offense.

Learn more

When we founded XBOW, we set out to understand how LLMs could transform offensive security. What became obvious quickly was that with every new model release, capabilities were accelerating at an unprecedented pace. Skills that once required years of pentesting experience were developing inside LLMs in a matter of months. Attackers were accelerating similar capabilities in parallel.

After spending 20 years in this industry, starting in zero-day discovery and exploitation, seeing AI demonstrate real offensive capabilities was a pivotal moment for me. It was immediately clear that this would change security forever. And with that realization came a responsibility: to build a product that allows customers to safely replicate and emulate the attacks that AI-enabled adversaries will soon be using. That’s what we do at XBOW.

“Coming Soon” becomes “Now”

Yesterday, Anthropic documented the first large-scale cyberattack executed mostly without human intervention. This is exactly the early stage of what we described when we published The Chaos Phase.

The report details how the Chinese state-sponsored group GTG-1002 ran an entire attack lifecycle powered by LLMs. Analysis shows that GTG-1002 completed roughly 80–90% of their tactical work (recon, phishing kit generation, privilege escalation attempts, lateral movement experiments, and exfiltration prep) through an LLM, leaving humans in a supervisory role. Operators “role‑played” a benign security engineer, then decomposed tasks into small chunks that slipped past safety rails. Critically, the agent scaled, firing thousands of requests per second, and it targeted roughly 30 organizations across tech, finance, government and chemical manufacturing.(Anthropic)

This is clear evidence that the game has changed. With AI-level scale and speed, adversaries can be everywhere, all at once.

The Monday playbook: Why this matters for us as CISOs and our Blue Teams

The role of a CISO has never been more fraught with complexity and change management. In the early 2000s, when the CISO role was still emerging, we had to explain why penetration testing even mattered. Today, the emergence of wide-scale AI-attacks is prompting a new shift in the defense paradigm for the enterprise. The need to understand how our systems stand up against AI-driven attacks is increasingly urgent. One thing is clear, traditional pentesting is no longer enough.

And more specifically, three lessons stand out for our Blue teams from the AI-orchestrated attack detailed by Anthropic.

Task‑shaping: breaking malicious work into innocuous subtasks to bypass controls.
Speed + concurrency: autonomous retry loops and branching probes outrun human triage.
Breadth: the same playbook swept multiple sectors, implying reusable kill chains.

In short, automation turns yesterday’s “sophisticated APT” into tomorrow’s baseline. Defenders have to meet automation with automation, and verify every “defensive” agent is actually aligned.

One thing is clear, Anthropic’s report shows that this is only the beginning. Research such as “From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs” demonstrates how fast and inexpensive it has become to transform a CVE report into a working exploit (https://arxiv.org/abs/2509.01835). AI isn’t just powering operations; it’s accelerating exploit creation itself.

That said, Anthropic’s report feels somewhat blunt and leaves more questions open than it answers. I find it unlikely that a real Chinese threat actor would rely on a standard Anthropic model, not because these models aren’t effective (we’ve demonstrated their offensive capabilities repeatedly), but because such groups know Anthropic actively monitors for this type of activity and learns from the patterns, behaviors, and objectives they observe.

It would be naïve to assume that sophisticated nation-states haven’t already built in-house AI systems with offensive capabilities, leaving no audit trail behind.

I strongly believe in secure-by-default design and the power of great security engineers. But the reality is clear: security teams can’t keep up with the velocity of modern engineering and AI.

Our Monday standups should be about what steps we need to take within the week to rethink the tool stack and defensive fabric that secures us.
‍
AI-driven offense is already here, and it’s not going to wait for anyone.

‍

https://xbow-website-b1b.pages.dev/traces/

Nico

Waisman

Head of Security

Bluesky

GitHub