XBOW on HackerOne: What’s Next for AI Pen Testing

XBOW reached #1 on HackerOne’s leaderboard, proving AI can match human security researchers. Now the focus shifts to integrating XBOW into pre-production workflows.

As some of you might have heard, XBOW, our autonomous AI pen-tester, recently hit the #1 spot on the HackerOne leaderboards. For a team with deep roots in the bug bounty world, this feels like coming full circle. It's a moment of great pride, but it also marks a new chapter for us.

When we started building XBOW, we had a simple, foundational question: could an autonomous hacker really match a human one? Back then, the idea was a mystery. There was no playbook for an AI to find vulnerabilities like a human researcher. We knew our answer couldn't come from a lab. It had to be proven in the wild.

Our journey began with a relentless pursuit of objective proof. We started with CTFs, then quickly moved on to building our own novel benchmarks: 104 realistic scenarios designed to test both offensive tools and human experts.

These benchmarks were an excellent starting point, but they were still artificial exercises. After the CTFs, we decided to let the system loose on real software, hunting vulnerabilities in open source web applications from DockerHub. XBOW discovered numerous zero-day vulnerabilities, proving the technology could find new, original bugs in the wild. However, open source web apps are typically smaller and less complex than large production systems.

The logical next step was to find bugs in real, black-box production environments. We chose to compete in one of the largest hacker arenas: HackerOne. It offered something we couldn't replicate: thousands of real, hardened targets at a scale that forced us to evolve at an incredible pace.

HackerOne was our live-fire range, and every time we developed a new capability, we set it loose on the platform. The feedback loop was immediate and unfiltered, forcing us to relentlessly sharpen XBOW's accuracy and reduce false positives, while helping secure participating organizations.

The leaderboard was never the goal in itself, but it became the ultimate benchmark for our founding question. Last week, right before BlackHat, we reached #1 globally in Q2. It was the culmination of countless hours of work by a small, dedicated team, whose brilliance and persistence in building XBOW from the ground up made this possible. It proved that an AI can indeed perform at the highest level of security research.

With that question decisively answered, our primary mission on the platform has reached its conclusion.

We believe the greatest benefit of XBOW's capabilities derives from running it pre-production, before any change is exposed to the outside world. Therefore, we are now focussed on working with customers to help them realize that vision.

XBOW isn’t here to replace pentesters or researchers; it augments teams. By removing routine burdens from penetration testers, it frees them to explore frontier vulnerability classes and the application-specific bugs that matter most.

While we are no longer focused on climbing the leaderboard, this doesn't mean we're leaving HackerOne. We will continue to use it as the place where we work with the community to gather feedback and to battle-test our most experimental capabilities before they touch a customer environment.

HackerOne was where we proved what was possible. Now, it’s time to take that capability to everyone who needs it.

XBOW on HackerOne: What’s Next

Related Posts

GPT-5.5: Democratizing Cyber Capabilities

Taking the Top Hacker in the US to New Heights: XBOW Raises $75M Series B

XBOW Is Now Available on AWS Marketplace