May 28, 2026
Product

Alvaro

Muñoz

Getting to “Should I?”, Instead of “Can I?”: How XBOW Finds IDORs With High Accuracy in Ambiguous Contexts

By understanding expected access patterns before testing them, XBOW brings context-aware reasoning to complex authorization issues.

A decade ago, I watched Dr. Gary McGraw’s “Bug Parade” presentation at RSA, and it fundamentally changed how I thought about security tools. McGraw drew a sharp line that stuck with me: software defects come in two varieties—bugs and flaws, split roughly 50/50 across every codebase. Code-level bugs are implementation problems: buffer overflows, SQL injections, the kind of thing that code review and automated scanning can find just by looking into the code. Design flaws are different. They live in architecture and design. They emerge from threat models, security requirements, and the gap between what the application should enforce and what it actually does. You can’t grep for them.

The security industry got the bug parade issue exactly backward, directing all its resources at only half the problem. Both SAST and DAST vendors built increasingly sophisticated tools to chase implementation bugs—and they got very good at it. But the harder half of the problem, design-level flaws, was left to manual threat modeling, requirements analysis, and the tired ritual of putting “a bunch of smart guys in a room.” For flaws, there was no automation because the technology simply wasn’t there.

When I worked across multiple SAST vendors, I saw this blindspot clearly. But the DAST side was not better. Authorization flaws like IDORs (insecure direct object reference) are textbook design problems. They don’t live in malformed input or missing validation. They live in the logic—in the gap between authentication (who are you?) and authorization (what are you allowed to access?). A traditional DAST scanner sees a valid session token, a structurally correct request, a 200 OK response, and marks it as passing. It has no concept of ownership. It doesn’t know which data belongs to which user. It can’t correlate behavior across multiple authenticated sessions. So it misses what an attacker wouldn’t: that User A just pulled User B’s invoice by changing one number in the URL.

That limitation has persisted for over a decade. Most of the industry’s approach to catching IDORs remained what it always was: replay requests with different credentials, hope something breaks. Effective? In narrow cases. Comprehensive? Not remotely.

One of the reasons I joined XBOW was to solve this exact problem. Not by building another vulnerability scanner that applies smarter pattern-matching to code, but by tackling what DAST couldn’t: complex reasoning about authorization logic, multi-step attack chains where early decisions cascade into later vulnerabilities, and the kind of contextual judgment that separates an actual security flaw from normal application behavior. XBOW was built from the ground up as an autonomous pentester, not a vulnerability scanner, which means it can finally bring the kind of logic-driven reasoning to design flaws that we’ve always applied to implementation bugs.

The Problem Isn’t Finding IDORs. It’s Understanding Them.

Every good tester starts with the question: What is supposed to happen here? Everything else builds from that.

Most tools stop at "Can I access this?". XBOW had to answer, "Should I be able to?" Those are very different questions, and closing the gap between them is what makes accurate IDOR testing possible at scale.

There are plenty of bugs you can point to immediately. A user-controlled SQL query, missing validation, unsafe deserialization—you can tie those to a specific line of code and say: this is wrong. That’s why scanners are so good at finding them.

IDORs don’t work like that. The exact same API response might be totally fine in one application and a serious vulnerability in another. That’s what makes this hard.

I ran into this during a project not long ago. We found what appeared to be an IDOR in an API, where it was pulling back more data than expected. It felt wrong, but we couldn’t prove it. There was no UI, no clear documentation, and no spec telling us what “correct” looked like. We ended up digging through public docs and asking around just to get reasonably confident. It took about an hour for a single finding.

That’s the reality with business logic issues. You’re constantly trying to answer a question without having the full picture.

Building Interpretation, Not Pattern Matching

If you just tell AI “go find IDORs,” you’ll get plenty of results—and a lot of noise–like false positives, inconsistent reasoning, and decisions based on guesses rather than anything grounded. We saw that early on.

Our initial approach was straightforward. We had an agent to find potential IDORs and a validator to decide if they were real. Since you can’t deterministically validate IDORs, the validator had to reason about them. So we had it do something simple: list arguments for why something is a vulnerability, list arguments for why it isn’t, then decide. Basically, the model is debating with itself.

And honestly, it worked better than expected. But it had a major flaw: it didn’t actually know how the application was supposed to behave. All of that reasoning was happening in a vacuum.

The Shift That Mattered

The turning point was when we stopped trying to make the system better at attacking and focused on making it better at understanding. Instead of saying “find vulnerabilities,” we said: “Explore the application and tell me what normal looks like.”

We introduced agents that log in as specific roles, browse the application normally, and document what they can and cannot do. No attacking, no exploiting—just observing and learning.

By the end of this step, you have a set of roles (e.g., admin, manager, user), each with its own partial view of what’s allowed. When you combine those, you get a clear picture of who can access what, where boundaries exist, and what actions are expected.

Why That Changes Everything

Once you have that context, validation becomes a completely different problem. Now, when the solver finds something suspicious, the validator doesn’t have to guess. It can ask: 

  • Does this align with what we observed for this role?
  • Does this break a boundary between roles?
  • Is this something no role was supposed to do?

When it builds arguments for or against a finding, those arguments are grounded in actual observations, not assumptions. That’s the difference. We didn’t just improve the reasoning, we gave it context.

Results improved immediately. There was less noise. When validation is grounded in observed behavior, false positives drop significantly. The system isn’t inventing what “should” happen, it’s comparing against what it has already seen.

Multi-Authentication Adds a New Dimension

IDORs are especially challenging because they involve crossing boundaries: one user accessing another user’s data, a lower-privileged role seeing something it shouldn’t, or lateral movement between accounts.

To understand that, you need multiple perspectives. To further improve the power of our IDOR testing, we built the ability to test multiple accounts, so XBOW could log in as different users and actually explore what's accessible, taking that context from the initial exploration and seeing if the access matches what is expected.

Scaling Complex Reasoning

Even the most well-resourced security teams are struggling to keep up. Development is faster and more distributed than ever, and the attack surface keeps expanding. XBOW is built for that reality.

That earlier example that took an hour? Now imagine doing that across hundreds of endpoints, across multiple roles, repeatedly. That’s not something a human can realistically keep up with. 

This is where this approach helps: extending a pentester’s reach without losing the nuance. Applying the same level of reasoning, over and over, so teams can cover more ground, while staying focused on the highest-impact and most complex work.

Closing Thoughts

What makes IDORs hard is not the exploit itself, but the judgment behind it. That judgment has traditionally lived in the heads of skilled pentesters—people who can look at an application, understand its intent, and decide whether an access path is expected or unacceptable. XBOW’s approach is about bringing that same kind of judgment into an autonomous system. 

By learning what normal looks like before trying to break it, XBOW moves beyond “Can I?” and gets closer to the question that actually matters: “Should I?” That shift is what makes high-accuracy IDOR testing possible in ambiguous, real-world applications. 

So what’s next? We’re onto the next business logic challenge.

https://xbow-website-b1b.pages.dev/traces/