Why LLMs Hallucinate Vulnerabilities
LLMs are powerful at spotting patterns and proposing possible vulnerabilities, but confidence is not actual proof. In this post, I explain why raw LLM output can’t be trusted as a finding, and why validation must exist outside the model to separate real vulnerabilities from convincing noise.
Large Language Models (LLMs) are increasingly used in security tools to analyze applications, suggest exploits, and identify potential vulnerabilities. In many cases, they surface useful leads faster than traditional approaches.
But teams trying to operationalize LLM-driven security quickly hit a familiar problem: The model sounds confident, but the vulnerability isn’t real.
This isn’t a flaw in a specific vendor’s model, nor something solved with better prompts or more context. It’s a fundamental mismatch between how LLMs reason and how vulnerabilities actually exist in the real world.
To understand why, it helps to separate thinking about vulnerabilities from proving they exist.
What “hallucination” means in security
In AI, hallucination refers to generating information that isn’t grounded in reality. In security, it shows up in subtler (and more expensive) ways. For example:
- Inferring SQL injection from a response pattern that only resembles prior exploits
- Suggesting an endpoint is exploitable because it matches a known vulnerability class
- Asserting impact without ever demonstrating it
These outputs aren’t random. Instead, they’re the result of pattern recognition across vulnerability data, exploit writeups, and source code. The model is doing exactly what it’s designed to do: generate the most plausible explanation based on prior examples.
The problem is simple: plausibility is not proof. In security, a vulnerability only exists if it can be exercised against a real system. Everything else is a hypothesis.
Raw LLM output can’t be trusted
LLMs don’t observe reality. They don’t measure timing differences, verify side effects, or confirm that a payload actually changed application behavior. They reason abstractly about what should happen, what usually happens, what has happened elsewhere, and interpret what might have happened here. But they don’t confirm what did happen.
This is why treating raw LLM output as a finding creates noise. A model can correctly identify an interesting attack surface and still be wrong about exploitability.
This limitation isn’t fixable by making the model smarter. Even a perfect reasoning engine still lacks direct access to ground truth, which is why systems that rely on LLMs alone tend to over-report and push verification back onto humans.
A concrete example: when a confident hypothesis isn’t enough
In testing an open-source ActiveSync server (Z-Push), an AI agent flagged a suspected SQL injection after noticing unusual authentication behavior. The input path and response patterns closely resembled known injection techniques.
Rather than accepting that inference at face value, the system (not the LLM!) attempted to prove it using controlled timing-based requests and comparisons against known-safe inputs. The issue was only reported once the timing behavior proved consistent and reproducible. Without validation, the signal would have been indistinguishable from a false positive.
Hypotheses are useful but conclusions are dangerous
Used correctly, LLMs are extremely valuable in security workflows. They’re good at:
- Generating creative exploit hypotheses
- Recognizing subtle vulnerability patterns
- Connecting behavior to historical flaws
- Exploring unusual inputs and edge cases
This mirrors how human pentesters work. They start with suspicion, not proof. The difference is what happens next.
A hypothesis is a starting point. A conclusion requires evidence. When systems collapse those two steps, they blur the line between exploration and verification—and that’s where trust breaks down.
Validation must exist outside the model
Because LLMs can’t observe reality directly, validation can’t live inside the model.
When an AI-driven system suspects a vulnerability, that suspicion should trigger a second phase that attempts to prove or disprove the idea using deterministic checks, such as:
- Measuring timing differences to confirm blind injection
- Verifying whether a server makes an outbound request
- Checking for access to data that should be unreachable
- Observing real browser behavior for client-side impact
This verification isn’t based on language or inference. It’s based on observable effects. In practice, effective systems treat AI-generated signals as tentative by default. The model’s role is to explore aggressively and surface hypotheses, not to declare findings. Only issues that survive concrete testing are elevated.
This distinction matters because creative reasoning and factual verification are different problems, and they require different tools.
The takeaway for security teams
LLMs are powerful amplifiers for exploration. They help systems think more like attackers and cover more ground than manual testing ever could. But they can’t be the final authority on whether a vulnerability exists.
Security teams don’t need AI that sounds certain. They need systems that are skeptical by default and assume the model might be wrong and demand proof before surfacing results.
The future of AI in security isn’t about eliminating hallucinations at the model level. It’s about designing systems that expect them and are built to catch them before they reach a human inbox.
