Machine Learning for Vulnerability Discovery
Machine learning accelerates vulnerability discovery but lacks runtime context, making AI-driven validation essential to reduce false positives and confirm real exploitability.
Machine learning is often used in vulnerability discovery for its fast pattern-matching capabilities and ability to generate hypotheses about exploits. It far surpasses the speed and accuracy of both manual analysis and traditional SAST (static source code) scanners. However, machine learning is only as powerful as the data it is trained on, and the source code it is given, and it therefore has a tendency to produce false-positive results due to its limited understanding of the wider context. It’s strong on pointing out potential vulnerabilities, but weak on verifying them.
Vulnerability discovery that is effective and fast enough for the pace and tactics of today’s software developers and cyberattackers needs to both leverage AI and ML, and blend identification of vulnerabilities with validation based on runtime context.
Key takeaways
- ML vulnerability detection is a powerful tool due to its speed and pattern-matching abilities.
- Because it only uses data collected in past security events, ML-based security scanning has a tendency to produce false positives.
- Combining ML-based vulnerability detection with AI-based validation of exploitability based on runtime context is a powerful combination that both finds and verifies vulnerabilities at a rapid pace.
- XBOW is leading the evolution to AI-driven offensive security that both finds and validates vulnerabilities.
What is vulnerability discovery?
The very first discovery activities in application security answer “what exists?” The subsequent vulnerability discovery phase answers “what’s wrong or risky about it?” Using both automated and manual techniques, these phases build upon one another. Successful vulnerability discovery is highly reliant on a strong initial discovery phase.
Ultimately, “discovery” to “vulnerability discovery” is the shift from understanding a system, to understanding where the system might have attacker entry points. Machine learning is often used for its pattern-matching capabilities in the initial vulnerability discovery phase.
The role of machine learning in vulnerability discovery
Machine learning excels at identifying possible vulnerabilities based on learned patterns, such as highlighting vulnerable code based on syntax patterns. ML security research also returns results far faster than a human being could, allowing teams to quickly scale vulnerability identification across large environments. AI is also rapidly improving (and will continue to improve) its software vulnerability detection capabilities, as evidenced by Anthropic’s recent announcement of Claude Code Security.
Albert Zeigler, XBOW Head of AI, says in a recent blog post,
“Used correctly, LLMs are extremely valuable in security workflows. They’re good at:
- Generating creative exploit hypotheses
- Recognizing subtle vulnerability patterns
- Connecting behavior to historical flaws
- Exploring unusual inputs and edge cases.”
Examples of machine learning vulnerability discovery
- Supervised learning for vulnerability discovery: This type of model, which is trained on a labeled dataset, excels at identifying known vulnerabilities. After training on a database of known vulnerabilities, a supervised ML model can quickly identify similar patterns in a new system. This method is especially good at finding vulnerabilities like SQL injection, hardcoded secrets, and known insecure library use.
- Semi-supervised learning for vulnerability discovery: By combining insights from labeled and unlabeled data, semi-supervised learning is applicable in many areas of vulnerability discovery. For example, it can analyze unlabeled data to map an attack surface and then use labeled data to make hypotheses about potential vulnerabilities.
Shortcomings of machine learning in vulnerability discovery
Machine learning will highlight plausible vulnerabilities, but it can’t validate whether they are truly exploitable at runtime. It doesn’t have the context or details on how a system works in practice to verify whether a suspected flaw could lead to a breach in the real world. Finally, it can be hard to understand why machine learning flagged something as a vulnerability and justify its conclusions.
Machine learning models are designed to identify the most plausible scenario based on prior examples, not prove a theory based on real-world, real-time data. For instance, they couldn’t verify that a payload actually changed application behavior.
As XBOW AI Lead Albert Zeigler notes, machine learning models, “reason abstractly about what should happen, what usually happens, what has happened elsewhere, and interpret what might have happened here. But they don’t confirm what did happen.”
Vulnerability discovery with machine learning plus validation from runtime
Machine learning for vulnerability discovery is one powerful AppSec tool, not the complete solution.
Simply passing the results of machine learning vulnerability discovery to developers would leave them with a huge backlog of issues to investigate, many of which are likely false positives. Machine learning vulnerability discovery plus runtime validation, on the other hand, makes the results more effective and efficient.
In XBOW’s AI-driven pentesting platform, for example, any suspected vulnerabilities trigger a second AI-driven step that attempts to confirm or deny the suspicions. This second step could include checking for access to data that should be unreachable, or observing real browser behavior.
Only issues that survive this testing are elevated. This system leaves people and technologies to do what they do best to identify risk fast enough for AI-led software development and cyberattacks.
XBOW is leading the AI-driven pentesting evolution
XBOW has created an AI-driven pentesting solution that matches the results of a human researcher, as proven on HackerOne.
Sign up on XBOW today to get human-level pentesting results at machine speed.
.avif)