how ai agents found 7 CVEs in popular npm packages
three weeks of pointing Claude Opus at npm packages produced 73 findings, 7 published CVEs, and 40M+ weekly downloads affected. here's how the workflow actually works.
in early march 2026, a side project that was supposed to last a weekend turned into something bigger: using an AI agent — specifically Claude Opus — to systematically audit popular npm packages for security vulnerabilities. not just running a linter. actually reading source code, tracing data flows, identifying trust boundary violations, and producing working proof-of-concept exploits.
three weeks later: 73 security findings across dozens of packages, 7 published CVEs, and a workflow that had found vulnerabilities in packages with a combined download count exceeding 40 million per week.
this post is about how that workflow operates, what it found, and why it led to pwnkit.
the workflow
the process is not complicated. it is, however, extremely methodical — which is exactly where AI agents excel. the pipeline for each target:
pick a package based on download count, attack surface (does it parse untrusted input? handle crypto? process URLs?), and history of prior vulnerabilities. high downloads plus complex parsing logic is the sweet spot.
the agent reads the source code front to back. not skimming — reading. it maps entry points, traces how user input flows through the system, identifies trust boundaries, and flags patterns that historically lead to vulnerabilities: unvalidated input, missing bounds checks, string concatenation in security-sensitive contexts.
every finding gets a working proof of concept. if the agent can't write a PoC that demonstrates the vulnerability, the finding is discarded. no maybes. no theoretical risks. working exploits or nothing.
responsible disclosure through GitHub Security Advisories or direct maintainer contact. full writeup, PoC code, suggested fix, 90-day timeline. then wait.
that’s the entire system. no proprietary scanning engine. no signature database. just an AI agent that reads code the way a security researcher reads code — except it doesn’t get tired, doesn’t skip the boring parts, and can process an entire codebase in minutes.
what it found
highlights below. each of these has a full writeup on doruk.ch with technical details, PoCs, and disclosure timelines.
node-forge — certificate forgery
CVE-2026-3389632 million weekly downloads. the core certificate chain verification logic had a conditional check that only validated basicConstraints when the extension was present. when absent — which is normal for end-entity certificates — any certificate could act as a CA. one conditional. a billion yearly downloads. certificate forgery for any domain.
mysql2 — connection override + 3 more
4 findings5 million weekly downloads. URL query parameters could override the host, disable TLS, and enable multi-statement queries. plus prototype pollution, geometry parsing DoS, and an out-of-bounds read in packet framing. four vulnerabilities that chain together: redirect the connection, then crash the client. the maintainer shipped all four fixes in 24 hours.
read the full writeup →uptime kuma / liquidjs — SSTI bypass
CVE-2026-33130a previously "patched" SSTI vulnerability was still exploitable. the entire security boundary — three separate mitigations — was bypassed by removing two quote characters from the payload. the root cause was in LiquidJS's require.resolve() fallback, which had no path containment checks. four independent researchers found the same bug through different vectors.
jsPDF — PDF injection + XSS
CVE-2026-31898 / CVE-2026-31938arbitrary PDF object injection via unsanitized annotation color parameters. plus HTML injection through document.write() in output methods — CVSS 9.6 Critical. another researcher reported first; an independent rediscovery contributed defense-in-depth hardening to the fixes.
why ai agents are good at this
the common thread across all of these findings is that they’re not sophisticated. a missing conditional check. an unfiltered URL parameter. a fallback code path with no validation. a string concatenation where there should be DOM construction. these aren’t zero-days requiring months of reverse engineering. they’re the kind of bugs that exist because nobody sat down and read the code carefully enough.
that’s precisely what AI agents are good at. the tedious, methodical work of reading every function, tracing every input, checking every assumption. a human researcher gets fatigued after a few hours of source review. an AI agent processes the entire codebase with the same level of attention on the last file as the first.
the key insight: the agent doesn’t need to be creative. it needs to be thorough. creativity helps for novel attack classes, but the vast majority of real-world vulnerabilities are variants of known patterns — missing validation, improper access control, trust boundary violations. an agent that systematically checks for those patterns across an entire codebase will find things that humans miss through fatigue or oversight.
73 findings, 7 CVEs — the numbers
after three weeks of running this workflow across popular npm packages, the totals:
- 73 total findings across dozens of packages
- 7 published CVEs in node-forge, mysql2, Uptime Kuma, LiquidJS, jsPDF, and picomatch
- 40M+ weekly downloads affected across the vulnerable packages
- every finding verified with a working proof of concept
not every finding became a CVE. some were lower severity, some were in packages with smaller install bases, some were reported but not yet disclosed. but every single one was verified with a working exploit before it was reported. no theoretical risks. no “this might be a problem.” working code or it didn’t count.
from manual workflow to pwnkit
the workflow worked. but it was manual. each audit required manual setup, agent configuration, output management, finding tracking, report writing. repeatable, but operator-bound.
the obvious next step: automate the workflow so anyone can run it.
that’s what pwnkit is. the same agentic pipeline — discover, attack, verify, report — packaged as an open-source CLI tool. point it at an npm package, an LLM API, an MCP server, or a source code repository. it runs autonomous AI agents in sequence, each specialized for a phase of the security assessment. the verification agent independently re-exploits every finding. if it can’t reproduce, the finding is killed.
the 7 CVEs were the proof that this approach works. pwnkit is the tool that makes it accessible.
npx pwnkit-cli audit --package node-forge
if you ship software that depends on open-source packages — and you almost certainly do — the question isn’t whether these vulnerabilities exist in your dependency tree. they do. the question is whether you find them before someone else does.