pwnkit is an open-source agentic framework for autonomous security research. It uses AI agents in a research-then-verify pipeline to find and prove vulnerabilities in AI/LLM apps, npm packages, and source code.

How does pwnkit eliminate false positives?

pwnkit's Verify agent independently re-exploits every finding. If it can't reproduce the vulnerability, the finding is killed as a false positive. Only confirmed vulnerabilities with working proof-of-concept code make it into the final report. The local dashboard provides a triage workbench for operators to review evidence, manage finding families, and control the verification workflow.

How much does pwnkit cost?

pwnkit is free and open source (Apache 2.0 license). It's an agentic harness — bring your own API key, or use it with Claude Code CLI or Codex CLI through your existing subscription. pwnkit orchestrates the pipeline, your tools power the AI.

What can pwnkit scan?

pwnkit scans AI/LLM apps, traditional web apps, npm packages, and source code repositories. It includes resumable scans, finding triage with deduplication, deterministic replay, a local verification dashboard, diff-aware PR review, and autonomous orchestration workers.

why i built blind verification

a security scan finds 200 “possible vulnerabilities.” four hours of triage later, 190 are noise and the other 10 are maybes. the only way to confirm any of them is to write a manual PoC.

this is the state of security tooling in 2026. fixing it took three attempts.

attempt 1: template-based scanning

the first version of pwnkit was simple. YAML templates. regex patterns. send a payload, check if the response matches a known-bad pattern. this is how most scanners work — nuclei, nikto, the whole ecosystem.

# template-v1.yaml
id: ssrf-check
payloads:
  - "http://169.254.169.254/latest/meta-data/"
  - "http://localhost:6379"
matchers:
  - type: regex
    pattern: "(ami-id|instance-id|ERR wrong)"

it worked for the obvious stuff. the false positive rate was brutal. a response containing the word “instance-id” in an error message? flagged. an API that returns user input in the response body? flagged. regex can’t understand context. it sees patterns, not meaning.

triage time exceeded the time it would have taken to pentest the target manually.

attempt 2: agentic scanning

if regex can’t understand context, the next step is a scanner that can think. the template engine got replaced with an AI agent that read the code, crafted payloads based on what it saw, and reasoned about responses.

this was better. way better. the agent could look at a function, understand the data flow, and craft a targeted attack. it could tell the difference between user input being reflected in an error message versus user input being passed to exec().

it had a new problem: hallucination.

the agent would find something that looked suspicious, then reason itself into a vulnerability that didn’t exist. “this function could be vulnerable if the input isn’t sanitized upstream…” then it would check upstream, find no sanitization, and report a critical finding — without noticing the WAF sitting in front of the whole thing, or the type coercion that made the payload harmless.

“could be vulnerable” is not the same as “is vulnerable.” the agent couldn’t always tell the difference.

attempt 3: single agent with proof-of-concept

the next iteration forced the agent to prove it. don’t just report a finding — write a concrete PoC that demonstrates the exploit. no working PoC, no finding.

this killed a lot of the hallucinations. no more “could be vulnerable” — either the PoC works or it doesn’t.

there was a subtler problem: confirmation bias.

the same agent that decided something was vulnerable was also writing the PoC. and if it already believed the vulnerability was real, it would write a PoC that looked convincing but didn’t actually prove anything. it tested the happy path. it assumed its payload got through. it wrote assertions that passed because they were testing the wrong thing.

this is the same problem that happens with human pentesters. the person who found the bug is the worst person to verify it. they already believe it’s real.

in academia, when a paper goes out for peer review, the reviewer doesn’t know who wrote it or what the author was thinking. they get the paper and nothing else. they have to independently evaluate whether the conclusions follow from the evidence.

the same idea applies to vulnerability verification.

the research agent does its thing — discovers attack surfaces, crafts payloads, launches multi-turn attacks, writes PoC code. one long agent session. then only the PoC code and the file path are extracted, all reasoning and context stripped, and handed to a completely separate verify agent.

the verify agent has no idea why the researcher thought this was vulnerable. it doesn’t know the attack narrative. it gets a PoC script and a file to look at. its job: independently trace the data flow, run the PoC, and confirm whether the exploit actually works.

if it can’t confirm — the finding is killed. no negotiation.

// the pipeline

// 1. research agent: one multi-turn session
//    discovers + attacks + writes PoC
const findings = await researchAgent.run({
  target: packageDir,
  mode: "audit"
});
// Returns: [{ file, vulnerability, poc, reasoning }]

// 2. verify agents: parallel, independent, blind
//    each gets ONLY poc + file path
const verified = await Promise.all(
  findings.map(f => verifyAgent.run({
    poc: f.poc,        // just the PoC code
    filePath: f.file   // just the file path
    // NO reasoning, NO context, NO attack narrative
  }))
);

// 3. only confirmed findings make the report
const confirmed = verified.filter(v => v.status === "confirmed");

pwnkit scanned itself

the best way to test a security tool is to point it at itself.

the research agent went through the pwnkit codebase and found 6 potential vulnerabilities:

command injection via unsanitized package names passed to shell
SSRF through target URL parameter in scan mode
arbitrary file read via path traversal in review command
prompt injection in LLM-powered analysis pipeline
two more related to input validation edge cases

six findings. the old pwnkit would have reported all six as vulnerabilities.

the blind verify agents independently rejected all six as false positives.

every rejection was correct. the code had proper mitigations in place — input sanitization, URL validation, path normalization, sandboxed execution — that the research agent missed or underestimated during its analysis. the verify agents, starting from scratch with only the PoC and file path, traced the actual data flow and found that none of the PoCs would succeed against the real code.

verification result

reported by research

confirmed by verify

correct rejections

an obvious objection: why not just have the same agent verify its own findings? or pass the reasoning along so the verify agent has more context?

because context is exactly how bias propagates. if the verify agent reads “this is a command injection because the package name flows into a shell command,” it’s going to look for ways to confirm that narrative. it’s going to focus on the shell command and miss the sanitization step three functions up the call stack.

making it blind forces the verify agent to build its own understanding from the ground up. it has to:

read the PoC code and understand what it’s trying to exploit
open the target file and trace the data flow independently
determine if the PoC would actually succeed against the real code
return a structured verdict: confirmed or rejected, with evidence

if the research agent missed a sanitization function, the verify agent will find it. if the PoC makes assumptions about the runtime environment, the verify agent will catch that. two independent analyses are exponentially harder to fool than one.

parallel, cheap, fast

the verify agents run in parallel — one per finding. if the research agent reports 8 vulnerabilities, 8 verify agents spin up simultaneously. each one is a short, focused session. they don’t need multi-turn conversations or tool access. they read code, trace data flow, and output a verdict.

// structured output via --json-schema (Claude Code)
// or --output-schema (Codex)

interface VerifyResult {
  finding_id: string;
  status: "confirmed" | "rejected";
  confidence: number;       // 0-100
  evidence: string;         // what the agent found
  data_flow_trace: string;  // source -> sink analysis
  rejection_reason?: string;// why it's a false positive
}

the structured output schema returns machine-parseable results from every verify agent. no regex parsing of natural language. no “let me summarize my findings” that might miss details. just a typed verdict that pipes straight into the report.

and because pwnkit is runtime-agnostic, this works with whatever you’re running:

Claude Code — --runtime claude with --json-schema
Codex — --runtime codex with --output-schema
Gemini, OpenCode, or any API — same pipeline, different backend

the pipeline, end to end

research agent

one multi-turn session. reads code, maps attack surface, crafts payloads, launches attacks, writes PoC for every finding.

strip context

extract only PoC code + file path from each finding. discard reasoning, attack narrative, confidence scores.

verify agents (parallel)

N agents spin up simultaneously. each gets one PoC + one file. independently traces data flow, confirms or rejects.

report generation

only confirmed findings appear. SARIF for GitHub, markdown + JSON with full evidence chains.

why this matters

false positives aren’t just annoying. they’re actively harmful.

every false positive erodes trust in the tool. after the third “critical” finding that turns out to be nothing, developers stop looking at the reports. the real vulnerability that comes next gets ignored because the signal-to-noise ratio trained them to ignore it.

blind verification doesn’t just reduce false positives. it makes every confirmed finding trustworthy. when pwnkit reports a vulnerability, it means two independent AI agents — one attacking, one verifying — both agree it’s real. the verify agent has traced the data flow from source to sink and confirmed the PoC works. that’s a finding worth acting on.

it’s the same principle that makes peer review work in science. the same principle behind adversarial testing. the same principle behind separation of duties in security. the person who writes the check doesn’t approve the check.

try it

blind verification is built into every pwnkit command. no configuration — it runs automatically. audit a package:

npx pwnkit-cli audit your-package

the research agent finds what it finds. the verify agents kill what doesn’t hold up. only the real stuff survives.