why i built blind verification
every security scanner drowns you in false positives. it took three approaches before one of them actually worked.
a security scan finds 200 “possible vulnerabilities.” four hours of triage later, 190 are noise and the other 10 are maybes. the only way to confirm any of them is to write a manual PoC.
this is the state of security tooling in 2026. fixing it took three attempts.
attempt 1: template-based scanning
the first version of pwnkit was simple. YAML templates. regex patterns. send a payload, check if the response matches a known-bad pattern. this is how most scanners work — nuclei, nikto, the whole ecosystem.
# template-v1.yaml
id: ssrf-check
payloads:
- "http://169.254.169.254/latest/meta-data/"
- "http://localhost:6379"
matchers:
- type: regex
pattern: "(ami-id|instance-id|ERR wrong)"
it worked for the obvious stuff. the false positive rate was brutal. a response containing the word “instance-id” in an error message? flagged. an API that returns user input in the response body? flagged. regex can’t understand context. it sees patterns, not meaning.
triage time exceeded the time it would have taken to pentest the target manually.
attempt 2: agentic scanning
if regex can’t understand context, the next step is a scanner that can think. the template engine got replaced with an AI agent that read the code, crafted payloads based on what it saw, and reasoned about responses.
this was better. way better. the agent could look at a function, understand the data flow, and craft a targeted attack. it could tell the difference between user input being reflected in an error message versus user input being passed to exec().
it had a new problem: hallucination.
the agent would find something that looked suspicious, then reason itself into a vulnerability that didn’t exist. “this function could be vulnerable if the input isn’t sanitized upstream…” then it would check upstream, find no sanitization, and report a critical finding — without noticing the WAF sitting in front of the whole thing, or the type coercion that made the payload harmless.
“could be vulnerable” is not the same as “is vulnerable.” the agent couldn’t always tell the difference.
attempt 3: single agent with proof-of-concept
the next iteration forced the agent to prove it. don’t just report a finding — write a concrete PoC that demonstrates the exploit. no working PoC, no finding.
this killed a lot of the hallucinations. no more “could be vulnerable” — either the PoC works or it doesn’t.
there was a subtler problem: confirmation bias.
the same agent that decided something was vulnerable was also writing the PoC. and if it already believed the vulnerability was real, it would write a PoC that looked convincing but didn’t actually prove anything. it tested the happy path. it assumed its payload got through. it wrote assertions that passed because they were testing the wrong thing.
this is the same problem that happens with human pentesters. the person who found the bug is the worst person to verify it. they already believe it’s real.
the insight: double-blind peer review
in academia, when a paper goes out for peer review, the reviewer doesn’t know who wrote it or what the author was thinking. they get the paper and nothing else. they have to independently evaluate whether the conclusions follow from the evidence.
the same idea applies to vulnerability verification.
the research agent does its thing — discovers attack surfaces, crafts payloads, launches multi-turn attacks, writes PoC code. one long agent session. then only the PoC code and the file path are extracted, all reasoning and context stripped, and handed to a completely separate verify agent.
the verify agent has no idea why the researcher thought this was vulnerable. it doesn’t know the attack narrative. it gets a PoC script and a file to look at. its job: independently trace the data flow, run the PoC, and confirm whether the exploit actually works.
if it can’t confirm — the finding is killed. no negotiation.
// the pipeline
// 1. research agent: one multi-turn session
// discovers + attacks + writes PoC
const findings = await researchAgent.run({
target: packageDir,
mode: "audit"
});
// Returns: [{ file, vulnerability, poc, reasoning }]
// 2. verify agents: parallel, independent, blind
// each gets ONLY poc + file path
const verified = await Promise.all(
findings.map(f => verifyAgent.run({
poc: f.poc, // just the PoC code
filePath: f.file // just the file path
// NO reasoning, NO context, NO attack narrative
}))
);
// 3. only confirmed findings make the report
const confirmed = verified.filter(v => v.status === "confirmed");
pwnkit scanned itself
the best way to test a security tool is to point it at itself.
the research agent went through the pwnkit codebase and found 6 potential vulnerabilities:
- command injection via unsanitized package names passed to shell
- SSRF through target URL parameter in scan mode
- arbitrary file read via path traversal in review command
- prompt injection in LLM-powered analysis pipeline
- two more related to input validation edge cases
six findings. the old pwnkit would have reported all six as vulnerabilities.
the blind verify agents independently rejected all six as false positives.
every rejection was correct. the code had proper mitigations in place — input sanitization, URL validation, path normalization, sandboxed execution — that the research agent missed or underestimated during its analysis. the verify agents, starting from scratch with only the PoC and file path, traced the actual data flow and found that none of the PoCs would succeed against the real code.
why blind matters
an obvious objection: why not just have the same agent verify its own findings? or pass the reasoning along so the verify agent has more context?
because context is exactly how bias propagates. if the verify agent reads “this is a command injection because the package name flows into a shell command,” it’s going to look for ways to confirm that narrative. it’s going to focus on the shell command and miss the sanitization step three functions up the call stack.
making it blind forces the verify agent to build its own understanding from the ground up. it has to:
- read the PoC code and understand what it’s trying to exploit
- open the target file and trace the data flow independently
- determine if the PoC would actually succeed against the real code
- return a structured verdict: confirmed or rejected, with evidence
if the research agent missed a sanitization function, the verify agent will find it. if the PoC makes assumptions about the runtime environment, the verify agent will catch that. two independent analyses are exponentially harder to fool than one.
parallel, cheap, fast
the verify agents run in parallel — one per finding. if the research agent reports 8 vulnerabilities, 8 verify agents spin up simultaneously. each one is a short, focused session. they don’t need multi-turn conversations or tool access. they read code, trace data flow, and output a verdict.
// structured output via --json-schema (Claude Code)
// or --output-schema (Codex)
interface VerifyResult {
finding_id: string;
status: "confirmed" | "rejected";
confidence: number; // 0-100
evidence: string; // what the agent found
data_flow_trace: string; // source -> sink analysis
rejection_reason?: string;// why it's a false positive
}
the structured output schema returns machine-parseable results from every verify agent. no regex parsing of natural language. no “let me summarize my findings” that might miss details. just a typed verdict that pipes straight into the report.
and because pwnkit is runtime-agnostic, this works with whatever you’re running:
- Claude Code —
--runtime claudewith--json-schema - Codex —
--runtime codexwith--output-schema - Gemini, OpenCode, or any API — same pipeline, different backend
the pipeline, end to end
why this matters
false positives aren’t just annoying. they’re actively harmful.
every false positive erodes trust in the tool. after the third “critical” finding that turns out to be nothing, developers stop looking at the reports. the real vulnerability that comes next gets ignored because the signal-to-noise ratio trained them to ignore it.
blind verification doesn’t just reduce false positives. it makes every confirmed finding trustworthy. when pwnkit reports a vulnerability, it means two independent AI agents — one attacking, one verifying — both agree it’s real. the verify agent has traced the data flow from source to sink and confirmed the PoC works. that’s a finding worth acting on.
it’s the same principle that makes peer review work in science. the same principle behind adversarial testing. the same principle behind separation of duties in security. the person who writes the check doesn’t approve the check.
try it
blind verification is built into every pwnkit command. no configuration — it runs automatically. audit a package:
npx pwnkit-cli audit your-package
the research agent finds what it finds. the verify agents kill what doesn’t hold up. only the real stuff survives.