Raptor exploitability-validation

Multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable, preventing wasted effort on hallucinated findings, dead code paths, or findings with unrealistic preconditions.

install
source · Clone the upstream repo
git clone https://github.com/gadievron/raptor
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/gadievron/raptor "$T" && mkdir -p ~/.claude/skills && cp -r "$T/.claude/skills/exploitability-validation" ~/.claude/skills/gadievron-raptor-exploitability-validation && rm -rf "$T"
manifest: .claude/skills/exploitability-validation/SKILL.md
source content

Exploitability Validation Skill

A multi-stage pipeline for validating that vulnerability findings are real, reachable, and exploitable.

Purpose

Prevents wasted effort on:

  • Hallucinated findings (file doesn't exist, code doesn't match)
  • Unreachable code paths (dead code, test-only)
  • Findings with unrealistic preconditions

When to Use

After scanning produces findings, BEFORE exploit development:

  1. Scanner finds potential vulnerability
  2. This skill validates it's real and reachable
  3. Exploit Feasibility checks binary constraints
  4. Exploit development proceeds

[CONFIG] Configuration

models:
  native: true
  additional: false  # Set true to also run GPT, Gemini

output_when_additional:
  display: "agreement: 2/3"
  threshold: "1/3 is enough to proceed"

[EXEC] Execution Rules

  1. Run the full pipeline end-to-end.

  2. Solve and fix any issues you encounter, unless you failed five times in a row, or need clarification.

  3. Run on latest thinking/reasoning model available (verify model name).

  4. Pipeline must be deterministic - if ran again, results should be the same.

  5. Validate after writing. Run

    libexec/raptor-validate-schema <type> <file>
    after each Write. Match the type to what you wrote:

    • stage
      for any
      stage-*.json
      file (e.g.,
      stage-a.json
      ,
      stage-c.json
      ,
      stage-f.json
      )
    • attack-tree
      ,
      attack-paths
      ,
      attack-surface
      ,
      hypotheses
      ,
      disproven
      for the matching working doc

    Fix any errors before proceeding to the next stage.

  6. No finding may reach Stage D without passing through Stages B and C, even if Stage A produced a successful PoC.

  7. Do not narrate gate compliance ("GATE-8 satisfied"), schema validation passes ("findings.json: OK"), or stage transitions ("Stage C complete") to the user. Do show substantive work: PoC test output, tool investigations (objdump, checksec), binary protections, hypothesis results, and evidence discovered. Document gate compliance in validation-report.md only. Report schema or pipeline failures immediately.

  8. Python imports: All

    python3 -c
    snippets must start with
    import sys, os; sys.path.insert(0, os.environ["RAPTOR_DIR"])
    before importing from
    packages.*
    or
    core.*
    .

  9. Build directory: Stage 0 creates

    $OUTPUT_DIR/build/
    . Compile PoCs and test binaries there, not in the target repo.

  10. libexec scripts: Run

    libexec/
    scripts exactly as shown in the prompts — do not prepend
    export
    commands, do not use absolute paths, do not wrap in additional shell logic. The permission system auto-approves
    libexec/raptor-*
    commands only when run in this exact form.

  11. Per-stage JSON files. Write your stage's output to

    stage-X.json
    (e.g.,
    stage-a.json
    ,
    stage-b.json
    ), not to
    findings.json
    . The prep script merges stage files into findings.json automatically. Do not read or write findings.json directly. Do not use
    python3 -c
    scripts for JSON — use the Write tool.


[GATES] MUST-GATEs

Rationale: Without these gates, models sample instead of checking all code, hedge with "if" and "maybe" instead of verifying, and miss exploitable findings.

GATE-1 [ASSUME-EXPLOIT]: Your goal is to discover real exploitable vulnerabilities. If you think something isn't - don't assume. First, investigate under the assumption that it is.

GATE-2 [STRICT-SEQUENCE]: Strictly follow instructions. If you think or try something else, or a new idea comes up, present the results of that analysis separately at the end. Always display the results of the strict criteria first, and only then display the results of the additional methods, if any.

GATE-3 [CHECKLIST]: Check pipeline, update checklist, and collect evidence of compliance to present at the end that you successfully executed all actions through these gates.

GATE-4 [NO-HEDGING]: If your Chain-of-Thought or results include "if", "maybe", "uncertain", "unclear", "could potentially", "may be possible", "depending on", "in theory", "in certain circumstances", or similar - immediately verify the claim. Do not leave unverified.

GATE-5 [FULL-COVERAGE]: Test the entire code provided (file(s)/code base) against checklist.json, ensuring you checked all functions and lines of code. Do not sample, estimate, or guess.

GATE-6 [PROOF]: Always provide proof and show the vulnerable code.

GATE-7 [CONSISTENCY]: Before finalizing each finding, verify that

vuln_type
,
severity
, and
status
are consistent with the
description
and
proof
text. A description that explains why a bug is benign must not carry high severity.

GATE-8 [POC-EVIDENCE]: A PoC requires observable evidence: a crash, changed output, callback, file read, error message, or measurable state change. "Ran without error" is not evidence. If the expected effect is not observed, either the PoC is wrong or the bug is not triggered — investigate which.


[STYLE] Output Formatting

Status values in JSON must be snake_case:

  • exploitable
    not
    EXPLOITABLE
    or
    Exploitable
  • confirmed
    not
    CONFIRMED
    or
    Confirmed
  • ruled_out
    not
    RULED_OUT
    or
    Ruled Out
  • disproven
    not
    DISPROVEN
    or
    Disproven

RULE: Any text shown to the user (chat, tables, summaries, stage progress) MUST use Title Case, never snake_case. This applies at every stage, not just the final report. Convert on output:

  • poc_success
    → "PoC Success"
  • not_disproven
    → "Not Disproven"
  • buffer_overflow
    → "Buffer Overflow"
  • command_injection
    → "Command Injection"
  • confirmed_constrained
    → "Confirmed (Constrained)"

BAD (snake_case leaked into chat):

- FIND-001 (buffer_overflow): poc_success

GOOD:

- FIND-001 (Buffer Overflow): PoC Success

No colored circles or emojis:

  • Do not use 🔴/🟡/🟢 - they are perspective-dependent (red = bad for defenders, good for researchers)
  • Use plain text headers:
    ### Exploitable (7 findings)
    not
    ### 🔴 EXPLOITABLE

Hypothesis status:

  • Proven
    - hypothesis confirmed by evidence
  • Disproven
    - hypothesis refuted by evidence
  • Partial
    - some predictions confirmed, others refuted

[REMIND] Critical Reminders

  • Do not skip, sample, or guess - check all code against checklist.json.
  • Provide proof for every claim.
  • Actually read files - do not rely on memory.
  • Update docs after every action.

Stages

All stages execute in sequence. No stage may be skipped. The only exception is Stage E, which only applies to memory corruption vulnerabilities.

Each stage has up to three phases: X0 (mechanical prep), X (LLM reasoning), X1 (mechanical validation). Run X0 and X1 via Python snippets in the stage prompt. The X phases are your reasoning work.

StageX0 (prep)X (reasoning)X1 (validation)
0-Build inventory-
ALoad checklist + existing findingsVuln assessment + PoCDedup flag + schema check
BLoad findings + attack surfaceHypotheses, attack treesSchema check all 5 docs
CChecklist lookup (file+line)Code verificationSchema check + pass/fail count
DTest/mock pre-filter + evidence cardRuling + CVSS vectorsSchema check + counts
EGroup by binaryPer-binary analysis + mappingVerdict → status mapping
FCVSS scoring + consistency checksSelf-review + corrections-
1--Recompute CVSS, report

Notes:

  • Stage E only applies to memory corruption vulnerabilities. Web/injection vulns skip E.
  • Stage 1 recomputes CVSS scores from final vectors (after any Stage F corrections), validates schemas, and generates the report. It never changes verdicts.
  • Each stage writes a
    stage_X_summary
    onto each finding (carry-forward). Later stages read the finding object instead of cross-referencing multiple files.

See stage-specific files for detailed instructions.


Working Documents (Stage B)

DocPurpose
attack-tree.jsonKnowledge graph. Source of truth.
hypotheses.jsonActive hypotheses. Status: testing, confirmed, disproven.
disproven.jsonFailed hypotheses. What was tried, why it failed.
attack-paths.jsonPaths attempted. PoC results. PROXIMITY. Blockers.
attack-surface.jsonSources, sinks, trust boundaries.

Flow

STAGE 0:  Inventory
          │
          ▼ checklist.json
          │
STAGE A:  A0 load checklist ─► A assess+PoC ─► A1 dedup+validate
          │
          ▼ findings.json (+ origin, stage_a_summary)
          │
STAGE B:  B0 load findings ─► B hypotheses+trees ─► B1 validate 5 docs
          │
          ▼ findings.json (+ stage_b_summary), working docs
          │
STAGE C:  C0 checklist lookup ─► C verify code ─► C1 validate
          │
          ▼ findings.json (+ sanity_check, stage_c_summary)
          │
STAGE D:  D0 test filter+evidence card ─► D ruling+CVSS ─► D1 validate
          │
          ▼ findings.json (+ ruling, cvss_vector, stage_d_summary)
          │
     ┌────┴────┐
     │         │
  Memory    Web/Injection
  Corruption    │
     │          │
STAGE E:        │
  E0 group ─► E analyze ─► E1 verdict map
     │          │
     └────┬─────┘
          │
          ▼ findings.json (+ feasibility, stage_e_summary, final_status)
          │
STAGE F:  F0 CVSS scores+checks ─► F review+correct ─► findings.json
          │
          ▼ findings.json (+ stage_f_summary)
          │
STAGE 1:  Recompute CVSS, validate, report
          │
          ▼ validation-report.md

Integration with Exploit Feasibility

Stage E automatically bridges to the

exploit_feasibility
package for memory corruption vulnerabilities.

Automatic (via Stage E):

# Stage E handles this automatically for applicable vuln types
# See stage-e-feasibility.md for details

Manual (if needed):

from packages.exploit_feasibility import analyze_binary, format_analysis_summary

result = analyze_binary(binary_path, vuln_type='format_string')
print(format_analysis_summary(result, verbose=True))

Final Status After Stage E:

Source StatusFeasibilityFinal Status
ConfirmedLikelyExploitable
ConfirmedDifficultConfirmed (Constrained)
ConfirmedUnlikelyConfirmed (Blocked)
ConfirmedN/A (web vuln)Confirmed

This ensures findings are:

  1. Real and reachable (Stages A-D)
  2. Actually exploitable (Stage E + exploit_feasibility)

Notice

This analysis is performed for defensive purposes, in a lab environment. Full permission has been provided.