Agentic-qe pentest-validation

Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).

install
source · Clone the upstream repo
git clone https://github.com/proffesor-for-testing/agentic-qe
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/proffesor-for-testing/agentic-qe "$T" && mkdir -p ~/.claude/skills && cp -r "$T/assets/skills/pentest-validation" ~/.claude/skills/proffesor-for-testing-agentic-qe-pentest-validation-17f4ce && rm -rf "$T"
manifest: assets/skills/pentest-validation/SKILL.md
source content

Pentest Validation

<default_to_action> When validating security findings:

  1. REQUIRE explicit authorization for target URL
  2. SCAN with qe-security-scanner (SAST + dependency + secrets)
  3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
  4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
  5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
  6. UPDATE exploit playbook with new patterns

Quality Gates:

  • Authorization confirmed before ANY exploitation
  • Target URL is staging/dev (NOT production)
  • Budget cap enforced ($15 default)
  • Time cap enforced (30 min default)
  • All exploitation attempts logged </default_to_action>

Quick Reference Card

The 4-Phase Pipeline

PhaseAgent(s)PurposeParallelism
1. Reconqe-security-scannerSAST, DAST, dependency scan, secretsInternal parallel
2. Analysisqe-security-reviewer + qe-security-auditorCode review + compliance checkBoth in parallel
3. Validationqe-pentest-validatorGraduated exploit validationPer-vuln-type parallel
4. Reportqe-quality-gate"No Exploit, No Report" filterSequential

Graduated Exploitation Tiers

TierHandlerCostLatencyUse When
1Agent Booster (WASM)$0<1msCode pattern is conclusive (eval, innerHTML, hardcoded creds)
2Haiku$0.0002~500msNeed payload test against live target
3Sonnet/Opus$0.003-$0.0152-5sFull exploit chain with data proof

When to Use This Skill

ScenarioTierEstimated Cost
PR security review (source only)1$0
Pre-release validation (staging)1-2$1-5
Full pentest validation1-3$5-15
Compliance audit evidence1-3$5-15

Configuration

pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

Safeguards (Mandatory)

Authorization Gate

Every pentest validation run MUST:

  1. Display target URL and exploitation tier to user
  2. Require explicit confirmation: "I own/authorized testing of this target"
  3. Log authorization with timestamp
  4. Block if target URL matches production patterns

What This Skill Does NOT Do

  • Full autonomous reconnaissance (Nmap, Subfinder)
  • Zero-day exploit development
  • Attack targets without explicit authorization
  • Test production systems
  • Store actual exfiltrated data (only proof of access)
  • Social engineering or phishing simulation
  • Port scanning or service discovery

Validation Pipelines

Injection Pipeline

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
SQL injectionString concat in query
' OR '1'='1
response diff
UNION SELECT data extraction
NoSQL injection
$where
,
$gt
in query
Operator injection testCollection enumeration
Command injection
exec()
,
system()
calls
Command delimiter testReverse shell proof
LDAP injectionString concat in filterWildcard injectionDirectory enumeration

XSS Pipeline

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
Reflected XSSNo output encoding
<img onerror>
reflection
Browser JS execution via Playwright
Stored XSS
innerHTML
assignment
Payload stored + retrievedCookie theft PoC
DOM XSS
document.write(location)
Fragment injectionDOM manipulation proof

Auth Pipeline

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
JWT noneNo algorithm validationModified JWT acceptedAdmin access with forged token
Session fixationNo session rotationPre-set session reusedCross-user session hijack
Credential stuffingNo rate limiting100 attempts unblockedValid credential discovery
IDORNo authorization checkAccess other user dataFull CRUD on foreign resources

SSRF Pipeline

AttackTier 1 (Pattern)Tier 2 (Payload)Tier 3 (Full)
Internal URLUser-controlled URL fetch
http://169.254.169.254
Cloud metadata extraction
DNS rebindingURL validation bypassRebind to internal IPInternal service access
Protocol smugglingURL scheme not restricted
file:///etc/passwd
File content in response

Agent Coordination

Orchestration Pattern

// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");

Finding Classification

StatusMeaningAction
confirmed-exploitable
Exploitation succeeded with PoCReport with evidence
likely-exploitable
Partial exploitation, defenses detectedReport with caveats
not-exploitable
All exploitation attempts failedFilter from report
inconclusive
WAF/defense blocked, unclear if vulnerableReport for manual review

Exploit Playbook Memory

Namespace Structure

aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc

Learning Loop

  1. Before validation: Query playbook for known patterns matching findings
  2. During validation: Try known payloads first (higher success rate)
  3. After validation: Store new successful patterns with confidence scores
  4. Over time: Agent converges on most effective payloads per tech stack

Cost Optimization

Estimated Cost by Scenario

ScenarioTier MixFindingsEst. CostEst. Time
PR check (source only)100% Tier 15$0<5s
Sprint validation70% T1, 30% T215$2-55-10 min
Release validation40% T1, 40% T2, 20% T325$8-1515-30 min
Full pentest20% T1, 30% T2, 50% T340$15-3030-60 min

Cost vs Shannon Comparison

MetricShannonAQE Pentest Validation
Cost per run~$50$5-15 (graduated tiers)
Runtime60-90 min15-30 min (parallel pipelines)
False positive rateLow (exploit-proven)Low (same principle)
LearningNone (static prompts)ReasoningBank playbook

Success Metrics

MetricTargetMeasurement
False positive reduction>60% of findings eliminatedPre/post validator comparison
Exploit confirmation rate>80% of confirmed findings truly exploitableManual PoC verification
Cost per run<$15 USDToken tracking per pipeline
Time per run<30 minutesExecution time metrics
Playbook growth100+ patterns after 6 monthsMemory namespace count

Related Skills


Remember

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.

Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.