Agentic-qe pentest-validation

Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).

install

source · Clone the upstream repo

git clone https://github.com/proffesor-for-testing/agentic-qe

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/proffesor-for-testing/agentic-qe "$T" && mkdir -p ~/.claude/skills && cp -r "$T/assets/skills/pentest-validation" ~/.claude/skills/proffesor-for-testing-agentic-qe-pentest-validation-17f4ce && rm -rf "$T"

manifest: assets/skills/pentest-validation/SKILL.md

Pentest Validation

<default_to_action> When validating security findings:

REQUIRE explicit authorization for target URL
SCAN with qe-security-scanner (SAST + dependency + secrets)
ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
UPDATE exploit playbook with new patterns

Quality Gates:

Authorization confirmed before ANY exploitation
Target URL is staging/dev (NOT production)
Budget cap enforced ($15 default)
Time cap enforced (30 min default)
All exploitation attempts logged </default_to_action>

Quick Reference Card

The 4-Phase Pipeline

Phase	Agent(s)	Purpose	Parallelism
1. Recon	qe-security-scanner	SAST, DAST, dependency scan, secrets	Internal parallel
2. Analysis	qe-security-reviewer + qe-security-auditor	Code review + compliance check	Both in parallel
3. Validation	qe-pentest-validator	Graduated exploit validation	Per-vuln-type parallel
4. Report	qe-quality-gate	"No Exploit, No Report" filter	Sequential

Graduated Exploitation Tiers

Tier	Handler	Cost	Latency	Use When
1	Agent Booster (WASM)	$0	<1ms	Code pattern is conclusive (eval, innerHTML, hardcoded creds)
2	Haiku	$0.0002	~500ms	Need payload test against live target
3	Sonnet/Opus	$0.003-$0.015	2-5s	Full exploit chain with data proof

When to Use This Skill

Scenario	Tier	Estimated Cost
PR security review (source only)	1	$0
Pre-release validation (staging)	1-2	$1-5
Full pentest validation	1-3	$5-15
Compliance audit evidence	1-3	$5-15

Configuration

pentest:
  target_url: https://staging.app.com    # REQUIRED for Tier 2-3
  source_repo: ./src                      # REQUIRED for Tier 1+
  exploitation_tier: 2                    # 1=pattern-only, 2=payload-test, 3=full-exploit
  vuln_types:                             # Which pipelines to run
    - injection                           # SQL, NoSQL, command injection
    - xss                                 # Reflected, stored, DOM XSS
    - auth                                # Auth bypass, session, JWT
    - ssrf                                # URL scheme abuse, metadata
  max_cost_usd: 15                        # Budget cap per run
  timeout_minutes: 30                     # Time cap per run
  require_authorization: true             # MUST confirm target ownership
  no_production: true                     # Block production URLs
  production_patterns:                    # URL patterns to block
    - "*.prod.*"
    - "api.*"
    - "www.*"

Safeguards (Mandatory)

Authorization Gate

Every pentest validation run MUST:

Display target URL and exploitation tier to user
Require explicit confirmation: "I own/authorized testing of this target"
Log authorization with timestamp
Block if target URL matches production patterns

What This Skill Does NOT Do

Full autonomous reconnaissance (Nmap, Subfinder)
Zero-day exploit development
Attack targets without explicit authorization
Test production systems
Store actual exfiltrated data (only proof of access)
Social engineering or phishing simulation
Port scanning or service discovery

Validation Pipelines

Injection Pipeline

Attack	Tier 1 (Pattern)	Tier 2 (Payload)	Tier 3 (Full)
SQL injection	String concat in query	`' OR '1'='1` response diff	UNION SELECT data extraction
NoSQL injection	`$where` , `$gt` in query	Operator injection test	Collection enumeration
Command injection	`exec()` , `system()` calls	Command delimiter test	Reverse shell proof
LDAP injection	String concat in filter	Wildcard injection	Directory enumeration

XSS Pipeline

Attack	Tier 1 (Pattern)	Tier 2 (Payload)	Tier 3 (Full)
Reflected XSS	No output encoding	`<img onerror>` reflection	Browser JS execution via Playwright
Stored XSS	`innerHTML` assignment	Payload stored + retrieved	Cookie theft PoC
DOM XSS	`document.write(location)`	Fragment injection	DOM manipulation proof

Auth Pipeline

Attack	Tier 1 (Pattern)	Tier 2 (Payload)	Tier 3 (Full)
JWT none	No algorithm validation	Modified JWT accepted	Admin access with forged token
Session fixation	No session rotation	Pre-set session reused	Cross-user session hijack
Credential stuffing	No rate limiting	100 attempts unblocked	Valid credential discovery
IDOR	No authorization check	Access other user data	Full CRUD on foreign resources

SSRF Pipeline

Attack	Tier 1 (Pattern)	Tier 2 (Payload)	Tier 3 (Full)
Internal URL	User-controlled URL fetch	`http://169.254.169.254`	Cloud metadata extraction
DNS rebinding	URL validation bypass	Rebind to internal IP	Internal service access
Protocol smuggling	URL scheme not restricted	`file:///etc/passwd`	File content in response

Agent Coordination

Orchestration Pattern

// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
  target: "./src",
  layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");

// Phase 2: Analysis (parallel review)
await Promise.all([
  Task("Code Security Review", {
    findings: phase1Results,
    depth: "comprehensive"
  }, "qe-security-reviewer"),

  Task("Compliance Audit", {
    findings: phase1Results,
    frameworks: ["owasp-top-10"]
  }, "qe-security-auditor")
]);

// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
  findings: [...phase1Results, ...phase2Results],
  target_url: "https://staging.app.com",
  exploitation_tier: 2,
  vuln_types: ["injection", "xss", "auth", "ssrf"],
  max_cost_usd: 15,
  timeout_minutes: 30
}, "qe-pentest-validator");

// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
  findings: phase3Results.confirmedFindings,
  gate: "no-exploit-no-report",
  require_poc: true
}, "qe-quality-gate");

Finding Classification

Status	Meaning	Action
`confirmed-exploitable`	Exploitation succeeded with PoC	Report with evidence
`likely-exploitable`	Partial exploitation, defenses detected	Report with caveats
`not-exploitable`	All exploitation attempts failed	Filter from report
`inconclusive`	WAF/defense blocked, unclear if vulnerable	Report for manual review

Exploit Playbook Memory

Namespace Structure

aqe/pentest/
 playbook/
  exploit/{vuln_type}/{tech_stack}/{technique}
  bypass/{defense_type}/{technique}
  payload/{vuln_type}/{variant}
 results/
  validation-{timestamp}
 poc/
  {finding_id}-poc

Learning Loop

Before validation: Query playbook for known patterns matching findings
During validation: Try known payloads first (higher success rate)
After validation: Store new successful patterns with confidence scores
Over time: Agent converges on most effective payloads per tech stack

Cost Optimization

Estimated Cost by Scenario

Scenario	Tier Mix	Findings	Est. Cost	Est. Time
PR check (source only)	100% Tier 1	5	$0	<5s
Sprint validation	70% T1, 30% T2	15	$2-5	5-10 min
Release validation	40% T1, 40% T2, 20% T3	25	$8-15	15-30 min
Full pentest	20% T1, 30% T2, 50% T3	40	$15-30	30-60 min

Cost vs Shannon Comparison

Metric	Shannon	AQE Pentest Validation
Cost per run	~$50	$5-15 (graduated tiers)
Runtime	60-90 min	15-30 min (parallel pipelines)
False positive rate	Low (exploit-proven)	Low (same principle)
Learning	None (static prompts)	ReasoningBank playbook

Success Metrics

Metric	Target	Measurement
False positive reduction	>60% of findings eliminated	Pre/post validator comparison
Exploit confirmation rate	>80% of confirmed findings truly exploitable	Manual PoC verification
Cost per run	<$15 USD	Token tracking per pipeline
Time per run	<30 minutes	Execution time metrics
Playbook growth	100+ patterns after 6 months	Memory namespace count

Related Skills

security-testing - OWASP vulnerability scanning, SAST/DAST automation
compliance-testing - Regulatory compliance
api-testing-patterns - API security testing
chaos-engineering-resilience - Security under chaos

Remember

"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.

Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.