Cc-skills multi-agent-e2e-validation
Multi-agent parallel E2E validation for database refactors. TRIGGERS - E2E validation, schema migration testing, database refactor validation.
git clone https://github.com/terrylica/cc-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/terrylica/cc-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/plugins/quality-tools/skills/multi-agent-e2e-validation" ~/.claude/skills/terrylica-cc-skills-multi-agent-e2e-validation && rm -rf "$T"
plugins/quality-tools/skills/multi-agent-e2e-validation/SKILL.mdMulti-Agent E2E Validation
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
Overview
Prescriptive workflow for spawning parallel validation agents to comprehensively test database refactors. Successfully identified 5 critical bugs (100% system failure rate) in QuestDB migration that would have shipped in production.
When to Use This Skill
Use this skill when:
- Database refactors (e.g., v3.x file-based → v4.x QuestDB)
- Schema migrations requiring validation
- Bulk data ingestion pipeline testing
- System migrations with multiple validation layers
- Pre-release validation for database-centric systems
Key outcomes:
- Parallel agent execution for comprehensive coverage
- Structured validation reporting (VALIDATION_FINDINGS.md)
- Bug discovery with severity classification (Critical/Medium/Low)
- Release readiness assessment
Core Methodology
1. Validation Architecture (3-Layer Model)
Layer 1: Environment Setup
- Container orchestration (Colima/Docker)
- Database deployment and schema application
- Connectivity validation (ILP, PostgreSQL, HTTP ports)
- Configuration file creation and validation
Layer 2: Data Flow Validation
- Bulk ingestion testing (CloudFront → QuestDB)
- Performance benchmarking against SLOs
- Multi-month data ingestion
- Deduplication testing (re-ingestion scenarios)
- Type conversion validation (FLOAT→LONG casts)
Layer 3: Query Interface Validation
- High-level query methods (get_latest, get_range, execute_sql)
- Edge cases (limit=1, cross-month boundaries)
- Error handling (invalid symbols, dates, parameters)
- Gap detection SQL compatibility
2. Agent Orchestration Pattern
Sequential vs Parallel Execution:
Agent 1 (Environment) → [SEQUENTIAL - prerequisite] ↓ Agent 2 (Bulk Loader) → [PARALLEL with Agent 3] Agent 3 (Query Interface) → [PARALLEL with Agent 2]
Dependency Rule: Environment validation must pass before data flow/query validation
Dynamic Todo Management:
- Start with high-level plan (ADR-defined phases)
- Prune completed agents from todo list
- Grow todo list when bugs discovered (e.g., Bug #5 found by Agent 3)
- Update VALIDATION_FINDINGS.md incrementally
3. Validation Script Structure
Each agent produces:
- Test Script (e.g.,
)test_bulk_loader.py- 5+ test functions with clear pass/fail criteria
- Structured output (test name, result, details)
- Summary report at end
- Artifacts (logs, config files, evidence)
- Findings Report (bugs, severity, fix proposals)
Example Test Structure:
def test_feature(conn): """Test 1: Feature description""" print("=" * 80) print("TEST 1: Feature description") print("=" * 80) results = {} # Test 1a: Subtest name print("\n1a. Testing subtest:") result_1a = perform_test() print(f" Result: {result_1a}") results["subtest_1a"] = result_1a == expected_1a # Summary print("\n" + "-" * 80) all_passed = all(results.values()) print(f"Test 1 Results: {'✓ PASS' if all_passed else '✗ FAIL'}") for test_name, passed in results.items(): print(f" - {test_name}: {'✓' if passed else '✗'}") return {"success": all_passed, "details": results}
4. Bug Classification and Tracking
Severity Levels:
- 🔴 Critical: 100% system failure (e.g., API mismatch, timestamp corruption)
- 🟡 Medium: Degraded functionality (e.g., below SLO performance)
- 🟢 Low: Minor issues, edge cases
Bug Report Format:
#### Bug N: Descriptive Name (**SEVERITY** - Status) **Location**: `file/path.py:line` **Issue**: One-sentence description **Impact**: Quantified impact (e.g., "100% ingestion failure") **Root Cause**: Technical explanation **Fix Applied**: Code changes with before/after **Verification**: Test results proving fix **Status**: ✅ FIXED / ⚠️ PARTIAL / ❌ OPEN
5. Release Readiness Decision Framework
Go/No-Go Criteria:
BLOCKER = Any Critical bug unfixed SHIP = All Critical bugs fixed + (Medium bugs acceptable OR fixed) DEFER = >3 Medium bugs unfixed OR any High-severity bug
Example Decision:
- 5 Critical bugs found → all fixed ✅
- 1 Medium bug (performance 55% below SLO) → acceptable ✅
- Verdict: RELEASE READY
Workflow: Step-by-Step
Step 1: Create Validation Plan (ADR-Driven)
Input: ADR document (e.g., ADR-0002 QuestDB Refactor) Output: Validation plan with 3-7 agents
Plan Structure:
## Validation Agents ### Agent 1: Environment Setup - Deploy QuestDB via Docker - Apply schema.sql - Validate connectivity (ILP, PG, HTTP) - Create .env configuration ### Agent 2: Bulk Loader Validation - Test CloudFront → QuestDB ingestion - Benchmark performance (target: >100K rows/sec) - Validate deduplication (re-ingestion test) - Multi-month ingestion test ### Agent 3: Query Interface Validation - Test get_latest() with various limits - Test get_range() with date boundaries - Test execute_sql() with parameterized queries - Test detect_gaps() SQL compatibility - Test error handling (invalid inputs)
Step 2: Execute Agent 1 (Environment)
Directory Structure:
tmp/e2e-validation/ agent-1-env/ test_environment_setup.py questdb.log config.env schema-check.txt
Validation Checklist:
- ✅ Container running
- ✅ Ports accessible (9009 ILP, 8812 PG, 9000 HTTP)
- ✅ Schema applied without errors
- ✅ .env file created
Step 3: Execute Agents 2-3 in Parallel
Agent 2: Bulk Loader
tmp/e2e-validation/ agent-2-bulk/ test_bulk_loader.py ingestion_benchmark.txt deduplication_test.txt
Agent 3: Query Interface
tmp/e2e-validation/ agent-3-query/ test_query_interface.py gap_detection_test.txt
Execution:
# Terminal 1 cd tmp/e2e-validation/agent-2-bulk uv run python test_bulk_loader.py # Terminal 2 cd tmp/e2e-validation/agent-3-query uv run python test_query_interface.py
Step 4: Document Findings in VALIDATION_FINDINGS.md
Template:
# E2E Validation Findings Report **Validation ID**: ADR-XXXX **Branch**: feat/database-refactor **Date**: YYYY-MM-DD **Target Release**: vX.Y.Z **Status**: [BLOCKED / READY / IN_PROGRESS] ## Executive Summary E2E validation discovered **N critical bugs** that would have caused [impact]: | Finding | Severity | Status | Impact | Agent | | ------- | -------- | ------ | ------------ | ------- | | Bug 1 | Critical | Fixed | 100% failure | Agent 2 | **Recommendation**: [RELEASE READY / BLOCKED / DEFER] ## Agent 1: Environment Setup - [STATUS] ... ## Agent 2: [Name] - [STATUS] ...
Step 5: Iterate on Fixes
For each bug:
- Document in VALIDATION_FINDINGS.md with 🔴/🟡/🟢 severity
- Apply fix to source code
- Re-run failing test
- Update bug status to ✅ FIXED
- Commit with semantic message (e.g.,
)fix: correct timestamp parsing in CSV ingestion
Example Fix Commit:
git add src/gapless_crypto_clickhouse/collectors/questdb_bulk_loader.py git commit -m "fix: prevent pandas from treating first CSV column as index BREAKING CHANGE: All timestamps were defaulting to epoch 0 (1970-01) due to pandas read_csv() auto-indexing. Added index_col=False to preserve first column as data. Fixes #ABC-123"
Step 6: Final Validation and Release Decision
Run all tests:
/usr/bin/env bash << 'SKILL_SCRIPT_EOF' cd tmp/e2e-validation for agent in agent-*; do echo "=== Running $agent ===" cd $agent uv run python test_*.py cd .. done SKILL_SCRIPT_EOF
Update VALIDATION_FINDINGS.md status:
- Count Critical bugs: X fixed, Y open
- Count Medium bugs: X fixed, Y open
- Apply decision framework
- Update Status field to ✅ RELEASE READY or ❌ BLOCKED
Real-World Example: QuestDB Refactor Validation
Context: Migrating from file-based storage (v3.x) to QuestDB (v4.0.0)
Bugs Found:
- 🔴 Sender API mismatch - Used non-existent
instead ofSender.from_uri()Sender.from_conf() - 🔴 Type conversion -
sent as FLOAT, schema expects LONGnumber_of_trades - 🔴 Timestamp parsing - pandas treating first column as index → epoch 0 timestamps
- 🔴 Deduplication - WAL mode doesn't provide UPSERT semantics (needed
)DEDUP ENABLE UPSERT KEYS - 🔴 SQL incompatibility - detect_gaps() used nested window functions (QuestDB unsupported)
Impact: Without this validation, v4.0.0 would ship with 100% data corruption and 100% ingestion failure
Outcome: All 5 bugs fixed, system validated, v4.0.0 released successfully
Common Pitfalls
1. Skipping Environment Validation
❌ Bad: Assume Docker/database is working, jump to data ingestion tests ✅ Good: Agent 1 validates environment first, catches port conflicts, schema errors early
2. Serial Agent Execution
❌ Bad: Run Agent 2, wait for completion, then run Agent 3 ✅ Good: Run Agent 2 & 3 in parallel (no dependency between them)
3. Manual Test Reporting
❌ Bad: Copy/paste test output into Slack/email ✅ Good: Structured VALIDATION_FINDINGS.md with severity, status, fix tracking
4. Ignoring Medium Bugs
❌ Bad: "Performance is 55% below SLO, but we'll fix it later" ✅ Good: Document in VALIDATION_FINDINGS.md, make explicit go/no-go decision
5. No Re-validation After Fixes
❌ Bad: Apply fix, assume it works, move on ✅ Good: Re-run failing test, update status in VALIDATION_FINDINGS.md
Resources
scripts/
Not applicable - validation scripts are project-specific (stored in
tmp/e2e-validation/)
references/
- Complete VALIDATION_FINDINGS.md templateexample_validation_findings.md
- Template for creating validation test scriptsagent_test_template.py
- Detailed severity criteria and examplesbug_severity_classification.md
assets/
Not applicable - validation artifacts are project-specific
Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Container not starting | Colima/Docker not running | Run before Agent 1 |
| Port conflicts | Ports already in use | Stop conflicting containers or use different ports |
| Schema application fails | Invalid SQL syntax | Check schema.sql for database-specific compatibility |
| Agent 2/3 fail without Agent 1 | Environment not validated | Ensure Agent 1 completes before starting Agent 2/3 |
| Test script import errors | Missing dependencies | Run in agent directory |
| Bug status not updating | VALIDATION_FINDINGS.md stale | Manually refresh status after each fix |
| Parallel agents interference | Shared resources conflict | Ensure agents use isolated directories |
| Decision unclear | Severity mixed Critical/Medium | Apply Go/No-Go criteria strictly per documentation |
Post-Execution Reflection
After this skill completes, reflect before closing the task:
- Locate yourself. — Find this SKILL.md's canonical path before editing.
- What failed? — Fix the instruction that caused it.
- What worked better than expected? — Promote to recommended practice.
- What drifted? — Fix any script, reference, or dependency that no longer matches reality.
- Log it. — Evolution-log entry with trigger, fix, and evidence.
Do NOT defer. The next invocation inherits whatever you leave behind.