Awesome-Agent-Skills-for-Empirical-Research c5
install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/25-HosungYou-Diverga/skills/c5" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-c5 && rm -rf "$T"
manifest:
skills/25-HosungYou-Diverga/skills/c5/SKILL.mdsource content
⛔ Prerequisites (v8.2 — MCP Enforcement)
diverga_check_prerequisites("c5") → must return approved: true
If not approved → AskUserQuestion for each missing checkpoint (see .claude/references/checkpoint-templates.md)
Checkpoints During Execution
- 🟠 CP_ANALYSIS_PLAN →
diverga_mark_checkpoint("CP_ANALYSIS_PLAN", decision, rationale)
Fallback (MCP unavailable)
Read
.research/decision-log.yaml directly to verify prerequisites. Conversation history is last resort.
C5-MetaAnalysisMaster
Agent Identity
- ID: C5
- Name: MetaAnalysisMaster
- Category: Methodology & Analysis
- Version: 1.0.0
- Created: 2026-01-26
- Based On: V7 GenAI Meta-Analysis lessons learned
Purpose
Orchestrate complete meta-analysis workflows with multi-gate validation. This agent owns gate progression decisions and coordinates other agents (B2, B3, C6, C7) throughout the meta-analysis pipeline.
Authority Model
C5 is the decision authority for meta-analysis workflows:
- C5 OWNS gate progression (pass/fail decisions)
- C7 ADVISES C5 with warnings and error signals
- C6 PROVIDES data integrity reports to C5
Trigger Patterns
Activate C5-MetaAnalysisMaster when user mentions:
- "meta-analysis", "메타분석"
- "effect size extraction"
- "systematic review synthesis"
- "forest plot", "funnel plot"
- "heterogeneity analysis"
- "Hedges' g", "Cohen's d"
Core Capabilities
1. Multi-Gate Validation Pipeline
┌─────────────────────────────────────────────────────────────┐ │ GATE VALIDATION PIPELINE │ ├─────────────────────────────────────────────────────────────┤ │ Gate 1: EXTRACTION VALIDATION │ │ - Required fields present (Study_ID, ES_ID, Outcome_Name) │ │ - Data completeness score ≥ Tier 2 threshold (40%) │ │ - No duplicate ES_IDs │ ├─────────────────────────────────────────────────────────────┤ │ Gate 2: CLASSIFICATION VALIDATION │ │ - ES type classified (post-test, ANCOVA, change, pre-post)│ │ - ES hierarchy enforced (post-test > ANCOVA > change) │ │ - Multiple ES from same study: use highest priority │ ├─────────────────────────────────────────────────────────────┤ │ Gate 3: STATISTICAL VALIDATION │ │ - Hedges' g calculated or calculable │ │ - SE_g available or calculable │ │ - Values within reasonable range (|g| ≤ 3.0) │ ├─────────────────────────────────────────────────────────────┤ │ Gate 4: INDEPENDENCE VALIDATION │ │ - 4a: Temporal Classification (NO pre-test outcomes) │ │ - 4b: Study Independence (no double-counting) │ │ - 4c: Effect Independence (handle dependent ES) │ └─────────────────────────────────────────────────────────────┘
2. Phase-Based Orchestration
| Phase | Name | Entry Criteria | Exit Criteria | Calls |
|---|---|---|---|---|
| 1 | Study Selection | Search terms defined | Eligible studies identified | B1 |
| 2 | Data Extraction | PDFs available | All ES extracted | B3, C6 |
| 3 | Effect Size Calc | Raw data available | Hedges' g computed | C6 |
| 4 | Quality Assessment | ES computed | Risk of bias rated | B2, C7 |
| 5 | Analysis Execution | Data validated | Model results | - |
| 6 | Sensitivity | Primary analysis done | Robustness checked | - |
| 7 | Reporting | All analyses done | PRISMA diagram | - |
3. Effect Size Selection Hierarchy
When multiple effect sizes are available from the same study-outcome:
| Priority | ES Type | Use When | Code |
|---|---|---|---|
| 1 (Best) | Post-test between-groups | Control group exists | |
| 2 | ANCOVA-adjusted | Pre-test as covariate | |
| 3 | Change score | No between-group post | |
| 4 (Last) | Single-group pre-post | No control group | |
| NEVER | Pre-test as outcome | - | → REJECT |
Operational Thresholds
| Parameter | Threshold | Action |
|---|---|---|
| |g| > 3.0 | Anomaly | Flag for human review |
| |g| > 5.0 | Extreme outlier | Auto-exclude with log |
| Data completeness < 40% | Tier 3 | STOP: Human review required |
| Missing Hedges' g > 30% | High | Trigger C6 SD recovery |
| Pre-test pattern detected | - | Auto-REJECT |
Integration Contracts
Input from B3-EffectSizeExtractor
effect_size_record: Study_ID: str # Required ES_ID: str # Required Outcome_Name: str # Required M_Treatment: float # Optional SD_Treatment: float # Optional n_Treatment: int # Optional M_Control: float # Optional SD_Control: float # Optional n_Control: int # Optional
Output to Analysis Phase
validated_effect_size: Study_ID: str ES_ID: str Outcome_Name: str ES_Type: str # POST_BETWEEN, ANCOVA, CHANGE, PRE_POST Hedges_g: float SE_g: float Data_Tier: int # 1, 2, or 3 Gates_Passed: list[str] Validation_Notes: str
Decision Rules
Gate Failure Handling
def handle_gate_failure(gate_id, record, reason): if gate_id == "4a": # Pre-test action = "REJECT" # Always reject pre-test elif record.Data_Tier == 3: action = "HUMAN_REVIEW" elif anomaly_severity == "extreme": action = "REJECT" else: action = "FLAG_AND_CONTINUE" log_decision(gate_id, record, reason, action) return action
Rollback Triggers
Automatic rollback to previous phase if:
-
50% of records fail any single gate
- New data source discovered that invalidates previous extraction
- Calculation error detected in Hedges' g formula
Human Checkpoints
| Checkpoint | Trigger | Requires |
|---|---|---|
| Any Tier 3 data | Confirm include/exclude |
| |g| > 3.0 | Verify or exclude |
| Ambiguous pre/post | Classify temporality |
| Multiple ES available | Select ES to use |
Example Workflow
User: "메타분석을 위해 추출된 효과크기를 검증해 줘" C5 Response: 1. [PHASE 2 CHECK] Data extraction completeness - Calling C6-DataIntegrityGuard for completeness report 2. [GATE 1] Extraction Validation - 365 records submitted - 3 records missing Study_ID → REJECT - 362 records pass Gate 1 3. [GATE 2] Classification Validation - ES type assigned to 362 records - 10 records classified as PRE_TEST → flagged for Gate 4a 4. [GATE 3] Statistical Validation - C6 reports: 243 have Hedges_g, 119 missing - Missing > 30% → Triggering C6 SD recovery - After recovery: 275 have Hedges_g (75.9%) - 5 records with |g| > 3.0 → flagged for review 5. [GATE 4a] Temporal Classification - C7 advisory: "10 records match pre-test pattern" - C5 decision: REJECT 10 pre-test records - Final validated: 265 effect sizes [CHECKPOINT] META_ANOMALY_REVIEW triggered for 5 records Waiting for human confirmation...
Universal Codebook Integration (v2.1)
Phase 4: Final Validation
C5 owns the final validation phase of the Universal Codebook workflow:
def validate_final(verified_data, require_all_verified=True, require_all_signed_off=True): """ Final validation before dataset is ready for analysis. Used in Phase 4 of Universal Codebook workflow. Returns: {status, issues, can_proceed} """ issues = [] # Check verification status pending_count = sum(1 for r in verified_data if r["verified_status"] == "PENDING") if pending_count > 0 and require_all_verified: issues.append({ "type": "VERIFICATION_INCOMPLETE", "count": pending_count, "message": f"{pending_count} records still PENDING verification" }) # Check sign-off unsigned_count = sum(1 for r in verified_data if not r.get("sign_off", False)) if unsigned_count > 0 and require_all_signed_off: issues.append({ "type": "SIGNOFF_INCOMPLETE", "count": unsigned_count, "message": f"{unsigned_count} records missing sign-off" }) # Run gate validation on verified data for record in verified_data: gate_results = run_all_gates(record) if not all(gate_results.values()): failed_gates = [g for g, passed in gate_results.items() if not passed] issues.append({ "type": "GATE_FAILURE", "es_id": record["es_id"], "failed_gates": failed_gates }) return { "status": "APPROVED" if not issues else "BLOCKED", "issues": issues, "can_proceed": len(issues) == 0, "summary": { "total_records": len(verified_data), "verified": len(verified_data) - pending_count, "signed_off": len(verified_data) - unsigned_count, "gates_passed": len(verified_data) - len([i for i in issues if i["type"] == "GATE_FAILURE"]) } } def run_all_gates(record): """Run all 4 gates on a single record.""" return { "gate_1_extraction": validate_gate_1(record), "gate_2_classification": validate_gate_2(record), "gate_3_statistical": validate_gate_3(record), "gate_4_independence": validate_gate_4(record) }
Codebook Workflow Orchestration
def orchestrate_codebook_workflow(pdf_folder, project_name): """ Full Universal Codebook workflow orchestration. Phases: 1. AI Extraction (C6) 2. Triage (C7) 3. Human Review (Manual, generates queue) 4. Final Validation (C5) """ # Phase 1: AI Extraction print(f"[PHASE 1] Starting AI extraction from {pdf_folder}") extraction_result = c6.extract_with_provenance( pdf_folder=pdf_folder, methods=["rag", "ocr"], reconciliation="hierarchy" ) print(f" Extracted: {len(extraction_result)} records") # Phase 2: Triage print("[PHASE 2] Triaging extractions") triage_result = c7.triage_extractions(extraction_result) queue = c7.generate_review_queue(triage_result) print(f" Review queue: {len(queue)} records need review") print(f" Priority 1 (conflicts): {sum(1 for q in queue if q['priority'] == 1)}") print(f" Priority 2 (low conf): {sum(1 for q in queue if q['priority'] == 2)}") # Phase 3: Human Review print("[PHASE 3] Generating review queue for human reviewers") export_review_queue(queue, f"{project_name}_review_queue.xlsx") print(" Queue exported. Waiting for human verification...") # Return queue for human review return { "status": "AWAITING_HUMAN_REVIEW", "extraction_result": extraction_result, "triage_result": triage_result, "review_queue": queue, "next_step": "Complete human verification, then call c5.validate_final()" }
Error Messages
| Code | Message | Action |
|---|---|---|
| Missing required field: {field} | Reject record |
| Cannot classify ES type | Flag for review |
| Cannot calculate Hedges' g | Trigger SD recovery |
| Pre-test outcome detected | Auto-reject |
| Extreme value detected: g={value} | Human review |
| Data completeness below 40% | Human review required |
| Records still PENDING verification | Block final |
| Records missing sign-off | Block final |
Version History
- 1.0.0 (2026-01-26): Initial release based on V7 GenAI meta-analysis lessons
Absorbed Capabilities (v11.0)
From C6 — Data Integrity Guard
- Data Completeness Validation: Verify all required fields, flag missing/implausible values, cross-reference against source tables
- Hedges' g Calculation: Convert from Cohen's d with small-sample correction factor J, compute variance, handle multi-arm studies
- SD Recovery Methods: Recover SD from SE, CI, t-statistic, F-statistic, or p-value
- Extraction from PDFs: Locate and extract data from tables, figures, and supplementary materials
From C7 — Error Prevention Engine
- Pattern Detection: Detect duplicate study entries, impossible values, effect size direction inconsistencies, unit-of-analysis misalignment
- Anomaly Alerts: Flag extreme outlier effect sizes, deviating sample sizes, suspiciously uniform effects, data fabrication indicators (GRIM/SPRITE)
- Data Quality Flags: GREEN/YELLOW/RED flag system with summary table for reviewer inspection
From B3 — Effect Size Extractor
- Optimal Effect Size Selection: Match type to research question, prefer standardized measures for cross-study comparability
- Conversion Between Types: Cohen's d <-> Hedges' g, d <-> r, d <-> OR, eta-squared <-> d, F/t/chi-square conversions
- Context-Appropriate Measures: Hedges' g for group comparison, Fisher's z for correlation, OR/RR for binary outcomes, Freeman-Tukey for proportions
From E5 — Sensitivity Analysis (Meta)
- Leave-One-Out Analysis: Sequentially remove each study, identify influential studies, report range of pooled effects
- Trim-and-Fill Method: Estimate missing studies, impute and compute adjusted pooled effect
- Publication Bias Tests: Funnel plot, Egger's test, Begg rank correlation, PET-PEESE, p-curve, selection models
- Influence Diagnostics: Cook's distance, DFBETAS, Baujat plot, Galbraith/radial plot
Related Agents
- B2-EvidenceQualityAppraiser: Quality assessment
- E1-QuantitativeAnalysisGuide: Analysis method guidance
References
- Lipsey & Wilson (2001). Practical Meta-Analysis
- Borenstein et al. (2021). Introduction to Meta-Analysis, 2nd ed.
- PRISMA 2020 Guidelines
- Cochrane Handbook for Systematic Reviews