Awesome-Agent-Skills-for-Empirical-Research universal-ma-codebook

install

source · Clone the upstream repo

git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/25-HosungYou-Diverga/skills/universal-ma-codebook" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-universal-ma-code && rm -rf "$T"

manifest: skills/25-HosungYou-Diverga/skills/universal-ma-codebook/SKILL.md

source content

Universal Meta-Analysis Codebook

Version: 2.2 Status: Production Codex Review: APPROVE WITH MINOR CHANGES (2026-01-26) Update: Context-specific extensions (2026-01-26)

Purpose

A universal, AI-Human collaboration codebook for meta-analysis that enables:

AI extraction from PDFs (RAG/OCR) with confidence tracking
Human verification of AI-extracted values
100% human-verified data through structured workflow
Integration with Diverga C5/C6/C7 agents and Category I pipeline
Context-specific extensions for domain-specific moderator variables

Context-Specific Extensions

The Universal Codebook supports project-specific moderator layers that extend the base 4-layer structure. Each meta-analysis context may have unique moderator variables.

Extension Architecture

┌─────────────────────────────────────────────────────────────────────┐
│              UNIVERSAL CODEBOOK WITH CONTEXT EXTENSION              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  LAYER 1: IDENTIFIERS + METADATA (10 fields) ← Universal           │
│  LAYER 2: CORE STATISTICAL VALUES (18 fields) ← Universal          │
│  LAYER 3: CONTEXT-SPECIFIC MODERATORS ← Project Extension          │
│  LAYER 4: AI EXTRACTION PROVENANCE ← Universal                     │
│  LAYER 5: HUMAN VERIFICATION (8 fields) ← Universal                │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Available Context Extensions

Context	Extension File	Moderator Count
GenAI-HE	`GENAI_HE_CODEBOOK.md`	15 moderators
Clinical Trials	`CLINICAL_CODEBOOK.md`	TBD
Educational Tech	`EDTECH_CODEBOOK.md`	TBD

Creating a Context Extension

Define moderator variables specific to your research domain
Create classification rules for categorical moderators
Write AI extraction prompts for each moderator
Configure C6 agent with the extension schema

# Example: Configure C6 for GenAI-HE context
c6.configure_extension(
    context="genai_he",
    moderators=[
        {"name": "genai_tool", "type": "categorical", "values": ["ChatGPT", "Claude", ...]},
        {"name": "blooms_level", "type": "ordinal", "values": ["remember", "understand", ...]},
        {"name": "study_design", "type": "categorical", "values": ["RCT", "quasi", ...]},
    ],
    extraction_prompts=GENAI_HE_PROMPTS
)

GenAI-HE Extension (Example)

Layer 3: GenAI-HE Moderator Variables (15 fields)

Category	Fields
GenAI Tool	genai_tool, genai_tool_version, genai_access_type
Educational Outcome	blooms_level, outcome_dimension, learning_domain
Study Design	study_design, intervention_duration, intervention_type, control_condition
Context	education_level, discipline, country, sample_size_total, publication_type

See:

GenAI-HE-Review-AIMC/docs/GENAI_HE_CODEBOOK.md

for full specification

Architecture: Four-Layer Design

┌─────────────────────────────────────────────────────────────────────┐
│              UNIVERSAL META-ANALYSIS CODEBOOK v2.1                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  LAYER 1: IDENTIFIERS + METADATA (10 fields)                        │
│  study_id, es_id, citation, doi, year, design_type,                │
│  timepoint, arm_label_treat, arm_label_control, unit_of_analysis   │
│                                                                     │
│  LAYER 2: CORE STATISTICAL VALUES (18 fields)                       │
│  Primary: outcome_name → se_g (12)                                  │
│  Change-score: pre_mean_treat, pre_sd_treat, pre_post_corr (3)     │
│  Cluster: cluster_size, icc, n_clusters (3)                        │
│                                                                     │
│  LAYER 3: AI EXTRACTION PROVENANCE                                  │
│  Per-value: ai_value, source, method, confidence, derived_from     │
│  Stored as: ai_extraction_json                                     │
│                                                                     │
│  LAYER 4: HUMAN VERIFICATION (8 fields)                             │
│  verified_status, verified_by, verified_date, corrections_json,    │
│  disagreement_resolved, final_values_json, verification_notes,     │
│  sign_off                                                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Workflow: AI-Human Collaboration

Phase 1: AI Extraction (Automated)

Triggered by: I3 RAG building completion or manual PDF upload

Agent: C6-DataIntegrityGuard

# C6 extracts statistical values from PDFs
extraction_result = c6.extract_with_provenance(
    pdf_folder="./pdfs",
    methods=["rag", "ocr"],
    reconciliation="hierarchy",
    log_all_candidates=True
)

Actions:

I3 builds RAG from PDFs
C6 queries for statistical values (M, SD, n)
Multiple extraction methods run in parallel
Conflict resolution applied (hierarchy + tolerance)
Provenance recorded for all extractions
Hedges' g calculated where inputs complete

Output: All rows →

verified_status = PENDING

Phase 2: Triage (Automated)

Agent: C7-ErrorPreventionEngine

# C7 categorizes by effective confidence
triage_result = c7.triage_extractions(
    data=extraction_result,
    thresholds=CONFIGURABLE_THRESHOLDS
)

Categories:

Confidence	Status	Action
HIGH (≥90%)	PROVISIONAL	Awaits sign-off
MEDIUM (70-89%)	PENDING	Recommended review
LOW (<70%)	PENDING	Required review (priority)
CONFLICT	PENDING	Required review (top priority)

Phase 3: Human Review (Mandatory)

Interface: Excel Review Queue or Web UI

Critical Rule: ALL rows require human verification

Priority Queue:

Conflicts detected (highest)
LOW confidence
MEDIUM confidence
HIGH confidence (spot check)

Human Actions:

Verify AI extraction against PDF
Correct errors, record reason
Mark as VERIFIED or REJECTED
Resolve conflicts

Phase 4: Final Sign-Off

Agent: C5-MetaAnalysisMaster

# C5 validates all gates pass
validation = c5.validate_final(
    data=verified_data,
    require_all_verified=True,
    require_all_signed_off=True
)

Requirements:

All rows:
```
verified_status = VERIFIED
```
All rows:
```
sign_off = True
```
All gates pass (C5 validation)

Result: 100% Human-Verified Dataset

Field Specifications

Layer 1: Identifiers + Metadata

Field	Type	Description	Example
`study_id`	str	Unique study identifier	"CHEN_2024"
`es_id`	str	Effect size ID	"CHEN_2024_01"
`citation`	str	Full APA citation	"Chen et al. (2024)..."
`doi`	str	DOI	"10.1000/xyz"
`year`	int	Publication year	2024
`design_type`	str	RCT\|QUASI\|PRE_POST	"RCT"
`timepoint`	str	Measurement timing	"post"
`arm_label_treat`	str	Treatment label	"ChatGPT group"
`arm_label_control`	str	Control label	"Traditional"
`unit_of_analysis`	str	individual\|cluster	"individual"

Layer 2: Core Statistical Values

Primary Statistics

Field	Type	Required
`outcome_name`	str	Yes
`outcome_unit`	str	No
`es_type`	str	Yes
`analysis_type`	str	No
`n_treatment`	int	Yes
`n_control`	int	Yes
`m_treatment`	float	Conditional
`sd_treatment`	float	Conditional
`m_control`	float	Conditional
`sd_control`	float	Conditional
`hedges_g`	float	Derived
`se_g`	float	Derived

Change-Score Fields (when es_type = CHANGE)

Field	Type	Description
`pre_mean_treat`	float	Pre-test mean
`pre_sd_treat`	float	Pre-test SD
`pre_post_corr`	float	Pre-post correlation (default 0.5)

Cluster Fields (when unit_of_analysis = cluster)

Field	Type	Description
`cluster_size`	float	Average cluster size
`icc`	float	Intra-class correlation
`n_clusters`	int	Number of clusters

Layer 3: AI Extraction Provenance

Stored in

ai_extraction_json

{
  "n_treatment": {
    "ai_value": 43,
    "source": "Table 2, p.8",
    "method": "OCR",
    "confidence": 85,
    "derived_from": null
  },
  "sd_treatment": {
    "ai_value": 12.5,
    "source": "Text p.11, 95% CI",
    "method": "CALCULATED",
    "confidence": 92,
    "derived_from": "CI_95: SE = (14.8-10.2)/3.92"
  }
}

Layer 4: Human Verification

Field	Type	Values
`verified_status`	str	PENDING\|PROVISIONAL\|VERIFIED\|REJECTED
`verified_by`	str	Reviewer initials
`verified_date`	date	Review date
`corrections_json`	json	{field: {ai_value, final_value, reason}}
`disagreement_resolved`	bool	Conflict resolved?
`final_values_json`	json	Human-confirmed values
`verification_notes`	str	Free text notes
`sign_off`	bool	Final approval

Confidence Thresholds (Configurable)

Per-Field Thresholds

Field	HIGH	MEDIUM	LOW
n (sample size)	≥95%	80-94%	<80%
M (mean)	≥90%	70-89%	<70%
SD	≥85%	65-84%	<65%
hedges_g (derived)	≥92%	75-91%	<75%
se_g (derived)	≥92%	75-91%	<75%
pre_post_corr	≥85%	65-84%	<65%
icc	≥80%	60-79%	<60%

Per-Source Modifiers

Source	Modifier
Structured table	+10%
Semi-structured figure	+5%
Unstructured text	0%
Abstract only	-15%
OCR with artifacts	-20%

Formula:

effective_confidence = base_confidence + source_modifier

Conflict Resolution

Extraction Hierarchy

Priority	Source	Weight
1	Table cell	1.0
2	Figure data	0.9
3	In-text stats	0.8
4	Abstract	0.5

Tolerance Thresholds

Value Type	Relative	Absolute
n (sample size)	5%	2
M (mean)	10%	0.5
SD	15%	0.5

Rule: If disagreement exceeds EITHER threshold → Human review required

Derived Value Verification

For calculated values (hedges_g, se_g), human verification means:

All source values (M, SD, n) verified
Formula appropriate for es_type
If change-score: pre_post_corr verified or documented default
If cluster: ICC and cluster_size verified
Final values recalculated from verified inputs

Integration Points

Systematic Review Pipeline Integration

# After Stage 5 (RAG building)
# RAG query integration

rag = RAGQuery(project_path)
values = c6.extract_from_rag(
    rag=rag,
    fields=["n_treatment", "n_control", "m_treatment", "sd_treatment",
            "m_control", "sd_control"],
    fallback_to_ocr=True
)

C5/C6/C7 Agent Roles

Agent	Role in Codebook
C5-MetaAnalysisMaster	Final validation, gate enforcement
C6-DataIntegrityGuard	Extraction, Hedges' g calculation
C7-ErrorPreventionEngine	Triage, conflict detection, warnings

Excel Template Structure

Sheet 1: Codebook

One-page reference with field definitions

Sheet 2: Data (Main)

41 columns (39 visible + 2 JSON)

Sheet 3: Review Queue

Priority-ordered list of rows needing human review

Sheet 4: Extraction Log

Audit trail of all AI extractions

Success Metrics

Metric	Target
AI extraction rate	≥85%
AI confidence accuracy	≥90%
Conflict detection rate	≥95%
Human review completeness	100%
Final sign-off rate	100%
Data completeness (Hedges' g)	≥95%

Commands

# Initialize codebook for a project
diverga codebook init --project genai-he

# Import AI extractions
diverga codebook import --source rag --project genai-he

# Generate review queue
diverga codebook queue --project genai-he

# Validate final dataset
diverga codebook validate --project genai-he

References

Plan document:

docs/plans/META_ANALYSIS_CODEBOOK_PLAN_V2.md

C5 agent:

.claude/skills/C5-meta-analysis-master/SKILL.md

C6 agent:

.claude/skills/C6-data-integrity-guard/SKILL.md

C7 agent:

.claude/skills/C7-error-prevention-engine/SKILL.md

Borenstein et al. (2021). Introduction to Meta-Analysis
Cochrane Handbook Chapter 6: Extracting Data
PRISMA 2020 Statement

Created: 2026-01-26 Codex Review: APPROVE WITH MINOR CHANGES Author: Claude Code