Claude-skill-registry data-extraction
Use when extracting structured data from medical research PDFs, parsing study characteristics, patient demographics, outcomes, and results. Invoke for systematic review data collection from papers.
install
source · Clone the upstream repo
git clone https://github.com/majiayu000/claude-skill-registry
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/data-extraction" ~/.claude/skills/majiayu000-claude-skill-registry-data-extraction && rm -rf "$T"
manifest:
skills/data/data-extraction/SKILL.mdsource content
Data Extraction Skill
This skill guides structured data extraction from research papers for systematic reviews.
When to Use
Invoke this skill when the user:
- Asks to extract data from a PDF
- Needs study characteristics pulled
- Wants patient demographics collected
- Requests outcome data extraction
- Mentions "data extraction" or "data collection"
Data Elements to Extract
1. Study Identification
| Field | Description | Example |
|---|---|---|
| study_id | FirstAuthorYear format | "Smith2023" |
| pmid | PubMed ID | "37654321" |
| doi | Digital Object Identifier | "10.1001/jamasurg.2023.1234" |
| title | Full article title | "..." |
2. Study Characteristics
| Field | Description | Values |
|---|---|---|
| year | Publication year | 2020 |
| country | Study location | "USA", "Japan" |
| study_design | Design type | "RCT", "Retrospective cohort" |
| multicenter | Single/multi | true/false |
| study_period | Enrollment dates | "2015-2020" |
3. Patient Demographics
| Field | Format | Notes |
|---|---|---|
| sample_size | Integer | Total N |
| age_mean | Number | Mean age |
| age_sd | Number | Standard deviation |
| age_median | Number | If no mean |
| age_iqr | [Q1, Q3] | Interquartile range |
| male_percent | 0-100 | Percentage male |
4. Clinical Characteristics (Neurosurgery)
Common scales and measures:
- GCS (Glasgow Coma Scale): 3-15
- GOS (Glasgow Outcome Scale): 1-5
- mRS (modified Rankin Scale): 0-6
- NIHSS (NIH Stroke Scale): 0-42
- Hunt-Hess: I-V
- Fisher Grade: 1-4
- WHO Grade: I-IV (tumors)
5. Intervention Details
intervention: name: "Decompressive craniectomy" type: "Surgical" technique: "Unilateral frontotemporoparietal" timing: "Within 48 hours" details: "Bone flap ≥12cm diameter"
6. Outcome Data
Binary Outcomes (events/total)
outcomes: - name: "Mortality" type: "binary" timepoint: "30 days" intervention: events: 12 total: 50 control: events: 25 total: 52
Continuous Outcomes (mean ± SD)
outcomes: - name: "Length of stay" type: "continuous" timepoint: "discharge" intervention: mean: 14.5 sd: 6.2 n: 50 control: mean: 18.3 sd: 7.1 n: 52
Effect Estimates
effect_estimate: measure: "OR" # OR, RR, HR, MD, SMD value: 0.65 ci_lower: 0.42 ci_upper: 0.98 p_value: 0.038
Extraction Principles
DO:
- Extract only explicitly stated data
- Record the exact numbers from the paper
- Note units (mg, mm, days, months)
- Specify timepoints for each outcome
- Flag unclear or ambiguous values with "?"
- Document page numbers for key data
DON'T:
- Calculate or derive values (unless necessary)
- Assume missing data
- Interpret unclear statements
- Mix timepoints within outcomes
Quality Checks
After extraction, verify:
- Sample sizes sum correctly across groups
- Event counts ≤ total participants
- Percentages add to ~100%
- CIs contain the point estimate
- P-values align with CI (crossing 1 for OR/RR)
Common Issues
Converting Median/IQR to Mean/SD
When only median and IQR reported:
Mean ≈ Median (for symmetric distributions) SD ≈ IQR / 1.35 (for normal distributions)
Extracting from Figures
- Use WebPlotDigitizer for graph data
- Note "extracted from figure" in comments
- Estimate uncertainty
Missing Control Group (Single-Arm)
For case series without controls:
outcomes: - name: "Mortality" type: "binary" timepoint: "in-hospital" single_arm: events: 15 total: 100
Output Format
Use YAML format for structured extraction:
study_id: "Smith2023" pmid: "37654321" doi: "10.1001/jamasurg.2023.1234" year: 2023 country: "USA" study_design: "Retrospective cohort" sample_size: 150 patient_demographics: age_mean: 58.3 age_sd: 12.4 male_percent: 62 intervention: name: "Decompressive craniectomy" type: "Surgical" outcomes: - name: "Mortality" type: "binary" timepoint: "30 days" intervention: events: 12 total: 75 control: events: 18 total: 75 notes: "Single-center study. High crossover rate (15%)."
Validation
After extraction, use the
validate_extraction tool to check against schema:
mcp__neuroresearch__validate_extraction(data, schema_type="study")