Medsci-skills define-variables

install

source · Clone the upstream repo

git clone https://github.com/Aperivue/medsci-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Aperivue/medsci-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/define-variables" ~/.claude/skills/aperivue-medsci-skills-define-variables && rm -rf "$T"

manifest: skills/define-variables/SKILL.md

source content

Define-Variables Skill

Purpose

Every observational study operationalizes abstract constructs (MASLD, CKD, emphysema, obesity, incidentaloma) into concrete rules against the available data dictionary. When that operationalization is invented ad-hoc from the dictionary alone, reviewers reject on construct validity regardless of downstream statistics.

This skill forces a literature-first pass: each variable is mapped to a canonical guideline/consensus definition, cross-checked against prior operationalizations in comparable cohorts, then mapped to available DB variables. Ad-hoc deviations are flagged explicitly and justified, not hidden.

Use it when:

a study question is known and variables are being selected
inclusion/exclusion criteria or phenotype definitions need citation backing
a data dictionary has ambiguous or derived variables (eGFR formula, BMI class, liver steatosis criteria, etc.)
a reviewer asked "why this cutoff?"
a retrospective audit reveals drifted definitions across projects in the same cohort

Call after

/design-study

, before

/write-protocol

Communication Rules

Communicate in the user's preferred language.
All variable names, guideline names, cutoffs in English.
Produce one artifact:
```
variable_operationalization.md
```
in the project root (or path the user specifies).

Inputs

Research question (one sentence)
Candidate variables — exposure, outcome, key covariates, eligibility filters
Data dictionary path (xlsx / csv / markdown) OR explicit list of available DB columns
Cohort type (e.g., health-screening, NHANES-like, claims, registry) — informs which prior-art cohort to compare against

Missing inputs → ask once, then proceed.

3-Tier Pipeline (token-efficient + scientifically rigorous)

Tier 1 — Canonical index lookup (no API calls)

Check

references/common_definitions.md

(shipped with skill) for the variable. Covers high-frequency constructs:

Liver: MASLD (AASLD 2023), MetALD (AASLD 2023), MAFLD (2020), NAFLD (legacy), ALD, viral hepatitis (AASLD 2022/2024 HBV, AASLD-IDSA HCV)
Metabolic: T2DM (ADA 2024), prediabetes (ADA 2024), metabolic syndrome (IDF 2009 / NCEP ATP III / K-NCEP), obesity/BMI (WHO Asian 2004 + WHO global), HTN (ACC/AHA 2017 + JNC-8), dyslipidemia (NCEP ATP III, 2023 AHA/ACC)
Renal: CKD (KDIGO 2024), eGFR formulas (CKD-EPI 2021 race-free, MDRD legacy), incidental renal mass (ACR 2018 white paper, Bosniak 2019)
Pulmonary: COPD (GOLD 2024), emphysema imaging (Fleischner 2015)
CV: CAC scoring (Agatston 1990, MESA percentiles), CAD risk (2018 ACC/AHA cholesterol, PREVENT 2023)
Cancer: gastric cancer H. pylori (Maastricht VI 2022), thyroid nodule (ACR TI-RADS 2017), gallbladder polyp (European 2022 joint guideline)
Imaging incidentalomas: adrenal (ACR 2023), pancreas (ACR 2017), renal (ACR 2018), thyroid (ACR 2017)

If the variable hits Tier 1, record: guideline, year, canonical cutoff, BibTeX key. Done — no

/search-lit

call.

Tier 2 — Targeted

/search-lit

(focused queries only)

For variables NOT in Tier 1, OR when subgroup justification is needed (Asian-specific cutoff, pediatric, young-adult, pregnancy, etc.), call

/search-lit

with one query per variable — not a general sweep. Query pattern:

"{construct} definition {cohort type} {subgroup qualifier}"
e.g., "obstructive sleep apnea prevalence Korean health screening cohort"

Cap: 5 queries per session. Stop early if first 1-2 papers converge on the same definition.

Tier 3 — Verification

Before finalizing, run

/verify-refs

on the accumulated BibTeX to confirm every citation exists in PubMed/CrossRef. Ad-hoc choices (no canonical source found) must be flagged

Ad-hoc: yes

and justified with 1-2 sentences — never hidden.

Output Template

Write to

{project_root}/variable_operationalization.md

using

templates/variable_operationalization.md

. Required structure:

Header: research question, cohort type, date, author
Operationalization table — one row per variable:

| Variable | Role | Canonical source | Definition | Cutoff | DB vars | Implementation | Ad-hoc? |
- ```
Role
```
  : exposure / outcome / covariate / eligibility
- ```
Canonical source
```
  : BibTeX key (e.g.,
```
@rinella2023_aasld_masld
```
  )
- ```
Definition
```
  : one line, verbatim from guideline where possible
- ```
Cutoff
```
  : numeric + units
- ```
DB vars
```
  : exact dictionary column names used
- ```
Implementation
```
  : SQL/pandas-style pseudocode (e.g.,
```
bmi>=25 & (b_tg>=150 | b_hdl<40)
```
  )
- ```
Ad-hoc?
```
  : yes/no. If yes, justification below table
Ad-hoc justifications — for each yes row
Mapping gaps — variables in the protocol with no DB equivalent; list proxy / omit / request decisions
References — BibTeX block

Non-Goals

Statistical analysis →
```
/analyze-stats
```
Manuscript drafting →
```
/write-paper
```
Data cleaning / missingness →
```
/clean-data
```
Sample size →
```
/calc-sample-size
```

Pipeline Position

intake-project → design-study → search-lit → define-variables → write-protocol → analyze-stats → write-paper
                                              ^^^^^^^^^^^^^^^

/orchestrate

should insert this skill between

/search-lit

and

/write-protocol

for any observational cohort or registry study.

Anti-Hallucination

Every variable definition, cutoff, and era anchor must be grounded in a verified source — a clinical guideline, a peer-reviewed paper with DOI, or an established registry data dictionary. Never invent a phenotype threshold from the model's prior; if the source is unknown, mark the row

Ad-hoc: yes

and require user confirmation before it propagates into

/write-protocol

/analyze-stats

. When citing papers to justify a cutoff, verify the citation via

/search-lit

/verify-refs

— do not carry references from memory alone. The output table must carry explicit

source

year

, and

guideline_version

columns so downstream skills can re-verify.

Failure Modes to Avoid

Dictionary-first framing — starting from what columns exist, then picking a definition that matches. Always flip: definition first, then map.
Cutoff drift — using a different cutoff than the cited guideline without justification (e.g., BMI≥23 cited as WHO Asian while text says ≥25).
Mixing eras — 2020 MAFLD criteria with 2023 MASLD criteria in the same analysis. Pick one and note why.
Silent ad-hoc — introducing a novel cutoff without the
```
Ad-hoc: yes
```
flag.
Sweep-style /search-lit — running a generic lit search instead of one focused query per gap variable. Wastes tokens and buries the signal.