Medsci-skills design-study

install

source · Clone the upstream repo

git clone https://github.com/Aperivue/medsci-skills

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/Aperivue/medsci-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/design-study" ~/.claude/skills/aperivue-medsci-skills-design-study && rm -rf "$T"

manifest: skills/design-study/SKILL.md

source content

Design-Study Skill

Purpose

This skill pressure-tests whether a study is answerable, interpretable, and defensible before large amounts of drafting or analysis work accumulate.

Use it when:

a study question is known but the analysis plan is still fluid
the user wants a methods sanity check
a manuscript feels vulnerable to reviewer criticism
a peer review requires explicit methodological diagnosis

Communication Rules

Communicate with the user in their preferred language.
Use English for statistical, radiologic, and reporting-guideline terminology.
Be direct about validity risks, but always propose the smallest feasible fix first.

Core Review Questions

Always inspect these dimensions:

What is the exact research question?
What is the analysis unit: patient, lesion, exam, study, phase, report?
What is the index date or decision point?
How are inclusion and exclusion criteria applied?
Is there any information leakage?
What is the reference standard or endpoint definition?
What comparator is clinically meaningful?
What validation strategy is used?
What uncertainty reporting is required?
Which reporting guideline best fits?
Are exposure/outcome/covariate definitions literature-grounded, or invented ad-hoc from the data dictionary? If ad-hoc, defer to
```
/define-variables
```
before drafting Methods.

Standard Output

## Study Design Review
Question: ...
Study type: ...
Analysis unit: ...
Index date / prediction timepoint: ...

### Strengths
- ...

### Major validity risks
1. ...
2. ...

### Minimal fixes
- ...

### Reporting fit
- Recommended guideline: ...

### Decision
- Ready for analysis / Needs redesign / Drafting can proceed with limitations

Workflow

Phase 1: Reconstruct the study

Extract from protocol, draft, slides, tables, or notes:

clinical problem
intended use case
population
inputs
outputs
outcome definition
timing of variable availability

Gate: Present the reconstructed study summary (question, analysis unit, intended use) to the user. Confirm before proceeding — if the reconstruction is wrong, the entire validity review will be misdirected.

Phase 2: Check structural validity

A. Analysis unit

Look for mismatches such as:

patient-level claim from lesion-level analysis
exam-level split with patient overlap
phase-level samples treated as independent

B. Leakage

Look for:

postoperative features used for preoperative prediction
normalization or thresholding performed before data split
repeated exams across train/test
reader annotations derived from outcome information

C. Reference standard

Check:

who established ground truth
when it was established
whether blinding was possible
whether only a subset had gold standard verification

D. Validation

Classify:

apparent only
internal split
cross-validation
temporal validation
external validation
multi-center external validation

Phase 3: Clinical framing

Ask whether the comparator and endpoint support the stated claim:

is the model better than current practice or just another model?
is the endpoint clinically meaningful?
does performance translate to action?

Phase 4: Reporting fit

Recommend one primary guideline:

```
TRIPOD-AI
```
```
CLAIM
```
```
STARD
```
```
STROBE
```
```
PRISMA
```
```
CARE
```
```
ARRIVE
```
journal-specific additions if needed

Frequent Failure Modes

Diagnostic AI

no clinically relevant comparator
exam-level split instead of patient-level split
unclear reference standard
AUROC-only reporting without threshold metrics

Prognostic modeling

unclear time zero
immortal time bias
feature timing mismatch
no calibration

Retrospective cohort / screening database

time zero misalignment: cohort entry ≠ follow-up start → immortal time bias
interval-censored outcomes treated as exact → underestimation of event times
healthy volunteer bias unacknowledged → inflated external validity claims
surveillance bias from unequal follow-up frequency between groups
3 bias classification (Hernan/Robins): selection bias (who enters), information bias (how measured), confounding (what else differs) — explicitly map each threat

Multimodal LLM / report generation

no clear rubric for clinical correctness
benchmark labels derived from noisy reports without adjudication
unsupported claims about safety or workflow benefit

Imaging meta-analysis

overlapping cohorts
paired modalities analyzed as independent
heterogeneity metrics missing
zero-cell handling unspecified

Minimal-Fix Principle

Whenever possible, recommend the smallest feasible repair first:

clarify the claim
narrow the target population
add a limitation statement
add a clinically relevant baseline
re-run one key sensitivity analysis
redefine the endpoint more explicitly

Escalate to redesign only when the central claim is not defensible otherwise.

Handoff Rules

route to
```
analyze-stats
```
when the design is basically sound but analysis details need refinement
route to
```
check-reporting
```
after the design is locked
route to
```
self-review
```
when the user wants a pre-submission quality check on their own manuscript
route back to
```
write-paper
```
only after the main validity risks are documented

What This Skill Does NOT Do

It does not compute statistics directly
It does not draft full manuscript prose
It does not resolve raw data engineering issues
It does not replace a full peer review when journal-facing tone is required

Anti-Hallucination

Never fabricate references. All citations must be verified via
```
/search-lit
```
with confirmed DOI or PMID. Mark unverified references as
```
[UNVERIFIED - NEEDS MANUAL CHECK]
```
.
Never invent clinical definitions, diagnostic criteria, or guideline recommendations. If uncertain, flag with
```
[VERIFY]
```
and ask the user.
Never fabricate numerical results — compliance percentages, scores, effect sizes, or sample sizes must come from actual data or analysis output.
If a reporting guideline item, journal policy, or clinical standard is uncertain, state the uncertainty rather than guessing.