Medsci-skills self-review
Pre-submission self-review for the user's own manuscripts, applying a reviewer perspective. Systematic check across 10 categories with research-type branching. Outputs Anticipated Major/Minor Comments with severity framing and optional R0 numbering for /revise pipeline integration.
git clone https://github.com/Aperivue/medsci-skills
T=$(mktemp -d) && git clone --depth=1 https://github.com/Aperivue/medsci-skills "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/self-review" ~/.claude/skills/aperivue-medsci-skills-self-review && rm -rf "$T"
skills/self-review/SKILL.mdSelf-Review Skill
You are helping a medical researcher check their own manuscript before journal submission. The goal is to anticipate reviewer comments by applying the same critical lens used in peer review across medical journals.
This is NOT about writing a review. It's about producing an actionable list of anticipated reviewer comments with specific fix suggestions, so the manuscript can be strengthened before reviewers ever see it.
Optional Flags
: After generating the review report, automatically apply fixes for all issues where--fix
is true. Edits the manuscript in place, then reports a diff summary. Does NOT fix issues markedfixable_by_ai
(e.g., missing data, design flaws). Maximum 2 fix-and-re-review iterations.fixable_by_ai: false
: Output the structured JSON block (see Phase 3c below) in addition to the markdown report. Default when called from--json
Phase 7./write-paper
Severity Framing
When flagging issues, classify severity:
- Fatal: Fundamental design flaw that cannot be fixed with existing data (e.g., data leakage that invalidates all results, absence of any reference standard, label-feature circularity). The manuscript likely needs redesign. Submission would likely result in Reject.
- Fixable: Significant but addressable with existing data (e.g., missing calibration analysis, unclear exclusion criteria, absent CIs, incomplete reporting). These are the most actionable findings.
Most issues are Fixable. Reserve Fatal for true design-level problems.
Workflow
Phase 1: Intake
- Get the manuscript -- PDF, Word doc, or pasted text.
- Ask the user:
- Target journal? (affects reporting standards and scope expectations)
- Manuscript type? (original research / review / technical note / letter / meta-analysis / case report)
- Anything they're already worried about?
- Read the full manuscript.
Phase 2: Systematic Check
Run the manuscript through each applicable category below. For each item, assess whether a reviewer would raise it as a Major or Minor comment.
Use the Research-Type Adaptation table (below) to determine which categories apply fully, partially, or not at all for the given manuscript type.
A. Study Design & Data Integrity
| Check | What to look for |
|---|---|
| Patient-level splitting | Are train/val/test splits at the patient level? Is this explicitly stated? |
| Leakage risk | Any postoperative variable used in a preoperative model? Cohort-wide preprocessing before split? |
| Temporal independence | Random split within same institution = no temporal independence. Acknowledged? |
| Analysis unit clarity | Patient vs exam vs lesion vs image -- is the unit consistent throughout? |
| Sample size per class | For the test set specifically -- are there enough cases per class for stable metrics? |
B. Reference Standard & Ground Truth
| Check | What to look for |
|---|---|
| Definition specificity | Is the reference standard precisely defined? (e.g., "pathological T stage" vs vague "staging") |
| Timing | Interval between index test and reference standard reported? |
| Independence | Were ground truth annotators independent from the comparator readers? |
| Annotation protocol | Number of readers, consensus method, blinding, inter-reader agreement reported? |
C. Validation & Statistical Reporting
| Check | What to look for |
|---|---|
| Confidence intervals | All primary metrics have 95% CIs? |
| Calibration [CRITICAL] | Prediction models: calibration plot + Brier score or slope/intercept MUST be present. AUC alone is insufficient -- mark as Major if absent |
| Clinical comparator | Is there a clinical-only baseline to show incremental value? |
| DCA / net benefit | For clinical decision tools: decision curve analysis present? |
| Multiple comparisons | If many tests: acknowledged as exploratory, or correction applied? |
| Paired statistics | If same patients compared across modalities: paired tests used (McNemar, DeLong)? |
D. Clinical Framing & Importance
| Check | What to look for |
|---|---|
| Intended use | Is the clinical decision point clearly stated? (triage vs diagnosis vs prognosis vs monitoring) |
| Overclaiming | Does language match evidence? ("will improve" -> "may potentially"; "superior" with overlapping CIs?) |
| Terminology precision | Key terms defined? (e.g., "perioperative" = when exactly?) |
| Title-content alignment | Does the title accurately reflect what was actually done? |
| Novelty statement | What does this study add beyond existing literature? Is this explicitly stated? |
| Clinical importance | Would the findings change clinical practice or research direction? Is this articulated? |
E. Reproducibility
| Check | What to look for |
|---|---|
| Preprocessing details | All steps listed in order? Normalization, augmentation, resampling specified? |
| Model details | Architecture, optimizer, LR, batch size, epochs, early stopping reported? |
| Segmentation protocol | ROI definition, reader experience, blinding, tool used? |
| Hardware/software | Inference environment, software versions, code availability? |
| Scanner/protocol info | For imaging studies: scanner model, sequence parameters, contrast protocol? |
| Data/code availability | Is a data availability statement included? Code shared or reason for not sharing stated? |
F. Reporting Completeness
| Check | What to look for |
|---|---|
| Abstract-body consistency | Numbers in Abstract match Tables/Results? |
| Table/Figure accuracy | Cross-check key values between tables, figures, and text |
| Follow-up duration | For survival/prognosis: median follow-up with IQR reported? |
| Ethics | All participating institutions' IRB approval documented? Patient consent described? |
| Missing data | Handling of incomplete cases described? |
| CONSORT/STARD/TRIPOD flow | Appropriate flow diagram present with patient counts at each step? |
| Funding & COI | Funding sources and competing interests disclosed? |
G. Reporting Guideline Compliance
Match the manuscript type to the appropriate checklist and verify key items:
| Manuscript type | Checklist | Critical items to verify |
|---|---|---|
| Diagnostic accuracy | STARD / STARD-AI | Flow diagram, reference standard, spectrum |
| Prediction model (non-AI) | TRIPOD 2015 | Model development vs validation, calibration, missing data |
| Prediction model (AI/ML) | TRIPOD+AI 2024 | Model development vs validation, calibration, leakage, fairness |
| AI / Radiomics | CLAIM 2024 / CLEAR | Feature selection transparency, external validation |
| RCT | CONSORT / CONSORT-AI | Randomization, blinding, ITT |
| Systematic review (interventions) | PRISMA 2020 | Search strategy, screening, risk of bias |
| Meta-analysis (observational) | MOOSE + PRISMA 2020 | Confounding assessment, heterogeneity, publication bias |
| Observational | STROBE | Confounding, selection bias, missing data |
| Reliability / agreement | GRRAS | ICC model/type, rater description, measurement protocol |
| Educational | SQUIRE 2.0 | Intervention description, outcome measures, context |
| Case report | CARE | Timeline, diagnostic reasoning, informed consent |
| Surgical | STROBE-Surgery | Surgeon experience, technique details, complications |
For a full item-by-item audit, run
/check-reporting on this manuscript. If it has already
been run, reference its results and flag any MISSING items as Anticipated Major/Minor Comments.
If not yet run, flag: "Full reporting guideline compliance not yet audited -- run /check-reporting
before submission for item-level assessment."
H. Circularity
| Check | What to look for |
|---|---|
| Label-feature overlap | Is the prediction label derived from the same data source as any input features? (e.g., NLP-extracted label + text-derived features from same reports) |
| Tautological prediction | Does the model predict something that is already encoded in its inputs? |
| Circular validation | Is the validation set constructed using information from the training process? |
I. Protocol Heterogeneity
| Check | What to look for |
|---|---|
| Multi-site acquisition | If multi-site: are scanner models, protocols, and acquisition parameters reported per site? |
| Harmonization | For imaging or lab features: was harmonization applied (ComBat, z-scoring)? If not, acknowledged? |
| Temporal protocol drift | For longitudinal data: did acquisition protocols change over the study period? |
J. Method Transparency
| Check | What to look for |
|---|---|
| Model provenance | Is it clear where the model came from? (in-house vs vendor-provided vs open-source) |
| Training vs fine-tuning | If pre-trained: was the model fine-tuned on study data? If vendor-provided: any access to training data composition? |
| Proprietary limitations | For commercial AI or tools: are known limitations acknowledged? Can results be independently reproduced? |
Research-Type Adaptation
Not all categories apply equally to every study type. Use this routing table:
| Category | AI/ML | Observational | Educational | Meta-Analysis | Case Report | Surgical |
|---|---|---|---|---|---|---|
| A. Study Design | Full | Full | Partial | N/A | N/A | Full |
| B. Reference Standard | Full | Full | N/A | Per-study | Partial | Full |
| C. Validation & Stats | Full | Full | Full | Special* | Partial | Full |
| D. Clinical Framing | Full | Full | Full | Full | Full | Full |
| E. Reproducibility | Full | Partial | Partial | Partial | N/A | Full |
| F. Reporting | Full | Full | Full | Full | Full | Full |
| G. Guideline Compliance | Full | Full | Full | Full | Full | Full |
| H. Circularity | Full | Partial | N/A | N/A | N/A | Partial |
| I. Protocol Heterogeneity | Full | Full | N/A | Per-study | N/A | Full |
| J. Method Transparency | Full | Partial | Partial | N/A | N/A | Partial |
*Meta-analysis: Replace C with heterogeneity assessment (I-squared, prediction intervals), publication bias (funnel plot, Egger), and sensitivity/subgroup analyses.
Type-Specific Additional Checks:
- Observational studies: Confounding assessment (DAG or adjustment strategy), selection bias, exposure measurement validity
- Educational studies: Learning outcome measurement validity, Kirkpatrick level, control group adequacy, curriculum fidelity
- Meta-analyses: Search comprehensiveness (2+ databases), screening reproducibility (2 reviewers), RoB assessment per study, GRADE certainty
- Case reports: Diagnostic reasoning transparency, timeline completeness, informed consent, generalizability disclaimer
- Surgical studies: Learning curve consideration, surgeon volume/experience, complication grading (Clavien-Dindo), operative detail completeness
Phase 2.5: Numerical Cross-Verification (Internal)
Before generating the report, verify internal consistency:
- Abstract vs Body: Do all numbers in the Abstract match the Results section and Tables?
- Table vs Text: Cross-check key metrics (sample sizes, primary outcomes, p-values) between tables and narrative text.
- Figure vs Text: Do figure legends match the data described in Results?
- Percentage arithmetic: Verify that n/N percentages are calculated correctly (e.g., 23/150 = 15.3%, not 15.0%).
- CI plausibility: Do confidence intervals seem reasonable given sample sizes?
Flag any discrepancies as Anticipated Minor Comments (category: F. Reporting Completeness).
Phase 2.5a: Numerical Source-Fidelity Audit (External)
Internal consistency (Phase 2.5) is necessary but not sufficient. Numbers can be fully self- consistent across Abstract / Table / Text and still be wrong at the source — a single transcription error propagates cleanly through every downstream stage.
Precedent failure pattern:
A revision-era comparative meta-analysis reported a safety-outcome 2x2 with the arm-level events direction-reversed relative to the primary-source Table. Internal consistency passed because Abstract, Discussion, Table, and the R script all echoed the same wrong values. The reversal was caught only by an explicit second-pass audit that randomly sampled claims and traced each back to the primary paper.
When to run: MA revisions, submissions, or any review where the user mentions "check against the source," "verify extraction," or "random sample."
Inputs the reviewer should expect:
(or .docx converted to .md)manuscript.md
(or equivalent data-extraction spreadsheet)extraction_final.csv- A directory of primary-source PDFs (or equivalent accessible text)
Procedure:
-
Inventory numerical claims in Abstract, Results, and Discussion (patterns:
,\\d+/\\d+
,\\d+\\.\\d+%
,(95% CI:
,p\\s*=\\s*0\\.
,I\\^2
, etc.).n\\s*= -
Stratified random sample — draw 5 claims across: (a) pooled estimates, (b) subgroup / sensitivity results, (c) comparative-arm specific values, (d) study-level numbers (first-cited in narrative), (e) a claim introduced during revision if the draft is post-v1. Comparative-arm specific values and revision-introduced numbers are the two highest- yield strata — always include one of each.
-
For each sampled claim, traverse 3 layers:
- Layer 1 (Manuscript → CSV): Find the row / column in the extraction CSV.
- Layer 2 (CSV → Primary source): Locate the exact Table, Figure, or paragraph in the original paper. Record page number.
- Layer 3 (Analysis script → CSV): If the claim came from an analysis script, read the script and confirm its input value matches the CSV cell.
-
Record results in a table and append to the report:
Claim (manuscript location) CSV row/col Primary source (paper, Table/Fig, page) Script input Match? -
Any mismatch is a Major Comment (M-level), not Minor. Mismatches that reverse a direction or change a significance boundary are P0 blockers for submission.
Revision-specific rule: If the manuscript contains
[VERIFY-CSV] tags, treat each as a
mandatory audit item regardless of the sampling size. The tag exists precisely because that
number was introduced after the initial extraction pass and has not yet been independently
checked.
Hand-entered analysis-script inputs are a code smell. When Layer 3 reveals a
matrix(...),
c(1, 2, 3), or data.frame(...) line with numerical data and no CSV-coordinate comment,
escalate to a Major Comment even if the audited values happen to match — the next revision
will re-introduce the same risk.
Phase 2.5b: Screening-Count Reconciliation from ID Sets (SR/MA-only)
Internal consistency across Abstract/Methods/Results (Phase 2.5) + source fidelity of 2×2 and effect-size numbers (Phase 2.5a) do not cover study-count arithmetic. The latter is a separate failure mode: a prior-draft prose total ("30 → 32 after FLAG consensus") can survive every downstream pass because Abstract, Methods, Results, Discussion, Figure 1 caption, and even the supplementary consensus file all cite the same wrong number back to each other.
Precedent failure pattern (CBCT Biopsy MA-1, 2026-04-20):
v11 manuscript reported k_qualitative = 32, k_narrative-only = 10, k_FT-excluded = 46. Screening TSV (28 INCLUDE) ∩ consensus sheet (non-Exclude) + 2 FLAG additions yields k_qualitative = 24 with only 2 narrative-only studies (k_FT-excluded = 54). The 32/10/46 figures came from a v7-draft assumption that was never reconciled against the ID-level artifacts;
,screening_consensus_final.md,Supplementary_Material_5all propagated the same wrong total. Caught only by an explicit ID-set recount againstv8_edit_plan.md+fulltext_screening_final.tsv, independently verified by Codex adversarial audit.MA1_Consensus_Sheet.xlsx
When to run: any SR/MA manuscript revision, regardless of stage. Run before Phase 3.
Inputs:
- Screening TSV with one row per full-text-reviewed record and an include/exclude column
- Consensus spreadsheet (Excel/CSV) with one row per record requiring adjudication and a
column (typical values:Consensus
,Exclude
,Include-qualitative
)Include-bivariate - Any FLAG-adjudicated inclusion log documenting records added to the qualitative pool outside the primary screening TSV
- The manuscript's Table 1 (or equivalent): the definitive list of studies contributing to the primary quantitative synthesis
Procedure:
-
Enumerate the ID sets:
- A = set of IDs marked INCLUDE in the screening TSV
- B = set of IDs marked Exclude in the consensus spreadsheet
- C = set of IDs marked Include-qualitative in the consensus spreadsheet
- T = set of IDs represented in Table 1 (via author/year cross-match)
-
Derive canonical totals:
- k_qualitative = |A \ B| + |C|
- k_bivariate = |T|
- k_narrative-only = k_qualitative − k_bivariate = |(A ∪ C) \ B \ T|
- k_FT-excluded = |screening TSV rows| − |A| + |B ∩ A| + |(B \ A) encountered at FT stage|
-
List the narrative-only IDs explicitly — this is the highest-yield cross-check. A manuscript claiming "10 narrative-only studies" while the (A ∪ C) \ B \ T set contains only 2 IDs is an immediate P0 finding.
-
Compare each derived total against the manuscript's prose claim in Abstract, Methods §Study Selection, Results §Study Selection, Figure 1 caption, Discussion §Limitations, and any References §Narrative-Only heading. Any mismatch between derived total and manuscript prose = P0 Major Comment, blocking submission.
-
Record results in a short reconciliation block and append to the report:
| Quantity | Manuscript claim | ID-derived value | Status | |---|---|---|---| | k_full-text | 78 | 78 | ✓ | | k_qualitative | 32 | 24 | ✗ P0 | | k_bivariate | 22 | 22 | ✓ | | k_narrative-only | 10 | 2 (IDs 120, 474) | ✗ P0 | | k_FT-excluded | 46 | 54 | ✗ P0 |
Any "N → M" transition claim in a consensus summary (e.g., "30 → 32 after FLAG consensus") that is not backed by an enumerable ID addition/subtraction set is itself a Major Comment, because the transition is unverifiable by downstream audit. Require conversion of every such claim to explicit ID lists before closing the report.
Phase 2.5c: Reference Hallucination Scan
Numerical audits (2.5/2.5a/2.5b) cover in-text numbers; they do not cover reference-list integrity. LLM-drafted or co-author-handed-in bibliographies frequently contain fabricated DOIs, wrong author/year combinations for a real DOI, or plausible-looking references that never existed. These slip past human proofreading because the surface form looks canonical.
When to run: every manuscript at self-review, regardless of stage. Mandatory before submission and before any revision circulation to co-authors or the editor.
Procedure:
-
Locate the bibliography. From
→SSOT.yaml
(fallbacktruth.refs_bib
for legacy projects). Ifmanuscript/_src/refs.bib
is absent, scanSSOT.yaml
as a last resort.references/library.bib -
Invoke
on the resolved bib. The skill writes/verify-refs
with a per-entry verdict (qc/reference_audit.json
/VERIFIED
/FABRICATED
) and a top-levelUNVERIFIED
boolean.submission_safe# equivalent CLI form (same result as invoking the skill) python3 skills/verify-refs/scripts/verify_refs.py \ --bib "$(python3 -c "import yaml,sys; print(yaml.safe_load(open('SSOT.yaml'))['truth']['refs_bib'])")" \ --out qc/reference_audit.json --strict -
Read
. For each entry not markedqc/reference_audit.json
, add a row to the reconciliation block below.VERIFIED
entries are P0 Major Comments (block submission).FABRICATED
entries are Minor Comments unless the manuscript is at a circulation/submission gate, in which case they escalate to Major.UNVERIFIED -
Cross-check placeholder drift.
— any remaininggrep -n '\[@NEW:' manuscript/
placeholder at self-review stage is a P0: the citation was queued but never resolved. Include in the reconciliation block.[@NEW:topic] -
Record results in a short reconciliation block and append to the Phase 3 report:
| Citekey | Verdict | Source check | Status | |---|---|---|---| | Kim_2024_Validation | VERIFIED | DOI + PubMed match | ✓ | | Park_2023_Radiomics | FABRICATED | DOI resolves to unrelated paper | ✗ P0 | | Lee_2022_DeepLearning | UNVERIFIED | No DOI/PMID, title not found | △ Major before submission | | [@NEW:segmentation_review] | PLACEHOLDER | unresolved citation queue | ✗ P0 |
Short-circuit rule: if
qc/reference_audit.json already exists with a bib-hash match within 60s (P9 cache TTL, pending), the scan MAY reuse it; otherwise re-run. Never consume a stale audit from a prior manuscript revision.
Do NOT fabricate replacement references if any entry fails. Fix-forward belongs to
/search-lit and /lit-sync, not to this skill. Self-review only reports the failure and blocks submission.
Phase 3: Report
Generate a concise report with this structure:
# Self-Review Report: {manuscript title} **Target journal**: {journal} **Manuscript type**: {type} **Date**: {date} **Overall assessment**: {1-2 sentences: key vulnerability and overall readiness} ## Anticipated Major Comments (fix before submission) M1. **{Issue title}** [{Category letter}] {1-2 sentences: what a reviewer would likely say, with specific manuscript location} **Severity**: {Fatal | Fixable} **Suggested fix**: {specific, actionable fix using existing data} M2. ... ## Anticipated Minor Comments (address proactively) m1. **{Issue}** [{Category}]: {1 sentence with location + fix} m2. ... ## Strengths (emphasize in cover letter) - {Specific strength 1} - {Specific strength 2} - ...
Conciseness targets:
- Anticipated Major Comments: 3-7 items, each 3-5 lines
- Anticipated Minor Comments: 3-6 items, each 1-2 sentences
- Strengths: 3-5 items, each 1 sentence
- Total report: 400-800 words (excluding optional R0 section)
Phase 3b: R0 Numbering (Optional)
If the user plans to use
/revise after receiving actual reviews, offer to append
R0-numbered output for pipeline compatibility:
## R0 Pre-Submission Findings (for /revise cross-reference) R0-1 [MAJ] {mapped from M1}: {issue title} R0-2 [MAJ] {mapped from M2}: {issue title} R0-3 [MIN] {mapped from m1}: {issue title} ...
When actual reviewer comments arrive as R1-N, the user can cross-reference which issues were anticipated (R0) vs. novel (R1-only).
Phase 3c: Structured JSON Output
When
--json is passed, or when invoked by /write-paper Phase 7, append a machine-readable JSON block after the markdown report. Fence it with triple backticks and the json language tag so downstream parsers can extract it.
{ "self_review_version": "1.0", "manuscript_title": "...", "date": "YYYY-MM-DD", "overall_score": 72, "verdict": "REVISE", "fatal_count": 0, "major_count": 3, "minor_count": 4, "issues": [ { "id": "M1", "severity": "major", "category": "C", "category_name": "Validation & Stats", "location": "Methods, paragraph 5", "description": "Calibration plot and Brier score absent for prediction model", "fixable_by_ai": true, "suggested_fix": "Add calibration analysis paragraph after discrimination results. Generate calibration plot via /make-figures." }, { "id": "m1", "severity": "minor", "category": "F", "category_name": "Reporting Completeness", "location": "Abstract, line 3", "description": "Abstract reports AUC 0.91 but Table 2 shows 0.912 -- rounding inconsistency", "fixable_by_ai": true, "suggested_fix": "Change abstract to match table: AUC 0.91 (95% CI: 0.87-0.95)" } ] }
Field definitions:
: Integer 0-100 reflecting manuscript submission readinessoverall_score
:verdict
(score >= 85, no fatal issues) or"PASS""REVISE"
:severity
,"fatal"
, or"major""minor"
: Letter code from the 10-category system (A-J)category
:fixable_by_ai
if the issue can be resolved by editing manuscript text with existing data;true
if it requires new data, analyses, or human judgment (e.g., design changes, IRB decisions, missing experiments)false
: Specific, actionable instruction. Ifsuggested_fix
is true, this must be concrete enough for the fixer to execute without ambiguity.fixable_by_ai
Phase 4: Fix Support
Standard mode (no --fix flag)
After presenting the report, offer to help fix specific issues:
- Rewrite overclaiming sentences
- Draft missing limitation statements
- Suggest statistical additions (e.g., calibration analysis code via
)/analyze-stats - Draft intended use or novelty statements
- Check specific tables/figures for consistency
- Generate missing flow diagrams via
/make-figures
Auto-fix mode (--fix flag)
When
--fix is passed:
- Filter fixable issues: Select all issues where
is true.fixable_by_ai - Apply fixes sequentially: For each fixable issue, edit the manuscript file directly:
- Text rewrites (overclaiming, missing sentences, terminology) → Edit in place
- Missing reporting items (ethics statement, data availability) → Insert at suggested location
- Numerical inconsistencies (abstract-table mismatch) → Correct to match tables
- Do NOT attempt: new statistical analyses, new figures, design changes, IRB-dependent items
- Do NOT invoke other skills (
,/make-figures
) during fix — text edits only/analyze-stats
- Report changes: After all fixes, output a summary:
## Auto-Fix Summary - Fixed: {N} issues - Skipped (requires human): {M} issues - Changes: {list of id + one-line description of what was changed} - Re-review: Run Phase 2 (systematic check) again on the modified manuscript.
- Iterate: If new fixable issues emerge, apply one more round (maximum 2 total fix iterations).
- Final output: Regenerate the Phase 3 report and Phase 3c JSON with updated scores.
Iteration limit: Maximum 2 fix-and-re-review cycles. If the score has not reached "PASS" after 2 iterations, output the final report with remaining issues and flag: "Auto-fix limit reached. Remaining issues require human review."
What This Skill Does NOT Do
- Does not write the paper or rewrite entire sections
- Does not generate fake data or fabricate results
- Does not guarantee acceptance -- it reduces preventable reviewer criticism
- Does not replace formal peer review by an external reviewer
Tone
Be direct and practical. The user is the author -- they need honest feedback, not diplomatic hedging. Frame issues as what a reviewer would likely flag, helping the user see their paper through a reviewer's eyes.
For Fatal issues, be unambiguous: "A reviewer would likely flag this as a fundamental design concern. Submitting without addressing this risks Reject."
For Fixable issues, be constructive: "A reviewer would likely raise this as a Major Comment. Here is how to address it with your existing data."
Anti-Hallucination
- Never fabricate references. All citations must be verified via
with confirmed DOI or PMID. Mark unverified references as/search-lit
. Self-review enforces this through Phase 2.5c: Reference Hallucination Scan (runs[UNVERIFIED - NEEDS MANUAL CHECK]
against the SSOT bib); any/verify-refs
verdict blocks submission as a P0 Major Comment.FABRICATED - Never invent clinical definitions, diagnostic criteria, or guideline recommendations. If uncertain, flag with
and ask the user.[VERIFY] - Never fabricate numerical results — compliance percentages, scores, effect sizes, or sample sizes must come from actual data or analysis output.
- If a reporting guideline item, journal policy, or clinical standard is uncertain, state the uncertainty rather than guessing.