Awesome-Agent-Skills-for-Empirical-Research paper-critique-framework

Structured framework for writing peer review reports and paper critiques

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/43-wentorai-research-plugins/skills/research/paper-review/paper-critique-framework" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-paper-critique-fr && rm -rf "$T"
manifest: skills/43-wentorai-research-plugins/skills/research/paper-review/paper-critique-framework/SKILL.md
source content

Paper Critique and Peer Review Framework

Overview

Writing constructive peer reviews is a core academic skill. This framework provides a systematic approach to evaluating manuscripts — from initial read-through to the final referee report. It covers what reviewers should assess, how to structure feedback, and how to calibrate between different review outcomes (accept, revise, reject). Applicable to conference papers, journal articles, and internal lab reviews.

The Three-Pass Review Method

Pass 1: Orientation (15-20 minutes)

Read only these elements:

  • Title, abstract, and keywords
  • Introduction (first and last paragraphs)
  • Section headings and figure captions
  • Conclusion

After Pass 1, answer:

□ What is the main claim?
□ What type of contribution? (empirical, theoretical, system, survey)
□ Is it within the venue's scope?
□ Does the abstract accurately represent the content?
□ Initial impression: novel or incremental?

Pass 2: Detailed Read (60-90 minutes)

Read the full paper. Annotate as you go:

Annotation symbols:
  ? = I don't understand this
  ! = This is interesting / strong point
  X = I disagree / see a problem
  → = This needs more evidence or justification
  ≈ = This is similar to [existing work] — check novelty

Focus on:

  • Claims vs. evidence: Is every major claim supported by data?
  • Methodology: Are the methods appropriate for the research question?
  • Experimental design: Are baselines fair? Are ablations sufficient?
  • Figures and tables: Do they support the narrative? Are they readable?
  • Writing quality: Is it clear, concise, and well-organized?

Pass 3: Verification (30-60 minutes)

For papers you're seriously evaluating:

  • Check key references — do they say what the authors claim?
  • Verify mathematical derivations (spot-check, not exhaustive)
  • Examine statistical claims (p-values, confidence intervals, effect sizes)
  • Check for cherry-picking in results (only best runs? selected metrics?)
  • Look for missing baselines that should have been compared

Review Report Structure

## Summary (3-5 sentences)
[Describe what the paper does, the approach, and the main finding.
 Demonstrate that you understood the paper.]

## Strengths (bulleted list)
- S1: [Specific strength with evidence from the paper]
- S2: [Another strength]
- S3: [Another strength]

## Weaknesses (bulleted list, ordered by severity)
- W1 (Major): [Specific weakness + why it matters + suggestion to fix]
- W2 (Major): [Another major weakness]
- W3 (Minor): [A less critical issue]
- W4 (Minor): [Another minor issue]

## Questions for Authors
- Q1: [Something you'd like clarified]
- Q2: [A concern that the authors might be able to address]

## Detailed Comments
[Page/line-specific comments, typos, suggestions]

## Overall Assessment
Recommendation: [Strong Accept / Accept / Weak Accept / Borderline /
                  Weak Reject / Reject / Strong Reject]
Confidence: [High / Medium / Low]

Assessment Criteria by Dimension

DimensionQuestions to AskWeight
NoveltyIs the idea new? Is the contribution beyond incremental?High
SignificanceWould this matter to the community? Does it advance the field?High
SoundnessAre the methods correct? Are conclusions supported?High
ClarityIs it well-written? Can it be understood and reproduced?Medium
CompletenessAre related works covered? Are experiments thorough?Medium
ReproducibilityCould someone replicate this? Code/data available?Medium

Calibration Guide

Strong Accept: Significant contribution, technically sound, well-written.
  Would be a highlight of the venue.

Accept: Solid contribution with minor issues. Advances the field.
  Worth publishing as-is or with minor revisions.

Weak Accept: Has merit but notable weaknesses. Contribution is real but modest.
  Borderline for this venue; would be accepted at a less selective venue.

Borderline: Equal arguments for and against. Significant weaknesses offset
  by some novelty. Depends on other reviews.

Weak Reject: Interesting direction but fundamental issues not addressed.
  Major revisions needed that likely require a new submission cycle.

Reject: Significant problems in novelty, soundness, or relevance.
  Not suitable for this venue even with revisions.

Strong Reject: Fundamental flaws. Clearly below threshold.

Common Review Pitfalls to Avoid

PitfallBetter Approach
"The writing needs improvement" (vague)Give 2-3 specific examples with suggested fixes
Rejecting for not solving YOUR problemEvaluate the paper on its own stated goals
Demanding impossible experimentsSuggest feasible improvements within scope
Ignoring supplementary materialCheck appendix — authors may have addressed your concern
Being harsh without being constructiveEvery weakness should include a suggestion for improvement
Reviewing too quicklyBlock dedicated time; a rushed review harms both authors and science
Citing only your own work as "missing"Only cite if genuinely relevant, not self-promotion

Reviewing Different Paper Types

Empirical Papers

  • Are datasets described completely? (Size, source, splits, preprocessing)
  • Are baselines appropriate and fairly tuned?
  • Statistical significance: error bars, multiple runs, significance tests
  • Ablation studies: which components contribute to the gain?

Systems Papers

  • Is the system actually built and evaluated (not just designed)?
  • Performance at scale: latency, throughput, resource usage
  • Comparison with existing systems (not just algorithms)
  • Deployment experience or case studies

Survey/Position Papers

  • Coverage: are major relevant works included?
  • Taxonomy: is the organizational framework useful and complete?
  • Analysis: does the survey add value beyond listing papers?
  • Recency: are the most recent advances covered?

Ethics Considerations

During review, check:
□ Are human subjects involved? Was IRB/ethics approval obtained?
□ Are there potential harms from the technology described?
□ Is the data collection ethical? (Consent, privacy, bias)
□ Are dual-use concerns addressed? (Misuse potential)
□ Are limitations and societal implications discussed?

References

  • Smith, R. (2006). "Peer review: a flawed process at the heart of science and journals." Journal of the Royal Society of Medicine, 99(4), 178-182.
  • NeurIPS Reviewer Guidelines
  • ACL Reviewer Tutorial
  • Cormode, G. (2009). "How NOT to review a paper." ACM SIGMOD Record, 37(4), 100-104.