Claude-skill-registry implement-paper-from-scratch
Guides you through implementing a research paper step-by-step from scratch. Use when asked to implement a paper, code up a paper, reproduce research results, or build a model from a paper. Focuses on building understanding through implementation with checkpoint questions.
git clone https://github.com/majiayu000/claude-skill-registry
T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/implement-paper-from-scratch" ~/.claude/skills/majiayu000-claude-skill-registry-implement-paper-from-scratch && rm -rf "$T"
skills/data/implement-paper-from-scratch/SKILL.mdImplement Paper From Scratch
The best way to truly understand a paper is to implement it. This skill guides you through that process methodically.
Philosophy
- No copy-pasting from reference implementations - We build understanding, not just working code
- Checkpoint questions verify understanding - You should be able to answer "why" at each step
- Minimal dependencies - Use NumPy/PyTorch fundamentals, not high-level wrappers
- Deliberate debugging - Bugs are learning opportunities, not obstacles
Process
Phase 1: Pre-Implementation Analysis
Before writing any code:
-
Identify the core algorithm - Strip away ablations, extensions, bells and whistles. What's the minimal version?
-
List the components - Break into modules:
- Data pipeline
- Model architecture
- Loss function(s)
- Training loop
- Evaluation metrics
-
Find the tricky parts - What's non-obvious?
- Custom layers or operations
- Numerical stability concerns
- Hyperparameter sensitivity
- Implementation details buried in appendices
-
Gather reference numbers - What should we expect?
- Training loss trajectory
- Validation metrics at convergence
- Compute requirements (if stated)
Phase 2: Scaffolded Implementation
Build up the implementation in this order:
Step 1: Data
# Start with synthetic/toy data # Verify shapes and types before touching real data
Checkpoint: Can you describe what each tensor represents and its expected shape?
Step 2: Model Architecture
# Build layer by layer # Print shapes at each stage # Verify parameter counts match paper
Checkpoint: If you randomly initialize and do a forward pass, do the output shapes match what the paper describes?
Step 3: Loss Function
# Implement exactly as described # Test with known inputs/outputs # Check gradient flow
Checkpoint: Can you explain each term in the loss and why it's there?
Step 4: Training Loop
# Minimal loop first (no logging, checkpointing, etc.) # Verify loss decreases on tiny overfit test # Then add bells and whistles
Checkpoint: Can you overfit a single batch? If not, something is broken.
Step 5: Evaluation
# Implement paper's exact metrics # Compare against reported numbers
Checkpoint: On the same data split, how close are you to paper's numbers?
Phase 3: The Debugging Gauntlet
When it doesn't work (and it won't at first):
-
The Overfit Test
- Can you memorize 1 example? 10? 100?
- If not, architecture or gradient bug
-
The Gradient Check
- Are gradients flowing to all parameters?
- Any NaN or exploding gradients?
-
The Initialization Check
- Match paper's initialization exactly
- This matters more than people think
-
The Learning Rate Sweep
- Log scale: 1e-5 to 1e-1
- Loss should decrease for some range
-
The Ablation Debug
- Remove components until it works
- Add back one at a time
Phase 4: Checkpoint Questions
At each stage, you should be able to answer:
Understanding:
- Why does this component exist?
- What would happen without it?
- What alternatives were considered?
Implementation:
- Why this specific implementation choice?
- Where could numerical issues arise?
- What's the computational complexity?
Debugging:
- What would it look like if this was broken?
- How would you test this in isolation?
- What are the most likely bugs?
Output Format
For each implementation session, provide:
## Today's Implementation Goal [Specific component we're building] ## Prerequisites Check - [ ] Previous components working - [ ] Understand what we're building - [ ] Know expected behavior ## Implementation ### Code [Code blocks with extensive comments] ### Checkpoint Questions 1. [Question] <details><summary>Answer</summary>[Answer]</details> 2. [Question] <details><summary>Answer</summary>[Answer]</details> ### Verification Steps - [ ] Test 1: [What to check] - [ ] Test 2: [What to check] ### Common Bugs at This Stage 1. [Bug pattern]: [How to identify and fix] ## What's Next [Preview of next component and how it connects]
Tips for Specific Paper Types
Transformer-based
- Attention mask shapes are the #1 bug source
- Verify positional encoding is applied correctly
- Check layer norm placement (pre vs post)
RL/Policy Gradient
- Sign errors in policy gradient are silent killers
- Advantage normalization matters
- Verify discount factor handling
Generative Models
- KL term balancing is finicky
- Check latent space distribution
- Verify reconstruction looks reasonable before training
Computer Vision
- Normalization (ImageNet stats, batch norm) is crucial
- Data augmentation can make or break results
- Verify input preprocessing matches paper exactly
Success Criteria
You're done when:
- Numbers match - Within reasonable variance of paper's results
- Understanding is deep - You can explain every line of code
- You found the gotchas - You know what breaks and why
- You could modify it - Confident to try your own variations
Anti-Patterns to Avoid
- ❌ Copying code you don't understand
- ❌ Skipping checkpoint questions
- ❌ Using pre-built components for core algorithm
- ❌ Ignoring discrepancies with paper
- ❌ Moving on before current step works