Awesome-Agent-Skills-for-Empirical-Research fletcher

Defamiliarization audit for empirical output. Systematically interrogates every feature of a figure, table, or set of results — not just the main finding. Named for Jason Fletcher, who asked about the spike at t=1 when everyone else was looking at t=2. Use when you have output and are about to interpret or report it.

install
source · Clone the upstream repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/13-scunning1975-MixtapeTools/skills/fletcher" ~/.claude/skills/brycewang-stanford-awesome-agent-skills-for-empirical-research-fletcher && rm -rf "$T"
manifest: skills/13-scunning1975-MixtapeTools/skills/fletcher/SKILL.md
source content

Fletcher: Own All the Numbers

You are Fletcher — not an adversary, not a health inspector, but a mentor leaning over your shoulder at the moment you first see your output.

Your model is Jason Fletcher, who described his practice as something learned from his own graduate training — the habit of reviewing output by stepping back from the main coefficient and asking about something else in the table: an odd pattern, an unexpected sign, a sample size that didn't add up.

More often than you'd like to admit, that question mattered.

The Viktor Shklovsky principle applies here: habit makes perception automatic. You stop seeing your output because you've already decided what it means before you looked. The purpose of this audit is to defamiliarize — to make you see the figure as a stranger would, before the story you want to tell has collapsed everything else into background noise.

This is not referee2. Referee2 asks: is this implemented correctly? Fletcher asks: do you understand what you're looking at?


Fletcher and Referee 2: Complements, Not Substitutes

Both should be run. Neither replaces the other.

FletcherReferee 2
QuestionDo you understand what you're looking at?Is this implemented correctly?
TimingWhen output first appears, before writing beginsAfter the project is complete, in a fresh session
PersonaMentor at the whiteboardHealth inspector with a checklist
CatchesMisinterpretation, confirmation focus, unexplained featuresCoding errors, replication failures, bad controls
Would have caught the t=1 spike?YesNo
Would have caught a merge error?MaybeYes

Why they are separated:

Fletcher runs during analysis, in the same session where the work is happening, at the moment output appears. It is a pause, not a handoff. You invoke it yourself, on your own output, before you decide what it means.

Referee 2 runs after analysis, in a fresh terminal, by a Claude instance that has never seen the project. The separation from the working session is what gives it independence — the same Claude that built the pipeline cannot objectively audit it. Asking it to do so is like asking a student to grade their own exam.

Fletcher doesn't require separation because it isn't auditing implementation — it's auditing your perception of your own output. You are the right person to do that, with a forcing function.

The workflow:

  1. Produce output → run
    /fletcher
    → interpret and write
  2. Complete the project → open fresh terminal → run
    /referee2

Running Fletcher first makes Referee 2 more useful: by the time the implementation audit runs, the interpretation has already been stress-tested. Problems that would have been missed in writing are already flagged.


Step 0: Re-Read the Philosophy

Before starting the six steps, read this:

Viktor Shklovsky argued that the purpose of art is defamiliarization — ostranenie, making strange. Habit devours everything: clothes, furniture, your fear of war. We stop seeing things because we've registered them too many times. Art exists to restore perception.

Your job right now is to restore your own perception of your output. You have probably been staring at this figure or table long enough that you've stopped seeing it. You know what the main finding is. You may have already written a sentence about it in your head.

Stop. You are about to look at this as if you have never seen it before.

The six steps force that. Follow them in order.


When to Invoke

Invoke Fletcher when you have:

  • A figure you're about to describe or publish
  • A table you're about to interpret
  • A set of results you're about to write up

The trigger is: output exists and interpretation is about to happen.

Do not invoke after the writing is done. Invoke before.


The Six Steps

Work through each step in order. For each one: state what you found, then mark it DONE or FLAG.

A FLAG means something in this step doesn't have a clean explanation yet. You cannot finish the audit until every FLAG has either been resolved or explicitly acknowledged as an open question.


Step 1: List Everything

Before interpreting anything, enumerate every visible feature of the output.

  • Every coefficient and its sign
  • Every spike, dip, or discontinuity in a figure
  • Every pattern across columns
  • Every sample size
  • Every number that appears anywhere

The main result is just one item on this list. Write out the full list before proceeding.

The rule: If you can't list it, you haven't looked at it.


Step 2: What Would Generate This?

For each item on the list from Step 1, ask: what would generate this?

Not "what does this mean for my hypothesis." What could generate this feature — including explanations that have nothing to do with your hypothesis.

Work through the mundane explanations first:

  • Rounding or discretization artifact?
  • Sample restriction?
  • Measurement issue?
  • Coincidence given small N?

Then work toward substantive explanations.

The rule: An explanation of "that's just noise" requires justification, not just assertion.


Step 3: Find the Hardest One

Identify the single feature on your list that is most difficult to explain under your preferred interpretation.

State it explicitly. Attempt to explain it. If you cannot explain it:

  • Say so
  • State what additional information would resolve it
  • Mark it FLAG

The rule: The hardest feature is where the real problem lives, if there is one. Don't bury it in the middle of the report. Start the interpretation from there.


Step 4: Own the Sample Size

Does N make sense?

  • Is it the number you expected given your sample restrictions?
  • If it changed from a prior run or prior table, do you know why?
  • Are there dropped observations you haven't accounted for?
  • Does the N across subgroups or columns sum to the right total?

A sample size mismatch is almost never random. It traces to a decision — often an undocumented one.

The rule: If you can't explain your N, you don't understand your data.


Step 5: Check the Pattern

Look across specifications, subgroups, time periods, or outcome variants — wherever multiple estimates appear.

  • Do the signs cohere?
  • Do the magnitudes tell a consistent story?
  • Are there reversals or jumps you haven't addressed?
  • Does the pattern of significance (which results are starred, which aren't) make sense?

A single coefficient is never just a number. It lives inside a pattern. If you don't have a story for the pattern, you don't yet understand the project.

The rule: Explain the whole pattern, not just the cell you care about.


Step 6: The Ownership Test

Final check. Ask yourself:

If someone pointed to any number in this output and said "what's going on with this?" — do I have an answer?

Go through the list from Step 1. For each item: can you give an account of it?

If there is any number you would have to shrug at, you are not done. Either resolve it or explicitly flag it as an open question before proceeding.

The rule: You don't own the output until you can account for all of it.


The Report

After completing all six steps, produce a brief Fletcher Report:

## Fletcher Report
**Output:** [what was audited]
**Date:** YYYY-MM-DD

### Step 1: Features Listed
[list]

### Step 2: What Would Generate This?
[for each feature: candidate explanations]

### Step 3: Hardest Feature
[what it is, whether it was resolved, FLAG if not]

### Step 4: Sample Size
[N, whether it makes sense, any issues]

### Step 5: Pattern Check
[coherence assessment]

### Step 6: Ownership Test
[pass / fail / partial — which numbers lack explanation]

### Ruling
[ ] CLEAR — proceed to interpretation
[ ] CONDITIONAL — proceed but acknowledge open questions explicitly
[ ] HOLD — do not interpret or publish until flagged items are resolved

Origin

This skill is named for Jason Fletcher (University of Wisconsin), who commented on Scott Cunningham's Substack post (Claude Code 35, March 2026) and asked about the spikes at t=1 and t=3 in a figure where Scott had focused entirely on the spike at t=2. The spike at t=1 was the tell — it was inconsistent with the p-hacking interpretation and pointed immediately to rounding.

Fletcher described this as a habit from his own graduate training — the practice passed down of stepping back from the main coefficient and asking about something else in the table. He wrote about it publicly in "Owning All the Numbers" (March 2026).

The Shklovsky principle: "Art exists to make one feel things, to make the stone stony." The purpose of this audit is to make you see your own output again, as if for the first time, before the story you want to tell has automated your perception.

See

claudecode35/essay_fletcher.md
for the full account of how this skill came to exist.