Skilllibrary skeptic-pass

Run a deliberately doubtful review that challenges every claim in a proposal, demands evidence for assertions, and flags optimism bias. Use when the user says "skeptic pass", "challenge these claims", "what's hand-wavy here", "where's the proof", "poke holes in the reasoning", or when a proposal reads too confidently for the evidence behind it. Do not use for adversarial exploitation (use red-team-challenge), hidden dependency discovery (use assumptions-audit), or structured option comparison (use tradeoff-analysis).

install

source · Clone the upstream repo

git clone https://github.com/merceralex397-collab/skilllibrary

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/merceralex397-collab/skilllibrary "$T" && mkdir -p ~/.claude/skills && cp -r "$T/04-planning-review-and-critique/skeptic-pass" ~/.claude/skills/merceralex397-collab-skilllibrary-skeptic-pass && rm -rf "$T"

manifest: 04-planning-review-and-critique/skeptic-pass/SKILL.md

source content

Purpose

Read a proposal as a demanding peer reviewer who needs convincing. Every claim is guilty until proven innocent. The goal is not to attack the plan (that is red-teaming) or to find hidden dependencies (that is assumptions audit)—the goal is to grade the evidence quality behind each assertion and flag where confidence outpaces proof.

This differs from related skills:

red-team-challenge adopts an adversary's mindset and tries to exploit weaknesses. Skeptic-pass stays in reviewer mode and demands evidence.
assumptions-audit extracts hidden dependencies. Skeptic-pass evaluates whether stated claims are actually supported.
steelman rebuilds arguments in their strongest form. Skeptic-pass tests whether arguments hold up under scrutiny.

When to use this skill

Use when:

the user says "skeptic pass", "challenge these claims", "what's hand-wavy", "where's the proof"
a proposal reads confidently but cites no evidence, benchmarks, or prior art
a plan uses quantitative claims (latency targets, cost savings, adoption rates) without sourcing them
a design document needs intellectual honesty review before stakeholder presentation
after a steelman pass, to test whether even the strongest version holds up

Do NOT use when:

the task is adversarial exploitation of a system (use
```
red-team-challenge
```
)
the task is finding unstated dependencies (use
```
assumptions-audit
```
)
the task is judging execution order and safety of a plan (use
```
plan-review
```
)
the evidence is already known to be absent and the goal is just to list what's missing (use
```
gap-analysis
```
)

Operating procedure

Extract every claim. Read the document and list every assertion that could be true or false. Include:
- Factual claims ("Our API handles 10k req/s")
- Causal claims ("This will reduce churn by 20%")
- Feasibility claims ("This can be done in two sprints")
- Comparative claims ("Option A is faster than Option B")
- Consensus claims ("The team agrees that…", "Industry best practice is…")
- Absence claims ("There is no risk of…", "This won't affect…")
Grade the evidence behind each claim. For each claim, assign one of:
- Proven: Supported by cited data, benchmark, test result, or authoritative source
- Plausible: Reasonable but unsupported—no evidence cited, just sounds right
- Asserted: Stated as fact with no supporting argument at all
- Contradicted: Available evidence suggests the opposite
- Unfalsifiable: Stated in a way that cannot be tested or disproven ("this will be robust")
Identify optimism bias patterns. Flag these common patterns when present:
- Planning fallacy: Timeline assumes everything goes right, no buffer for unknowns
- Anchoring: A specific number is used without justification, and all reasoning anchors to it
- Survivorship bias: "Company X did this successfully" without noting failures
- Appeal to novelty: Preferring the new approach because it's new, not because evidence supports it
- Sunk cost: Continuing a direction because of past investment, not future value
- Consensus illusion: "Everyone agrees" when no explicit poll or decision was recorded
- Precision theater: False precision (e.g., "this will take 14.5 days") masking deep uncertainty
Stress-test the causal claims. For every "X will cause Y" or "doing X will result in Y":
- Is the mechanism explained or just asserted?
- Are there confounding factors?
- Has this causal chain been observed before, or is it hoped for?
- What would disprove this claim?
Challenge the numbers. For every quantitative claim:
- Where does this number come from?
- Is it a measurement, estimate, or guess?
- What's the confidence interval?
- Is it cited or derived?
- Does it survive a sanity check against known baselines?
Produce a credibility assessment. Grade the overall document:
- Well-supported: Most claims are proven or plausible with clear reasoning
- Mixed: Some claims are supported, others are pure assertion
- Weakly-supported: Most claims are asserted without evidence
- Overconfident: Confidence level far exceeds evidence quality

Output contract

Return a Skeptic Report with:

```
Claims Inventory
```
— numbered list of extracted claims
```
Evidence Grades
```
— table: Claim | Grade (Proven/Plausible/Asserted/Contradicted/Unfalsifiable) | Evidence Cited | Evidence Needed
```
Optimism Bias Flags
```
— specific patterns identified with the triggering text
```
Weakest Claims
```
— the 3-5 claims with the largest gap between confidence and evidence
```
Overall Credibility
```
— one of the four grades above, with a one-paragraph justification
```
Evidence Collection Plan
```
— for the weakest claims, what specific evidence would resolve the doubt

Named failure modes of this method

Cynicism spiral: Doubting everything equally instead of prioritizing claims that matter. Fix: focus skepticism on claims that drive decisions, not background context.
Demanding impossible proof: Requiring RCT-level evidence for reasonable engineering judgment. Fix: calibrate evidence expectations to the claim's impact—high-stakes claims need strong evidence, low-stakes claims need plausibility.
Missing the forest: Grading individual claims without assessing whether the overall argument holds even if some claims are weak. Fix: always include the overall credibility assessment.
Style policing: Flagging vague language when the underlying claim is actually well-supported elsewhere. Fix: check whether evidence exists before flagging weak phrasing.
Confirmation bias in review: Only challenging claims you disagree with while accepting others at face value. Fix: grade every claim, including ones that feel obviously true.

References

Kahneman, D. (2011). Thinking, Fast and Slow — cognitive biases in judgment
RFC 7282 (https://www.rfc-editor.org/rfc/rfc7282.html) — on consensus and objection handling in technical review
Tetlock, P. (2015). Superforecasting — calibrating confidence to evidence
Sagan, C. "Extraordinary claims require extraordinary evidence" — evidence proportionality principle

Failure handling

If the document is too short to contain meaningful claims, say so rather than manufacturing doubt.
If all claims are well-supported, report that the skeptic pass found high credibility—do not force criticism.
If the document is a brainstorm or early draft, adjust evidence expectations downward and note that a full skeptic pass requires a more developed proposal.