Claude-skill-registry judgment-postmortem-calibration

Build VC judgment faster through structured postmortems with quantified calibration: log initial takes, track prediction accuracy with Brier scores, and measure learning rate over time. Use after decisions, passes, and major diligence sprints.

install

source · Clone the upstream repo

git clone https://github.com/majiayu000/claude-skill-registry

Claude Code · Install into ~/.claude/skills/

T=$(mktemp -d) && git clone --depth=1 https://github.com/majiayu000/claude-skill-registry "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data/judgment-postmortem-calibration" ~/.claude/skills/majiayu000-claude-skill-registry-judgment-postmortem-calibration && rm -rf "$T"

manifest: skills/data/judgment-postmortem-calibration/SKILL.md

Judgment postmortem calibration

When to use

Use this skill when you want to:

Improve selection judgment (faster learning, fewer repeated mistakes)
Capture why you said yes/no and how evidence changed your view
Measure your prediction accuracy and learning rate over time
Build an internal "decision log" that compounds
Review investments or passes after outcomes are known

Trigger points:

After every IC decision (invest or pass)
After every competitive loss
Quarterly: review passes that raised from others + calculate calibration metrics
Annually: review portfolio outcomes vs initial thesis + measure learning rate

Inputs you should request (only if missing)

Deal name + date of first meeting
Your initial take (reconstruct honestly if not documented)
Outcome to date (funded by others? traction? pivot? shut down?)
Original memo or notes (if available)
Your probability estimates at decision time (if recorded)

Outputs you must produce

Decision log entry (structured, one page)
Calibration scorecard (predictions vs reality with scores)
Brier score calculation (for probabilistic predictions)
Updated heuristics (2-5 actionable bullets)
Pattern library update (what archetype was this?)
Learning rate metrics (are you getting better?)
Follow-up list (who to ping, what to track)

Templates:

assets/decision-log.md
assets/calibration-tracker.csv (for longitudinal tracking)

Core principle: Measure what you believe, then score it

The value of postmortems comes from:

Honest recording of what you believed at decision time
Quantified predictions (probabilities, not just "I thought X")
Systematic scoring against outcomes
Tracking improvement over time

Procedure

1) Capture the timeline with predictions

Date	Event	Your belief	Confidence (%)
First meeting	"This will be a $1B+ outcome"	20%
First meeting	"Product-market fit within 12 months"	60%
Diligence	"They'll close 3 enterprise deals in 6 months"	40%
Decision	"Worth investing"	70%
+12 months			Actual outcome

2) Record the initial thesis with probabilities

At first meeting, I believed:

What would make the company win:
P(success | investment) estimate: ___%
P(this raises next round) estimate: ___%
P(achieves stated 12-month milestones) estimate: ___%
Top risk and P(risk materializes): ___%
My recommendation:

At decision point, I believed:

P(success | investment): ___%
P(raises next round): ___%
Top risks with probabilities:
1. Risk: ___ | P(materializes): ___%
2. Risk: ___ | P(materializes): ___%
Final recommendation:

3) Document the decision

Decision: Invest / Pass / Lost competitive
Stated rationale (at the time):
Unstated factors (be honest):
Confidence in decision: ___%

4) Score predictions against reality

Prediction scorecard:

Prediction	Your P(true)	Actual (1/0)	Brier contribution
"This will raise Series A"	70%	1 (yes)	(0.7-1)² = 0.09
"Product-market fit in 12mo"	60%	0 (no)	(0.6-0)² = 0.36
"3 enterprise deals in 6mo"	40%	0 (no)	(0.4-0)² = 0.16
"Top risk materializes"	30%	1 (yes)	(0.3-1)² = 0.49

Brier score for this deal: (sum of contributions) / n = ___

Perfect = 0.0, Random = 0.25, Always wrong = 1.0
Good forecaster: < 0.20
Reasonable forecaster: 0.20 - 0.25

5) Calculate calibration (are your probabilities accurate?)

Group your historical predictions by confidence level:

Confidence bucket	Predictions	Outcomes (% true)	Calibration gap
10-20%	15	18% true	+3% (slightly under-confident)
30-40%	22	28% true	-7% (slightly over-confident)
50-60%	18	52% true	-3% (well calibrated)
70-80%	12	58% true	-17% (over-confident)
90%+	5	80% true	-12% (over-confident)

Calibration insight: "I tend to be over-confident in the 70-80% range. When I say 75%, things happen ~60% of the time."

6) Identify what you underweighted or overweighted

For this deal:

Underweighted: ___
Overweighted: ___
Surprise factor: ___

Pattern across deals (update quarterly):

Factor	Times underweighted	Times overweighted
Team learning rate
Distribution advantages
Timing/market readiness
Technical moat
Competition
Founder-market fit

7) Extract heuristics (portable rules)

Good heuristics are:

Specific enough to act on
Falsifiable
Tied to observed evidence
Attached to a base rate

Heuristic format: "When [specific condition], [outcome] happens [X%] of the time in my experience."

Examples:

"When a seed-stage founder can't name a specific buyer trigger event, they fail to hit enterprise sales targets 80% of the time."
"When we invest in a second-time founder with prior distribution success, they raise Series A 90% of the time."

Write 2-5 heuristics from this postmortem: 1. 2. 3.

8) Update your pattern library with base rates

Archetype performance tracking:

Archetype	Deals	Success rate	Avg Brier	Notes
First-time founder, crowded market	8	25%	0.28	Over-confident on differentiation
Second-time founder, distribution edge	5	80%	0.15	Under-confident on execution
Technical founder, no GTM	6	33%	0.32	Over-weight technical moat

9) Measure learning rate (quarterly)

Rolling Brier score by quarter:

Quarter	Deals scored	Avg Brier	Calibration gap	Trend
Q1 2025	12	0.28	15% over-confident	Baseline
Q2 2025	15	0.24	10% over-confident	Improving
Q3 2025	14	0.21	8% over-confident	Improving
Q4 2025	16	0.19	5% over-confident	Good

Learning rate = (Brier_t - Brier_t-1) / Brier_t-1

Target: 5-10% improvement per quarter until Brier < 0.20

10) Create follow-up list

What to track	Signal	Recheck date	Prediction to score
			P(___) = ___%
			P(___) = ___%

Who to keep warm	Why	Next touch

Quarterly calibration review

Every quarter:

Score all predictions that reached outcome date
Calculate Brier scores by deal and overall
Update calibration table (predictions vs outcomes by confidence bucket)
Identify systematic biases (over/under-confidence patterns)
Review passes that raised: were pass reasons validated?
Update heuristics with new base rates
Calculate learning rate vs previous quarter

Quarterly output:

Brier score trend chart
Calibration curve (predicted % vs actual %)
Top 3 biases to correct
Updated heuristic base rates

Annual review: Portfolio outcomes vs initial thesis

For each portfolio company:

What did we believe at investment?
What's the current reality?
Where were we right/wrong?
Score the original predictions
Update archetype base rates

Annual output:

Portfolio Brier score
Best/worst calibrated predictions
Archetype performance update
Learning rate over 4 quarters
Heuristics validated or invalidated

Salesforce logging (optional)

If Salesforce is your system of record:

Add predictions as structured fields on Opportunity (or in Notes)
Record confidence levels at each stage
Link postmortem Note titled "Postmortem (YYYY-MM-DD)"
Update outcome fields when known
Tag with heuristics extracted

Edge cases

If you have no outcome yet: run a "process postmortem" focused on what you learned and what evidence was missing. Record predictions for future scoring.
If the outcome is ambiguous: define binary success criteria now, score later.
If you can't remember your initial thesis: reconstruct as honestly as possible, and start recording predictions with probabilities now.
If you have few deals: even 10-15 scored predictions start to show calibration patterns.