Awesome-omni-skill prediction-tracking
Track and evaluate AI predictions over time to assess accuracy. Use when reviewing past predictions to determine if they came true, failed, or remain uncertain.
git clone https://github.com/diegosouzapw/awesome-omni-skill
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/prediction-tracking-rickoslyder" ~/.claude/skills/diegosouzapw-awesome-omni-skill-prediction-tracking-1d8fdd && rm -rf "$T"
skills/development/prediction-tracking-rickoslyder/SKILL.mdPrediction Tracking Skill
Track predictions made by AI researchers and critics, evaluate their accuracy over time.
Prediction Recording
When recording a new prediction, capture:
Required Fields
- text: The prediction as stated
- author: Who made it
- madeAt: When it was made
- timeframe: When they expect it to happen
- topic: What area of AI
- confidence: How confident they seemed
Optional Fields
- sourceUrl: Where the prediction was made
- targetDate: Specific date if mentioned
- conditions: Any caveats or conditions
- metrics: How to measure success
Evaluation Status
When evaluating predictions, assign one of:
verified
verifiedClearly came true as stated.
- The predicted capability/event occurred
- Within the stated timeframe
- Substantially as described
falsified
falsifiedClearly did not come true.
- Timeframe passed without occurrence
- Contradictory evidence emerged
- Author retracted or modified claim
partially-verified
partially-verifiedPartially accurate.
- Some aspects came true, others didn't
- Capability exists but weaker than claimed
- Timeframe was off but direction correct
too-early
too-earlyNot enough time has passed.
- Still within stated timeframe
- No definitive evidence either way
unfalsifiable
unfalsifiableCannot be objectively assessed.
- Too vague to measure
- No clear success criteria
- Moved goalposts
ambiguous
ambiguousPrediction was too vague to evaluate.
- Multiple interpretations possible
- Success criteria unclear
Evaluation Process
For each prediction being evaluated:
1. Restate the prediction
What exactly was claimed?
2. Identify timeframe
Has enough time passed to evaluate?
3. Gather evidence
What has happened since?
- Relevant releases or announcements
- Benchmark results
- Real-world deployments
- Counter-evidence
4. Assess status
Which evaluation status applies?
5. Score accuracy
If verifiable, rate 0.0-1.0:
- 1.0: Exactly as predicted
- 0.7-0.9: Substantially correct
- 0.4-0.6: Partially correct
- 0.1-0.3: Mostly wrong
- 0.0: Completely wrong
6. Note lessons
What does this tell us about:
- The author's forecasting ability
- The topic's predictability
- Common prediction pitfalls
Output Format
For evaluation:
{ "evaluations": [ { "predictionId": "id", "status": "verified", "accuracyScore": 0.85, "evidence": "Description of evidence", "notes": "Additional context", "evaluatedAt": "timestamp" } ] }
For accuracy statistics:
{ "author": "Author name", "totalPredictions": 15, "verified": 5, "falsified": 3, "partiallyVerified": 2, "pending": 4, "unfalsifiable": 1, "averageAccuracy": 0.62, "topicBreakdown": { "reasoning": { "predictions": 5, "accuracy": 0.7 }, "agents": { "predictions": 3, "accuracy": 0.4 } }, "calibration": "Assessment of how well-calibrated they are" }
Calibration Assessment
Evaluate whether predictors are well-calibrated:
Well-Calibrated
- High-confidence predictions usually come true
- Low-confidence predictions have mixed results
- Acknowledges uncertainty appropriately
Overconfident
- High-confidence predictions often fail
- Rarely expresses uncertainty
- Doesn't update on evidence
Underconfident
- Low-confidence predictions often come true
- Hedges even on likely outcomes
- Too conservative
Inconsistent
- Confidence doesn't correlate with accuracy
- Random relationship between stated and actual accuracy
Tracking Notable Predictors
Keep running assessments of key voices:
| Predictor | Total | Accuracy | Calibration | Notes |
|---|---|---|---|---|
| Sam Altman | 20 | 55% | Overconfident | Timeline optimism |
| Gary Marcus | 15 | 70% | Well-calibrated | Conservative |
| Dario Amodei | 12 | 65% | Slightly over | Safety-focused |
Red Flags
Watch for prediction patterns that suggest bias:
- Always bullish regardless of topic
- Never acknowledges failed predictions
- Moves goalposts when wrong
- Predictions align suspiciously with financial interests
- Vague enough to claim credit for anything