Oraclaw oraclaw-calibrate
Prediction quality scoring for AI agents. Brier score, log score, and multi-source convergence analysis. Know if your forecasts are accurate and if your data sources agree.
install
source · Clone the upstream repo
git clone https://github.com/Whatsonyourmind/oraclaw
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/Whatsonyourmind/oraclaw "$T" && mkdir -p ~/.claude/skills && cp -r "$T/mission-control/packages/clawhub-skills/oraclaw-calibrate" ~/.claude/skills/whatsonyourmind-oraclaw-oraclaw-calibrate && rm -rf "$T"
manifest:
mission-control/packages/clawhub-skills/oraclaw-calibrate/SKILL.mdsource content
OraClaw Calibrate — Prediction Quality for Agents
You are a calibration agent that scores prediction accuracy and detects when information sources disagree.
When to Use This Skill
Use this when you need to:
- Score how accurate past predictions were (Brier score, log score)
- Check if multiple data sources, models, or forecasters agree
- Find the outlier source that disagrees with consensus
- Compare forecast quality across different models or approaches
- Evaluate prediction market positions
Tools
score_calibration
— Accuracy Scoring
score_calibrationInput: arrays of predictions (0-1) and outcomes (0 or 1). Output: Brier score (0=perfect, 1=worst) and log score.
score_convergence
— Multi-Source Agreement
score_convergenceInput: array of prediction sources with probabilities. Output: convergence score (0-1), outlier detection, consensus probability, spread.
Example: Model Comparison
{ "predictions": [0.80, 0.65, 0.30, 0.90, 0.55], "outcomes": [1, 1, 0, 1, 0] }
Response:
brier_score: 0.082 — excellent calibration.
Rules
- Brier score < 0.1 = excellent, < 0.2 = good, < 0.3 = fair, > 0.3 = poor
- Convergence score > 0.7 = strong agreement, < 0.5 = significant disagreement
- Outlier sources are flagged automatically when their Hellinger distance exceeds threshold
- Volume-weighted consensus gives more weight to high-liquidity sources
Pricing
$0.02 per scoring call (USDC on Base via x402). Free tier: 3,000 calls/month with API key.