The gold standard for probabilistic forecasting — how it works, what good looks like, and why calibration matters more than confidence.
Brier Score = (prediction − outcome)². That's it. If you predict 80% and the event happens (outcome = 1), your score is (0.80 − 1.00)² = 0.04. If the event doesn't happen (outcome = 0), your score is (0.80 − 0.00)² = 0.64.
Always predicting 100% when the event happens and 0% when it doesn't.
Always predicting 50%. No information, no skill — the baseline to beat.
Always predicting 100% when the event doesn't happen. Perfectly wrong.
A good Brier Score requires two things. Calibration: when you say 70%, the event should happen about 70% of the time. Resolution: your predictions should be as far from 50% as possible — sharp, not wishy-washy. The best forecasters are both well-calibrated and decisive.
Every forecaster on Altus Alpha — Community, AI, and Expert — is scored via Brier Score on every prediction. The score is aggregated per track, per vertical, and overall. It's the primary metric for the promotion path: consistently beating the AI baseline (Haiku Oracle, typically around 0.20) is the gateway to Expert status.
Professional superforecasters (as studied by Philip Tetlock) achieve Brier Scores around 0.15-0.18 on geopolitical questions. Our Haiku Oracle baseline typically scores 0.18-0.22 depending on the domain. Anything below 0.15 consistently is world-class.