What Is the Brier Score and What It Measures
What Brier score is
Brier score measures how accurate your probability forecasts are for a binary event (yes or no). It does not ask whether you were “right”. It asks how close your probability was to the outcome.
The core idea
Each forecast is scored by squared error:
(p - o)^2
Where:
• p is your predicted probability for YES (between 0 and 1)
• o is the outcome (1 if YES happened, 0 if NO happened)
How it is calculated across many forecasts
If you have N forecasts, Brier score is the average squared error:
BS = (1/N) * sum((p_i - o_i)^2)
Lower is better. A perfect score is 0.
Worked examples
Example A: You predict p = 0.70 and the event happens (o = 1).
• squared error = (0.70 - 1)^2 = 0.09
Example B: You predict p = 0.70 and the event does not happen (o = 0).
• squared error = (0.70 - 0)^2 = 0.49
Example C: You predict p = 0.52 and the event happens (o = 1).
• squared error = (0.52 - 1)^2 = 0.2304
Notice the pattern: confident misses are punished a lot more than mild misses.
How to interpret the number
Brier score lives between 0 and 1 for binary events.
• 0.00 is perfect
• 0.25 is what you get if you always predict 50% on a balanced set (a coin flip baseline)
• Values closer to 1 indicate very poor probability forecasts (usually due to extreme wrong calls)
Why Brier score is useful
• It is a proper scoring rule, so honest probabilities are rewarded.
• It is easy to compute and explain.
• It decomposes into interpretable parts (see decomposition guide later).
What Brier score does not tell you by itself
It does not tell you whether you are better than a baseline. Raw BS depends on the question set. If your dataset is easy or outcomes are imbalanced, your BS may look great without real skill.
That is why many scorecards also report Brier skill score versus a benchmark.
Common mistakes
Using accuracy instead of probabilities: If you only record yes or no, you lose information. Brier score needs the probability.
Ignoring calibration: A strong BS still may hide systematic miscalibration. Use calibration curves and tables.
Small samples: With small N, your score is noisy. Always report sample size.
Takeaway
Brier score is the average squared error of your probability forecasts. It rewards honest probabilities and heavily penalizes confident mistakes. On its own, it is not enough for fair comparisons, so pair it with Brier skill score and calibration diagnostics.
Related
• Brier Score vs Brier Skill Score: When to Use Which
• Calibration Explained: Why 70 Percent Should Mean 70 Percent