Choosing a Baseline: 50 50 vs Base Rate vs Market Consensus
Why baseline choice matters
Brier score is a raw error metric, so it depends on the question set. If you want fair comparisons, you use Brier skill score (BSS) versus a benchmark.
BSS is:
BSS = 1 - (BS / BS_ref)
So the baseline you choose directly determines what it means to have “skill”.
Baseline option 1: 50 50
What it is: predict 0.50 for every binary event.
When it is useful
• teaching and onboarding
• sanity checks (do you beat pure ignorance)
• balanced question sets where outcomes are close to 50/50
When it fails
• imbalanced outcomes (for example many events are usually NO)
• scorecards where people pick easy near certain questions
In those cases, beating 50/50 can look impressive but does not prove real forecasting skill.
Baseline option 2: base rate
What it is: predict the base rate of YES for the relevant group.
Example: if a class of events resolves YES 30% of the time, the base rate forecast is p = 0.30 for all events in that group.
Why base rate is a strong default
• stable and hard to game
• works even when markets are illiquid or unavailable
• immediately punishes forecasters who ignore priors and go too extreme
How to define base rate in practice
• overall base rate across your dataset
• base rate by category (often better)
• base rate by horizon or regime, if the world changes
If the base rate changes over time, you can get calibration drift. Track it with a rolling window.
Baseline option 3: market consensus
What it is: use market implied probabilities as the baseline.
This is popular because it answers a strong question: do you beat the crowd?
How to define market consensus cleanly
Do not blindly use last trade. In thin markets it can be noisy or stale.
Common definitions:
• mid price at a fixed checkpoint
• VWAP over a defined consensus window
• a consensus snapshot at market close
When market consensus is appropriate
• there is real liquidity
• you can align timestamps using an evaluation checkpoint
• your consensus definition is documented in methodology
When market consensus misleads
• thin market conditions
• short lived price spikes (news lag, one small trade)
• potential manipulation in low volume markets
• misaligned timing that introduces look-ahead bias
Sportsbook odds as a baseline: do not forget the margin
If you benchmark against sportsbook odds, you must account for overround and vig.
Convert odds to implied probability, then apply removing the vig so probabilities sum to 1. Otherwise you are benchmarking against inflated probabilities.
Best practice: report multiple baselines
One simple scorecard pattern:
• headline BS
• BSS vs base rate (default fairness baseline)
• BSS vs market consensus (when market data is reliable)
That gives you both: “do I beat priors” and “do I beat the crowd”.
Common mistakes
Changing baseline definitions: your BSS series becomes non comparable. Document and keep stable settings.
Using market prices without liquidity context: always report liquidity or at least flag thin markets.
Mixing categories: different categories have different base rates. Segment when needed.
Takeaway
50/50 is good for education, base rate is the best default baseline, and market consensus is the strongest baseline when the market is liquid and timestamps align. If you want trustworthy scorecards, define baselines explicitly and keep them stable.