Choosing a Baseline: 50 50 vs Base Rate vs Market Consensus

Published on January 1, 2026

Why baseline choice matters

Brier score is a raw error metric, so it depends on the question set. If you want fair comparisons, you use Brier skill score (BSS) versus a benchmark.

BSS is:

BSS = 1 - (BS / BS_ref)

So the baseline you choose directly determines what it means to have “skill”.

Baseline option 1: 50 50

What it is: predict 0.50 for every binary event.

When it is useful

• teaching and onboarding

• sanity checks (do you beat pure ignorance)

• balanced question sets where outcomes are close to 50/50

When it fails

• imbalanced outcomes (for example many events are usually NO)

• scorecards where people pick easy near certain questions

In those cases, beating 50/50 can look impressive but does not prove real forecasting skill.

Baseline option 2: base rate

What it is: predict the base rate of YES for the relevant group.

Example: if a class of events resolves YES 30% of the time, the base rate forecast is p = 0.30 for all events in that group.

Why base rate is a strong default

• stable and hard to game

• works even when markets are illiquid or unavailable

• immediately punishes forecasters who ignore priors and go too extreme

How to define base rate in practice

• overall base rate across your dataset

• base rate by category (often better)

• base rate by horizon or regime, if the world changes

If the base rate changes over time, you can get calibration drift. Track it with a rolling window.

Baseline option 3: market consensus

What it is: use market implied probabilities as the baseline.

This is popular because it answers a strong question: do you beat the crowd?

How to define market consensus cleanly

Do not blindly use last trade. In thin markets it can be noisy or stale.

Common definitions:

• mid price at a fixed checkpoint

• VWAP over a defined consensus window

• a consensus snapshot at market close

When market consensus is appropriate

• there is real liquidity

• you can align timestamps using an evaluation checkpoint

• your consensus definition is documented in methodology

When market consensus misleads

• thin market conditions

• short lived price spikes (news lag, one small trade)

• potential manipulation in low volume markets

• misaligned timing that introduces look-ahead bias

Sportsbook odds as a baseline: do not forget the margin

If you benchmark against sportsbook odds, you must account for overround and vig.

Convert odds to implied probability, then apply removing the vig so probabilities sum to 1. Otherwise you are benchmarking against inflated probabilities.

Best practice: report multiple baselines

One simple scorecard pattern:

• headline BS

• BSS vs base rate (default fairness baseline)

• BSS vs market consensus (when market data is reliable)

That gives you both: “do I beat priors” and “do I beat the crowd”.

Common mistakes

Changing baseline definitions: your BSS series becomes non comparable. Document and keep stable settings.

Using market prices without liquidity context: always report liquidity or at least flag thin markets.

Mixing categories: different categories have different base rates. Segment when needed.

Takeaway

50/50 is good for education, base rate is the best default baseline, and market consensus is the strongest baseline when the market is liquid and timestamps align. If you want trustworthy scorecards, define baselines explicitly and keep them stable.

• Benchmark

• Brier Skill Score

• Base Rate

• Market Consensus

• Brier Score vs Brier Skill Score: When to Use Which

• How to Read a Forecast Scorecard

← Back to Guides