Designing a Fair Leaderboard: Rules That Stop Gaming
Why leaderboards get gamed
Forecasting leaderboards are easy to break because people can choose:
• which questions they forecast
• when they forecast
• how often they update
Without constraints, rankings mostly measure selection bias and timing, not forecasting skill.
The five rule pillars of a fair leaderboard
A fair leaderboard needs these pillars:
• a shared eligibility pool
• coverage and volume requirements
• checkpoint based scoring
• clean benchmarks and liquidity rules
• transparent methodology
Pillar 1: define a shared eligibility pool
The first fix is to define what counts.
Examples:
• only binary markets
• only markets that were open at least 24 hours
• only markets in selected categories
• exclude markets with ambiguous rules or disputes
Then compute coverage as: forecasts made divided by eligible markets.
Pillar 2: require minimum volume and coverage
Low volume performance is noisy and easy to fake.
Set minimums like:
• minimum sample size (N)
• minimum active days
• minimum coverage percentage
This blocks “one great week” from winning a season.
Pillar 3: score with evaluation checkpoints
If you score the final update before settlement, you reward waiting.
Use an evaluation checkpoint so everyone is evaluated at the same horizon.
Common choices:
• T-24h before settlement
• daily snapshot time (for long questions)
Checkpoints are the single biggest upgrade for fairness.
Pillar 4: use clean benchmarks
Raw Brier score is hard to compare across datasets. Use Brier skill score with a benchmark.
Default benchmark: base rate
Base rate is always available and hard to game. It should be your default baseline.
Market benchmark: only with liquidity rules
If you benchmark against market consensus, define:
• mid price or VWAP, not last trade
• consensus timestamp or window
• liquidity filters (spread, volume, depth)
Otherwise thin markets will inject noise or invite manipulation.
Pillar 5: publish methodology
A leaderboard should have a visible methodology box, including:
• eligibility definition
• checkpoint rule
• metric and benchmark definition
• N and coverage requirements
• how voids and disputes are handled
Additional anti gaming rules (optional)
Locking the forecast at entry
In tournament mode, you can allow only one forecast per market per user. This stops micro timing games.
Limit late entries
Require that a forecast must be made at least X hours before settlement to count.
Show distributions
Publish how often users forecast in each probability range. It is hard to hide extreme behavior when distributions are visible.
Use rolling windows for stability
Show rolling performance so users see whether someone is stable or spiky. See Rolling Windows.
Common mistakes
Letting users choose everything
Without a pool and coverage rules, leaderboards are mostly selection bias.
No checkpoint rule
Rewards waiting and late copying.
Using market prices without liquidity checks
Thin markets break the baseline.
Hiding N
No N means no trust.
Takeaway
A fair leaderboard is rules plus transparency. Define a shared eligible pool, require coverage and minimum N, score with checkpoints, benchmark with base rates and only use market consensus when liquidity is real. Then publish the methodology so rankings are interpretable and defensible.