Guides & Tutorials

Designing a Fair Leaderboard: Rules That Stop Gaming

Designing a Fair Leaderboard: Rules That Stop Gaming

Most forecasting leaderboards reward timing, cherry picking, or low volume luck. This guide gives practical rules that make a leaderboard fair: eligibility pools, coverage requirements, checkpoints, liquidity flags, and transparent methodology.

Read Guide
Backtesting Forecasters: A Minimal, Repeatable Template

Backtesting Forecasters: A Minimal, Repeatable Template

Backtesting forecasters is not about fancy models. It is about clean rules: what gets scored, when it gets scored, and what baseline you compare to. This guide gives a minimal template you can repeat every month without changing the goalposts.

Read Guide
Herding vs Independent Forecasting: When Consensus Hurts

Herding vs Independent Forecasting: When Consensus Hurts

Following the crowd can improve accuracy when you have no edge, but it can also destroy learning, hide weak calibration, and create fake confidence. This guide explains herding, when consensus is useful, and when you should stay independent.

Read Guide
The Base Rate Trap: Why Priors Beat Vibes

The Base Rate Trap: Why Priors Beat Vibes

Most forecast mistakes start by ignoring base rates. This guide explains what base rates are, why humans neglect them, and a simple workflow that anchors priors before you update with evidence.

Read Guide
Multi Class Forecasts: Extending Brier Beyond Binary

Multi Class Forecasts: Extending Brier Beyond Binary

Brier score is most common for binary events, but it also works for multi class outcomes like A vs B vs Draw. This guide shows the multi class formula, a worked example, normalization choices, and how to benchmark fairly.

Read Guide
Clipping Probabilities: When It Helps and When It Lies

Clipping Probabilities: When It Helps and When It Lies

Probability clipping replaces 0 and 1 with small bounds like 0.01 and 0.99 to avoid infinite log loss and numeric issues. This guide explains when clipping is reasonable, what it hides, and how to disclose it in methodology.

Read Guide
Confidence Intervals for Calibration Buckets

Confidence Intervals for Calibration Buckets

Calibration bucket hit rates are noisy when sample size is small. Confidence intervals show how uncertain the realized frequency is, so you do not overreact to random swings. This guide explains a practical way to add intervals to your scorecard.

Read Guide
Probability Buckets: How Many and How Wide

Probability Buckets: How Many and How Wide

Calibration tables need probability buckets, but bucket choices can mislead. This guide explains how many buckets to use, how wide they should be, and how to handle small samples without fake precision.

Read Guide
Scorecard Methodology: What You Must Disclose

Scorecard Methodology: What You Must Disclose

A scorecard without methodology is just a marketing number. This guide lists the minimum disclosures that make Brier score and Brier skill score interpretable, comparable, and hard to game.

Read Guide
How to Build a Simple Forecasting Journal

How to Build a Simple Forecasting Journal

A forecasting journal is the fastest way to improve calibration and reduce repeat mistakes. This guide gives a minimal template, what to log, and how to turn entries into a scorecard feedback loop.

Read Guide
Brier Score Calculator: Step by Step With Examples

Brier Score Calculator: Step by Step With Examples

This step by step guide shows how to calculate Brier score for binary events, how to average across forecasts, and how to turn the result into a practical scorecard with baselines and checkpoints.

Read Guide
Benchmarking Against the Market: A Clean Methodology

Benchmarking Against the Market: A Clean Methodology

If you want to say “I beat the market”, you must define timestamps, consensus, and liquidity rules. This guide gives a simple methodology you can publish so market benchmarking is fair, repeatable, and hard to game.

Read Guide
Proper Scoring Rules: Why Honest Probabilities Win

Proper Scoring Rules: Why Honest Probabilities Win

A proper scoring rule rewards you for stating your true belief as a probability. This guide explains what “proper” means, why Brier and log loss are proper, and how proper scoring reduces gaming and overconfidence.

Read Guide
Evaluation Checkpoints: How to Score Forecasts Fairly

Evaluation Checkpoints: How to Score Forecasts Fairly

If you score the last update before settlement, you mostly reward late forecasting. Evaluation checkpoints fix that by scoring everyone at the same horizon. This guide explains checkpoint rules, edge cases, and how to avoid look ahead bias.

Read Guide
Liquidity and Thin Markets: When Market Benchmarks Break

Liquidity and Thin Markets: When Market Benchmarks Break

Market consensus is a strong benchmark only when the market is liquid. In thin markets, last trade and even mid price can be noisy or manipulable. This guide explains what to watch and how to benchmark safely.

Read Guide
Market Consensus: Mid Price vs Last Trade vs VWAP

Market Consensus: Mid Price vs Last Trade vs VWAP

If you benchmark forecasters against the market, you must define what “the market probability” is. This guide compares last trade, mid price, and VWAP, and explains when each one is appropriate.

Read Guide
Implied Probability: From Market Prices and Odds to a Forecast

Implied Probability: From Market Prices and Odds to a Forecast

Implied probability is the probability embedded in a market price or a bookmaker odd. This guide shows how to convert prices and odds into probabilities, how to remove vig, and how to use implied probability as a benchmark in scorecards.

Read Guide
Out of Sample Testing for Forecasters

Out of Sample Testing for Forecasters

Out of sample testing is how you check whether forecasting skill generalizes. This guide explains simple time splits, how to avoid leakage, and how to build a scorecard that is hard to game.

Read Guide
Selection Bias and Coverage: How People Accidentally Fake Skill

Selection Bias and Coverage: How People Accidentally Fake Skill

Most forecasting leaderboards lie because people do not forecast the same questions. This guide explains selection bias, why coverage matters, and how to design scorecards and rules that measure skill instead of cherry picking.

Read Guide
Rolling Windows: Tracking Improvement Over Time

Rolling Windows: Tracking Improvement Over Time

Rolling windows show how your forecasting performance changes over time without overreacting to one good or bad week. This guide explains window size choices, how to read trends, and how to separate real improvement from noise.

Read Guide
Forecast Horizon: Why Early Predictions Are Harder

Forecast Horizon: Why Early Predictions Are Harder

Forecast horizon is how far ahead you are predicting. Earlier forecasts face more uncertainty and usually score worse, so fair evaluation needs horizon splits or fixed checkpoints.

Read Guide
Log Loss vs Brier: Which One Punishes You More and Why

Log Loss vs Brier: Which One Punishes You More and Why

Brier score and log loss are both proper scoring rules, but they punish mistakes differently. This guide explains the intuition, shows worked examples, and tells you which metric fits your use case.

Read Guide
Brier Score Decomposition: Reliability, Resolution, Uncertainty

Brier Score Decomposition: Reliability, Resolution, Uncertainty

Brier score can be broken into three parts that explain why your score is good or bad. This guide explains reliability (calibration), resolution (discrimination), and uncertainty (base rate), and how to use the decomposition to improve.

Read Guide
Choosing a Baseline: 50 50 vs Base Rate vs Market Consensus

Choosing a Baseline: 50 50 vs Base Rate vs Market Consensus

A baseline (benchmark) is the reference forecast you compare against when you calculate Brier skill score. This guide explains the three most common baselines, when each is appropriate, and the traps that can create fake skill.

Read Guide
Overconfidence and Underconfidence: How to Diagnose and Fix

Overconfidence and Underconfidence: How to Diagnose and Fix

Overconfidence means your high probabilities happen less often than you claim. Underconfidence means they happen more often. This guide shows how to diagnose both from calibration tables and how to fix them with simple probability mapping.

Read Guide
Sharpness vs Calibration: Being Bold Without Being Wrong

Sharpness vs Calibration: Being Bold Without Being Wrong

Sharpness is how far your forecasts move away from 50%. Calibration is whether those probabilities match reality. This guide explains the difference, why both matter, and how to improve sharpness without becoming overconfident.

Read Guide
How to Read a Calibration Curve and Table

How to Read a Calibration Curve and Table

Calibration curves and tables show whether your probabilities match reality. This guide explains probability buckets, how to interpret deviations, and how to avoid being fooled by small sample sizes.

Read Guide
Calibration Explained: Why 70 Percent Should Mean 70 Percent

Calibration Explained: Why 70 Percent Should Mean 70 Percent

Calibration means your probabilities match reality over time. If you say 70% often, those events should happen about 70% of the time. This guide explains calibration, how to measure it, and how to fix common patterns.

Read Guide
How to Read a Forecast Scorecard

How to Read a Forecast Scorecard

A good scorecard is more than one number. This guide explains BS, BSS, sample size, coverage, calibration diagnostics, horizon splits, and the methodology checks that determine whether results are trustworthy.

Read Guide
Brier Score vs Brier Skill Score: When to Use Which

Brier Score vs Brier Skill Score: When to Use Which

Brier score measures raw probability error. Brier skill score (BSS) measures performance relative to a benchmark like base rate or market consensus. This guide shows the difference, the math, and how to pick the right baseline.

Read Guide
What Is the Brier Score and What It Measures

What Is the Brier Score and What It Measures

Brier score is the standard way to measure the accuracy of probabilistic forecasts for binary events. This guide explains what it measures, how it is calculated, and how to interpret it in practice.

Read Guide