Guides & Tutorials
Designing a Fair Leaderboard: Rules That Stop Gaming
Most forecasting leaderboards reward timing, cherry picking, or low volume luck. This guide gives practical rules that make a leaderboard fair: eligibility pools, coverage requirements, checkpoints, liquidity flags, and transparent methodology.
Read Guide →
Backtesting Forecasters: A Minimal, Repeatable Template
Backtesting forecasters is not about fancy models. It is about clean rules: what gets scored, when it gets scored, and what baseline you compare to. This guide gives a minimal template you can repeat every month without changing the goalposts.
Read Guide →
Herding vs Independent Forecasting: When Consensus Hurts
Following the crowd can improve accuracy when you have no edge, but it can also destroy learning, hide weak calibration, and create fake confidence. This guide explains herding, when consensus is useful, and when you should stay independent.
Read Guide →
The Base Rate Trap: Why Priors Beat Vibes
Most forecast mistakes start by ignoring base rates. This guide explains what base rates are, why humans neglect them, and a simple workflow that anchors priors before you update with evidence.
Read Guide →
Multi Class Forecasts: Extending Brier Beyond Binary
Brier score is most common for binary events, but it also works for multi class outcomes like A vs B vs Draw. This guide shows the multi class formula, a worked example, normalization choices, and how to benchmark fairly.
Read Guide →
Clipping Probabilities: When It Helps and When It Lies
Probability clipping replaces 0 and 1 with small bounds like 0.01 and 0.99 to avoid infinite log loss and numeric issues. This guide explains when clipping is reasonable, what it hides, and how to disclose it in methodology.
Read Guide →
Confidence Intervals for Calibration Buckets
Calibration bucket hit rates are noisy when sample size is small. Confidence intervals show how uncertain the realized frequency is, so you do not overreact to random swings. This guide explains a practical way to add intervals to your scorecard.
Read Guide →
Probability Buckets: How Many and How Wide
Calibration tables need probability buckets, but bucket choices can mislead. This guide explains how many buckets to use, how wide they should be, and how to handle small samples without fake precision.
Read Guide →
Scorecard Methodology: What You Must Disclose
A scorecard without methodology is just a marketing number. This guide lists the minimum disclosures that make Brier score and Brier skill score interpretable, comparable, and hard to game.
Read Guide →
How to Build a Simple Forecasting Journal
A forecasting journal is the fastest way to improve calibration and reduce repeat mistakes. This guide gives a minimal template, what to log, and how to turn entries into a scorecard feedback loop.
Read Guide →
Brier Score Calculator: Step by Step With Examples
This step by step guide shows how to calculate Brier score for binary events, how to average across forecasts, and how to turn the result into a practical scorecard with baselines and checkpoints.
Read Guide →
Benchmarking Against the Market: A Clean Methodology
If you want to say “I beat the market”, you must define timestamps, consensus, and liquidity rules. This guide gives a simple methodology you can publish so market benchmarking is fair, repeatable, and hard to game.
Read Guide →
Proper Scoring Rules: Why Honest Probabilities Win
A proper scoring rule rewards you for stating your true belief as a probability. This guide explains what “proper” means, why Brier and log loss are proper, and how proper scoring reduces gaming and overconfidence.
Read Guide →
Evaluation Checkpoints: How to Score Forecasts Fairly
If you score the last update before settlement, you mostly reward late forecasting. Evaluation checkpoints fix that by scoring everyone at the same horizon. This guide explains checkpoint rules, edge cases, and how to avoid look ahead bias.
Read Guide →
Liquidity and Thin Markets: When Market Benchmarks Break
Market consensus is a strong benchmark only when the market is liquid. In thin markets, last trade and even mid price can be noisy or manipulable. This guide explains what to watch and how to benchmark safely.
Read Guide →
Market Consensus: Mid Price vs Last Trade vs VWAP
If you benchmark forecasters against the market, you must define what “the market probability” is. This guide compares last trade, mid price, and VWAP, and explains when each one is appropriate.
Read Guide →
Implied Probability: From Market Prices and Odds to a Forecast
Implied probability is the probability embedded in a market price or a bookmaker odd. This guide shows how to convert prices and odds into probabilities, how to remove vig, and how to use implied probability as a benchmark in scorecards.
Read Guide →
Out of Sample Testing for Forecasters
Out of sample testing is how you check whether forecasting skill generalizes. This guide explains simple time splits, how to avoid leakage, and how to build a scorecard that is hard to game.
Read Guide →
Selection Bias and Coverage: How People Accidentally Fake Skill
Most forecasting leaderboards lie because people do not forecast the same questions. This guide explains selection bias, why coverage matters, and how to design scorecards and rules that measure skill instead of cherry picking.
Read Guide →Rolling Windows: Tracking Improvement Over Time
Rolling windows show how your forecasting performance changes over time without overreacting to one good or bad week. This guide explains window size choices, how to read trends, and how to separate real improvement from noise.
Read Guide →
Forecast Horizon: Why Early Predictions Are Harder
Forecast horizon is how far ahead you are predicting. Earlier forecasts face more uncertainty and usually score worse, so fair evaluation needs horizon splits or fixed checkpoints.
Read Guide →
Log Loss vs Brier: Which One Punishes You More and Why
Brier score and log loss are both proper scoring rules, but they punish mistakes differently. This guide explains the intuition, shows worked examples, and tells you which metric fits your use case.
Read Guide →
Brier Score Decomposition: Reliability, Resolution, Uncertainty
Brier score can be broken into three parts that explain why your score is good or bad. This guide explains reliability (calibration), resolution (discrimination), and uncertainty (base rate), and how to use the decomposition to improve.
Read Guide →
Choosing a Baseline: 50 50 vs Base Rate vs Market Consensus
A baseline (benchmark) is the reference forecast you compare against when you calculate Brier skill score. This guide explains the three most common baselines, when each is appropriate, and the traps that can create fake skill.
Read Guide →
Overconfidence and Underconfidence: How to Diagnose and Fix
Overconfidence means your high probabilities happen less often than you claim. Underconfidence means they happen more often. This guide shows how to diagnose both from calibration tables and how to fix them with simple probability mapping.
Read Guide →
Sharpness vs Calibration: Being Bold Without Being Wrong
Sharpness is how far your forecasts move away from 50%. Calibration is whether those probabilities match reality. This guide explains the difference, why both matter, and how to improve sharpness without becoming overconfident.
Read Guide →
How to Read a Calibration Curve and Table
Calibration curves and tables show whether your probabilities match reality. This guide explains probability buckets, how to interpret deviations, and how to avoid being fooled by small sample sizes.
Read Guide →
Calibration Explained: Why 70 Percent Should Mean 70 Percent
Calibration means your probabilities match reality over time. If you say 70% often, those events should happen about 70% of the time. This guide explains calibration, how to measure it, and how to fix common patterns.
Read Guide →
How to Read a Forecast Scorecard
A good scorecard is more than one number. This guide explains BS, BSS, sample size, coverage, calibration diagnostics, horizon splits, and the methodology checks that determine whether results are trustworthy.
Read Guide →
Brier Score vs Brier Skill Score: When to Use Which
Brier score measures raw probability error. Brier skill score (BSS) measures performance relative to a benchmark like base rate or market consensus. This guide shows the difference, the math, and how to pick the right baseline.
Read Guide →
What Is the Brier Score and What It Measures
Brier score is the standard way to measure the accuracy of probabilistic forecasts for binary events. This guide explains what it measures, how it is calculated, and how to interpret it in practice.
Read Guide →