Glossary of Terms
Key terms and definitions for prediction markets, forecasting, and probability.
A
Audit Trail
An audit trail is a tamper resistant record of forecasts, timestamps, and edits. It supports trust and prevents cherry picking.
B
Backtest
A backtest evaluates a forecasting method on historical data. Good backtests avoid look-ahead bias and report performance out of sample.
Base Rate
Base rate is the underlying frequency that an event occurs in a reference set. It is a natural baseline forecast and is often used to compute Brier skill score.
Base Rate Shift
Base rate shift is a change in how often an event occurs over time. It can break old baselines and cause calibration to drift.
Baseline Forecast
A baseline forecast is a simple reference model used for comparison, such as 50/50, base rate, or market consensus. It is the denominator in skill scoring.
Benchmark
A benchmark is a reference forecast used for comparison, such as 50/50, base rate, or market consensus. It is required to compute Brier skill score.
Binary Event
A binary event has exactly two outcomes, usually yes or no. Brier score and log loss are commonly used to score probabilistic forecasts for binary events.
Brier Score
Brier score measures the accuracy of probabilistic forecasts for binary events by averaging the squared error between predicted probability and the actual outcome. Lower is better.
Brier Skill Score
Brier skill score (BSS) measures how much better (or worse) your Brier score is versus a baseline forecast. Higher is better: 1 is perfect, 0 matches the baseline, and negative is worse than baseline.
C
Calibration
Calibration describes whether predicted probabilities match observed frequencies. A well calibrated forecaster’s 70% predictions come true about 70% of the time.
Calibration Curve
A calibration curve plots predicted probabilities against observed frequencies across probability buckets. It shows whether probabilities mean what they claim.
Calibration Drift
Calibration drift is a change in calibration over time, where probability buckets no longer match observed frequencies. It often follows base-rate shifts or regime changes.
Calibration Table
A calibration table summarizes bucket counts, average predicted probability, and realized frequency. It is the numeric backbone of calibration curves.
Class Imbalance
Class imbalance means outcomes are heavily skewed toward one side (mostly yes or mostly no). It affects baselines, calibration, and score interpretation.
Climatology
Climatology is a baseline forecast that predicts the historical base rate for every event. It is a standard benchmark for Brier skill score.
Confidence Interval
A confidence interval is a range that reflects uncertainty in an estimated metric, such as the realized frequency in a calibration bucket. Wider intervals usually mean smaller samples.
Consensus Snapshot
A consensus snapshot is the benchmark probability taken at a specific time, such as the mid price at market close. It is used for consistent comparisons.
Consensus Window
A consensus window is the time period used to compute a market-consensus probability, such as VWAP over the last 6 hours or the mid price snapshot at close.
Coverage
Coverage is the share of eligible questions you actually forecast. Low coverage can hide selection bias and makes performance comparisons less reliable.
Coverage Bias
Coverage bias is the distortion that occurs when evaluation reflects only the subset of questions a forecaster chose to answer, rather than the full eligible set.
Crowd Wisdom
Crowd wisdom is the idea that aggregated forecasts from many independent participants can be more accurate than most individuals, especially when incentives and information diversity are strong.
D
Data Leakage
Data leakage is when training or evaluation accidentally uses information that would not be available at prediction time. It creates unrealistic performance.
Decision Threshold
A decision threshold is a cutoff probability used to convert probabilities into yes/no decisions, such as acting only when p is above 0.60.
Decomposition
Brier score decomposition breaks overall error into components that reflect calibration (reliability), discrimination (resolution), and the inherent uncertainty of the task.
E
Edge
Edge is the difference between your estimated probability and the market’s implied probability. Positive edge suggests expected value if the price is fair and execution is possible.
Evaluation Checkpoint
An evaluation checkpoint is a fixed time at which forecasts are scored (for example T-24h or market close). It prevents look-ahead bias and makes comparisons fair.
Event Prevalence
Event prevalence is the proportion of events that resolve YES in a dataset. It is essentially the base rate of outcome = 1 and drives uncertainty and baseline scores.
Expected Value
Expected value (EV) is the average payoff you would expect from a decision under your probability estimate. It connects forecasts to action.
Extreme Probabilities
Extreme probabilities are forecasts near 0% or 100%. They can be valuable when justified, but they create large penalties when wrong.
F
Forecast Dispersion
Forecast dispersion measures how spread out probabilities are across questions or forecasters. Low dispersion can indicate herding or low sharpness.
Forecast Distribution
Forecast distribution is the histogram or density of predicted probabilities. It is a simple way to visualize sharpness and detect overuse of 50%.
Forecast Drift
Forecast drift is a gradual change in forecast behavior or calibration over time, often due to changing environments or shifting question mixes.
Forecast Error
Forecast error is the difference between your predicted probability and the actual outcome. Scoring rules summarize forecast error across many events.
Forecast Horizon
Forecast horizon is the time between when a probability forecast is made and when the event resolves. Longer horizons are typically harder and should be compared separately.
Forecast Update
A forecast update is a revision to a probability estimate as new information arrives. Good forecasting is iterative and records updates over time.
H
Herding
Herding is when forecasters copy the crowd or market consensus rather than independent reasoning. It can reduce crowd wisdom and hide true skill differences.
I
Implied Probability
Implied probability is the probability suggested by a market price or betting odds. In prediction markets, a contract price often approximates the market’s implied chance of the outcome.
L
Last Traded Price
Last traded price is the most recent transaction price. It can be misleading in thin markets because it may be stale or moved by a small trade.
Liquidity
Liquidity is how easily you can trade without moving the price. Low liquidity makes prices noisy and weakens market-consensus benchmarks.
Log Loss
Log loss (cross entropy) measures probabilistic forecast error by penalizing confident wrong predictions very strongly. Lower is better.
Look-ahead Bias
Look-ahead bias is using information that was not available at the time the forecast was made. It can dramatically inflate measured performance.
M
Manipulation
Manipulation is an attempt to move prices or perceived consensus away from fair value. It is easier in thin markets and can distort benchmark comparisons.
Market Close
Market close is the time when trading stops for a question. A close snapshot is often used as a benchmark probability for evaluation.
Market Consensus
Market consensus is a reference probability derived from market prices, such as the mid price, last trade, or a volume weighted average. It is often used as a benchmark forecast.
Methodology
Methodology describes the rules and settings used to compute scores and diagnostics, such as baselines, bins, clipping, and evaluation checkpoints.
Mid Price
Mid price is the average of the best bid and best ask. It is often a better proxy for consensus than last traded price in thin markets.
Miscalibration
Miscalibration is the gap between predicted probabilities and observed frequencies. It is the core failure mode behind unreliable probabilities.
O
Out-of-sample
Out-of-sample evaluation measures performance on data that was not used to form or tune the forecasts. It helps detect overfitting and ensures results generalize.
Overconfidence
Overconfidence is a systematic tendency to assign probabilities that are too extreme compared to reality. It shows up as high confidence predictions that do not come true often enough.
Overround
Overround is the sportsbook margin embedded in odds, causing implied probabilities to sum to more than 1. It is also called the vigorish or “vig” in many contexts.
P
Participation Rate
Participation rate is how often a user submits forecasts over time or across eligible questions. It complements score metrics by capturing consistency and engagement.
Price as Probability
Price as probability is the convention of interpreting a prediction market price as the market’s implied chance of the outcome, often approximately but not always exactly.
Probability Bucket
A probability bucket is a range of predicted probabilities used to group forecasts for calibration analysis, such as 0.60 to 0.70.
Probability Clipping
Probability clipping limits extreme probabilities (near 0 or 1) to avoid infinite penalties in log loss and to reduce the impact of extreme overconfidence.
Proper Scoring Rule
A proper scoring rule rewards probabilistic forecasts in a way that makes reporting your true beliefs the best strategy. It discourages hedging or “gaming” the score.
Q
Question Difficulty
Question difficulty reflects how hard an event is to forecast given available information. It affects score interpretation and fair comparisons across datasets.
R
Regime Change
A regime change is a structural shift in the environment that alters relationships and base rates, making past patterns less predictive.
Reliability
Reliability is another name for calibration. It describes whether forecast probabilities match observed frequencies within probability buckets.
Reliability Diagram
A reliability diagram is another name for a calibration curve: it shows predicted probability versus observed frequency by buckets to assess calibration.
Removing the Vig
Removing the vig normalizes sportsbook implied probabilities so they sum to 1, producing a “fair” probability estimate for comparison and evaluation.
Resolution
Resolution is the ability of forecasts to meaningfully separate situations where an event is more likely from situations where it is less likely. Higher resolution means your probabilities vary appropriately across cases.
Resolution (Forecasting)
Resolution is the degree to which forecasts separate cases with different outcome frequencies. Higher resolution means your probabilities vary meaningfully across situations.
Rolling Window
A rolling window evaluates performance over a moving time period (for example the last 30 days). It helps track improvement and detect drift over time.
S
Sample Size
Sample size (N) is the number of scored forecasts. Larger samples make scores more stable and reduce noise in calibration buckets and comparisons.
Scorecard
A scorecard is a shareable report that summarizes forecasting performance, including Brier score, skill score, calibration tables, and breakdowns by category or time.
Selection Bias
Selection bias occurs when the set of questions you score is not representative, such as only forecasting “easy” questions or only counting the ones you feel confident about.
Settlement
Settlement is the official resolution of a market or question, determining whether outcome is YES or NO and triggering payout or scoring.
Sharpe-like Metric
A Sharpe-like metric is a stability-adjusted performance measure (mean over variability). For forecasts, it can summarize average skill relative to volatility across windows.
Sharpness
Sharpness describes how concentrated your forecasts are away from 50%. Higher sharpness means you often make confident calls, but it is only good if you stay well calibrated.
Squared Error
Squared error is the squared difference between predicted probability and outcome. It is the per forecast loss used in Brier score.
T
Thin Market
A thin market has low liquidity and sparse trading. Prices can be stale or moved by small trades, making consensus estimates less reliable.
Time-weighted Scoring
Time-weighted scoring gives different weights to forecasts based on when they were made, for example weighting earlier forecasts more to reward early insight.
Timestamp Integrity
Timestamp integrity means forecast timestamps are accurate and cannot be manipulated. It is required to prevent look-ahead bias and to evaluate by horizon.
U
Underconfidence
Underconfidence is a systematic tendency to assign probabilities that are too close to 50% compared to reality. It appears when your 60% calls happen far more often than 60%.
V
Vig
Vig (vigorish) is the sportsbook margin built into odds. It makes implied probabilities sum to more than 1 and reduces expected value for bettors.
VWAP
VWAP (volume weighted average price) is the average traded price weighted by volume over a time window. It smooths noise and can represent consensus when trading is active.