Glossary of Terms

Key terms and definitions for prediction markets, forecasting, and probability.

A

Audit Trail

An audit trail is a tamper resistant record of forecasts, timestamps, and edits. It supports trust and prevents cherry picking.

B

Backtest

A backtest evaluates a forecasting method on historical data. Good backtests avoid look-ahead bias and report performance out of sample.

Base Rate

Base rate is the underlying frequency that an event occurs in a reference set. It is a natural baseline forecast and is often used to compute Brier skill score.

Base Rate Shift

Base rate shift is a change in how often an event occurs over time. It can break old baselines and cause calibration to drift.

Baseline Forecast

A baseline forecast is a simple reference model used for comparison, such as 50/50, base rate, or market consensus. It is the denominator in skill scoring.

Benchmark

A benchmark is a reference forecast used for comparison, such as 50/50, base rate, or market consensus. It is required to compute Brier skill score.

Binary Event

A binary event has exactly two outcomes, usually yes or no. Brier score and log loss are commonly used to score probabilistic forecasts for binary events.

Brier Score

Brier score measures the accuracy of probabilistic forecasts for binary events by averaging the squared error between predicted probability and the actual outcome. Lower is better.

Brier Skill Score

Brier skill score (BSS) measures how much better (or worse) your Brier score is versus a baseline forecast. Higher is better: 1 is perfect, 0 matches the baseline, and negative is worse than baseline.

C

Calibration

Calibration describes whether predicted probabilities match observed frequencies. A well calibrated forecaster’s 70% predictions come true about 70% of the time.

Calibration Curve

A calibration curve plots predicted probabilities against observed frequencies across probability buckets. It shows whether probabilities mean what they claim.

Calibration Drift

Calibration drift is a change in calibration over time, where probability buckets no longer match observed frequencies. It often follows base-rate shifts or regime changes.

Calibration Table

A calibration table summarizes bucket counts, average predicted probability, and realized frequency. It is the numeric backbone of calibration curves.

Class Imbalance

Class imbalance means outcomes are heavily skewed toward one side (mostly yes or mostly no). It affects baselines, calibration, and score interpretation.

Climatology

Climatology is a baseline forecast that predicts the historical base rate for every event. It is a standard benchmark for Brier skill score.

Confidence Interval

A confidence interval is a range that reflects uncertainty in an estimated metric, such as the realized frequency in a calibration bucket. Wider intervals usually mean smaller samples.

Consensus Snapshot

A consensus snapshot is the benchmark probability taken at a specific time, such as the mid price at market close. It is used for consistent comparisons.

Consensus Window

A consensus window is the time period used to compute a market-consensus probability, such as VWAP over the last 6 hours or the mid price snapshot at close.

Coverage

Coverage is the share of eligible questions you actually forecast. Low coverage can hide selection bias and makes performance comparisons less reliable.

Coverage Bias

Coverage bias is the distortion that occurs when evaluation reflects only the subset of questions a forecaster chose to answer, rather than the full eligible set.

Crowd Wisdom

Crowd wisdom is the idea that aggregated forecasts from many independent participants can be more accurate than most individuals, especially when incentives and information diversity are strong.

D

Data Leakage

Data leakage is when training or evaluation accidentally uses information that would not be available at prediction time. It creates unrealistic performance.

Decision Threshold

A decision threshold is a cutoff probability used to convert probabilities into yes/no decisions, such as acting only when p is above 0.60.

Decomposition

Brier score decomposition breaks overall error into components that reflect calibration (reliability), discrimination (resolution), and the inherent uncertainty of the task.

E

Edge

Edge is the difference between your estimated probability and the market’s implied probability. Positive edge suggests expected value if the price is fair and execution is possible.

Evaluation Checkpoint

An evaluation checkpoint is a fixed time at which forecasts are scored (for example T-24h or market close). It prevents look-ahead bias and makes comparisons fair.

Event Prevalence

Event prevalence is the proportion of events that resolve YES in a dataset. It is essentially the base rate of outcome = 1 and drives uncertainty and baseline scores.

Expected Value

Expected value (EV) is the average payoff you would expect from a decision under your probability estimate. It connects forecasts to action.

Extreme Probabilities

Extreme probabilities are forecasts near 0% or 100%. They can be valuable when justified, but they create large penalties when wrong.

F

Forecast Dispersion

Forecast dispersion measures how spread out probabilities are across questions or forecasters. Low dispersion can indicate herding or low sharpness.

Forecast Distribution

Forecast distribution is the histogram or density of predicted probabilities. It is a simple way to visualize sharpness and detect overuse of 50%.

Forecast Drift

Forecast drift is a gradual change in forecast behavior or calibration over time, often due to changing environments or shifting question mixes.

Forecast Error

Forecast error is the difference between your predicted probability and the actual outcome. Scoring rules summarize forecast error across many events.

Forecast Horizon

Forecast horizon is the time between when a probability forecast is made and when the event resolves. Longer horizons are typically harder and should be compared separately.

Forecast Update

A forecast update is a revision to a probability estimate as new information arrives. Good forecasting is iterative and records updates over time.

H

Herding

Herding is when forecasters copy the crowd or market consensus rather than independent reasoning. It can reduce crowd wisdom and hide true skill differences.

I

Implied Probability

Implied probability is the probability suggested by a market price or betting odds. In prediction markets, a contract price often approximates the market’s implied chance of the outcome.

L

Last Traded Price

Last traded price is the most recent transaction price. It can be misleading in thin markets because it may be stale or moved by a small trade.

Liquidity

Liquidity is how easily you can trade without moving the price. Low liquidity makes prices noisy and weakens market-consensus benchmarks.

Log Loss

Log loss (cross entropy) measures probabilistic forecast error by penalizing confident wrong predictions very strongly. Lower is better.

Look-ahead Bias

Look-ahead bias is using information that was not available at the time the forecast was made. It can dramatically inflate measured performance.

M

Manipulation

Manipulation is an attempt to move prices or perceived consensus away from fair value. It is easier in thin markets and can distort benchmark comparisons.

Market Close

Market close is the time when trading stops for a question. A close snapshot is often used as a benchmark probability for evaluation.

Market Consensus

Market consensus is a reference probability derived from market prices, such as the mid price, last trade, or a volume weighted average. It is often used as a benchmark forecast.

Methodology

Methodology describes the rules and settings used to compute scores and diagnostics, such as baselines, bins, clipping, and evaluation checkpoints.

Mid Price

Mid price is the average of the best bid and best ask. It is often a better proxy for consensus than last traded price in thin markets.

Miscalibration

Miscalibration is the gap between predicted probabilities and observed frequencies. It is the core failure mode behind unreliable probabilities.

O

Out-of-sample

Out-of-sample evaluation measures performance on data that was not used to form or tune the forecasts. It helps detect overfitting and ensures results generalize.

Overconfidence

Overconfidence is a systematic tendency to assign probabilities that are too extreme compared to reality. It shows up as high confidence predictions that do not come true often enough.

Overround

Overround is the sportsbook margin embedded in odds, causing implied probabilities to sum to more than 1. It is also called the vigorish or “vig” in many contexts.

P

Participation Rate

Participation rate is how often a user submits forecasts over time or across eligible questions. It complements score metrics by capturing consistency and engagement.

Price as Probability

Price as probability is the convention of interpreting a prediction market price as the market’s implied chance of the outcome, often approximately but not always exactly.

Probability Bucket

A probability bucket is a range of predicted probabilities used to group forecasts for calibration analysis, such as 0.60 to 0.70.

Probability Clipping

Probability clipping limits extreme probabilities (near 0 or 1) to avoid infinite penalties in log loss and to reduce the impact of extreme overconfidence.

Proper Scoring Rule

A proper scoring rule rewards probabilistic forecasts in a way that makes reporting your true beliefs the best strategy. It discourages hedging or “gaming” the score.

Q

Question Difficulty

Question difficulty reflects how hard an event is to forecast given available information. It affects score interpretation and fair comparisons across datasets.

R

Regime Change

A regime change is a structural shift in the environment that alters relationships and base rates, making past patterns less predictive.

Reliability

Reliability is another name for calibration. It describes whether forecast probabilities match observed frequencies within probability buckets.

Reliability Diagram

A reliability diagram is another name for a calibration curve: it shows predicted probability versus observed frequency by buckets to assess calibration.

Removing the Vig

Removing the vig normalizes sportsbook implied probabilities so they sum to 1, producing a “fair” probability estimate for comparison and evaluation.

Resolution

Resolution is the ability of forecasts to meaningfully separate situations where an event is more likely from situations where it is less likely. Higher resolution means your probabilities vary appropriately across cases.

Resolution (Forecasting)

Resolution is the degree to which forecasts separate cases with different outcome frequencies. Higher resolution means your probabilities vary meaningfully across situations.

Rolling Window

A rolling window evaluates performance over a moving time period (for example the last 30 days). It helps track improvement and detect drift over time.

S

Sample Size

Sample size (N) is the number of scored forecasts. Larger samples make scores more stable and reduce noise in calibration buckets and comparisons.

Scorecard

A scorecard is a shareable report that summarizes forecasting performance, including Brier score, skill score, calibration tables, and breakdowns by category or time.

Selection Bias

Selection bias occurs when the set of questions you score is not representative, such as only forecasting “easy” questions or only counting the ones you feel confident about.

Settlement

Settlement is the official resolution of a market or question, determining whether outcome is YES or NO and triggering payout or scoring.

Sharpe-like Metric

A Sharpe-like metric is a stability-adjusted performance measure (mean over variability). For forecasts, it can summarize average skill relative to volatility across windows.

Sharpness

Sharpness describes how concentrated your forecasts are away from 50%. Higher sharpness means you often make confident calls, but it is only good if you stay well calibrated.

Squared Error

Squared error is the squared difference between predicted probability and outcome. It is the per forecast loss used in Brier score.

T

Thin Market

A thin market has low liquidity and sparse trading. Prices can be stale or moved by small trades, making consensus estimates less reliable.

Time-weighted Scoring

Time-weighted scoring gives different weights to forecasts based on when they were made, for example weighting earlier forecasts more to reward early insight.

Timestamp Integrity

Timestamp integrity means forecast timestamps are accurate and cannot be manipulated. It is required to prevent look-ahead bias and to evaluate by horizon.

U

Underconfidence

Underconfidence is a systematic tendency to assign probabilities that are too close to 50% compared to reality. It appears when your 60% calls happen far more often than 60%.

V

Vig

Vig (vigorish) is the sportsbook margin built into odds. It makes implied probabilities sum to more than 1 and reduces expected value for bettors.

VWAP

VWAP (volume weighted average price) is the average traded price weighted by volume over a time window. It smooths noise and can represent consensus when trading is active.