Log Loss vs Brier: Which One Punishes You More and Why

Published on January 1, 2026

Two proper scoring rules, two different personalities

Both Brier score and log loss reward honest probabilities. Both are proper scoring rules.

The difference is how they punish mistakes, especially extreme mistakes.

Brier score in one line

Brier score is squared error:

(p - o)^2

• p is your predicted probability of YES

• o is outcome (1 for YES, 0 for NO)

It punishes mistakes smoothly. The penalty grows with distance, but not explosively.

Log loss in one line

Log loss penalizes you based on how much probability you assigned to what actually happened.

For a YES outcome:

LL = -log(p)

For a NO outcome:

LL = -log(1 - p)

It punishes extreme wrong calls extremely hard.

Worked examples: the same forecast, two scores

Case 1: you predict 0.90 and YES happens

• Brier: (0.90 - 1)^2 = 0.01

• Log loss: -log(0.90) is small

Both reward you for being confidently right.

Case 2: you predict 0.90 and NO happens

• Brier: (0.90 - 0)^2 = 0.81

• Log loss: -log(0.10) is very large

This is the key difference. Log loss treats this as a disaster, because you assigned near zero probability to what occurred.

Case 3: you predict 0.60 and NO happens

• Brier: (0.60 - 0)^2 = 0.36

• Log loss: -log(0.40) is moderate

Both penalize the mistake, but log loss is not as explosive here because you did not go extreme.

Intuition: why log loss is harsher

Log loss cares about the probability assigned to the realized outcome. If you claim an outcome is almost impossible and it happens, log loss is designed to crush you.

Brier score still penalizes you heavily, but it is bounded between 0 and 1 for binary events. Log loss is unbounded as p approaches 0 or 1.

When Brier score is a good fit

• you want a stable, interpretable metric for scorecards

• you want to pair it with Brier skill score vs a benchmark

• you care about decomposing performance into reliability and resolution

When log loss is a good fit

• you want to strongly discourage extreme probabilities unless you are truly sure

• your system is sensitive to rare tail events and you want to penalize missing them

• you want a metric aligned with probabilistic likelihood methods

Why many platforms use both

A practical pattern is:

• headline score: Brier score plus BSS

• risk control score: log loss as an additional diagnostic

This gives users a stable main metric and a clear warning when they go too extreme.

Probability clipping: a common implementation detail

Because log loss explodes at 0 and 1, many implementations apply probability clipping, for example forcing probabilities into a range like 0.01 to 0.99.

Clipping is not cheating, but it is a methodology choice. If you clip, document it.

Common mistakes

Comparing raw numbers across metrics: a Brier score of 0.18 is not directly comparable to a log loss of 0.52. They are different scales.

Ignoring sample size: with small sample size, both metrics are noisy, and log loss can be dominated by one extreme miss.

Reading log loss without calibration context: pair it with calibration diagnostics.

Takeaway

Brier score is a stable squared error metric. Log loss is harsher and punishes extreme wrong calls much more. Use Brier (and BSS) for scorecards and comparisons, and use log loss as a second metric when you want to discourage reckless certainty.

• Brier Score

• Log Loss

• Proper Scoring Rule

• Probability Clipping

• Brier Score vs Brier Skill Score: When to Use Which

• Brier Score Decomposition: Reliability, Resolution, Uncertainty

← Back to Guides