Overconfidence and Underconfidence: How to Diagnose and Fix
Two common calibration failure modes
If you forecast probabilities long enough, you will see one of these patterns:
• overconfidence: you assign probabilities that are too high relative to reality
• underconfidence: you assign probabilities that are too low relative to reality
Both are forms of miscalibration and both show up clearly in a calibration table.
How to diagnose overconfidence
You are overconfident when the realized frequency is below your predicted probability in the higher buckets.
Example pattern:
• your 0.80 bucket resolves YES only 0.62 of the time
• your 0.70 bucket resolves YES only 0.55 of the time
This means your “80%” behaves more like “62%”.
How to diagnose underconfidence
You are underconfident when realized frequency is above your predicted probability in the higher buckets.
Example pattern:
• your 0.60 bucket resolves YES 0.75 of the time
• your 0.70 bucket resolves YES 0.82 of the time
This means you are too timid. You should push probabilities away from 0.50 more when evidence supports it.
First check: is it real or just noise
Before you “fix” anything, check:
• bucket counts (sample size)
• whether deviations are consistent across multiple buckets
• whether you are mixing unlike categories or horizons
If the buckets are tiny, your pattern may be random. Use fewer buckets or evaluate over a longer window.
The simplest fix: probability mapping
The fastest practical calibration fix is to apply a consistent mapping from your raw probabilities to a calibrated probability.
Two common approaches:
1) Shrink toward 0.50
This helps overconfidence. You compress extremes toward the middle.
Example idea:
• map 0.90 to 0.80
• map 0.80 to 0.70
• map 0.70 to 0.62
Then re score your forecasts and check whether calibration improves.
2) Stretch away from 0.50
This helps underconfidence. You increase sharpness by moving probabilities farther from the middle.
Example idea:
• map 0.55 to 0.60
• map 0.60 to 0.70
• map 0.70 to 0.80
Bucket based mapping: a practical method
You can build a mapping directly from your calibration table:
• take each bucket
• map its average predicted probability to its realized frequency
Example:
• your 0.78 average bucket resolves at 0.63
• so your map sends 0.78 to 0.63
This is a simple, data driven correction.
Common causes and what to do
Cause: base rate neglect
If you regularly ignore the base rate, you will tend to become overconfident. Fix by anchoring on base rates first and moving away only with evidence.
Cause: mixing horizons
Late forecasts are easier. If you mix early and late forecasts, you can create fake patterns. Fix with evaluation checkpoints or horizon splits.
Cause: herding then reversing
If you follow market consensus and then swing away emotionally, you can produce unstable calibration. Fix by setting update rules and keeping an audit trail.
How to know your fix worked
Look for:
• improved calibration curve closer to the diagonal
• better bucket stability across time windows
• improved headline Brier score and Brier skill score
Takeaway
Overconfidence and underconfidence are calibration problems you can diagnose directly from calibration buckets. Start by verifying sample size. Then apply simple probability mapping: shrink toward 0.50 for overconfidence, stretch away for underconfidence, and re score to validate the improvement.