Consider some binary categorization of people (for example, yes or no recidivism). For a model
- Group calibration: scores are calibrated across groups and bins, so
- Positive balance: average scores for positive people across groups should be the same,
where
Intuitively, the first notion requires the score to mean what theyโre intendedโprobabilitiesโfor each subgroup; this means that different people with the same score should be treated comparably. The second and third similarly enforce fairness across predictions for the groupโscores for true positives and true negatives should be comparable across groups.
Unfortunately, itโs impossible to satisfy all three properties. A proof sketch is as follows.
Let
Assume for contradiction that all fairness definitions hold. Then, for a group
Across two groups
which forces the base rates across the two groups to be equal. This condition is rarely satisfied in the real world. Alternatively, if the lines arenโt equal, they intersect at
Thus, these three constraints canโt be simultaneously satisfied in most cases. More details are discussed in Kleinberg et al.