Group calibration is a byproduct of unconstrained learning
Much recent work on fairness in machine learning has focused on how well a score function is calibrated in different groups within a given population, where each group is defined by restricting one or more sensitive attributes. We investigate to which extent group calibration follows from unconstrained empirical risk minimization on its own, without the need for any explicit intervention. We show that under reasonable conditions, the deviation from satisfying group calibration is bounded by the excess loss of the empirical risk minimizer relative to the Bayes optimal score function. As a corollary, it follows that empirical risk minimization can simultaneously achieve calibration for many groups, a task that prior work deferred to highly complex algorithms. We complement our results with a lower bound, and a range of experimental findings. Our results challenge the view that group calibration necessitates an active intervention, suggesting that often we ought to think of it as a byproduct of unconstrained machine learning.
READ FULL TEXT