Revisiting Distributionally Robust Supervised Learning in Classification
Distributionally Robust Supervised Learning (DRSL) is necessary for building reliable machine learning systems. When machine learning is deployed in the real world, its performance can be significantly degraded because test data may follow a different distribution from training data. Previous DRSL minimizes the loss for the worst-case test distribution. However, our theoretical analyses show that the previous DRSL essentially reduces to ordinary empirical risk minimization in a classification scenario. This implies that the previous DRSL ends up learning classifiers exactly for the given training data even though it is designed to be robust to distribution shift from the training dataset. In order to learn practically useful robust classifiers, our theoretical analyses motivate us to structurally constrain the distribution shift considered by DRSL. To this end, we propose novel DRSL which can incorporate the structural assumptions on distribution shift and that can learn useful robust decision boundaries based on the assumptions. We derive efficient gradient-based optimization algorithms and establish the convergence rate of the model parameter as well as the order of the estimation error for our DRSL. The effectiveness of our DRSL is demonstrated through experiments.
READ FULL TEXT