Robust Online Classification: From Estimation to Denoising
We study online classification in the presence of noisy labels. The noise mechanism is modeled by a general kernel that specifies, for any feature-label pair, a (known) set of distributions over noisy labels. At each time step, an adversary selects an unknown distribution from the distribution set specified by the kernel based on the actual feature-label pair, and generates the noisy label from the selected distribution. The learner then makes a prediction based on the actual features and noisy labels observed thus far, and incurs loss 1 if the prediction differs from the underlying truth (and 0 otherwise). The prediction quality is quantified through minimax risk, which computes the cumulative loss over a finite horizon T. We show that for a wide range of natural noise kernels, adversarially selected features, and finite class of labeling functions, minimax risk can be upper bounded independent of the time horizon and logarithmic in the size of labeling function class. We then extend these results to inifinite classes and stochastically generated features via the concept of stochastic sequential covering. Our results extend and encompass findings of Ben-David et al. (2009) through substantial generality, and provide intuitive understanding through a novel reduction to online conditional distribution estimation.
READ FULL TEXT