Online Active Learning: Label Complexity vs. Classification Errors
We study online active learning for classifying streaming instances. At each time, the decision maker decides whether to query for the label of the current instance and, in the event of no query, self labels the instance. The objective is to minimize the number of queries while constraining the number of classification errors over a horizon of length T. We consider a general concept space with a finite VC dimension d and adopt the agnostic setting where the instance distribution is unknown and labels are noisy following an unknown conditional distribution. We propose a disagreement-based online learning algorithm and establish its O(d^2 T) label complexity and Θ(1) (i.e., bounded) classification errors in excess to the best classifier in the concept space under the Massart bounded noise condition. This represents the first study of online active learning under a general concept space. The proposed algorithm is shown to outperform extensions of representative offline algorithms developed under the PAC setting as well as online algorithms specialized for learning homogeneous linear separators.
READ FULL TEXT