Hardness of Learning Halfspaces with Massart Noise
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise. Specifically, given labeled examples (x, y) from a distribution D on ℝ^n×{± 1} such that the marginal distribution on x is arbitrary and the labels are generated by an unknown halfspace corrupted with Massart noise at rate η<1/2, we want to compute a hypothesis with small misclassification error. Characterizing the efficient learnability of halfspaces in the Massart model has remained a longstanding open problem in learning theory. Recent work gave a polynomial-time learning algorithm for this problem with error η+ϵ. This error upper bound can be far from the information-theoretically optimal bound of OPT+ϵ. More recent work showed that exact learning, i.e., achieving error OPT+ϵ, is hard in the Statistical Query (SQ) model. In this work, we show that there is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm. In particular, our lower bound implies that no efficient SQ algorithm can approximate the optimal error within any polynomial factor.
READ FULL TEXT