Provable Approximations for Constrained ℓ_p Regression
The ℓ_p linear regression problem is to minimize f(x)=||Ax-b||_p over x∈R^d, where A∈R^n× d, b∈R^n, and p>0. To avoid overfitting and bound ||x||_2, the constrained ℓ_p regression minimizes f(x) over every unit vector x∈R^d. This makes the problem non-convex even for the simplest case d=p=2. Instead, ridge regression is used to minimize the Lagrange form f(x)+λ ||x||_2 over x∈R^d, which yields a convex problem in the price of calibrating the regularization parameter λ>0. We provide the first provable constant factor approximation algorithm that solves the constrained ℓ_p regression directly, for every constant p,d≥ 1. Using core-sets, its running time is O(n n) including extensions for streaming and distributed (big) data. In polynomial time, it can handle outliers, p∈ (0,1) and minimize f(x) over every x and permutation of rows in A. Experimental results are also provided, including open source and comparison to existing software.
READ FULL TEXT