Average Case Column Subset Selection for Entrywise ℓ_1-Norm Loss
We study the column subset selection problem with respect to the entrywise ℓ_1-norm loss. It is known that in the worst case, to obtain a good rank-k approximation to a matrix, one needs an arbitrarily large n^Ω(1) number of columns to obtain a (1+ϵ)-approximation to the best entrywise ℓ_1-norm low rank approximation of an n × n matrix. Nevertheless, we show that under certain minimal and realistic distributional settings, it is possible to obtain a (1+ϵ)-approximation with a nearly linear running time and poly(k/ϵ)+O(klog n) columns. Namely, we show that if the input matrix A has the form A = B + E, where B is an arbitrary rank-k matrix, and E is a matrix with i.i.d. entries drawn from any distribution μ for which the (1+γ)-th moment exists, for an arbitrarily small constant γ > 0, then it is possible to obtain a (1+ϵ)-approximate column subset selection to the entrywise ℓ_1-norm in nearly linear time. Conversely we show that if the first moment does not exist, then it is not possible to obtain a (1+ϵ)-approximate subset selection algorithm even if one chooses any n^o(1) columns. This is the first algorithm of any kind for achieving a (1+ϵ)-approximation for entrywise ℓ_1-norm loss low rank approximation.
READ FULL TEXT