Smooth minimization of nonsmooth functions with parallel coordinate descent methods
We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions. The problem class includes as a special case L1-regularized L1 regression and the minimization of the exponential loss ("AdaBoost problem"). We assume the input data defining the loss function is contained in a sparse m× n matrix A with at most ω nonzeros in each row. Our methods need O(n β/τ) iterations to find an approximate solution with high probability, where τ is the number of processors and β = 1 + (ω-1)(τ-1)/(n-1) for the fastest variant. The notation hides dependence on quantities such as the required accuracy and confidence levels and the distance of the starting iterate from an optimal point. Since β/τ is a decreasing function of τ, the method needs fewer iterations when more processors are used. Certain variants of our algorithms perform on average only O((A)/n) arithmetic operations during a single iteration per processor and, because β decreases when ω does, fewer iterations are needed for sparser problems.
READ FULL TEXT