Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite Sums
In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of n nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves finite sums with m components. ADFS converges linearly when local functions are smooth, and matches the rates of the best known finite sum algorithms when executed on a single machine. On several machines, ADFS enjoys a O (√(n)) or O(n) speed-up depending on the leading complexity term as long as the diameter of the network is not too big with respect to m. This also leads to a √(m) speed-up over state-of-the-art distributed batch methods, which is the expected speed-up for finite sum algorithms. In terms of communication times and network parameters, ADFS scales as well as optimal distributed batch algorithms. As a side contribution, we give a generalized version of the accelerated proximal coordinate gradient algorithm using arbitrary sampling that we apply to a well-chosen dual problem to derive ADFS. Yet, ADFS uses primal proximal updates that only require solving one-dimensional problems for many standard machine learning applications. Finally, ADFS can be formulated for non-smooth objectives with equally good scaling properties. We illustrate the improvement of ADFS over state-of-the-art approaches with simulations.
READ FULL TEXT