Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms
We consider a class of distributed non-convex optimization problems often arises in modern distributed signal and information processing, in which a number of agents connected by a network G collectively optimize a sum of smooth (possibly non-convex) local objective functions. We address the following fundamental question: For a class of unconstrained non-convex problems with Lipschitz continuous gradient, by only utilizing local gradient information, what is the fastest rate that distributed algorithms can achieve, and how to achieve those rates. We develop a lower bound analysis that identifies difficult problem instances for any first-order method. We show that in the worst-case it takes any first-order algorithm O(D L /ϵ) iterations to achieve certain ϵ-solution, where D is the network diameter, and L is the Lipschitz constant of the gradient. Further for a general problem class and a number of network classes, we propose optimal primal-dual gradient methods whose rates precisely match the lower bounds (up to a ploylog factor). To the best of our knowledge, this is the first time that lower rate bounds and optimal methods have been developed for distributed non-convex problems. Our results provide guidelines for future design of distributed optimization algorithms, convex and non-convex alike.
READ FULL TEXT