The Minimax Complexity of Distributed Optimization

by   Blake Woodworth, et al.

In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the "graph oracle model", an extension of the classic oracle complexity framework that can be applied to study distributed optimization algorithms. Next, I describe a general approach to proving optimization lower bounds for arbitrary randomized algorithms (as opposed to more restricted classes of algorithms, e.g., deterministic or "zero-respecting" algorithms), which is used extensively throughout the thesis. For the remainder of the thesis, I focus on the specific case of the "intermittent communication setting", where multiple computing devices work in parallel with limited communication amongst themselves. In this setting, I analyze the theoretical properties of the popular Local Stochastic Gradient Descent (SGD) algorithm in convex setting, both for homogeneous and heterogeneous objectives. I provide the first guarantees for Local SGD that improve over simple baseline methods, but show that Local SGD is not optimal in general. In pursuit of optimal methods in the intermittent communication setting, I then show matching upper and lower bounds for the intermittent communication setting with homogeneous convex, heterogeneous convex, and homogeneous non-convex objectives. These upper bounds are attained by simple variants of SGD which are therefore optimal. Finally, I discuss several additional assumptions about the objective or more powerful oracles that might be exploitable in order to develop better intermittent communication algorithms with better guarantees than our lower bounds allow.



page 1

page 2

page 3

page 4


Is Local SGD Better than Minibatch SGD?

We study local SGD (also known as parallel SGD and federated averaging),...

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

We suggest a general oracle-based framework that captures different para...

Improved Communication Lower Bounds for Distributed Optimisation

Motivated by the interest in communication-efficient methods for distrib...

Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps

In this thesis, we propose new theoretical frameworks for the analysis o...

Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility

We study the Convex Set Disjointness (CSD) problem, where two players ha...

Distributed Zero-Order Optimization under Adversarial Noise

We study the problem of distributed zero-order optimization for a class ...

AIDE: Fast and Communication Efficient Distributed Optimization

In this paper, we present two new communication-efficient methods for di...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.