DeepAI AI Chat
Log In Sign Up

A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

by   Aurélien Bellet, et al.
University of Southern California
Télécom ParisTech
Princeton University

Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error ϵ and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an ϵ-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.


page 1

page 2

page 3

page 4


Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

Distributed computing excels at processing large scale data, but the com...

Quantized Frank-Wolfe: Communication-Efficient Distributed Optimization

How can we efficiently mitigate the overhead of gradient communications ...

Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Communication compression is a common technique in distributed optimizat...

AIDE: Fast and Communication Efficient Distributed Optimization

In this paper, we present two new communication-efficient methods for di...

DEED: A General Quantization Scheme for Communication Efficiency in Bits

In distributed optimization, a popular technique to reduce communication...

Distributed Graph Learning with Smooth Data Priors

Graph learning is often a necessary step in processing or representing s...

Distributed Distributionally Robust Optimization with Non-Convex Objectives

Distributionally Robust Optimization (DRO), which aims to find an optima...