Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits

02/16/2022
by   Maximilian Egger, et al.
17

We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers from which at most b ≤ n can be utilized in parallel. By assigning tasks to all the workers and waiting only for the k fastest ones, the main node can trade-off the error of the algorithm with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k sync, can incur additional costs since it ignores the computational efforts of slow workers. We propose a cost-efficient scheme that assigns tasks only to k workers and gradually increases k. As the response times of the available workers are unknown to the main node a priori, we utilize a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations, and to minimize the effect of slow workers. Assuming that the mean response times of the workers are independent and exponentially distributed with different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Compared to adaptive k sync, our scheme achieves significantly lower errors with the same computational efforts while being inferior in terms of speed.

READ FULL TEXT
research
08/04/2022

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

We consider the setting where a master wants to run a distributed stocha...
research
04/17/2023

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

In distributed machine learning, a central node outsources computational...
research
02/25/2020

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

We consider the setting where a master wants to run a distributed stocha...
research
03/19/2018

D^2: Decentralized Training over Decentralized Data

While training a machine learning model using multiple workers, each of ...
research
01/17/2023

A Semi-supervised Sensing Rate Learning based CMAB Scheme to Combat COVID-19 by Trustful Data Collection in the Crowd

Mobile CrowdSensing (MCS), through employing considerable workers to sen...
research
10/02/2020

On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach

We analyze statistical discrimination using a multi-armed bandit model w...
research
11/21/2021

The Gittins Policy in the M/G/1 Queue

The Gittins policy is a highly general scheduling policy that minimizes ...

Please sign up or login with your details

Forgot password? Click here to reset