DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems

12/16/2021
by   Martin Rapp, et al.
0

We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly dropping filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2022

CoCo-FL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization

Devices participating in federated learning (FL) typically have heteroge...
research
01/26/2022

Fast Server Learning Rate Tuning for Coded Federated Dropout

In cross-device Federated Learning (FL), clients with low computational ...
research
09/30/2021

Federated Dropout – A Simple Approach for Enabling Federated Learning on Resource Constrained Devices

Federated learning (FL) is a popular framework for training an AI model ...
research
05/26/2023

Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices

Federated learning (FL) is usually performed on resource-constrained edg...
research
04/25/2022

FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning Using Shared Data on the Server

Despite achieving remarkable performance, Federated Learning (FL) suffer...
research
07/05/2023

FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

Federated Learning (FL) allows machine learning models to train locally ...
research
06/09/2020

Distributed Learning on Heterogeneous Resource-Constrained Devices

We consider a distributed system, consisting of a heterogeneous set of d...

Please sign up or login with your details

Forgot password? Click here to reset