Efficient On-device Training via Gradient Filtering

01/01/2023
by   Yuedong Yang, et al.
0

Despite its importance for federated learning, continuous learning and many other applications, on-device training remains an open problem for EdgeAI. The problem stems from the large number of operations (e.g., floating point multiplications and additions) and memory consumption required during training by the back-propagation algorithm. Consequently, in this paper, we propose a new gradient filtering approach which enables on-device DNN model training. More precisely, our approach creates a special structure with fewer unique elements in the gradient map, thus significantly reducing the computational complexity and memory consumption of back propagation during training. Extensive experiments on image classification and semantic segmentation with multiple DNN models (e.g., MobileNet, DeepLabV3, UPerNet) and devices (e.g., Raspberry Pi and Jetson Nano) demonstrate the effectiveness and wide applicability of our approach. For example, compared to SOTA, we achieve up to 19× speedup and 77.1 only 0.1 over 20× speedup and 90 highly optimized baselines in MKLDNN and CUDNN on NVIDIA Jetson Nano. Consequently, our approach opens up a new direction of research with a huge potential for on-device training.

READ FULL TEXT

page 2

page 3

page 8

page 15

research
12/22/2021

FedLGA: Towards System-Heterogeneity of Federated Learning via Local Gradient Approximation

Federated Learning (FL) is a decentralized machine learning architecture...
research
04/11/2023

Communication Efficient DNN Partitioning-based Federated Learning

Efficiently running federated learning (FL) on resource-constrained devi...
research
08/04/2022

ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity

When the available hardware cannot meet the memory and compute requireme...
research
10/26/2022

Low-latency Federated Learning with DNN Partition in Distributed Industrial IoT Networks

Federated Learning (FL) empowers Industrial Internet of Things (IIoT) wi...
research
04/28/2022

FPIRM: Floating-point Processing in Racetrack Memories

Convolutional neural networks (CNN) have become a ubiquitous algorithm w...
research
06/15/2022

Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading

This paper proposes Mandheling, the first system that enables highly res...
research
06/30/2022

On-Device Training Under 256KB Memory

On-device training enables the model to adapt to new data collected from...

Please sign up or login with your details

Forgot password? Click here to reset