Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading

06/15/2022
by   Daliang Xu, et al.
0

This paper proposes Mandheling, the first system that enables highly resource-efficient on-device training by orchestrating the mixed-precision training with on-chip Digital Signal Processing (DSP) offloading. Mandheling fully explores the advantages of DSP in integer-based numerical calculation by four novel techniques: (1) a CPU-DSP co-scheduling scheme to mitigate the overhead from DSP-unfriendly operators; (2) a self-adaptive rescaling algorithm to reduce the overhead of dynamic rescaling in backward propagation; (3) a batch-splitting algorithm to improve the DSP cache efficiency; (4) a DSP-compute subgraph reusing mechanism to eliminate the preparation overhead on DSP. We have fully implemented Mandheling and demonstrate its effectiveness through extensive experiments. The results show that, compared to the state-of-the-art DNN engines from TFLite and MNN, Mandheling reduces the per-batch training time by 5.5× and the energy consumption by 8.9× on average. In end-to-end training tasks, Mandheling reduces up to 10.7× convergence time and 13.1× energy consumption, with only 1.9

READ FULL TEXT

page 10

page 12

research
06/02/2023

DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference

Due to limited resources on edge and different characteristics of deep n...
research
12/11/2021

Joint Device Association, Resource Allocation and Computation Offloading in Ultra-Dense Multi-Device and Multi-Task IoT Networks

With the emergence of more and more applications of Internet-of-Things (...
research
02/21/2023

Dynamic Resource Partitioning for Multi-Tenant Systolic Array Based DNN Accelerator

Deep neural networks (DNN) have become significant applications in both ...
research
06/12/2018

End-to-End Learning of Energy-Constrained Deep Neural Networks

Deep Neural Networks (DNN) are increasingly deployed in highly energy-co...
research
06/03/2022

Multi-user Co-inference with Batch Processing Capable Edge Server

Graphics processing units (GPUs) can improve deep neural network inferen...
research
02/13/2023

Divide and Save: Splitting Workload Among Containers in an Edge Device to Save Energy and Time

The increasing demand for edge computing is leading to a rise in energy ...
research
01/01/2023

Efficient On-device Training via Gradient Filtering

Despite its importance for federated learning, continuous learning and m...

Please sign up or login with your details

Forgot password? Click here to reset