ALERT: Accurate Anytime Learning for Energy and Timeliness

10/31/2019
by   Chengcheng Wan, et al.
0

An increasing number of software applications incorporate runtime Deep Neural Network (DNN) inference for its great accuracy in many problem domains. While much prior work has separately tackled the problems of improving DNN-inference accuracy and improving DNN-inference efficiency, an important problem is under-explored: disciplined methods for dynamically managing application-specific latency, accuracy, and energy tradeoffs and constraints at run time. To address this need, we propose ALERT, a co-designed combination of runtime system and DNN nesting technique. The runtime takes latency, accuracy, and energy constraints, and uses dynamic feedback to predict the best DNN-model and system power-limit setting. The DNN nesting creates a type of flexible network that efficiently delivers a series of results with increasing accuracy as time goes on. These two parts well complement each other: the runtime is aware of the tradeoffs of different DNN settings, and the nested DNNs' flexibility allows the runtime prediction to satisfy application requirements even in unpredictable, changing environments. On real systems for both image and speech, ALERT achieves close-to-optimal results. Comparing with the optimal static DNN-model and power-limit setting, which is impractical to predict, ALERT achieves a harmonic mean 33 constraints, and reduces image-classification error rate by 58 sentence-prediction perplexity by 52

READ FULL TEXT
research
10/31/2019

ALERT: Accurate Learning for Energy and Timeliness

An increasing number of software applications incorporate runtime Deep N...
research
12/27/2021

Automatic Configuration for Optimal Communication Scheduling in DNN Training

ByteScheduler partitions and rearranges tensor transmissions to improve ...
research
05/12/2020

Energy-Aware DNN Graph Optimization

Unlike existing work in deep neural network (DNN) graphs optimization fo...
research
05/08/2021

Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms

Mobile and embedded platforms are increasingly required to efficiently e...
research
05/08/2021

Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms

Inference for Deep Neural Networks is increasingly being executed locall...
research
04/12/2021

ENOS: Energy-Aware Network Operator Search for Hybrid Digital and Compute-in-Memory DNN Accelerators

This work proposes a novel Energy-Aware Network Operator Search (ENOS) a...
research
01/28/2021

AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model Compression for Mobile Applications

There are many deep learning (e.g., DNN) powered mobile and wearable app...

Please sign up or login with your details

Forgot password? Click here to reset