Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices

06/15/2022
by   Rongjie Yi, et al.
11

DNNs are ubiquitous on edge devices nowadays. With its increasing importance and use cases, it's not likely to pack all DNNs into device memory and expect that each inference has been warmed up. Therefore, cold inference, the process to read, initialize, and execute a DNN model, is becoming commonplace and its performance is urgently demanded to be optimized. To this end, we present NNV12, the first on-device inference engine that optimizes for cold inference NNV12 is built atop 3 novel optimization knobs: selecting a proper kernel (implementation) for each DNN operator, bypassing the weights transformation process by caching the post-transformed weights on disk, and pipelined execution of many kernels on asymmetric processors. To tackle with the huge search space, NNV12 employs a heuristic-based scheme to obtain a near-optimal kernel scheduling plan. We fully implement a prototype of NNV12 and evaluate its performance across extensive experiments. It shows that NNV12 achieves up to 15.2x and 401.5x compared to the state-of-the-art DNN engines on edge CPUs and GPUs, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2020

Scaling Up Deep Neural Network Optimization for Edge Inference

Deep neural networks (DNNs) have been increasingly deployed on and integ...
research
05/05/2021

ScissionLite: Accelerating Distributed Deep Neural Networks Using Transfer Layer

Industrial Internet of Things (IIoT) applications can benefit from lever...
research
05/01/2023

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

As deep neural networks (DNNs) are being applied to a wide range of edge...
research
07/10/2023

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Many applications such as autonomous driving and augmented reality, requ...
research
09/03/2019

Guardians of the Deep Fog: Failure-Resilient DNN Inference from Edge to Cloud

Partitioning and distributing deep neural networks (DNNs) over physical ...
research
02/29/2020

A Note on Latency Variability of Deep Neural Networks for Mobile Inference

Running deep neural network (DNN) inference on mobile devices, i.e., mob...
research
01/21/2023

SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

With the growing model size, deep neural networks (DNN) are increasingly...

Please sign up or login with your details

Forgot password? Click here to reset