26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

05/02/2019
by   Wei Niu, et al.
0

With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem. However, without a careful optimization, executing Deep Neural Networks (a key building block of the real-time video stream processing that is the foundation of many popular applications) is still challenging, specifically, if an extremely low latency or high accuracy inference is needed. This work presents CADNN, a programming framework to efficiently execute DNN on mobile devices with the help of advanced model compression (sparsity) and a set of thorough architecture-aware optimization. The evaluation result demonstrates that CADNN outperforms all the state-of-the-art dense DNN execution frameworks like TensorFlow Lite and TVM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/01/2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

With the emergence of a spectrum of high-end mobile devices, many applic...
research
04/22/2020

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

High-end mobile platforms rapidly serve as primary computing devices for...
research
12/18/2021

LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Deep neural networks (DNNs) have become ubiquitous techniques in mobile ...
research
07/20/2020

Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Mobile devices are becoming an important carrier for deep learning tasks...
research
01/20/2020

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

Weight pruning has been widely acknowledged as a straightforward and eff...
research
07/04/2022

CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

Mobile devices run deep learning models for various purposes, such as im...
research
06/22/2023

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

Cascade systems comprise a two-model sequence, with a lightweight model ...

Please sign up or login with your details

Forgot password? Click here to reset