CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

07/04/2022
by   Taeho Kim, et al.
0

Mobile devices run deep learning models for various purposes, such as image classification and speech recognition. Due to the resource constraints of mobile devices, researchers have focused on either making a lightweight deep neural network (DNN) model using model pruning or generating an efficient code using compiler optimization. Surprisingly, we found that the straightforward integration between model compression and compiler auto-tuning often does not produce the most efficient model for a target device. We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy. CPrune makes a lightweight DNN model through informed pruning based on the structural information of subgraphs built during the compiler tuning process. Our experimental results show that CPrune increases the DNN execution speed up to 2.73x compared to the state-of-the-art TVM auto-tune while satisfying the accuracy requirement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2020

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

High-end mobile platforms rapidly serve as primary computing devices for...
research
01/01/2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

With the emergence of a spectrum of high-end mobile devices, many applic...
research
11/30/2022

Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution

Embedded and IoT devices, largely powered by microcontroller units (MCUs...
research
01/20/2020

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

Weight pruning has been widely acknowledged as a straightforward and eff...
research
05/02/2019

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

With the rapid emergence of a spectrum of high-end mobile devices, many ...
research
01/23/2020

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Accelerating DNN execution on various resource-limited computing platfor...
research
04/27/2023

Compiler Auto-tuning through Multiple Phase Learning

Widely used compilers like GCC and LLVM usually have hundreds of optimiz...

Please sign up or login with your details

Forgot password? Click here to reset