Architecture Aware Latency Constrained Sparse Neural Networks

09/01/2021
by   Tianli Zhao, et al.
0

Acceleration of deep neural networks to meet a specific latency constraint is essential for their deployment on mobile devices. In this paper, we design an architecture aware latency constrained sparse (ALCS) framework to prune and accelerate CNN models. Taking modern mobile computation architectures into consideration, we propose Single Instruction Multiple Data (SIMD)-structured pruning, along with a novel sparse convolution algorithm for efficient computation. Besides, we propose to estimate the run time of sparse models with piece-wise linear interpolation. The whole latency constrained pruning task is formulated as a constrained optimization problem that can be efficiently solved with Alternating Direction Method of Multipliers (ADMM). Extensive experiments show that our system-algorithm co-design framework can achieve much better Pareto frontier among network accuracy and latency on resource-constrained mobile devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Joint Channel and Weight Pruning for Model Acceleration on Moblie Devices

For practical deep neural network design on mobile devices, it is essent...
research
07/19/2021

Latency-Memory Optimized Splitting of Convolution Neural Networks for Resource Constrained Edge Devices

With the increasing reliance of users on smart devices, bringing essenti...
research
02/16/2020

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications

Deep Neural Networks (DNNs) are allowing mobile devices to incorporate a...
research
03/07/2022

Dynamic ConvNets on Tiny Devices via Nested Sparsity

This work introduces a new training and compression pipeline to build Ne...
research
01/13/2021

NetCut: Real-Time DNN Inference Using Layer Removal

Deep Learning plays a significant role in assisting humans in many aspec...
research
02/12/2021

Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices

A pruning-based AutoML framework for run-time reconfigurability, namely ...
research
07/23/2020

PareCO: Pareto-aware Channel Optimization for Slimmable Neural Networks

Slimmable neural networks provide a flexible trade-off front between pre...

Please sign up or login with your details

Forgot password? Click here to reset