Dynamic ConvNets on Tiny Devices via Nested Sparsity

03/07/2022
by   Matteo Grimaldi, et al.
13

This work introduces a new training and compression pipeline to build Nested Sparse ConvNets, a class of dynamic Convolutional Neural Networks (ConvNets) suited for inference tasks deployed on resource-constrained devices at the edge of the Internet-of-Things. A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at run time, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weights subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit (MCU), Nested Sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving (i) comparable accuracy, (ii) remarkable storage savings, and (iii) high performance. Moreover, when compared to state-of-the-art dynamic strategies, like dynamic pruning and layer width scaling, Nested Sparse ConvNets turn out to be Pareto optimal in the accuracy vs. latency space.

READ FULL TEXT
research
06/12/2020

Dynamic Model Pruning with Feedback

Deep neural networks often have millions of parameters. This can hinder ...
research
12/11/2017

Learning Nested Sparse Structures in Deep Neural Networks

Recently, there have been increasing demands to construct compact deep a...
research
09/01/2021

Architecture Aware Latency Constrained Sparse Neural Networks

Acceleration of deep neural networks to meet a specific latency constrai...
research
06/10/2023

RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Deep Neural Network (DNN) based inference at the edge is challenging as ...
research
12/21/2021

Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting

Deep neural networks (DNNs) have shown to provide superb performance in ...
research
05/20/2019

DARC: Differentiable ARchitecture Compression

In many learning situations, resources at inference time are significant...
research
06/20/2018

Doubly Nested Network for Resource-Efficient Inference

We propose doubly nested network(DNNet) where all neurons represent thei...

Please sign up or login with your details

Forgot password? Click here to reset