Gated Compression Layers for Efficient Always-On Models

03/15/2023
by   Haiguang Li, et al.
0

Mobile and embedded machine learning developers frequently have to compromise between two inferior on-device deployment strategies: sacrifice accuracy and aggressively shrink their models to run on dedicated low-power cores; or sacrifice battery by running larger models on more powerful compute cores such as neural processing units or the main application processor. In this paper, we propose a novel Gated Compression layer that can be applied to transform existing neural network architectures into Gated Neural Networks. Gated Neural Networks have multiple properties that excel for on-device use cases that help significantly reduce power, boost accuracy, and take advantage of heterogeneous compute cores. We provide results across five public image and audio datasets that demonstrate the proposed Gated Compression layer effectively stops up to 96 or improving model accuracy.

READ FULL TEXT
research
05/18/2018

Neural Network Compression using Transform Coding and Clustering

With the deployment of neural networks on mobile devices and the necessi...
research
02/13/2019

An Optimized Recurrent Unit for Ultra-Low-Power Keyword Spotting

There is growing interest in being able to run neural networks on sensor...
research
12/15/2022

Towards Hardware-Specific Automatic Compression of Neural Networks

Compressing neural network architectures is important to allow the deplo...
research
11/13/2020

On the stability properties of Gated Recurrent Units neural networks

The goal of this paper is to provide sufficient conditions for guarantee...
research
11/29/2017

Now Playing: Continuous low-power music recognition

Existing music recognition applications require a connection to a server...
research
05/22/2017

A Low-Power Accelerator for Deep Neural Networks with Enlarged Near-Zero Sparsity

It remains a challenge to run Deep Learning in devices with stringent po...
research
02/14/2020

An optimal scheduling architecture for accelerating batch algorithms on Neural Network processor architectures

In neural network topologies, algorithms are running on batches of data ...

Please sign up or login with your details

Forgot password? Click here to reset