Binary Early-Exit Network for Adaptive Inference on Low-Resource Devices

06/17/2022
by   Aaqib Saeed, et al.
0

Deep neural networks have significantly improved performance on a range of tasks with the increasing demand for computational resources, leaving deployment on low-resource devices (with limited memory and battery power) infeasible. Binary neural networks (BNNs) tackle the issue to an extent with extreme compression and speed-up gains compared to real-valued models. We propose a simple but effective method to accelerate inference through unifying BNNs with an early-exiting strategy. Our approach allows simple instances to exit early based on a decision threshold and utilizes output layers added to different intermediate layers to avoid executing the entire binary model. We extensively evaluate our method on three audio classification tasks and across four BNNs architectures. Our method demonstrates favorable quality-efficiency trade-offs while being controllable with an entropy-based threshold specified by the system user. It also results in better speed-ups (latency less than 6ms) with a single model based on existing BNN architectures without retraining for different efficiency levels. It also provides a straightforward way to estimate sample difficulty and better understanding of uncertainty around certain classes within the dataset.

READ FULL TEXT
research
09/27/2021

Consistency Training of Multi-exit Architectures for Sensor Data

Deep neural networks have become larger over the years with increasing d...
research
08/31/2023

Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting

Although deep learning has made strides in the field of deep noise suppr...
research
07/12/2022

Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices

This work introduces BRILLsson, a novel binary neural network-based repr...
research
04/16/2020

The Right Tool for the Job: Matching Model and Instance Complexities

As NLP models become larger, executing a trained model requires signific...
research
06/04/2023

Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings

Adaptive inference is a simple method for reducing inference costs. The ...
research
10/08/2021

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

When deploying deep learning models to a device, it is traditionally ass...
research
02/04/2020

Lightweight Convolutional Representations for On-Device Natural Language Processing

The increasing computational and memory complexities of deep neural netw...

Please sign up or login with your details

Forgot password? Click here to reset