Compressing Neural Networks using the Variational Information Bottleneck

02/28/2018
by   Bin Dai, et al.
0

Neural networks can be compressed to reduce memory and computational requirements, or to increase accuracy by facilitating the use of a larger base architecture. In this paper we focus on pruning individual neurons, which can simultaneously trim model size, FLOPs, and run-time memory. To improve upon the performance of existing compression algorithms we utilize the information bottleneck principle instantiated via a tractable variational bound. Minimization of this information theoretic bound reduces the redundancy between adjacent layers by aggregating useful information into a subset of neurons that can be preserved. In contrast, the activations of disposable neurons are shut off via an attractive form of sparse regularization that emerges naturally from this framework, providing tangible advantages over traditional sparsity penalties without contributing additional tuning parameters to the energy landscape. We demonstrate state-of-the-art compression rates across an array of datasets and network architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2016

NoiseOut: A Simple Way to Prune Neural Networks

Neural networks are usually over-parameterized with significant redundan...
research
07/12/2016

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

State-of-the-art neural networks are getting deeper and wider. While the...
research
06/11/2022

A Theoretical Understanding of Neural Network Compression from Sparse Linear Approximation

The goal of model compression is to reduce the size of a large neural ne...
research
03/13/2020

What Information Does a ResNet Compress?

The information bottleneck principle (Shwartz-Ziv Tishby, 2017) sugg...
research
11/16/2015

Diversity Networks: Neural Network Compression Using Determinantal Point Processes

We introduce Divnet, a flexible technique for learning networks with div...
research
03/23/2022

Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks

Spiking Neural Networks (SNNs) compute in an event-based matter to achie...
research
05/28/2020

Exploiting Non-Linear Redundancy for Neural Model Compression

Deploying deep learning models, comprising of non-linear combination of ...

Please sign up or login with your details

Forgot password? Click here to reset