Towards Hardware-Specific Automatic Compression of Neural Networks

12/15/2022
by   Torben Krieger, et al.
0

Compressing neural network architectures is important to allow the deployment of models to embedded or mobile devices, and pruning and quantization are the major approaches to compress neural networks nowadays. Both methods benefit when compression parameters are selected specifically for each layer. Finding good combinations of compression parameters, so-called compression policies, is hard as the problem spans an exponentially large search space. Effective compression policies consider the influence of the specific hardware architecture on the used compression methods. We propose an algorithmic framework called Galen to search such policies using reinforcement learning utilizing pruning and quantization, thus providing automatic compression for neural networks. Contrary to other approaches we use inference latency measured on the target hardware device as an optimization goal. With that, the framework supports the compression of models specific to a given hardware target. We validate our approach using three different reinforcement learning agents for pruning, quantization and joint pruning and quantization. Besides proving the functionality of our approach we were able to compress a ResNet18 for CIFAR-10, on an embedded ARM processor, to 20 significant loss of accuracy. Moreover, we can demonstrate that a joint search and compression using pruning and quantization is superior to an individual search for policies using a single compression method.

READ FULL TEXT

page 6

page 9

research
11/23/2018

Joint Neural Architecture Search and Quantization

Designing neural architectures is a fundamental step in deep learning ap...
research
08/02/2021

Multi-objective Recurrent Neural Networks Optimization for the Edge – a Quantization-based Approach

The compression of deep learning models is of fundamental importance in ...
research
07/21/2023

Model Compression Methods for YOLOv5: A Review

Over the past few years, extensive research has been devoted to enhancin...
research
03/28/2023

Tetra-AML: Automatic Machine Learning via Tensor Networks

Neural networks have revolutionized many aspects of society but in the e...
research
04/13/2023

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

In this paper, we propose an ultrafast automated model compression frame...
research
06/14/2018

Scalable Neural Network Compression and Pruning Using Hard Clustering and L1 Regularization

We propose a simple and easy to implement neural network compression alg...
research
03/15/2023

Gated Compression Layers for Efficient Always-On Models

Mobile and embedded machine learning developers frequently have to compr...

Please sign up or login with your details

Forgot password? Click here to reset