Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

08/12/2023
by   Seyedarmin Azizi, et al.
0

As the complexity and computational demands of deep learning models rise, the need for effective optimization methods for neural network designs becomes paramount. This work introduces an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency. The search domain is strategically reduced by leveraging Hessian-based pruning, ensuring the removal of non-crucial parameters. Subsequently, we detail the development of surrogate models for favorable and unfavorable outcomes by employing a cluster-based tree-structured Parzen estimator. This strategy allows for a streamlined exploration of architectural possibilities and swift pinpointing of top-performing designs. Through rigorous testing on well-known datasets, our method proves its distinct advantage over existing methods. Compared to leading compression strategies, our approach records an impressive 20 Additionally, our method boasts a 12x reduction in search time relative to the best search-focused strategies currently available. As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2018

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural ne...
research
05/14/2022

A Comprehensive Survey on Model Quantization for Deep Neural Networks

Recent advances in machine learning by deep neural networks are signific...
research
07/11/2023

Mixed-Precision Quantization with Cross-Layer Dependencies

Quantization is commonly used to compress and accelerate deep neural net...
research
12/04/2017

Adaptive Quantization for Deep Neural Network

In recent years Deep Neural Networks (DNNs) have been rapidly developed ...
research
08/22/2023

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

Mixed-precision neural networks (MPNNs) that enable the use of just enou...
research
10/12/2018

Training Deep Neural Network in Limited Precision

Energy and resource efficient training of DNNs will greatly extend the a...

Please sign up or login with your details

Forgot password? Click here to reset