Loss-aware Binarization of Deep Networks

11/05/2016
by   Lu Hou, et al.
0

Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time. Recently, there have been a number of attempts on binarizing the network weights and activations. This greatly reduces the network size, and replaces the underlying multiplications to additions or even XNOR bit operations. However, existing binarization schemes are based on simple matrix approximation and ignore the effect of binarization on the loss. In this paper, we propose a proximal Newton algorithm with diagonal Hessian approximation that directly minimizes the loss w.r.t. the binarized weights. The underlying proximal step has an efficient closed-form solution, and the second-order information can be efficiently obtained from the second moments already computed by the Adam optimizer. Experiments on both feedforward and recurrent networks show that the proposed loss-aware binarization algorithm outperforms existing binarization schemes, and is also more robust for wide and deep networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2018

Loss-aware Weight Quantization of Deep Networks

The huge size of deep networks hinders their use in small computing devi...
research
02/05/2019

A Modular Approach to Block-diagonal Hessian Approximations for Second-order Optimization Methods

We propose a modular extension of the backpropagation algorithm for comp...
research
12/18/2019

Adaptive Loss-aware Quantization for Multi-bit Networks

We investigate the compression of deep neural networks by quantizing the...
research
02/06/2019

Negative eigenvalues of the Hessian in deep neural networks

The loss function of deep networks is known to be non-convex but the pre...
research
06/04/2021

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) a...
research
05/23/2023

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Given the massive cost of language model pre-training, a non-trivial imp...
research
10/08/2019

Bregman Proximal Framework for Deep Linear Neural Networks

A typical assumption for the analysis of first order optimization method...

Please sign up or login with your details

Forgot password? Click here to reset