BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods to Deep Binary Model

09/29/2020
by   Junjie Liu, et al.
0

Recent methods have significantly reduced the performance degradation of Binary Neural Networks (BNNs), but guaranteeing the effective and efficient training of BNNs is an unsolved problem. The main reason is that the estimated gradients produced by the Straight-Through-Estimator (STE) mismatches with the gradients of the real derivatives. In this paper, we provide an explicit convex optimization example where training the BNNs with the traditionally adaptive optimization methods still faces the risk of non-convergence, and identify that constraining the range of gradients is critical for optimizing the deep binary model to avoid highly suboptimal solutions. For solving above issues, we propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors. In brief, it employs an adaptive range constraint via an errors measurement for smoothing the gradients transition while follows the exponential moving strategy from AMSGrad to avoid errors accumulation during the optimization. The experiments verify the corollary of theoretical convergence analysis, and further demonstrate that our optimization method can speed up the convergence about 1:2x and boost the performance of BNNs to a significant level than the specific binary optimizer about 3:7 highly non-convex optimization problem.

READ FULL TEXT
research
04/19/2019

On the Convergence of Adam and Beyond

Several recently proposed stochastic optimization methods that have been...
research
01/09/2018

Convergence Analysis of Gradient Descent Algorithms with Proportional Updates

The rise of deep learning in recent years has brought with it increasing...
research
12/14/2021

Imaginary Zeroth-Order Optimization

Zeroth-order optimization methods are developed to overcome the practica...
research
07/14/2023

Multiplicative update rules for accelerating deep learning training and increasing robustness

Even nowadays, where Deep Learning (DL) has achieved state-of-the-art pe...
research
11/06/2018

Double Adaptive Stochastic Gradient Optimization

Adaptive moment methods have been remarkably successful in deep learning...
research
06/25/2020

The Effect of Optimization Methods on the Robustness of Out-of-Distribution Detection Approaches

Deep neural networks (DNNs) have become the de facto learning mechanism ...

Please sign up or login with your details

Forgot password? Click here to reset