Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks

05/15/2023
by   Minyoung Huh, et al.
0

This work examines the challenges of training neural networks using vector quantization using straight-through estimation. We find that a primary cause of training instability is the discrepancy between the model embedding and the code-vector distribution. We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments. We propose to address this issue via affine re-parameterization of the code vectors. Additionally, we introduce an alternating optimization to reduce the gradient error introduced by the straight-through estimation. Moreover, we propose an improvement to the commitment loss to ensure better alignment between the codebook representation and the model embedding. These optimization methods improve the mathematical approximation of the straight-through estimation and, ultimately, the model performance. We demonstrate the effectiveness of our methods on several common model architectures, such as AlexNet, ResNet, and ViT, across various tasks, including image classification and generative modeling.

READ FULL TEXT
research
05/10/2021

In-Hindsight Quantization Range Estimation for Quantized Training

Quantization techniques applied to the inference of deep neural networks...
research
10/27/2022

An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization

Molecule optimization is an important problem in chemical discovery and ...
research
03/13/2023

Bag of Tricks with Quantized Convolutional Neural Networks for image classification

Deep neural networks have been proven effective in a wide range of tasks...
research
12/04/2018

Bag of Tricks for Image Classification with Convolutional Neural Networks

Much of the recent progress made in image classification research can be...
research
03/24/2022

DyRep: Bootstrapping Training with Dynamic Re-parameterization

Structural re-parameterization (Rep) methods achieve noticeable improvem...
research
12/21/2019

Towards Efficient Training for Neural Network Quantization

Quantization reduces computation costs of neural networks but suffers fr...

Please sign up or login with your details

Forgot password? Click here to reset