Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation

08/12/2017
by   Binghui Chen, et al.
0

Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind softmax always gives us an illusion of training well and then is omitted. In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration. This operation based on noise injection aims at postponing the early saturation and further bringing continuous gradients propagation so as to significantly encourage SGD solver to be more exploratory and help to find a better local-minima. This paper empirically verifies the superiority of the early softmax desaturation, and our method indeed improves the generalization ability of CNN model by regularization. We experimentally find that this early desaturation helps optimization in many tasks, yielding state-of-the-art or competitive results on several popular benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

Stochastic gradient descent (SGD) and its variants are mainstream method...
research
02/17/2018

An Alternative View: When Does SGD Escape Local Minima?

Stochastic gradient descent (SGD) is widely used in machine learning. Al...
research
02/10/2021

On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes

The noise in stochastic gradient descent (SGD), caused by minibatch samp...
research
05/10/2018

Ensemble Soft-Margin Softmax Loss for Image Classification

Softmax loss is arguably one of the most popular losses to train CNN mod...
research
12/23/2020

Vehicle Re-identification Based on Dual Distance Center Loss

Recently, deep learning has been widely used in the field of vehicle re-...
research
06/06/2019

Bad Global Minima Exist and SGD Can Reach Them

Several recent works have aimed to explain why severely overparameterize...
research
04/10/2020

Efficient Sampled Softmax for Tensorflow

This short paper discusses an efficient implementation of sampled softma...

Please sign up or login with your details

Forgot password? Click here to reset