Probability-Dependent Gradient Decay in Large Margin Softmax

10/31/2022
by   Siyuan Zhang, et al.
0

In the past few years, Softmax has become a common component in neural network frameworks. In this paper, a gradient decay hyperparameter is introduced in Softmax to control the probability-dependent gradient decay rate during training. By following the theoretical analysis and empirical results of a variety of model architectures trained on MNIST, CIFAR-10/100 and SVHN, we find that the generalization performance depends significantly on the gradient decay rate as the confidence probability rises, i.e., the gradient decreases convexly or concavely as the sample probability increases. Moreover, optimization with the small gradient decay shows a similar curriculum learning sequence where hard samples are in the spotlight only after easy samples are convinced sufficiently, and well-separated samples gain a higher gradient to reduce intra-class distance. Based on the analysis results, we can provide evidence that the large margin Softmax will affect the local Lipschitz constraint of the loss function by regulating the probability-dependent gradient decay rate. This paper provides a new perspective and understanding of the relationship among concepts of large margin Softmax, local Lipschitz constraint and curriculum learning by analyzing the gradient decay rate. Besides, we propose a warm-up strategy to dynamically adjust Softmax loss in training, where the gradient decay rate increases from over-small to speed up the convergence rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2019

Large Margin Softmax Loss for Speaker Verification

In neural network based speaker verification, speaker embedding is expec...
research
11/10/2019

Improved Large-margin Softmax Loss for Speaker Diarisation

Speaker diarisation systems nowadays use embeddings generated from speec...
research
01/17/2018

Additive Margin Softmax for Face Verification

In this paper, we propose a conceptually simple and geometrically interp...
research
03/15/2018

Large Margin Deep Networks for Classification

We present a formulation of deep learning that aims at producing a large...
research
10/18/2021

Real Additive Margin Softmax for Speaker Verification

The additive margin softmax (AM-Softmax) loss has delivered remarkable p...
research
02/17/2022

General Cyclical Training of Neural Networks

This paper describes the principle of "General Cyclical Training" in mac...

Please sign up or login with your details

Forgot password? Click here to reset