Why does CTC result in peaky behavior?

05/31/2021
by   Albert Zeyer, et al.
0

The peaky behavior of CTC models is well known experimentally. However, an understanding about why peaky behavior occurs is missing, and whether this is a good property. We provide a formal analysis of the peaky behavior and gradient descent convergence properties of the CTC loss and related training criteria. Our analysis provides a deep understanding why peaky behavior occurs and when it is suboptimal. On a simple example which should be trivial to learn for any model, we prove that a feed-forward neural network trained with CTC from uniform initialization converges towards peaky behavior with a 100 Our analysis further explains why CTC only works well together with the blank label. We further demonstrate that peaky behavior does not occur on other related losses including a label prior model, and that this improves convergence.

READ FULL TEXT
research
11/02/2019

Global Convergence of Gradient Descent for Deep Linear Residual Networks

We analyze the global convergence of gradient descent for deep linear re...
research
09/18/2020

Linear Convergence and Implicit Regularization of Generalized Mirror Descent with Time-Dependent Mirrors

The following questions are fundamental to understanding the properties ...
research
05/22/2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

Recent research shows that when Gradient Descent (GD) is applied to neur...
research
03/16/2023

Controlled Descent Training

In this work, a novel and model-based artificial neural network (ANN) tr...
research
12/20/2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Large pretrained language models have shown surprising In-Context Learni...
research
05/30/2021

On the geometry of generalization and memorization in deep neural networks

Understanding how large neural networks avoid memorizing training data i...
research
11/12/2019

On uniform boundedness of sequential social learning

In the classical herding model, asymptotic learning refers to situations...

Please sign up or login with your details

Forgot password? Click here to reset