End-to-End Training Approaches for Discriminative Segmental Models

10/21/2016
by   Hao Tang, et al.
0

Recent work on discriminative segmental models has shown that they can achieve competitive speech recognition performance, using features based on deep neural frame classifiers. However, segmental models can be more challenging to train than standard frame-based approaches. While some segmental models have been successfully trained end to end, there is a lack of understanding of their training under different settings and with different losses. We investigate a model class based on recent successful approaches, consisting of a linear model that combines segmental features based on an LSTM frame classifier. Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training. We study segmental models trained end to end with hinge loss, log loss, latent hinge loss, and marginal log loss. We consider several losses for the case where training alignments are available as well as where they are not. We find that in general, marginal log loss provides the most consistent strong performance without requiring ground-truth alignments. We also find that training with dropout is very important in obtaining good performance with end-to-end training. Finally, the best results are typically obtained by a combination of two-stage training and fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2017

End-to-End Neural Segmental Models for Speech Recognition

Segmental models are an alternative to frame-based models for sequence p...
research
09/05/2017

Sequence Prediction with Neural Segmental Models

Segments that span contiguous parts of inputs, such as phonemes in speec...
research
10/08/2020

Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference

While discriminative neural network classifiers are generally preferred,...
research
01/14/2018

Fix your classifier: the marginal value of training the last weight layer

Neural networks are commonly used as models for classification for a wid...
research
02/25/2019

FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

Many of the recent successful methods for video object segmentation (VOS...
research
05/24/2019

EnsembleNet: End-to-End Optimization of Multi-headed Models

Ensembling is a universally useful approach to boost the performance of ...
research
02/28/2019

Scaling Matters in Deep Structured-Prediction Models

Deep structured-prediction energy-based models combine the expressive po...

Please sign up or login with your details

Forgot password? Click here to reset