Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

03/29/2022
by   Xiaodong Cui, et al.
0

We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR). Length perturbation is a data augmentation algorithm that randomly drops and inserts frames of an utterance to alter the length of the speech feature sequence. N-best based label smoothing randomly injects noise to ground truth labels during training in order to avoid overfitting, where the noisy labels are generated from n-best hypotheses. We evaluate these two techniques extensively on the 300-hour Switchboard (SWB300) dataset and an in-house 500-hour Japanese (JPN500) dataset using recurrent neural network transducer (RNNT) acoustic models for ASR. We show that both techniques improve the generalization of RNNT models individually and they can also be complementary. In particular, they yield good improvements over a strong SWB300 baseline and give state-of-art performance on SWB300 using RNNT models.

READ FULL TEXT
research
11/02/2020

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

The word error rate (WER) of an automatic speech recognition (ASR) syste...
research
11/08/2015

Towards Structured Deep Neural Network for Automatic Speech Recognition

In this paper we propose the Structured Deep Neural Network (structured ...
research
09/09/2019

Self-Teaching Networks

We propose self-teaching networks to improve the generalization capacity...
research
08/24/2021

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

When recurrent neural network transducers (RNNTs) are trained using the ...
research
03/25/2022

Impact of Dataset on Acoustic Models for Automatic Speech Recognition

In Automatic Speech Recognition, GMM-HMM had been widely used for acoust...
research
12/02/2020

Regularization via Adaptive Pairwise Label Smoothing

Label Smoothing (LS) is an effective regularizer to improve the generali...
research
09/12/2018

End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

Speech activity detection (SAD) plays an important role in current speec...

Please sign up or login with your details

Forgot password? Click here to reset