Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts

06/14/2021
by   Chak-Fai Li, et al.
0

Sequence-to-sequence (seq2seq) models are competitive with hybrid models for automatic speech recognition (ASR) tasks when large amounts of training data are available. However, data sparsity and domain adaptation are more problematic for seq2seq models than their hybrid counterparts. We examine corpora of five languages from the IARPA MATERIAL program where the transcribed data is conversational telephone speech (CTS) and evaluation data is broadcast news (BN). We show that there is a sizable initial gap in such a data condition between hybrid and seq2seq models, and the hybrid model is able to further improve through the use of additional language model (LM) data. We use an additional set of untranscribed data primarily in the BN domain for semisupervised training. In semisupervised training, a seed model trained on transcribed data generates hypothesized transcripts for unlabeled domain-matched data for further training. By using a hybrid model with an expanded language model for pseudotranscription, we are able to improve our seq2seq model from an average word error rate (WER) of 66.7 languages to 29.0 operating point, hybrid models are still able to use additional LM data to maintain an advantage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2017

Multilingual Speech Recognition With A Single End-To-End Model

Training a conventional automatic speech recognition (ASR) system to sup...
research
06/02/2021

Improving low-resource ASR performance with untranscribed out-of-domain data

Semi-supervised training (SST) is a common approach to leverage untransc...
research
11/06/2018

Transfer learning of language-independent end-to-end ASR with language model fusion

This work explores better adaptation methods to low-resource languages u...
research
04/09/2015

Leveraging Twitter for Low-Resource Conversational Speech Language Modeling

In applications involving conversational speech, data sparsity is a limi...
research
10/30/2017

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

Despite the success of sequence-to-sequence approaches in automatic spee...
research
12/11/2018

Scalable language model adaptation for spoken dialogue systems

Language models (LM) for interactive speech recognition systems are trai...
research
07/20/2021

Seed Words Based Data Selection for Language Model Adaptation

We address the problem of language model customization in applications w...

Please sign up or login with your details

Forgot password? Click here to reset