On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

06/09/2020
by   Balázs Tarján, et al.
0

Advanced neural network models have penetrated Automatic Speech Recognition (ASR) in recent years, however, in language modeling many systems still rely on traditional Back-off N-gram Language Models (BNLM) partly or entirely. The reason for this are the high cost and complexity of training and using neural language models, mostly possible by adding a second decoding pass (rescoring). In our recent work we have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with text generation based data augmentation. In the present paper we analyze the amount of transferable knowledge and demonstrate that the neural augmented LM (RNN-BNLM) can help to capture almost 50 dropping the second decoding pass and making the system real-time capable. We also systematically compare word and subword LMs and show that subword-based neural text augmentation can be especially beneficial in under-resourced conditions. In addition, we show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2020

Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR

Recently Deep Transformer models have proven to be particularly powerful...
research
11/22/2019

Improving N-gram Language Models with Pre-trained Deep Transformer

Although n-gram language models (LMs) have been outperformed by the stat...
research
11/11/2020

Text Augmentation for Language Models in High Error Recognition Scenario

We examine the effect of data augmentation for training of language mode...
research
08/12/2014

First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

We present a method to perform first-pass large vocabulary continuous sp...
research
06/13/2019

Telephonetic: Making Neural Language Models Robust to ASR and Semantic Noise

Speech processing systems rely on robust feature extraction to handle ph...
research
10/03/2017

Decoding visemes: improving machine lipreading

To undertake machine lip-reading, we try to recognise speech from a visu...
research
07/22/2015

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

Discriminative segmental models, such as segmental conditional random fi...

Please sign up or login with your details

Forgot password? Click here to reset