Improving N-gram Language Models with Pre-trained Deep Transformer

11/22/2019
by   Yiren Wang, et al.
0

Although n-gram language models (LMs) have been outperformed by the state-of-the-art neural LMs, they are still widely used in speech recognition due to its high efficiency in inference. In this paper, we demonstrate that n-gram LM can be improved by neural LMs through a text generation based data augmentation method. In contrast to previous approaches, we employ a large-scale general domain pre-training followed by in-domain fine-tuning strategy to construct deep Transformer based neural LMs. Large amount of in-domain text data is generated with the well trained deep Transformer to construct new n-gram LMs, which are then interpolated with baseline n-gram systems. Empirical studies on different speech recognition tasks show that the proposed approach can effectively improve recognition accuracy. In particular, our proposed approach brings significant relative word error rate reduction up to 6.0

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

Advanced neural network models have penetrated Automatic Speech Recognit...
research
07/14/2020

Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR

Recently Deep Transformer models have proven to be particularly powerful...
research
10/24/2019

An Empirical Study of Efficient ASR Rescoring with Transformers

Neural language models (LMs) have been proved to significantly outperfor...
research
06/11/2019

Cued@wmt19:ewc&lms

Two techniques provide the fabric of the Cambridge University Engineerin...
research
06/12/2023

On the N-gram Approximation of Pre-trained Language Models

Large pre-trained language models (PLMs) have shown remarkable performan...
research
06/23/2016

NN-grams: Unifying neural network and n-gram language models for Speech Recognition

We present NN-grams, a novel, hybrid language model integrating n-grams ...
research
10/26/2022

Residual Learning of Neural Text Generation with n-gram Language Model

N-gram language models (LM) have been largely superseded by neural LMs a...

Please sign up or login with your details

Forgot password? Click here to reset