An Empirical Study of Efficient ASR Rescoring with Transformers

10/24/2019
by   Hongzhao Huang, et al.
0

Neural language models (LMs) have been proved to significantly outperform classical n-gram LMs for language modeling due to their superior abilities to model long-range dependencies in text and handle data sparsity problems. And recently, well configured deep Transformers have exhibited superior performance over shallow stack of recurrent neural network layers for language modeling. However, these state-of-the-art deep Transformer models were mostly engineered to be deep with high model capacity, which makes it computationally inefficient and challenging to be deployed into large-scale real-world applications. Therefore, it is important to develop Transformer LMs that have relatively small model sizes, while still retaining good performance of those much larger models. In this paper, we aim to conduct empirical study on training Transformers with small parameter sizes in the context of ASR rescoring. By combining techniques including subword units, adaptive softmax, large-scale model pre-training, and knowledge distillation, we show that we are able to successfully train small Transformer LMs with significant relative word error rate reductions (WERR) through n-best rescoring. In particular, our experiments on a video speech recognition dataset show that we are able to achieve WERRs ranging from 6.46 the well-known large GPT model [1], whose WERR with rescoring on the same dataset is 7.58

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization

Recurrent Neural Networks (RNNs) have dominated language modeling becaus...
research
11/22/2019

Improving N-gram Language Models with Pre-trained Deep Transformer

Although n-gram language models (LMs) have been outperformed by the stat...
research
07/02/2023

Conformer LLMs – Convolution Augmented Large Language Models

This work builds together two popular blocks of neural architecture, nam...
research
05/10/2019

Language Modeling with Deep Transformers

We explore multi-layer autoregressive Transformer models in language mod...
research
05/19/2020

Exploring Transformers for Large-Scale Speech Recognition

While recurrent neural networks still largely define state-of-the-art sp...
research
04/26/2019

Transformers with convolutional context for ASR

The recent success of transformer networks for neural machine translatio...
research
02/09/2021

Bayesian Transformer Language Models for Speech Recognition

State-of-the-art neural language models (LMs) represented by Transformer...

Please sign up or login with your details

Forgot password? Click here to reset