Bayesian Transformer Language Models for Speech Recognition

by   Boyang Xue, et al.

State-of-the-art neural language models (LMs) represented by Transformers are highly complex. Their use of fixed, deterministic parameter estimates fail to account for model uncertainty and lead to over-fitting and poor generalization when given limited training data. In order to address these issues, this paper proposes a full Bayesian learning framework for Transformer LM estimation. Efficient variational inference based approaches are used to estimate the latent parameter posterior distributions associated with different parts of the Transformer model architecture including multi-head self-attention, feed forward and embedding layers. Statistically significant word error rate (WER) reductions up to 0.5% absolute (3.18% relative) and consistent perplexity gains were obtained over the baseline Transformer LMs on state-of-the-art Switchboard corpus trained LF-MMI factored TDNN systems with i-Vector speaker adaptation. Performance improvements were also obtained on a cross domain LM adaptation task requiring porting a Transformer LM trained on the Switchboard and Fisher data to a low-resource DementiaBank elderly speech corpus.



page 1

page 2

page 3

page 4


Bayesian Learning for Deep Neural Network Adaptation

A key task for speech recognition systems is to reduce the mismatch betw...

Mixed Precision of Quantization of Transformer Language Models for Speech Recognition

State-of-the-art neural language models represented by Transformers are ...

Improving N-gram Language Models with Pre-trained Deep Transformer

Although n-gram language models (LMs) have been outperformed by the stat...

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding

Transformer-based end-to-end speech recognition models have received con...

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

State-of-the-art language models (LMs) represented by long-short term me...

Efficient domain adaptation of language models in ASR systems using Prompt-tuning

Automatic Speech Recognition (ASR) systems have found their use in numer...

Resource-Efficient Separation Transformer

Transformers have recently achieved state-of-the-art performance in spee...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.