Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

by   W. Ronny Huang, et al.

We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we instantiate an (additional) embedding table which embeds the previous n-gram token sequence, rather than a single token. This allows the embedding table to be scaled up arbitrarily – with a commensurate increase in performance – without changing the token vocabulary. Since embeddings are sparsely retrieved from the table via a lookup; increasing the size of the table adds neither extra operations to each forward pass nor extra parameters that need to be stored on limited GPU/TPU memory. We explore scaling n-gram embedding tables up to nearly a billion parameters. When trained on a 3-billion sentence corpus, we find that LookupLM improves long tail log perplexity by 2.44 and long tail WER by 23.4 standard RNN language model baseline, an improvement comparable to a scaling up the baseline by 6.2x the number of floating point operations.


page 1

page 2

page 3

page 4


NetFC: enabling accurate floating-point arithmetic on programmable switches

In-network computation has been widely used to accelerate data-intensive...

Investigation on N-gram Approximated RNNLMs for Recognition of Morphologically Rich Speech

Recognition of Hungarian conversational telephone speech is challenging ...

Unsupervised Attention-based Sentence-Level Meta-Embeddings from Contextualised Language Models

A variety of contextualised language models have been proposed in the NL...

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) ...

SIMD-Optimized Search Over Sorted Data

Applications often require a fast, single-threaded search algorithm over...

Deep Shallow Fusion for RNN-T Personalization

End-to-end models in general, and Recurrent Neural Network Transducer (R...

Generalizations of Laver tables

We shall generalize the notion of a Laver table to algebras which may ha...