Self-Normalized Importance Sampling for Neural Language Modeling

11/11/2021
by   Zijian Yang, et al.
0

To mitigate the problem of having to traverse over the full vocabulary in the softmax normalization of a neural language model, sampling-based training criteria are proposed and investigated in the context of large vocabulary word-based neural language models. These training criteria typically enjoy the benefit of faster training and testing, at a cost of slightly degraded performance in terms of perplexity and almost no visible drop in word error rate. While noise contrastive estimation is one of the most popular choices, recently we show that other sampling-based criteria can also perform well, as long as an extra correction step is done, where the intended class posterior probability is recovered from the raw model outputs. In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step. Compared to noise contrastive estimation, our method is directly comparable in terms of complexity in application. Through self-normalized language model training as well as lattice rescoring experiments, we show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

On Sampling-Based Training Criteria for Neural Language Modeling

As the vocabulary size of modern word-based language models becomes ever...
research
06/04/2018

Self-Normalization Properties of Language Modeling

Self-normalizing discriminative models approximate the normalized probab...
research
12/15/2015

Strategies for Training Large Vocabulary Neural Language Models

Training neural network language models over large vocabularies is still...
research
07/01/2019

Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR

Sequence discriminative training criteria have long been a standard tool...
research
11/21/2015

BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

We propose BlackOut, an approximation algorithm to efficiently train mas...
research
07/09/2018

External Patch-Based Image Restoration Using Importance Sampling

This paper introduces a new approach to patch-based image restoration ba...
research
04/22/2020

Residual Energy-Based Models for Text Generation

Text generation is ubiquitous in many NLP tasks, from summarization, to ...

Please sign up or login with your details

Forgot password? Click here to reset