Strategies for Training Large Vocabulary Neural Language Models

12/15/2015
by   Welin Chen, et al.
0

Training neural network language models over large vocabularies is still computationally very costly compared to count-based models such as Kneser-Ney. At the same time, neural language models are gaining popularity for many applications such as speech recognition and machine translation whose success depends on scalability. We present a systematic comparison of strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We further extend self normalization to be a proper estimator of likelihood and introduce an efficient variant of softmax. We evaluate each method on three popular benchmarks, examining performance on rare words, the speed/accuracy trade-off and complementarity to Kneser-Ney.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2017

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Training large vocabulary Neural Network Language Models (NNLMs) is a di...
research
09/22/2017

Improving Language Modelling with Noise-contrastive estimation

Neural language models do not scale well when the vocabulary is large. N...
research
07/26/2017

Self-organized Hierarchical Softmax

We propose a new self-organizing hierarchical softmax formulation for ne...
research
11/11/2021

Self-Normalized Importance Sampling for Neural Language Modeling

To mitigate the problem of having to traverse over the full vocabulary i...
research
06/11/2018

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

Neural language models (NLMs) have recently gained a renewed interest by...
research
03/26/2016

Pointing the Unknown Words

The problem of rare and unknown words is an important issue that can pot...
research
06/01/2016

Generalizing and Hybridizing Count-based and Neural Language Models

Language models (LMs) are statistical models that calculate probabilitie...

Please sign up or login with your details

Forgot password? Click here to reset