Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

06/11/2018
by   Minjia Zhang, et al.
0

Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks. However, NLMs are very computationally demanding largely due to the computational cost of the softmax layer over a large vocabulary. We observe that, in decoding of many NLP tasks, only the probabilities of the top-K hypotheses need to be calculated preciously and K is often much smaller than the vocabulary size. This paper proposes a novel softmax layer approximation algorithm, called Fast Graph Decoder (FGD), which quickly identifies, for a given context, a set of K words that are most likely to occur according to a NLM. We demonstrate that FGD reduces the decoding time by an order of magnitude while attaining close to the full softmax baseline accuracy on neural machine translation and language modeling tasks. We also prove the theoretical guarantee on the softmax approximation quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks

Neural language models have been widely used in various NLP tasks, inclu...
research
03/12/2022

Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice

Classifiers in natural language processing (NLP) often have a large numb...
research
12/15/2015

Strategies for Training Large Vocabulary Neural Language Models

Training neural network language models over large vocabularies is still...
research
09/14/2016

Efficient softmax approximation for GPUs

We propose an approximate strategy to efficiently train neural network b...
research
03/26/2016

Pointing the Unknown Words

The problem of rare and unknown words is an important issue that can pot...
research
09/23/2016

One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities

The softmax representation of probabilities for categorical variables pl...
research
05/01/2023

An Iterative Algorithm for Rescaled Hyperbolic Functions Regression

Large language models (LLMs) have numerous real-life applications across...

Please sign up or login with your details

Forgot password? Click here to reset