Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

11/10/2017
by   Zhilin Yang, et al.
0

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2018

Sigsoftmax: Reanalysis of the Softmax Bottleneck

Softmax is an output activation function for modeling categorical probab...
research
10/11/2021

Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling

The Softmax bottleneck was first identified in language modeling as a th...
research
02/21/2019

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The softmax function on top of a final linear layer is the de facto meth...
research
02/28/2019

Efficient Contextual Representation Learning Without Softmax Layer

Contextual representation models have achieved great success in improvin...
research
09/26/2016

Pointer Sentinel Mixture Models

Recent neural network sequence models with softmax classifiers have achi...
research
05/16/2020

MicroNet for Efficient Language Modeling

It is important to design compact language models for efficient deployme...

Please sign up or login with your details

Forgot password? Click here to reset