Revisiting Simple Neural Probabilistic Language Models

04/08/2021
by   Simeng Sun, et al.
0

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of <cit.>, which simply concatenates word embeddings within a fixed window and passes the result through a feed-forward network to predict the next word. When scaled up to modern hardware, this model (despite its many limitations) performs much better than expected on word-level language model benchmarks. Our analysis reveals that the NPLM achieves lower perplexity than a baseline Transformer with short input contexts but struggles to handle long-term dependencies. Inspired by this result, we modify the Transformer by replacing its first self-attention layer with the NPLM's local concatenation layer, which results in small but consistent perplexity decreases across three word-level language modeling datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2019

Augmenting Self-attention with Persistent Memory

Transformer networks have lead to important progress in language modelin...
research
05/10/2019

Language Modeling with Deep Transformers

We explore multi-layer autoregressive Transformer models in language mod...
research
01/15/2022

Kformer: Knowledge Injection in Transformer Feed-Forward Layers

Knowledge-Enhanced Model have developed a diverse set of techniques for ...
research
03/28/2018

Meta-Learning a Dynamical Language Model

We consider the task of word-level language modeling and study the possi...
research
05/23/2023

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...
research
08/28/2018

A Quantum Many-body Wave Function Inspired Language Modeling Approach

The recently proposed quantum language model (QLM) aimed at a principled...
research
08/09/2018

Character-Level Language Modeling with Deeper Self-Attention

LSTMs and other RNN variants have shown strong performance on character-...

Please sign up or login with your details

Forgot password? Click here to reset