Dynamic Evaluation of Transformer Language Models

04/17/2019
by   Ben Krause, et al.
20

This research note combines two methods that have recently improved the state of the art in language modeling: Transformers and dynamic evaluation. Transformers use stacked layers of self-attention that allow them to capture long range dependencies in sequential data. Dynamic evaluation fits models to the recent sequence history, allowing them to assign higher probabilities to re-occurring sequential patterns. By applying dynamic evaluation to Transformer-XL models, we improve the state of the art on enwik8 from 0.99 to 0.94 bits/char, text8 from 1.08 to 1.04 bits/char, and WikiText-103 from 18.3 to 16.4 perplexity points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2017

Dynamic Evaluation of Neural Sequence Models

We present methodology for using dynamic evaluation to improve neural se...
research
04/04/2021

TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling

In this paper, we describe the use of recurrent neural networks to captu...
research
10/26/2021

Hierarchical Transformers Are More Efficient Language Models

Transformer models yield impressive results on many NLP and sequence mod...
research
10/14/2021

Causal Transformers Perform Below Chance on Recursive Nested Constructions, Unlike Humans

Recursive processing is considered a hallmark of human linguistic abilit...
research
06/01/2023

Exposing Attention Glitches with Flip-Flop Language Modeling

Why do large language models sometimes output factual inaccuracies and e...
research
12/05/2022

Meta-Learning Fast Weight Language Models

Dynamic evaluation of language models (LMs) adapts model parameters at t...
research
06/08/2023

Decision S4: Efficient Sequence-Based RL via State Spaces Layers

Recently, sequence learning methods have been applied to the problem of ...

Please sign up or login with your details

Forgot password? Click here to reset