When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute

02/24/2021
by   Tao Lei, et al.
1

Large language models have become increasingly difficult to train because of the required computation time and cost. In this work, we present SRU++, a recurrent unit with optional built-in attention that exhibits state-of-the-art modeling capacity and training efficiency. On standard language modeling benchmarks such as enwik8 and Wiki-103 datasets, our model obtains better perplexity and bits-per-character (bpc) while using 2.5x-10x less training time and cost compared to top-performing Transformer models. Our results reaffirm that attention is not all we need and can be complementary to other sequential modeling modules. Moreover, fast recurrence with little attention can be a leading model architecture.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2022

Meta-Learning Fast Weight Language Models

Dynamic evaluation of language models (LMs) adapts model parameters at t...
research
07/17/2023

Retentive Network: A Successor to Transformer for Large Language Models

In this work, we propose Retentive Network (RetNet) as a foundation arch...
research
06/15/2022

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

Current language models achieve low perplexity but their resulting gener...
research
06/01/2023

Exposing Attention Glitches with Flip-Flop Language Modeling

Why do large language models sometimes output factual inaccuracies and e...
research
08/18/2021

SHAQ: Single Headed Attention with Quasi-Recurrence

Natural Language Processing research has recently been dominated by larg...
research
08/27/2019

Bridging the Gap for Tokenizer-Free Language Models

Purely character-based language models (LMs) have been lagging in qualit...
research
10/13/2022

Is It Worth the (Environmental) Cost? Limited Evidence for the Benefits of Diachronic Continuous Training

Language is constantly changing and evolving, leaving language models to...

Please sign up or login with your details

Forgot password? Click here to reset