Multi-scale Transformer Language Models

05/01/2020
by   Sandeep Subramanian, et al.
0

We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language. Experiments on large-scale language modeling benchmarks empirically demonstrate favorable likelihood vs memory footprint trade-offs, e.g. we show that it is possible to train a hierarchical variant with 30 layers that has 23 smaller memory footprint and better perplexity, compared to a vanilla transformer with less than half the number of layers, on the Toronto BookCorpus. We analyze the advantages of learned representations at multiple scales in terms of memory footprint, compute time, and perplexity, which are particularly appealing given the quadratic scaling of transformers' run time and memory usage with respect to sequence length.

READ FULL TEXT

page 3

page 11

research
10/26/2021

Hierarchical Transformers Are More Efficient Language Models

Transformer models yield impressive results on many NLP and sequence mod...
research
03/01/2022

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Transformer language models that are trained on vast amounts of data hav...
research
03/15/2023

Attention-likelihood relationship in transformers

We analyze how large language models (LLMs) represent out-of-context wor...
research
02/06/2023

Computation vs. Communication Scaling for Future Transformers on Future Hardware

Scaling neural network models has delivered dramatic quality gains acros...
research
11/18/2021

Quality and Cost Trade-offs in Passage Re-ranking Task

Deep learning models named transformers achieved state-of-the-art result...
research
06/13/2021

Memory-efficient Transformers via Top-k Attention

Following the success of dot-product attention in Transformers, numerous...

Please sign up or login with your details

Forgot password? Click here to reset