Transformers are Universal Predictors

07/15/2023
by   Sourya Basu, et al.
0

We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments on both synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2021

Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees

Transformer networks are the de facto standard architecture in natural l...
research
03/14/2022

Efficient Language Modeling with Sparse all-MLP

All-MLP architectures have attracted increasing interest as an alternati...
research
07/13/2022

N-Grammer: Augmenting Transformers with latent n-grams

Transformer models have recently emerged as one of the foundational mode...
research
06/01/2023

Birth of a Transformer: A Memory Viewpoint

Large language models based on transformers have achieved great empirica...
research
09/17/2021

Primer: Searching for Efficient Transformers for Language Modeling

Large Transformer models have been central to recent advances in natural...
research
08/14/2023

CausalLM is not optimal for in-context learning

Recent empirical evidence indicates that transformer based in-context le...
research
02/22/2022

NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space

In this paper, we present a unified model that works for both multilingu...

Please sign up or login with your details

Forgot password? Click here to reset