I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths

06/18/2020
by   Hyoungwook Nam, et al.
0

Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing in recent years, brought to the forefront by pre-trained bi-directional Transformer models. Its effectiveness is partly due to its non-sequential architecture, which promotes scalability and parallelism but limits the model to inputs of a bounded length. In particular, such architectures perform poorly on algorithmic tasks, where the model must learn a procedure which generalizes to input lengths unseen in training, a capability we refer to as inductive generalization. Identifying the computational limits of existing self-attention mechanisms, we propose I-BERT, a bi-directional Transformer that replaces positional encodings with a recurrent layer. The model inductively generalizes on a variety of algorithmic tasks where state-of-the-art Transformer models fail to do so. We also test our method on masked language modeling tasks where training and validation sets are partitioned to verify inductive generalization. Out of three algorithmic and two natural language inductive generalization tasks, I-BERT achieves state-of-the-art results on four tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2020

LRG at SemEval-2020 Task 7: Assessing the Ability of BERT and Derivative Models to Perform Short-Edits based Humor Grading

In this paper, we assess the ability of BERT and its derivative models (...
research
04/26/2020

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching

Many information retrieval and natural language processing problems can ...
research
03/22/2023

A Small-Scale Switch Transformer and NLP-based Model for Clinical Narratives Classification

In recent years, Transformer-based models such as the Switch Transformer...
research
10/19/2021

Inductive Biases and Variable Creation in Self-Attention Mechanisms

Self-attention, an architectural motif designed to model long-range inte...
research
05/28/2021

On the Bias Against Inductive Biases

Borrowing from the transformer models that revolutionized the field of n...
research
04/14/2021

Demystifying BERT: Implications for Accelerator Design

Transfer learning in natural language processing (NLP), as realized usin...
research
04/17/2021

Higher Order Recurrent Space-Time Transformer

Endowing visual agents with predictive capability is a key step towards ...

Please sign up or login with your details

Forgot password? Click here to reset