The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

08/26/2021
by   Róbert Csordás, et al.
12

Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50 split, and from 35 largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100 accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2023

The Impact of Positional Encoding on Length Generalization in Transformers

Length generalization, the ability to generalize from small training con...
research
09/30/2021

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

Systematic compositionality is an essential mechanism in human language,...
research
09/24/2021

Transformers Generalize Linearly

Natural language exhibits patterns of hierarchically governed dependenci...
research
12/01/2021

Systematic Generalization with Edge Transformers

Recent research suggests that systematic generalization in natural langu...
research
10/14/2021

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

Despite successes across a broad range of applications, Transformers hav...
research
07/15/2022

A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation

BERT4Rec is an effective model for sequential recommendation based on th...
research
10/04/2021

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Much of recent progress in NLU was shown to be due to models' learning d...

Please sign up or login with your details

Forgot password? Click here to reset