Pretrained Transformers Improve Out-of-Distribution Robustness

04/13/2020
by   Dan Hendrycks, et al.
0

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for various NLP tasks by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and LSTMs, and we show that pretrained Transformers' performance declines are substantially smaller. Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness.

READ FULL TEXT

page 3

page 4

page 5

research
10/14/2022

Pretrained Transformers Do not Always Improve Robustness

Pretrained Transformers (PT) have been shown to improve Out of Distribut...
research
10/05/2020

How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

Task-agnostic forms of data augmentation have proven widely effective in...
research
04/18/2021

Contrastive Out-of-Distribution Detection for Pretrained Transformers

Pretrained transformers achieve remarkable performance when the test dat...
research
12/03/2021

Linear algebra with transformers

Most applications of transformers to mathematics, from integration to th...
research
10/23/2020

Learning to Recognize Dialect Features

Linguists characterize dialects by the presence, absence, and frequency ...
research
03/03/2023

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Coreset selection is among the most effective ways to reduce the trainin...
research
08/30/2023

Learning Diverse Features in Vision Transformers for Improved Generalization

Deep learning models often rely only on a small set of features even whe...

Please sign up or login with your details

Forgot password? Click here to reset