DeepAI AI Chat
Log In Sign Up

Pretrained Transformers Improve Out-of-Distribution Robustness

by   Dan Hendrycks, et al.

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for various NLP tasks by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and LSTMs, and we show that pretrained Transformers' performance declines are substantially smaller. Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness.


page 3

page 4

page 5


Pretrained Transformers Do not Always Improve Robustness

Pretrained Transformers (PT) have been shown to improve Out of Distribut...

How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

Task-agnostic forms of data augmentation have proven widely effective in...

Contrastive Out-of-Distribution Detection for Pretrained Transformers

Pretrained transformers achieve remarkable performance when the test dat...

Linear algebra with transformers

Most applications of transformers to mathematics, from integration to th...

Learning to Recognize Dialect Features

Linguists characterize dialects by the presence, absence, and frequency ...

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Coreset selection is among the most effective ways to reduce the trainin...

Learning Diverse Features in Vision Transformers for Improved Generalization

Deep learning models often rely only on a small set of features even whe...