Pretrained Transformers Do not Always Improve Robustness

10/14/2022
by   Swaroop Mishra, et al.
2

Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does the robustness comparison hold in a real world setting where some part of the dataset can be noisy? Do PT also provide more robust representation than traditional models on exposure to noisy data? We perform a comparative study on 10 models and find an empirical evidence that PT provide less robust representation than traditional models on exposure to noisy data. We investigate further and augment PT with an adversarial filtering (AF) mechanism that has been shown to improve OOD generalization. However, increase in generalization does not necessarily increase robustness, as we find that noisy data fools the AF method powered by PT.

READ FULL TEXT
research
04/13/2020

Pretrained Transformers Improve Out-of-Distribution Robustness

Although pretrained Transformers such as BERT achieve high accuracy on i...
research
03/17/2022

Are Vision Transformers Robust to Spurious Correlations?

Deep neural networks may be susceptible to learning spurious correlation...
research
07/25/2023

On the unreasonable vulnerability of transformers for image restoration – and an easy fix

Following their success in visual recognition tasks, Vision Transformers...
research
11/22/2022

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Despite the widespread success of Transformers on NLP tasks, recent work...
research
04/18/2021

Contrastive Out-of-Distribution Detection for Pretrained Transformers

Pretrained transformers achieve remarkable performance when the test dat...

Please sign up or login with your details

Forgot password? Click here to reset