The Effect of Model Size on Worst-Group Generalization

12/08/2021
by   Alan Pham, et al.
4

Overparameterization is shown to result in poor test accuracy on rare subgroups under a variety of settings where subgroup information is known. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when subgroup labels are unknown.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2020

Exploiting Redundancy in Pre-trained Language Models for Efficient Transfer Learning

Large pre-trained contextual word representations have transformed the f...
research
12/12/2022

You Only Need a Good Embeddings Extractor to Fix Spurious Correlations

Spurious correlations in training data often lead to robustness issues s...
research
07/14/2020

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Recent work has shown that pre-trained language models such as BERT impr...
research
07/17/2023

An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

In the realm of out-of-distribution generalization tasks, finetuning has...
research
04/08/2020

Poor Man's BERT: Smaller and Faster Transformer Models

The ongoing neural revolution in Natural Language Processing has recentl...
research
10/21/2021

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

In Domain Generalization (DG) settings, models trained on a given set of...
research
04/20/2022

Improved Worst-Group Robustness via Classifier Retraining on Independent Splits

High-capacity deep neural networks (DNNs) trained with Empirical Risk Mi...

Please sign up or login with your details

Forgot password? Click here to reset