Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level

05/13/2021
by   Ruiqi Zhong, et al.
14

Larger language models have higher accuracy on average, but are they better on every single instance (datapoint)? Some work suggests larger models have higher out-of-distribution robustness, while other work suggests they have lower accuracy on rare subgroups. To understand these differences, we investigate these models at the level of individual instances. However, one major challenge is that individual predictions are highly sensitive to noise in the randomness in training. We develop statistically rigorous methods to address this, and after accounting for pretraining and finetuning noise, we find that our BERT-Large is worse than BERT-Mini on at least 1-4 across MNLI, SST-2, and QQP, compared to the overall accuracy improvement of 2-10 instance-level accuracy has momentum: improvement from BERT-Mini to BERT-Medium correlates with improvement from BERT-Medium to BERT-Large. Our findings suggest that instance-level predictions provide a rich source of information; we therefore, recommend that researchers supplement model weights with model predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2020

CMV-BERT: Contrastive multi-vocab pretraining of BERT

In this work, we represent CMV-BERT, which improves the pretraining of a...
research
01/20/2023

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Large-scale pre-trained language models have been shown to be helpful in...
research
08/03/2023

Improving Requirements Completeness: Automated Assistance through Large Language Models

Natural language (NL) is arguably the most prevalent medium for expressi...
research
02/09/2023

Using Language Models for Enhancing the Completeness of Natural-language Requirements

[Context and motivation] Incompleteness in natural-language requirements...
research
06/30/2021

The MultiBERTs: BERT Reproductions for Robustness Analysis

Experiments with pretrained models such as BERT are often based on a sin...
research
07/26/2023

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

This study introduces and evaluates tiny, mini, small, and medium-sized ...
research
08/23/2023

Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

Foundational Language Models (FLMs) have advanced natural language proce...

Please sign up or login with your details

Forgot password? Click here to reset