On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research

06/05/2023
by   Made Nindyatama Nityasya, et al.
0

This evidence-based position paper critiques current research practices within the language model pre-training literature. Despite rapid recent progress afforded by increasingly better pre-trained language models (PLMs), current PLM research practices often conflate different possible sources of model improvement, without conducting proper ablation studies and principled comparisons between different models under comparable conditions. These practices (i) leave us ill-equipped to understand which pre-training approaches should be used under what circumstances; (ii) impede reproducibility and credit assignment; and (iii) render it difficult to understand: "How exactly does each factor contribute to the progress that we have today?" We provide a case in point by revisiting the success of BERT over its baselines, ELMo and GPT-1, and demonstrate how – under comparable conditions where the baselines are tuned to a similar extent – these baselines (and even-simpler variants thereof) can, in fact, achieve competitive or better performance than BERT. These findings demonstrate how disentangling different factors of model improvements can lead to valuable new insights. We conclude with recommendations for how to encourage and incentivize this line of work, and accelerate progress towards a better and more systematic understanding of what factors drive the progress of our foundation models today.

READ FULL TEXT
research
03/14/2022

PERT: Pre-training BERT with Permuted Language Model

Pre-trained Language Models (PLMs) have been widely used in various natu...
research
07/14/2022

BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling

The pre-training of large language models usually requires massive amoun...
research
10/09/2022

Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection

Pre-training large neural language models, such as BERT, has led to impr...
research
08/02/2021

LICHEE: Improving Language Model Pre-training with Multi-grained Tokenization

Language model pre-training based on large corpora has achieved tremendo...
research
06/10/2020

MC-BERT: Efficient Language Pre-Training via a Meta Controller

Pre-trained contextual representations (e.g., BERT) have become the foun...
research
06/13/2022

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

This paper aims to advance the mathematical intelligence of machines by ...
research
04/13/2021

Mediators in Determining what Processing BERT Performs First

Probing neural models for the ability to perform downstream tasks using ...

Please sign up or login with your details

Forgot password? Click here to reset