ScholarBERT: Bigger is Not Always Better

05/23/2022
by   Zhi Hong, et al.
14

Transformer-based masked language models trained on general corpora, such as BERT and RoBERTa, have shown impressive performance on various downstream tasks. Increasingly, researchers are "finetuning" these models to improve performance on domain-specific tasks. Here, we report a broad study in which we applied 14 transformer-based models to 11 scientific tasks in order to evaluate how downstream performance is affected by changes along various dimensions (e.g., training data, model size, pretraining time, finetuning length). In this process, we created the largest and most diverse scientific language model to date, ScholarBERT, by training a 770M-parameter BERT model on an 221B token scientific literature dataset spanning many disciplines. Counterintuitively, our evaluation of the 14 BERT-based models (seven versions of ScholarBERT, five science-specific large language models from the literature, BERT-Base, and BERT-Large) reveals little difference in performance across the 11 science-focused tasks, despite major differences in model size and training data. We argue that our results establish an upper bound for the performance achievable with BERT-based architectures on tasks from the scientific domain.

READ FULL TEXT

page 4

page 7

page 9

research
03/29/2021

Retraining DistilBERT for a Voice Shopping Assistant by Using Universal Dependencies

In this work, we retrained the distilled BERT language model for Walmart...
research
01/24/2021

WangchanBERTa: Pretraining transformer-based Thai Language Models

Transformer-based language models, more specifically BERT-based architec...
research
05/24/2021

Neural Language Models for Nineteenth-Century English

We present four types of neural language models trained on a large histo...
research
08/11/2023

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

We hypothesize that large language models (LLMs) based on the transforme...
research
08/31/2019

Quantity doesn't buy quality syntax with neural language models

Recurrent neural networks can learn to predict upcoming words remarkably...
research
04/13/2021

Mediators in Determining what Processing BERT Performs First

Probing neural models for the ability to perform downstream tasks using ...
research
01/27/2023

Context Matters: A Strategy to Pre-train Language Model for Science Education

This study aims at improving the performance of scoring student response...

Please sign up or login with your details

Forgot password? Click here to reset