On the comparability of Pre-trained Language Models

01/03/2020
by   Matthias Aßenmacher, et al.
0

Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. Mainly three forces are driving the improvements in this area of research: More elaborated architectures are making better use of contextual information. Instead of simply plugging in static pre-trained representations, these are learned based on surrounding context in end-to-end trainable models with more intelligently designed language modelling objectives. Along with this, larger corpora are used as resources for pre-training large language models in a self-supervised fashion which are afterwards fine-tuned on supervised tasks. Advances in parallel computing as well as in cloud computing, made it possible to train these models with growing capacities in the same or even in shorter time than previously established models. These three developments agglomerate in new state-of-the-art (SOTA) results being revealed in a higher and higher frequency. It is not always obvious where these improvements originate from, as it is not possible to completely disentangle the contributions of the three driving forces. We set ourselves to providing a clear and concise overview on several large pre-trained language models, which achieved SOTA results in the last two years, with respect to their use of new architectures and resources. We want to clarify for the reader where the differences between the models are and we furthermore attempt to gain some insight into the single contributions of lexical/computational improvements as well as of architectural changes. We explicitly do not intend to quantify these contributions, but rather see our work as an overview in order to identify potential starting points for benchmark comparisons. Furthermore, we tentatively want to point at potential possibilities for improvement in the field of open-sourcing and reproducible research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2021

Are Pre-trained Convolutions Better than Pre-trained Transformers?

In the era of pre-trained language models, Transformers are the de facto...
research
05/28/2021

Knowledge Inheritance for Pre-trained Language Models

Recent explorations of large-scale pre-trained language models (PLMs) su...
research
02/16/2023

Foundation Models for Natural Language Processing – Pre-trained Language Models Integrating Media

This open access book provides a comprehensive overview of the state of ...
research
09/03/2020

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

Advances in language modeling have led to the development of deep attent...
research
09/05/2023

nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources

State-of-the-art language models like T5 have revolutionized the NLP lan...
research
06/10/2023

ECGBERT: Understanding Hidden Language of ECGs with Self-Supervised Representation Learning

In the medical field, current ECG signal analysis approaches rely on sup...
research
09/18/2023

Unsupervised Open-Vocabulary Object Localization in Videos

In this paper, we show that recent advances in video representation lear...

Please sign up or login with your details

Forgot password? Click here to reset