Are Pre-trained Convolutions Better than Pre-trained Transformers?

05/07/2021
by   Yi Tay, et al.
11

In the era of pre-trained language models, Transformers are the de facto choice of model architectures. While recent research has shown promise in entirely convolutional, or CNN, architectures, they have not been explored using the pre-train-fine-tune paradigm. In the context of language models, are convolutional models competitive to Transformers when pre-trained? This paper investigates this research question and presents several interesting findings. Across an extensive set of experiments on 8 datasets/tasks, we find that CNN-based pre-trained models are competitive and outperform their Transformer counterpart in certain scenarios, albeit with caveats. Overall, the findings outlined in this paper suggest that conflating pre-training and architectural advances is misguided and that both advances should be considered independently. We believe our research paves the way for a healthy amount of optimism in alternative architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2022

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Pre-trained Transformers are challenging human performances in many natu...
research
09/06/2023

Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection

Out-of-domain (OOD) detection is a crucial component in industrial appli...
research
05/14/2021

Classifying Long Clinical Documents with Pre-trained Transformers

Automatic phenotyping is a task of identifying cohorts of patients that ...
research
01/03/2020

On the comparability of Pre-trained Language Models

Recent developments in unsupervised representation learning have success...
research
06/21/2023

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

Investigating deep learning language models has always been a significan...
research
05/02/2022

OPT: Open Pre-trained Transformer Language Models

Large language models, which are often trained for hundreds of thousands...
research
05/23/2023

Physics of Language Models: Part 1, Context-Free Grammar

We design experiments to study how generative language models, like GPT,...

Please sign up or login with your details

Forgot password? Click here to reset