Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers

07/26/2021
by   Danielle Rothermel, et al.
3

Self-supervised pre-training of large-scale transformer models on text corpora followed by finetuning has achieved state-of-the-art on a number of natural language processing tasks. Recently, Lu et al. (2021, arXiv:2103.05247) claimed that frozen pretrained transformers (FPTs) match or outperform training from scratch as well as unfrozen (fine-tuned) pretrained transformers in a set of transfer tasks to other modalities. In our work, we find that this result is, in fact, an artifact of not tuning the learning rates. After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned. Thus, while transfer from pretrained language models to other modalities does indeed provide gains and hints at exciting possibilities for future work, properly tuning hyperparameters is important for arriving at robust findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2022

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

The success of deep learning heavily relies on large-scale data with com...
research
03/02/2023

Learning to Grow Pretrained Models for Efficient Transformer Training

Scaling transformers has led to significant breakthroughs in many domain...
research
05/29/2022

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

Large-scale pretrained transformers have created milestones in text (GPT...
research
11/05/2020

Training Transformers for Information Security Tasks: A Case Study on Malicious URL Prediction

Machine Learning (ML) for information security (InfoSec) utilizes distin...
research
05/17/2020

T-VSE: Transformer-Based Visual Semantic Embedding

Transformer models have recently achieved impressive performance on NLP ...
research
04/18/2021

Knowledge Neurons in Pretrained Transformers

Large-scale pretrained language models are surprisingly good at recallin...
research
09/28/2022

Transfer Learning with Pretrained Remote Sensing Transformers

Although the remote sensing (RS) community has begun to pretrain transfo...

Please sign up or login with your details

Forgot password? Click here to reset