Difference-Masking: Choosing What to Mask in Continued Pretraining

05/23/2023
by   Alex Wilf, et al.
0

Self-supervised learning (SSL) and the objective of masking-and-predicting in particular have led to promising SSL performance on a variety of downstream tasks. However, while most approaches randomly mask tokens, there is strong intuition from the field of education that deciding what to mask can substantially improve learning outcomes. We introduce Difference-Masking, an approach that automatically chooses what to mask during continued pretraining by considering what makes an unlabelled target domain different from the pretraining domain. Empirically, we find that Difference-Masking outperforms baselines on continued pretraining settings across four diverse language and multimodal video tasks. The cross-task applicability of Difference-Masking supports the effectiveness of our framework for SSL pretraining in language, vision, and other domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2022

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer

In this paper, we propose a simple yet powerful improvement over the rec...
research
08/12/2020

Variance-reduced Language Pretraining via a Mask Proposal Network

Self-supervised learning, a.k.a., pretraining, is important in natural l...
research
09/16/2023

RMP: A Random Mask Pretrain Framework for Motion Prediction

As the pretraining technique is growing in popularity, little work has b...
research
04/11/2023

MRVM-NeRF: Mask-Based Pretraining for Neural Radiance Fields

Most Neural Radiance Fields (NeRFs) have poor generalization ability, li...
research
03/12/2023

Improving Masked Autoencoders by Learning Where to Mask

Masked image modeling is a promising self-supervised learning method for...
research
12/10/2022

Uniform Masking Prevails in Vision-Language Pretraining

Masked Language Modeling (MLM) has proven to be an essential component o...
research
05/18/2023

How does the task complexity of masked pretraining objectives affect downstream performance?

Masked language modeling (MLM) is a widely used self-supervised pretrain...

Please sign up or login with your details

Forgot password? Click here to reset