Train No Evil: Selective Masking for Task-guided Pre-training

04/21/2020
by   Yuxian Gu, et al.
0

Recently, pre-trained language models mostly follow the pre-training-then-fine-tuning paradigm and have achieved great performances on various downstream tasks. However, due to the aimlessness of pre-training and the small in-domain supervised data scale of fine-tuning, the two-stage models typically cannot capture the domain-specific and task-specific language patterns well. In this paper, we propose a selective masking task-guided pre-training method and add it between the general pre-training and fine-tuning. In this stage, we train the masked language modeling task on in-domain unsupervised data, which enables our model to effectively learn the domain-specific language patterns. To efficiently learn the task-specific language patterns, we adopt a selective masking strategy instead of the conventional random masking, which means we only mask the tokens that are important to the downstream task. Specifically, we define the importance of tokens as their impacts on the final classification results and use a neural model to learn the implicit selecting rules. Experimental results on two sentiment analysis tasks show that our method can achieve comparable or even better performance with less than 50% overall computation cost, which indicates our method is both effective and efficient. The source code will be released in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

Using Selective Masking as a Bridge between Pre-training and Fine-tuning

Pre-training a language model and then fine-tuning it for downstream tas...
research
01/31/2023

ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for E-Commerce Product Search

In this paper, we propose a robust multilingual model to improve the qua...
research
07/14/2023

Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords

We propose a novel task-agnostic in-domain pre-training method that sits...
research
07/27/2022

Leveraging GAN Priors for Few-Shot Part Segmentation

Few-shot part segmentation aims to separate different parts of an object...
research
06/24/2021

Pre-training transformer-based framework on large-scale pediatric claims data for downstream population-specific tasks

The adoption of electronic health records (EHR) has become universal dur...
research
10/09/2020

Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding

Neural models have yielded state-of-the-art results in deciphering spoke...
research
02/16/2022

Should You Mask 15

Masked language models conventionally use a masking rate of 15 belief th...

Please sign up or login with your details

Forgot password? Click here to reset