Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene

06/04/2021
by   Ruikun Luo, et al.
0

The major paradigm of applying a pre-trained language model to downstream tasks is to fine-tune it on labeled task data, which often suffers instability and low performance when the labeled examples are scarce. One way to alleviate this problem is to apply post-training on unlabeled task data before fine-tuning, adapting the pre-trained model to target domains by contrastive learning that considers either token-level or sequence-level similarity. Inspired by the success of sequence masking, we argue that both token-level and sequence-level similarities can be captured with a pair of masked sequences. Therefore, we propose complementary random masking (CRM) to generate a pair of masked sequences from an input sequence for sequence-level contrastive learning and then develop contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings. Empirical results show that CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2020

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

Fine-tuned pre-trained language models (LMs) achieve enormous success in...
research
10/29/2022

Differentiable Data Augmentation for Contrastive Sentence Representation Learning

Fine-tuning a pre-trained language model via the contrastive learning fr...
research
02/07/2021

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

Fine-tuning pre-trained language models (PLMs) has demonstrated its effe...
research
09/20/2022

Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

Prompting, which casts downstream applications as language modeling task...
research
04/17/2023

VECO 2.0: Cross-lingual Language Model Pre-training with Multi-granularity Contrastive Learning

Recent studies have demonstrated the potential of cross-lingual transfer...
research
01/14/2022

Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

A typical information extraction pipeline consists of token- or span-lev...
research
10/28/2022

Assessing Phrase Break of ESL speech with Pre-trained Language Models

This work introduces an approach to assessing phrase break in ESL learne...

Please sign up or login with your details

Forgot password? Click here to reset