CAPT: Contrastive Pre-Training for LearningDenoised Sequence Representations

by   Fuli Luo, et al.

Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing. These models typically corrupt the given sequences with certain types of noise, such as masking, shuffling, or substitution, and then try to recover the original input. However, such pre-training approaches are prone to learning representations that are covariant with the noise, leading to the discrepancy between the pre-training and fine-tuning stage. To remedy this, we present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations. The proposed CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals. In this way, it not only alleviates the pretrain-finetune discrepancy induced by the noise of pre-training, but also aids the pre-trained model in better capturing global semantics of the input via more effective sentence-level supervision. Different from most prior work that focuses on a particular modality, comprehensive empirical evidence on 11 natural language understanding and cross-modal tasks illustrates that CAPT is applicable for both language and vision-language tasks, and obtains surprisingly consistent improvement, including 0.6 gain on GLUE benchmarks and 0.8


page 1

page 2

page 3

page 4


MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

Recent masked image modeling (MIM) has received much attention in self-s...

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Motivated by the success of masked language modeling (MLM) in pre-traini...

Data Augmentation based Consistency Contrastive Pre-training for Automatic Speech Recognition

Self-supervised acoustic pre-training has achieved amazing results on th...

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Although pre-trained language models (PLMs) have achieved state-of-the-a...

Whitening Sentence Representations for Better Semantics and Faster Retrieval

Pre-training models such as BERT have achieved great success in many nat...

PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map

Deep learning has recently achieved significant progress in trajectory f...

Unsupervised Pre-training for Natural Language Generation: A Literature Review

Recently, unsupervised pre-training is gaining increasing popularity in ...