Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

06/06/2022
by   Wei Wang, et al.
0

Following SimCSE, contrastive learning based methods have achieved the state-of-the-art (SOTA) performance in learning sentence embeddings. However, the unsupervised contrastive learning methods still lag far behind the supervised counterparts. We attribute this to the quality of positive and negative samples, and aim to improve both. Specifically, for positive samples, we propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence. This is to counteract the intrinsic bias of pre-trained token embeddings to frequency, word cases and subwords. For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model. Combining the above two methods with SimCSE, our proposed Contrastive learning with Augmented and Retrieved Data for Sentence embedding (CARDS) method significantly surpasses the current SOTA on STS benchmarks in the unsupervised setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2022

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

This paper finds that contrastive learning can produce superior sentence...
research
01/28/2022

PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings

Learning sentence embeddings in an unsupervised manner is fundamental in...
research
05/24/2023

Contrastive Learning of Sentence Embeddings from Scratch

Contrastive learning has been the dominant approach to train state-of-th...
research
05/09/2018

Adversarial Contrastive Estimation

Learning by contrasting positive and negative samples is a general strat...
research
02/28/2022

A Mutually Reinforced Framework for Pretrained Sentence Embeddings

The lack of labeled data is a major obstacle to learning high-quality se...
research
02/14/2022

Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings

Learning scientific document representations can be substantially improv...
research
05/22/2023

ImSimCSE: Improving Contrastive Learning for Sentence Embeddings from Two Perspectives

This paper aims to improve contrastive learning for sentence embeddings ...

Please sign up or login with your details

Forgot password? Click here to reset