DebCSE: Rethinking Unsupervised Contrastive Sentence Embedding Learning in the Debiasing Perspective

09/14/2023
by   Pu Miao, et al.
0

Several prior studies have suggested that word frequency biases can cause the Bert model to learn indistinguishable sentence embeddings. Contrastive learning schemes such as SimCSE and ConSERT have already been adopted successfully in unsupervised sentence embedding to improve the quality of embeddings by reducing this bias. However, these methods still introduce new biases such as sentence length bias and false negative sample bias, that hinders model's ability to learn more fine-grained semantics. In this paper, we reexamine the challenges of contrastive sentence embedding learning from a debiasing perspective and argue that effectively eliminating the influence of various biases is crucial for learning high-quality sentence embeddings. We think all those biases are introduced by simple rules for constructing training data in contrastive learning and the key for contrastive learning sentence embedding is to mimic the distribution of training data in supervised machine learning in unsupervised way. We propose a novel contrastive framework for sentence embedding, termed DebCSE, which can eliminate the impact of these biases by an inverse propensity weighted sampling method to select high-quality positive and negative pairs according to both the surface and semantic similarity between sentences. Extensive experiments on semantic textual similarity (STS) benchmarks reveal that DebCSE significantly outperforms the latest state-of-the-art models with an average Spearman's correlation coefficient of 80.33

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2022

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Unsupervised sentence embedding aims to obtain the most appropriate embe...
research
03/11/2022

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Contrastive learning has shown great potential in unsupervised sentence ...
research
05/02/2022

Debiased Contrastive Learning of Unsupervised Sentence Representations

Recently, contrastive learning has been shown to be effective in improvi...
research
09/09/2021

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

Contrastive learning has been gradually applied to learn high-quality un...
research
03/21/2021

An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information

In this paper, we focus on the problem of unsupervised image-sentence ma...
research
01/28/2022

PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings

Learning sentence embeddings in an unsupervised manner is fundamental in...
research
12/09/2022

MED-SE: Medical Entity Definition-based Sentence Embedding

We propose Medical Entity Definition-based Sentence Embedding (MED-SE), ...

Please sign up or login with your details

Forgot password? Click here to reset