ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

09/09/2021
by   Xing Wu, et al.
0

Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair. As the length information of a sentence will generally be encoded into the sentence embeddings due to the usage of position embedding in Transformer, each positive pair in unsup-SimCSE actually contains the same length information. And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that unsup-SimCSE does have such a problem. To alleviate it, we apply a simple repetition operation to modify the input sentence, and then pass the input sentence and its modified counterpart to the pre-trained Transformer encoder, respectively, to get the positive pair. Additionally, we draw inspiration from the community of computer vision and introduce a momentum contrast, enlarging the number of negative pairs without additional calculations. The proposed two modifications are applied on positive and negative pairs separately, and build a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE). We evaluate the proposed ESimCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02 BERT-base.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2021

S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding

Contrastive learning has been studied for improving the performance of l...
research
04/18/2021

SimCSE: Simple Contrastive Learning of Sentence Embeddings

This paper presents SimCSE, a simple contrastive learning framework that...
research
06/08/2023

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS

Existing sentence textual similarity benchmark datasets only use a singl...
research
09/09/2021

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

Contrastive learning has been gradually applied to learn high-quality un...
research
03/21/2021

An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information

In this paper, we focus on the problem of unsupervised image-sentence ma...
research
03/11/2022

A Sentence is Worth 128 Pseudo Tokens: A Semantic-Aware Contrastive Learning Framework for Sentence Embeddings

Contrastive learning has shown great potential in unsupervised sentence ...
research
10/20/2022

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

This paper finds that contrastive learning can produce superior sentence...

Please sign up or login with your details

Forgot password? Click here to reset