Contrastive Learning of Sentence Embeddings from Scratch

05/24/2023
by   Junlei Zhang, et al.
0

Contrastive learning has been the dominant approach to train state-of-the-art sentence embeddings. Previous studies have typically learned sentence embeddings either through the use of human-annotated natural language inference (NLI) data or via large-scale unlabeled sentences in an unsupervised manner. However, even in the case of unlabeled data, their acquisition presents challenges in certain domains due to various reasons. To address these issues, we present SynCSE, a contrastive learning framework that trains sentence embeddings with synthesized data. Specifically, we explore utilizing large language models to synthesize the required data samples for contrastive learning, including (1) producing positive and negative annotations given unlabeled sentences (SynCSE-partial), and (2) generating sentences along with their corresponding annotations from scratch (SynCSE-scratch). Experimental results on sentence similarity and reranking tasks indicate that both SynCSE-partial and SynCSE-scratch greatly outperform unsupervised baselines, and SynCSE-partial even achieves comparable performance to the supervised models in most settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Following SimCSE, contrastive learning based methods have achieved the s...
research
10/30/2022

Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework

Most sentence embedding techniques heavily rely on expensive human-annot...
research
05/17/2021

Sentence Similarity Based on Contexts

Existing methods to measure sentence similarity are faced with two chall...
research
10/20/2022

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

This paper finds that contrastive learning can produce superior sentence...
research
08/29/2022

Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity

Semantically meaningful sentence embeddings are important for numerous t...
research
05/09/2023

StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure

This work explores the utility of explicit structure for representation ...
research
01/28/2022

PCL: Peer-Contrastive Learning with Diverse Augmentations for Unsupervised Sentence Embeddings

Learning sentence embeddings in an unsupervised manner is fundamental in...

Please sign up or login with your details

Forgot password? Click here to reset