Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework

10/30/2022
by   Yiming Chen, et al.
0

Most sentence embedding techniques heavily rely on expensive human-annotated sentence pairs as the supervised signals. Despite the use of large-scale unlabeled data, the performance of unsupervised methods typically lags far behind that of the supervised counterparts in most downstream tasks. In this work, we propose a semi-supervised sentence embedding framework, GenSE, that effectively leverages large-scale unlabeled data. Our method include three parts: 1) Generate: A generator/discriminator model is jointly trained to synthesize sentence pairs from open-domain unlabeled corpus; 2) Discriminate: Noisy sentence pairs are filtered out by the discriminator to acquire high-quality positive and negative sentence pairs; 3) Contrast: A prompt-based contrastive approach is presented for sentence representation learning with both annotated and synthesized data. Comprehensive experiments show that GenSE achieves an average correlation score of 85.19 on the STS datasets and consistent performance improvement on four domain adaptation tasks, significantly surpassing the state-of-the-art methods and convincingly corroborating its effectiveness and generalization ability.Code, Synthetic data and Models available at https://github.com/MatthewCYM/GenSE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Contrastive Learning of Sentence Embeddings from Scratch

Contrastive learning has been the dominant approach to train state-of-th...
research
10/29/2022

Differentiable Data Augmentation for Contrastive Sentence Representation Learning

Fine-tuning a pre-trained language model via the contrastive learning fr...
research
11/05/2022

Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference

Natural Language Inference (NLI) or Recognizing Textual Entailment (RTE)...
research
01/19/2023

JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications

Contrastive learning is widely used for sentence representation learning...
research
09/07/2021

PAUSE: Positive and Annealed Unlabeled Sentence Embedding

Sentence embedding refers to a set of effective and versatile techniques...
research
06/16/2023

DisasterNets: Embedding Machine Learning in Disaster Mapping

Disaster mapping is a critical task that often requires on-site experts ...
research
03/19/2021

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

Momentum Contrast (MoCo) achieves great success for unsupervised visual ...

Please sign up or login with your details

Forgot password? Click here to reset