TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

04/14/2021
by   Kexin Wang, et al.
0

Learning sentence embeddings often requires large amount of labeled data. However, for most tasks and domains, labeled data is seldom available and creating it is expensive. In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points. It can achieve up to 93.1 approaches. Further, we show that TSDAE is a strong pre-training method for learning sentence embeddings, significantly outperforming other approaches like Masked Language Model. A crucial shortcoming of previous studies is the narrow evaluation: Most work mainly evaluates on the single task of Semantic Textual Similarity (STS), which does not require any domain knowledge. It is unclear if these proposed methods generalize to other domains and tasks. We fill this gap and evaluate TSDAE and other recent approaches on four different datasets from heterogeneous domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2022

RetroMAE: Pre-training Retrieval-oriented Transformers via Masked Auto-Encoder

Pre-trained models have demonstrated superior power on many important ta...
research
03/13/2021

Embedding Calibration for Music Semantic Similarity using Auto-regressive Transformer

One of the advantages of using natural language processing (NLP) technol...
research
05/16/2023

Adapting Sentence Transformers for the Aviation Domain

Learning effective sentence representations is crucial for many Natural ...
research
05/04/2023

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

To better support information retrieval tasks such as web search and ope...
research
09/25/2020

An Unsupervised Sentence Embedding Method byMutual Information Maximization

BERT is inefficient for sentence-pair tasks such as clustering or semant...
research
10/30/2019

Unsupervised Star Galaxy Classification with Cascade Variational Auto-Encoder

The increasing amount of data in astronomy provides great challenges for...
research
10/04/2021

A Novel Metric for Evaluating Semantics Preservation

In this paper, we leverage pre-trained language models (PLMs) to precise...

Please sign up or login with your details

Forgot password? Click here to reset