Virtual Augmentation Supported Contrastive Learning of Sentence Representations

10/16/2021
by   Dejiao Zhang, et al.
0

Despite profound successes, contrastive representation learning relies on carefully designed data augmentations using domain specific knowledge. This challenge is magnified in natural language processing where no general rules exist for data augmentation due to the discrete nature of natural language. We tackle this challenge by presenting a Virtual augmentation Supported Contrastive Learning of sentence representations (VaSCL). Originating from the interpretation that data augmentation essentially constructs the neighborhoods of each training instance, we in turn utilize the neighborhood to generate effective data augmentations. Leveraging the large training batch size of contrastive learning, we approximate the neighborhood of an instance via its K-nearest in-batch neighbors in the representation space. We then define an instance discrimination task within this neighborhood, and generate the virtual augmentation in an adversarial training manner. We access the performance of VaSCL on a wide range of downstream tasks, and set a new state-of-the-art for unsupervised sentence representation learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2020

i-Mix: A Strategy for Regularizing Contrastive Representation Learning

Contrastive representation learning has shown to be an effective way of ...
research
05/25/2021

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Learning high-quality sentence representations benefits a wide range of ...
research
10/08/2022

SDA: Simple Discrete Augmentation for Contrastive Sentence Representation Learning

Contrastive learning methods achieve state-of-the-art results in unsuper...
research
05/20/2022

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

Data augmentation plays a key role in modern machine learning pipelines....
research
05/21/2023

Contrastive Learning with Logic-driven Data Augmentation for Logical Reasoning over Text

Pre-trained large language model (LLM) is under exploration to perform N...
research
03/18/2020

Watching the World Go By: Representation Learning from Unlabeled Videos

Recent single image unsupervised representation learning techniques show...
research
09/12/2023

Narrowing the Gap between Supervised and Unsupervised Sentence Representation Learning with Large Language Model

Sentence Representation Learning (SRL) is a fundamental task in Natural ...

Please sign up or login with your details

Forgot password? Click here to reset