Sentence Similarity Based on Contexts

05/17/2021
by   Xiaofei Sun, et al.
0

Existing methods to measure sentence similarity are faced with two challenges: (1) labeled datasets are usually limited in size, making them insufficient to train supervised neural models; (2) there is a training-test gap for unsupervised language modeling (LM) based models to compute semantic scores between sentences, since sentence-level semantics are not explicitly modeled at training. This results in inferior performances in this task. In this work, we propose a new framework to address these two issues. The proposed framework is based on the core idea that the meaning of a sentence should be defined by its contexts, and that sentence similarity can be measured by comparing the probabilities of generating two sentences given the same context. The proposed framework is able to generate high-quality, large-scale dataset with semantic similarity scores between two sentences in an unsupervised manner, with which the train-test gap can be largely bridged. Extensive experiments show that the proposed framework achieves significant performance boosts over existing baselines under both the supervised and unsupervised settings across different datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Contrastive Learning of Sentence Embeddings from Scratch

Contrastive learning has been the dominant approach to train state-of-th...
research
10/23/2017

Testing the limits of unsupervised learning for semantic similarity

Semantic Similarity between two sentences can be defined as a way to det...
research
09/01/2021

ConRPG: Paraphrase Generation using Contexts as Regularizer

A long-standing issue with paraphrase generation is how to obtain reliab...
research
03/01/2018

Matching Natural Language Sentences with Hierarchical Sentence Factorization

Semantic matching of natural language sentences or identifying the relat...
research
09/26/2017

Generating Sentences by Editing Prototypes

We propose a new generative model of sentences that first samples a prot...
research
09/12/2023

Narrowing the Gap between Supervised and Unsupervised Sentence Representation Learning with Large Language Model

Sentence Representation Learning (SRL) is a fundamental task in Natural ...
research
05/18/2019

Semantic flow in language networks

In this study we propose a framework to characterize documents based on ...

Please sign up or login with your details

Forgot password? Click here to reset