Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures

10/21/2020
by   M. Li, et al.
0

Measuring sentence semantic similarity using pre-trained language models such as BERT generally yields unsatisfactory zero-shot performance, and one main reason is ineffective token aggregation methods such as mean pooling. In this paper, we demonstrate under a Bayesian framework that distance between primitive statistics such as the mean of word embeddings are fundamentally flawed for capturing sentence-level semantic similarity. To remedy this issue, we propose to learn a categorical variational autoencoder (VAE) based on off-the-shelf pre-trained language models. We theoretically prove that measuring the distance between the latent categorical mixtures, namely Latte-Mix, can better reflect the true sentence semantic similarity. In addition, our Bayesian framework provides explanations for why models finetuned on labelled sentence pairs have better zero-shot performance. We also empirically demonstrate that these finetuned models could be further improved by Latte-Mix. Our method not only yields the state-of-the-art zero-shot performance on semantic similarity datasets such as STS, but also enjoy the benefits of fast training and having small memory footprints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

On the Sentence Embeddings from Pre-trained Language Models

Pre-trained contextual representations like BERT have achieved great suc...
research
01/07/2021

Ask2Transformers: Zero-Shot Domain labelling with Pre-trained Language Models

In this paper we present a system that exploits different pre-trained La...
research
06/04/2019

Toward Grammatical Error Detection from Sentence Labels: Zero-shot Sequence Labeling with CNNs and Contextualized Embeddings

Zero-shot grammatical error detection is the task of tagging token-level...
research
02/10/2022

Distilling Hypernymy Relations from Language Models: On the Effectiveness of Zero-Shot Taxonomy Induction

In this paper, we analyze zero-shot taxonomy learning methods which are ...
research
09/30/2022

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

In this paper, we propose a theoretical framework to explain the efficac...
research
02/27/2015

Probabilistic Zero-shot Classification with Semantic Rankings

In this paper we propose a non-metric ranking-based representation of se...
research
03/23/2023

A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance

Measuring similarity between patents is an essential step to ensure nove...

Please sign up or login with your details

Forgot password? Click here to reset