Are Classes Clusters?

04/16/2021
by   Kees Varekamp, et al.
0

Sentence embedding models aim to provide general purpose embeddings for sentences. Most of the models studied in this paper claim to perform well on STS tasks - but they do not report on their suitability for clustering. This paper looks at four recent sentence embedding models (Universal Sentence Encoder (Cer et al., 2018), Sentence-BERT (Reimers and Gurevych, 2019), LASER (Artetxe and Schwenk, 2019), and DeCLUTR (Giorgi et al., 2020)). It gives a brief overview of the ideas behind their implementations. It then investigates how well topic classes in two text classification datasets (Amazon Reviews (Ni et al., 2019) and News Category Dataset (Misra, 2018)) map to clusters in their corresponding sentence embedding space. While the performance of the resulting classification model is far from perfect, it is better than random. This is interesting because the classification model has been constructed in an unsupervised way. The topic classes in these real life topic classification datasets can be partly reconstructed by clustering the corresponding sentence embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2019

COSTRA 1.0: A Dataset of Complex Sentence Transformations

We present COSTRA 1.0, a dataset of complex sentence transformations. Th...
research
09/23/2019

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings

Contextualized word embeddings (CWE) such as provided by ELMo (Peters et...
research
06/09/2019

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

While paragraph embedding models are remarkably effective for downstream...
research
06/14/2017

Idea density for predicting Alzheimer's disease from transcribed speech

Idea Density (ID) measures the rate at which ideas or elementary predica...
research
11/15/2017

Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

We extend the work of Wieting et al. (2017), back-translating a large pa...
research
06/10/2020

A novel sentence embedding based topic detection method for micro-blog

Topic detection is a challenging task, especially without knowing the ex...
research
12/06/2021

Producing augmentation-invariant embeddings from real-life imagery

This article presents an efficient way to produce feature-rich, high-dim...

Please sign up or login with your details

Forgot password? Click here to reset