The Effect of Downstream Classification Tasks for Evaluating Sentence Embeddings

04/03/2019
by   Peter Potash, et al.
0

One popular method for quantitatively evaluating the performance of sentence embeddings involves their usage on downstream language processing tasks that require sentence representations as input. One simple such task is classification, where the sentence representations are used to train and test models on several classification datasets. We argue that by evaluating sentence representations in such a manner, the goal of the representations becomes learning a low-dimensional factorization of a sentence-task label matrix. We show how characteristics of this matrix can affect the ability for a low-dimensional factorization to perform as sentence representations in a suite of classification tasks. Primarily, sentences that have more labels across all possible classification tasks have a higher reconstruction loss, though this effect can be drastically negated if the amount of such sentences is small.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2018

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Although much effort has recently been devoted to training high-quality ...
research
11/01/2020

Vec2Sent: Probing Sentence Embeddings with Natural Language Generation

We introspect black-box sentence embeddings by conditionally generating ...
research
02/07/2022

Comparison and Combination of Sentence Embeddings Derived from Different Supervision Signals

We have recently seen many successful applications of sentence embedding...
research
07/27/2018

Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

To ensure satisfactory user experience, dialog systems must be able to d...
research
06/09/2019

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

While paragraph embedding models are remarkably effective for downstream...
research
03/14/2018

SentEval: An Evaluation Toolkit for Universal Sentence Representations

We introduce SentEval, a toolkit for evaluating the quality of universal...
research
09/06/2019

Efficient Sentence Embedding using Discrete Cosine Transform

Vector averaging remains one of the most popular sentence embedding meth...

Please sign up or login with your details

Forgot password? Click here to reset