On the Information Content of Predictions in Word Analogy Tests

10/18/2022
by   Jugurta Montalvão, et al.
0

An approach is proposed to quantify, in bits of information, the actual relevance of analogies in analogy tests. The main component of this approach is a softaccuracy estimator that also yields entropy estimates with compensated biases. Experimental results obtained with pre-trained GloVe 300-D vectors and two public analogy test sets show that proximity hints are much more relevant than analogies in analogy tests, from an information content perspective. Accordingly, a simple word embedding model is used to predict that analogies carry about one bit of information, which is experimentally corroborated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2017

Relevance-based Word Embedding

Learning a high-dimensional dense representation for vocabulary terms, a...
research
01/23/2021

Dictionary-based Debiasing of Pre-trained Word Embeddings

Word embeddings trained on large corpora have shown to encode high level...
research
07/09/2020

Principal Word Vectors

We generalize principal component analysis for embedding words into a ve...
research
04/05/2017

Linear Ensembles of Word Embedding Models

This paper explores linear methods for combining several word embedding ...
research
10/14/2019

Updating Pre-trained Word Vectors and Text Classifiers using Monolingual Alignment

In this paper, we focus on the problem of adapting word vector-based mod...
research
10/12/2018

Xorshift1024*, Xorshift1024+, Xorshift128+ and Xoroshiro128+ Fail Statistical Tests for Linearity

L'Ecuyer & Simard's Big Crush statistical test suite has revealed statis...
research
09/25/2021

Sorting through the noise: Testing robustness of information processing in pre-trained language models

Pre-trained LMs have shown impressive performance on downstream NLP task...

Please sign up or login with your details

Forgot password? Click here to reset