Autoencoding Improves Pre-trained Word Embeddings

10/25/2020
by   Masahiro Kaneko, et al.
0

Prior work investigating the geometry of pre-trained word embeddings have shown that word embeddings to be distributed in a narrow cone and by centering and projecting using principal component vectors one can increase the accuracy of a given set of pre-trained word embeddings. However, theoretically, this post-processing step is equivalent to applying a linear autoencoder to minimise the squared l2 reconstruction error. This result contradicts prior work (Mu and Viswanath, 2018) that proposed to remove the top principal components from pre-trained embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings, without requiring access to additional linguistic resources or labelled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2021

Dictionary-based Debiasing of Pre-trained Word Embeddings

Word embeddings trained on large corpora have shown to encode high level...
research
04/17/2021

Embodying Pre-Trained Word Embeddings Through Robot Actions

We propose a promising neural network model with which to acquire a grou...
research
09/29/2020

Leader: Prefixing a Length for Faster Word Vector Serialization

Two competing file formats have become the de facto standards for distri...
research
08/11/2017

Simple and Effective Dimensionality Reduction for Word Embeddings

Word embeddings have become the basic building blocks for several natura...
research
10/05/2019

On Dimensional Linguistic Properties of the Word Embedding Space

Word embeddings have become a staple of several natural language process...
research
10/01/2019

Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations

Contextualized word representations are able to give different represent...
research
03/15/2022

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

State-of-the-art NLP systems represent inputs with word embeddings, but ...

Please sign up or login with your details

Forgot password? Click here to reset