Near-lossless Binarization of Word Embeddings

03/24/2018
by   Julien Tissier, et al.
0

Is it possible to learn binary word embeddings of arbitrary size from their real-value counterparts with (almost) no loss in task performance? If so, inferences performed in downstream NLP applications would benefit a massive speed-up brought by binary representations. In this paper, we derive an autoencoder architecture to learn semantic preserving binary embeddings from existing real-value ones. A binary encoder requires an element-wise function that outputs a bit given a real value - it is unfortunately highly non-differentiable. We propose to use the same parameters matrix for the encoder and the decoder so that we are able to learn weights using the decoder. We also show that it is possible and desirable to minimize the correlation between the different binary features at training time through ad hoc regularization. The learned binary codes can be of arbitrary sizes, and the binary representations yield (almost) the same performances than their real-value counterparts. Finally, we show that we are able to recon- struct semantic-preserving real-value embeddings from the binary embeddings - the cherry on top.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2019

Black is to Criminal as Caucasian is to Police:Detecting and Removing Multiclass Bias in Word Embeddings

Online texts -- across genres, registers, domains, and styles -- are rid...
research
08/25/2019

On Measuring and Mitigating Biased Inferences of Word Embeddings

Word embeddings carry stereotypical connotations from the text they are ...
research
11/15/2017

Bridging Source and Target Word Embeddings for Neural Machine Translation

Neural machine translation systems encode a source sequence into a vecto...
research
01/18/2021

Can a Fruit Fly Learn Word Embeddings?

The mushroom body of the fruit fly brain is one of the best studied syst...
research
11/03/2017

Compressing Word Embeddings via Deep Compositional Code Learning

Natural language processing (NLP) models often require a massive number ...
research
08/15/2019

Hamming Sentence Embeddings for Information Retrieval

In retrieval applications, binary hashes are known to offer significant ...
research
02/24/2021

Abelian Neural Networks

We study the problem of modeling a binary operation that satisfies some ...

Please sign up or login with your details

Forgot password? Click here to reset