Learning Compressed Sentence Representations for On-Device Text Processing

06/19/2019
by   Dinghan Shen, et al.
0

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2 storage requirement by over 98 representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2019

Hamming Sentence Embeddings for Information Retrieval

In retrieval applications, binary hashes are known to offer significant ...
research
09/26/2018

Semantic Sentence Embeddings for Paraphrasing and Text Summarization

This paper introduces a sentence to vector encoding framework suitable f...
research
01/19/2023

JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications

Contrastive learning is widely used for sentence representation learning...
research
06/04/2019

Towards Lossless Encoding of Sentences

A lot of work has been done in the field of image compression via machin...
research
03/15/2022

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

How to learn highly compact yet effective sentence representation? Pre-t...
research
06/16/2020

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Sentence encoders map sentences to real valued vectors for use in downst...
research
08/03/2018

Efficient Purely Convolutional Text Encoding

In this work, we focus on a lightweight convolutional architecture that ...

Please sign up or login with your details

Forgot password? Click here to reset