Towards Lossless Encoding of Sentences

06/04/2019
by   Gabriele Prato, et al.
0

A lot of work has been done in the field of image compression via machine learning, but not much attention has been given to the compression of natural language. Compressing text into lossless representations while making features easily retrievable is not a trivial task, yet has huge benefits. Most methods designed to produce feature rich sentence embeddings focus solely on performing well on downstream tasks and are unable to properly reconstruct the original sequence from the learned embedding. In this work, we propose a near lossless method for encoding long sequences of texts as well as all of their sub-sequences into feature rich representations. We test our method on sentiment analysis and show good performance across all sub-sentence and sentence embeddings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2018

SufiSent - Universal Sentence Representations Using Suffix Encodings

Computing universal distributed representations of sentences is a fundam...
research
10/02/2021

Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences

Sentence embedding methods offer a powerful approach for working with sh...
research
02/07/2022

Comparison and Combination of Sentence Embeddings Derived from Different Supervision Signals

We have recently seen many successful applications of sentence embedding...
research
07/01/2019

Representation, Exploration and Recommendation of Music Playlists

Playlists have become a significant part of our listening experience bec...
research
06/19/2019

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, ar...
research
05/04/2023

Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recover the Whole Sentence

Sentence-level representations are beneficial for various natural langua...
research
05/10/2022

Sentence-level Privacy for Document Embeddings

User language data can contain highly sensitive personal content. As suc...

Please sign up or login with your details

Forgot password? Click here to reset