Discrete Cosine Transform as Universal Sentence Encoder

06/02/2021
by   Nada Almarwani, et al.
0

Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs. These kinds of representations are ideal for training a classifier for an end task such as sentiment analysis, question answering and text classification. Different models have been proposed to efficiently generate general purpose sentence representations to be used in pretraining protocols. While averaging is the most commonly used efficient sentence encoder, Discrete Cosine Transform (DCT) was recently proposed as an alternative that captures the underlying syntactic characteristics of a given text without compromising practical efficiency compared to averaging. However, as with most other sentence encoders, the DCT sentence encoder was only evaluated in English. To this end, we utilize DCT encoder to generate universal sentence representation for different languages such as German, French, Spanish and Russian. The experimental results clearly show the superior effectiveness of DCT encoding in which consistent performance improvements are achieved over strong baselines on multiple standardized datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

Efficient Sentence Embedding using Discrete Cosine Transform

Vector averaging remains one of the most popular sentence embedding meth...
research
08/11/2018

Fake Sentence Detection as a Training Task for Sentence Encoding

Sentence encoders are typically trained on language modeling tasks which...
research
06/12/2019

Probing Multilingual Sentence Representations With X-Probe

This paper extends the task of probing sentence representations for ling...
research
10/04/2021

Towards Theme Detection in Personal Finance Questions

Banking call centers receive millions of calls annually, with much of th...
research
09/26/2018

Semantic Sentence Embeddings for Paraphrasing and Text Summarization

This paper introduces a sentence to vector encoding framework suitable f...
research
05/25/2023

Extracting Text Representations for Terms and Phrases in Technical Domains

Extracting dense representations for terms and phrases is a task of grea...
research
02/07/2021

Unsupervised Sentence-embeddings by Manifold Approximation and Projection

The concept of unsupervised universal sentence encoders has gained tract...

Please sign up or login with your details

Forgot password? Click here to reset