Efficient Sentence Embedding using Discrete Cosine Transform

09/06/2019
by   Nada Almarwani, et al.
0

Vector averaging remains one of the most popular sentence embedding methods in spite of its obvious disregard for syntactic structure. While more complex sequential or convolutional networks potentially yield superior classification performance, the improvements in classification accuracy are typically mediocre compared to the simple vector averaging. As an efficient alternative, we propose the use of discrete cosine transform (DCT) to compress word sequences in an order-preserving manner. The lower order DCT coefficients represent the overall feature patterns in sentences, which results in suitable embeddings for tasks that could benefit from syntactic features. Our results in semantic probing tasks demonstrate that DCT embeddings indeed preserve more syntactic information compared with vector averaging. With practically equivalent complexity, the model yields better overall performance in downstream classification tasks that correlate with syntactic features, which illustrates the capacity of DCT to preserve word order information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Discrete Cosine Transform as Universal Sentence Encoder

Modern sentence encoders are used to generate dense vector representatio...
research
08/23/2018

Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model

Existing neural semantic parsers mainly utilize a sequence encoder, i.e....
research
03/13/2023

A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches

Vector-based word representations help countless Natural Language Proces...
research
04/02/2019

A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations

We propose a generative model for a sentence that uses two latent variab...
research
04/03/2019

The Effect of Downstream Classification Tasks for Evaluating Sentence Embeddings

One popular method for quantitatively evaluating the performance of sent...
research
04/13/2020

Integrated Eojeol Embedding for Erroneous Sentence Classification in Korean Chatbots

This paper attempts to analyze the Korean sentence classification system...
research
09/26/2019

DisSim: A Discourse-Aware Syntactic Text Simplification Frameworkfor English and German

We introduce DisSim, a discourse-aware sentence splitting framework for ...

Please sign up or login with your details

Forgot password? Click here to reset