Improving Document Classification with Multi-Sense Embeddings

11/18/2019
by   Vivek Gupta, et al.
0

Efficient representation of text documents is an important building block in many NLP tasks. Research on long text categorization has shown that simple weighted averaging of word vectors for sentence representation often outperforms more sophisticated neural models. Recently proposed Sparse Composite Document Vector (SCDV) (Mekala et. al, 2017) extends this approach from sentences to documents using soft clustering over word vectors. However, SCDV disregards the multi-sense nature of words, and it also suffers from the curse of higher dimensionality. In this work, we address these shortcomings and propose SCDV-MS. SCDV-MS utilizes multi-sense word embeddings and learns a lower dimensional manifold. Through extensive experiments on multiple real-world datasets, we show that SCDV-MS embeddings outperform previous state-of-the-art embeddings on multi-class and multi-label text categorization tasks. Furthermore, SCDV-MS embeddings are more efficient than SCDV in terms of time and space complexity on textual classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2016

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

We present a feature vector formation technique for documents - Sparse C...
research
03/21/2022

Efficient Classification of Long Documents Using Transformers

Several methods have been proposed for classifying long textual document...
research
04/11/2021

The Cardan grille approach to the Voynich MS taken to the next level

The Voynich MS is an illustrated 15th century manuscript, whose text is ...
research
12/01/2019

Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

The Word Mover's Distance (WMD) proposed in Kusner et al. [ICML,2015] is...
research
01/07/2019

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distribu...
research
06/14/2016

Active Discriminative Text Representation Learning

We propose a new active learning (AL) method for text classification wit...
research
11/26/2022

Searching for Discriminative Words in Multidimensional Continuous Feature Space

Word feature vectors have been proven to improve many NLP tasks. With re...

Please sign up or login with your details

Forgot password? Click here to reset