Word Mover's Embedding: From Word2Vec to Document Embedding

10/30/2018
by   Lingfei Wu, et al.
0

While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called Word Mover's Distance (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the Word Mover's Embedding (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2017

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based...
research
12/01/2019

Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

The Word Mover's Distance (WMD) proposed in Kusner et al. [ICML,2015] is...
research
02/07/2021

Unsupervised Sentence-embeddings by Manifold Approximation and Projection

The concept of unsupervised universal sentence encoders has gained tract...
research
10/27/2020

Improving Word Recognition using Multiple Hypotheses and Deep Embeddings

We propose a novel scheme for improving the word recognition accuracy us...
research
11/30/2022

Generalised Spherical Text Embedding

This paper aims to provide an unsupervised modelling approach that allow...
research
09/16/2019

Short-Text Classification Using Unsupervised Keyword Expansion

Short-text classification, like all data science, struggles to achieve h...
research
04/05/2018

Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop

Most of the literature around text classification treats it as a supervi...

Please sign up or login with your details

Forgot password? Click here to reset