Speeding up Word Mover's Distance and its variants via properties of distances between embeddings

12/01/2019
by   Matheus Werner, et al.
0

The Word Mover's Distance (WMD) proposed in Kusner et al. [ICML,2015] is a distance between documents that takes advantage of semantic relations among words that are captured by their embeddings. This distance proved to be quite effective, obtaining state-of-art error rates for classification tasks, but also impracticable for large collections/documents due to its computational complexity. For circumventing this problem, variants of WMD have been proposed. Among them, Relaxed Word Mover's Distance (RWMD) is one of the most successful due to its simplicity, effectiveness, and also because of its fast implementations. Relying on assumptions that are supported by empirical properties of the distances between embeddings, we propose an approach to speed up both WMD and RWMD. Experiments over 10 datasets suggest that our approach leads to a significant speed-up in document classification tasks while maintaining the same error rates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2018

Word Mover's Embedding: From Word2Vec to Document Embedding

While the celebrated Word2Vec technique yields semantically rich represe...
research
11/18/2019

Improving Document Classification with Multi-Sense Embeddings

Efficient representation of text documents is an important building bloc...
research
10/14/2021

WMDecompose: A Framework for Leveraging the Interpretable Properties of Word Mover's Distance in Sociocultural Analysis

Despite the increasing popularity of NLP in the humanities and social sc...
research
05/17/2017

Utility of general and specific word embeddings for classifying translational stages of research

Conventional text classification models make a bag-of-words assumption r...
research
04/23/2019

Wasserstein-Fisher-Rao Document Distance

As a fundamental problem of natural language processing, it is important...
research
07/01/2020

Fused Text Recogniser and Deep Embeddings Improve Word Recognition and Retrieval

Recognition and retrieval of textual content from the large document col...
research
11/20/2017

Linear-Complexity Relaxed Word Mover's Distance with GPU Acceleration

The amount of unstructured text-based data is growing every day. Queryin...

Please sign up or login with your details

Forgot password? Click here to reset