Document Similarity from Vector Space Densities

09/01/2020
by   Ilia Rushkin, et al.
0

We propose a computationally light method for estimating similarities between text documents, which we call the density similarity (DS) method. The method is based on a word embedding in a high-dimensional Euclidean space and on kernel regression, and takes into account semantic relations among words. We find that the accuracy of this method is virtually the same as that of a state-of-the-art method, while the gain in speed is very substantial. Additionally, we introduce generalized versions of the top-k accuracy metric and of the Jaccard metric of agreement between similarity models.

READ FULL TEXT

page 8

page 9

research
06/24/2016

Issues in evaluating semantic spaces using word analogies

The offset method for solving word analogies has become a standard evalu...
research
11/29/2017

Embedding Words as Distributions with a Bayesian Skip-gram Model

We introduce a method for embedding words as probability densities in a ...
research
09/02/2016

Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second--Order Vectors

Vector space methods that measure semantic similarity and relatedness of...
research
01/27/2018

Improving Word Vector with Prior Knowledge in Semantic Dictionary

Using low dimensional vector space to represent words has been very effe...
research
11/01/2019

Finding the most similar textual documents using Case-Based Reasoning

In recent years, huge amounts of unstructured textual data on the Intern...
research
02/07/2022

Moving Other Way: Exploring Word Mover Distance Extensions

The word mover's distance (WMD) is a popular semantic similarity metric ...
research
08/16/2021

IsoScore: Measuring the Uniformity of Vector Space Utilization

The recent success of distributed word representations has led to an inc...

Please sign up or login with your details

Forgot password? Click here to reset