A Triangle Inequality for Cosine Similarity

07/08/2021
by   Erich Schubert, et al.
0

Similarity search is a fundamental problem for many data analysis techniques. Many efficient search techniques rely on the triangle inequality of metrics, which allows pruning parts of the search space based on transitive bounds on distances. Recently, Cosine similarity has become a popular alternative choice to the standard Euclidean metric, in particular in the context of textual data and neural network embeddings. Unfortunately, Cosine similarity is not metric and does not satisfy the standard triangle inequality. Instead, many search techniques for Cosine rely on approximation techniques such as locality sensitive hashing. In this paper, we derive a triangle inequality for Cosine similarity that is suitable for efficient similarity search with many standard search structures (such as the VP-tree, Cover-tree, and M-tree); show that this bound is tight and discuss fast approximations for it. We hope that this spurs new research on accelerating exact similarity search for cosine similarity, and possible other similarity measures beyond the existing work for distance metrics.

READ FULL TEXT

page 7

page 8

page 9

research
02/14/2020

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Distances are pervasive in machine learning. They serve as similarity me...
research
01/16/2013

The Anchors Hierachy: Using the triangle inequality to survive high dimensional data

This paper is about metric data structures in high-dimensional or non-Eu...
research
08/19/2022

A Ptolemaic Partitioning Mechanism

For many years, exact metric search relied upon the property of triangle...
research
07/16/2014

In Defense of MinHash Over SimHash

MinHash and SimHash are the two widely adopted Locality Sensitive Hashin...
research
10/10/2018

A Similarity Measure for Weaving Patterns in Textiles

We propose a novel approach for measuring the similarity between weaving...
research
05/07/2020

Indexing Metric Spaces for Exact Similarity Search

With the continued digitalization of societal processes, we are seeing a...
research
06/30/2018

The Historical Significance of Textual Distances

Measuring similarity is a basic task in information retrieval, and now o...

Please sign up or login with your details

Forgot password? Click here to reset