DeepAI AI Chat
Log In Sign Up

A Triangle Inequality for Cosine Similarity

07/08/2021
by   Erich Schubert, et al.
0

Similarity search is a fundamental problem for many data analysis techniques. Many efficient search techniques rely on the triangle inequality of metrics, which allows pruning parts of the search space based on transitive bounds on distances. Recently, Cosine similarity has become a popular alternative choice to the standard Euclidean metric, in particular in the context of textual data and neural network embeddings. Unfortunately, Cosine similarity is not metric and does not satisfy the standard triangle inequality. Instead, many search techniques for Cosine rely on approximation techniques such as locality sensitive hashing. In this paper, we derive a triangle inequality for Cosine similarity that is suitable for efficient similarity search with many standard search structures (such as the VP-tree, Cover-tree, and M-tree); show that this bound is tight and discuss fast approximations for it. We hope that this spurs new research on accelerating exact similarity search for cosine similarity, and possible other similarity measures beyond the existing work for distance metrics.

READ FULL TEXT

page 7

page 8

page 9

02/14/2020

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Distances are pervasive in machine learning. They serve as similarity me...
01/16/2013

The Anchors Hierachy: Using the triangle inequality to survive high dimensional data

This paper is about metric data structures in high-dimensional or non-Eu...
08/19/2022

A Ptolemaic Partitioning Mechanism

For many years, exact metric search relied upon the property of triangle...
07/16/2014

In Defense of MinHash Over SimHash

MinHash and SimHash are the two widely adopted Locality Sensitive Hashin...
10/10/2018

A Similarity Measure for Weaving Patterns in Textiles

We propose a novel approach for measuring the similarity between weaving...
05/07/2020

Indexing Metric Spaces for Exact Similarity Search

With the continued digitalization of societal processes, we are seeing a...
11/04/2019

Novel semi-metrics for multivariate change point analysis and anomaly detection

This paper proposes a new method for determining similarity and anomalie...