The similarity metric

11/20/2001
by   Ming Li, et al.
0

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new "normalized information distance", based on the noncomputable notion of Kolmogorov complexity, and show that it is in this class and it minorizes every computable distance in the class (that is, it is universal in that it discovers all computable similarities). We demonstrate that it is a metric and call it the similarity metric. This theory forms the foundation for a new practical tool. To evidence generality and robustness we give two distinctive applications in widely divergent areas using standard compression programs like gzip and GenCompress. First, we compare whole mitochondrial genomes and infer their evolutionary history. This results in a first completely automatic computed whole mitochondrial phylogeny tree. Secondly, we fully automatically compute the language tree of 52 different languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2003

Clustering by compression

We present a new method for clustering based on compression. The method ...
research
02/17/2006

Similarity of Objects and the Meaning of Words

We survey the emerging area of compression-based, parameter-free, simila...
research
02/20/2015

Web Similarity

Normalized web distance (NWD) is a similarity or normalized semantic dis...
research
11/06/2017

The Fréchet distance of surfaces is computable

We show that the Fréchet distance of two-dimensional parametrised surfac...
research
11/27/2019

Measuring similarity between two mixture trees using mixture distance metric and algorithms

Ancestral mixture model, proposed by Chen and Lindsay (2006), is an impo...
research
10/31/2020

Measuring Place Function Similarity with Trajectory Embedding

Modeling place functions from a computational perspective is a prevalent...
research
11/23/2010

Evolutionary distances in the twilight zone -- a rational kernel approach

Phylogenetic tree reconstruction is traditionally based on multiple sequ...

Please sign up or login with your details

Forgot password? Click here to reset