Normalized Compression Distance of Multisets with Applications

12/22/2012
by   Andrew R. Cohen, et al.
0

Normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity measure between a pair of finite objects based on compression. However, it is not sufficient for all applications. We propose an NCD of finite multisets (a.k.a. multiples) of finite objects that is also a metric. Previously, attempts to obtain such an NCD failed. We cover the entire trajectory from theoretical underpinning to feasible practice. The new NCD for multisets is applied to retinal progenitor cell classification questions and to related synthetically generated data that were earlier treated with the pairwise NCD. With the new method we achieved significantly better results. Similarly for questions about axonal organelle transport. We also applied the new NCD to handwritten digit recognition and improved classification accuracy significantly over that of pairwise NCD by incorporating both the pairwise and NCD for multisets. In the analysis we use the incomputable Kolmogorov complexity that for practical purposes is approximated from above by the length of the compressed version of the file involved, using a real-world compression program. Index Terms--- Normalized compression distance, multisets or multiples, pattern recognition, data mining, similarity, classification, Kolmogorov complexity, retinal progenitor cells, synthetic data, organelle transport, handwritten character recognition

READ FULL TEXT

page 18

page 23

research
06/16/2010

Normalized Information Distance is Not Semicomputable

Normalized information distance (NID) uses the theoretical notion of Kol...
research
11/26/2018

A Consolidated Approach to Convolutional Neural Networks and the Kolmogorov Complexity

The ability to precisely quantify similarity between various entities ha...
research
12/19/2003

Clustering by compression

We present a new method for clustering based on compression. The method ...
research
06/05/2018

RG Smoothing Algorithm Which Makes Data Compression

I describe a new method for smoothing a one-dimensional curve in Euclidi...
research
10/21/2014

Generalized Compression Dictionary Distance as Universal Similarity Measure

We present a new similarity measure based on information theoretic measu...
research
07/01/2023

Applications of Binary Similarity and Distance Measures

In the recent past, binary similarity measures have been applied in solv...

Please sign up or login with your details

Forgot password? Click here to reset