Distances between Data Sets Based on Summary Statistics

02/04/2019
by   Nikolaj Tatti, et al.
0

The concepts of similarity and distance are crucial in data mining. We consider the problem of defining the distance between two data sets by comparing summary statistics computed from the data sets. The initial definition of our distance is based on geometrical notions of certain sets of distributions. We show that this distance can be computed in cubic time and that it has several intuitive properties. We also show that this distance is the unique Mahalanobis distance satisfying certain assumptions. We also demonstrate that if we are dealing with binary data sets, then the distance can be represented naturally by certain parity functions, and that it can be evaluated in linear time. Our empirical tests with real world data show that the distance works well.

READ FULL TEXT

page 18

page 19

research
05/18/2017

Exemplar or Matching: Modeling DCJ Problems with Unequal Content Genome Data

The edit distance under the DCJ model can be computed in linear time for...
research
10/18/2018

Stochastic Distance Transform

The distance transform (DT) and its many variations are ubiquitous tools...
research
05/18/2008

Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy

Data analysis and data mining are concerned with unsupervised pattern fi...
research
02/17/2021

Unbiased Estimations based on Binary Classifiers: A Maximum Likelihood Approach

Binary classifiers trained on a certain proportion of positive items int...
research
03/02/2023

Encoding of data sets and algorithms

In many high-impact applications, it is important to ensure the quality ...
research
09/27/2021

Heat diffusion distance processes: a statistically founded method to analyze graph data sets

We propose two multiscale comparisons of graphs using heat diffusion, al...
research
10/31/2016

A New Distance Measure for Non-Identical Data with Application to Image Classification

Distance measures are part and parcel of many computer vision algorithms...

Please sign up or login with your details

Forgot password? Click here to reset