A Novel Intrinsic Measure of Data Separability

09/11/2021
by   Shuyue Guan, et al.
0

In machine learning, the performance of a classifier depends on both the classifier model and the separability/complexity of datasets. To quantitatively measure the separability of datasets, we create an intrinsic measure – the Distance-based Separability Index (DSI), which is independent of the classifier model. We consider the situation in which different classes of data are mixed in the same distribution to be the most difficult for classifiers to separate. We then formally show that the DSI can indicate whether the distributions of datasets are identical for any dimensionality. And we verify the DSI to be an effective separability measure by comparing to several state-of-the-art separability/complexity measures using synthetic and real datasets. Having demonstrated the DSI's ability to compare distributions of samples, we also discuss some of its other promising applications, such as measuring the performance of generative adversarial networks (GANs) and evaluating the results of clustering methods.

READ FULL TEXT

page 9

page 14

research
05/27/2020

Data Separability for Neural Network Classifiers and the Development of a Separability Index

In machine learning, the performance of a classifier depends on both the...
research
05/02/2019

Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality

Generative Adversarial Networks (GANs) are an elegant mechanism for data...
research
06/09/2020

Towards an Intrinsic Definition of Robustness for a Classifier

The robustness of classifiers has become a question of paramount importa...
research
11/10/2022

A classification performance evaluation measure considering data separability

Machine learning and deep learning classification models are data-driven...
research
10/13/2018

Measuring Swampiness: Quantifying Chaos in Large Heterogeneous Data Repositories

As scientific data repositories and filesystems grow in size and complex...
research
07/15/2021

A multi-schematic classifier-independent oversampling approach for imbalanced datasets

Over 85 oversampling algorithms, mostly extensions of the SMOTE algorith...
research
12/27/2018

Evaluating Generative Adversarial Networks on Explicitly Parameterized Distributions

The true distribution parameterizations of commonly used image datasets ...

Please sign up or login with your details

Forgot password? Click here to reset