A comparison of Gap statistic definitions with and without logarithm function

by   Mojgan Mohajer, et al.

The Gap statistic is a standard method for determining the number of clusters in a set of data. The Gap statistic standardizes the graph of (W_k), where W_k is the within-cluster dispersion, by comparing it to its expectation under an appropriate null reference distribution of the data. We suggest to use W_k instead of (W_k), and to compare it to the expectation of W_k under a null reference distribution. In fact, whenever a number fulfills the original Gap statistic inequality, this number also fulfills the inequality of a Gap statistic using W_k, but not vice versa. The two definitions of the Gap function are evaluated on several simulated data sets and on a real data of DCE-MR images.



There are no comments yet.


page 14


Testing for Stochastic Order in Interval-Valued Data

We construct a procedure to test the stochastic order of two samples of ...

Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Cluster analysis is an unsupervised learning strategy that can be employ...

New statistic for detecting laboratory effects in ORDANOVA

The present study defines a new statistic for detecting laboratory effec...

Estimation of the weighted integrated square error of the Grenander estimator by the Kolmogorov-Smirnov statistic

We consider in this paper the Grenander estimator of unbounded, in gener...

Comparing a Large Number of Multivariate Distributions

In this paper, we propose a test for the equality of multiple distributi...

A Higher-Order Kolmogorov-Smirnov Test

We present an extension of the Kolmogorov-Smirnov (KS) two-sample test, ...

On the influence function for the Theil-like class of inequality measures

On one hand, a large class of inequality measures, which includes the ge...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.