Minkowski distances and standardisation for clustering and classification of high dimensional data

11/29/2019
by   Christian Hennig, et al.
0

There are many distance-based methods for classification and clustering, and for data with a high number of dimensions and a lower number of observations, processing distances is computationally advantageous compared to the raw data matrix. Euclidean distances are used as a default for continuous multivariate data, but there are alternatives. Here the so-called Minkowski distances, L_1 (city block)-, L_2 (Euclidean)-, L_3-, L_4-, and maximum distances are combined with different schemes of standardisation of the variables before aggregating them. Boxplot transformation is proposed, a new transformation method for a single variable that standardises the majority of observations but brings outliers closer to the main bulk of the data. Distances are compared in simulations for clustering by partitioning around medoids, complete and average linkage, and classification by nearest neighbours, of data with a low number of observations but high dimensionality. The L_1-distance and the boxplot transformation show good results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2015

A Probabilistic ℓ_1 Method for Clustering High Dimensional Data

In general, the clustering problem is NP-hard, and global optimality can...
research
06/27/2023

A new classification framework for high-dimensional data

Classification is a classic problem but encounters lots of challenges wh...
research
08/29/2006

Neural Network Clustering Based on Distances Between Objects

We present an algorithm of clustering of many-dimensional objects, where...
research
01/04/2023

A general framework for implementing distances for categorical variables

The degree to which subjects differ from each other with respect to cert...
research
12/07/2022

Designing Feature Vector Representations: A case study from Chemistry

We present a case study investigating feature descriptors in the context...
research
08/20/2015

Review and Perspective for Distance Based Trajectory Clustering

In this paper we tackle the issue of clustering trajectories of geolocal...
research
08/13/2017

Mahalanonbis Distance Informed by Clustering

A fundamental question in data analysis, machine learning and signal pro...

Please sign up or login with your details

Forgot password? Click here to reset