Skewed Distributions or Transformations? Modelling Skewness for a Cluster Analysis

11/18/2020
by   Michael P. B. Gallaugher, et al.
0

Because of its mathematical tractability, the Gaussian mixture model holds a special place in the literature for clustering and classification. For all its benefits, however, the Gaussian mixture model poses problems when the data is skewed or contains outliers. Because of this, methods have been developed over the years for handling skewed data, and fall into two general categories. The first is to consider a mixture of more flexible skewed distributions, and the second is based on incorporating a transformation to near normality. Although these methods have been compared in their respective papers, there has yet to be a detailed comparison to determine when one method might be more suitable than the other. Herein, we provide a detailed comparison on many benchmarking datasets, as well as describe a novel method to assess cluster separation.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

01/05/2020

Cutoff for exact recovery of Gaussian mixture models

We determine the cutoff value on separation of cluster centers for exact...
06/26/2019

Unsupervised Methods for Identifying Pass Coverage Among Defensive Backs with NFL Player Tracking Data

Analysis of player tracking data for American football is in its infancy...
01/20/2019

Fitting A Mixture Distribution to Data: Tutorial

This paper is a step-by-step tutorial for fitting a mixture distribution...
12/28/2015

Outlier Detection In Large-scale Traffic Data By Naïve Bayes Method and Gaussian Mixture Model Method

It is meaningful to detect outliers in traffic data for traffic manageme...
06/01/2021

ClustRank: a Visual Quality Measure Trained on Perceptual Data for Sorting Scatterplots by Cluster Patterns

Visual quality measures (VQMs) are designed to support analysts by autom...
02/21/2020

Petrophysically and geologically guided multi-physics inversion using a dynamic Gaussian mixture model

In a previous paper, we introduced a framework for carrying out petrophy...
03/22/2017

A probabilistic approach to emission-line galaxy classification

We invoke a Gaussian mixture model (GMM) to jointly analyse two traditio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.