Degrees of Freedom and Model Selection for kmeans Clustering

06/06/2018
by   David P. Hofmeyr, et al.
0

This paper investigates the problem of model selection for kmeans clustering, based on conservative estimates of the model degrees of freedom. An extension of Stein's lemma, which is used in unbiased risk estimation, is used to obtain an expression which allows one to approximate the degrees of freedom. Empirically based estimates of this approximation are obtained. The degrees of freedom estimates are then used within the popular Bayesian Information Criterion to perform model selection. The proposed estimation procedure is validated in a thorough simulation study, and the robustness is assessed through relaxations of the modelling assumptions and on data from real applications. Comparisons with popular existing techniques suggest that this approach performs extremely well when the modelling assumptions

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2019

Computing the degrees of freedom of rank-regularized estimators and cousins

Estimating a low rank matrix from its linear measurements is a problem o...
research
07/06/2022

Degrees of Freedom and Information Criteria for the Synthetic Control Method

We provide an analytical characterization of the model flexibility of th...
research
01/18/2021

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates

When teaching and discussing statistical assumptions, our focus is often...
research
09/03/2021

Bayesian Estimation of the Degrees of Freedom Parameter of the Student-t Distribution—A Beneficial Re-parameterization

In this paper, conditional data augmentation (DA) is investigated for th...
research
04/14/2022

On Measuring Model Complexity in Heteroscedastic Linear Regression

Heteroscedasticity is common in real world applications and is often han...
research
12/28/2018

Shortcuts to Thermodynamic Computing: The Cost of Fast and Faithful Erasure

Landauer's Principle states that the energy cost of information processi...
research
08/25/2023

Degrees of Freedom: Search Cost and Self-consistency

Model degrees of freedom () is a fundamental concept in statistics becau...

Please sign up or login with your details

Forgot password? Click here to reset