False clustering rate control in mixture models

03/04/2022
by   Ariane Marandon, et al.
0

The clustering task consists in delivering labels to the members of a sample. For most data sets, some individuals are ambiguous and intrinsically difficult to attribute to one or another cluster. However, in practical applications, misclassifying individuals is potentially disastrous. To overcome this difficulty, the idea followed here is to classify only a part of the sample in order to obtain a small misclassification rate. This approach is well known in the supervised setting, and referred to as classification with an abstention option. The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework. The problem is formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items. New procedures are introduced and their behavior is shown to be close to the optimal one by establishing theoretical results and conducting numerical experiments. An application to breast cancer data illustrates the benefits of the new approach from a practical viewpoint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2020

Mixture Complexity and Its Application to Gradual Clustering Change Detection

In model-based clustering using finite mixture models, it is a significa...
research
12/05/2019

A sparse negative binomial mixture model for clustering RNA-seq count data

Clustering with variable selection is a challenging but critical task fo...
research
03/30/2017

The Informativeness of k-Means and Dimensionality Reduction for Learning Mixture Models

The learning of mixture models can be viewed as a clustering problem. In...
research
06/30/2020

Autoregressive Mixture Models for Serial Correlation Clustering of Time Series Data

Clustering individuals into similar groups in longitudinal studies can i...
research
07/29/2017

A generalized multivariate Student-t mixture model for Bayesian classification and clustering of radar waveforms

In this paper, a generalized multivariate Student-t mixture model is dev...
research
03/06/2017

Classification and clustering for samples of event time data using non-homogeneous Poisson process models

Data of the form of event times arise in various applications. A simple ...

Please sign up or login with your details

Forgot password? Click here to reset