A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables

05/09/2019
by   Shu Wang, et al.
0

Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers subject to limits of detection (LOD) are common to encounter in clinical data, which brings censored variables into finite mixture model. Additionally, researchers are recently getting more interest in variable importance due to the increasing number of variables that become available for clustering. To address these challenges, we propose a Bayesian finite mixture model to simultaneously conduct variable selection, account for biomarker LOD and obtain clustering results. We took a Bayesian approach to obtain parameter estimates and the cluster membership to bypass the limitation of the EM algorithm. To account for LOD, we added one more step in Gibbs sampling to iteratively fill in biomarker values below or above LODs. In addition, we put a spike-and-slab type of prior on each variable to obtain variable importance. Simulations across various scenarios were conducted to examine the performance of this method. Real data application on electronic health records was also conducted.

READ FULL TEXT
research
05/25/2023

Flexible Variable Selection for Clustering and Classification

The importance of variable selection for clustering has been recognized ...
research
05/06/2019

Hybrid Density- and Partition-based Clustering Algorithm for Data with Mixed-type Variables

Clustering is an essential technique for discovering patterns in data. T...
research
01/31/2017

Variable selection for clustering with Gaussian mixture models: state of the art

The mixture models have become widely used in clustering, given its prob...
research
10/22/2020

A Normal-Gamma Dirichlet Process Mixture Model

We propose a Dirichlet process mixture (DPM) for prediction and cluster-...
research
01/02/2018

Variable selection in Functional Additive Regression Models

This paper considers the problem of variable selection when some of the ...
research
10/13/2020

Mixed data Deep Gaussian Mixture Model: A clustering model for mixed datasets

Clustering mixed data presents numerous challenges inherent to the very ...
research
03/30/2022

Benchmarking distance-based partitioning methods for mixed-type data

Clustering mixed-type data, that is, observation by variable data that c...

Please sign up or login with your details

Forgot password? Click here to reset