Clustering of variables for enhanced interpretability of predictive models

08/18/2020
by   Evelyne Vigneau, et al.
0

A new strategy is proposed for building easy to interpret predictive models in the context of a high-dimensional dataset, with a large number of highly correlated explanatory variables. The strategy is based on a first step of variables clustering using the CLustering of Variables around Latent Variables (CLV) method. The exploration of the hierarchical clustering dendrogram is undertaken in order to sequentially select the explanatory variables in a group-wise fashion. For model setting implementation, the dendrogram is used as the base-learner in an L2-boosting procedure. The proposed approach, named lmCLV, is illustrated on the basis of a toy-simulated example when the clusters and predictive equation are already known, and on a real case study dealing with the authentication of orange juices based on 1H-NMR spectroscopic analysis. In both illustrative examples, this procedure was shown to have similar predictive efficiency to other methods, with additional interpretability capacity. It is available in the R package ClustVarLV.

READ FULL TEXT
research
02/26/2020

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

In this paper, we introduce a two step methodology to extract a hierarch...
research
03/11/2016

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for varia...
research
06/30/2020

Hierarchical Qualitative Clustering – clustering mixed datasets with critical qualitative information

Clustering can be used to extract insights from data or to verify some o...
research
12/22/2020

Separating and reintegrating latent variables to improve classification of genomic data

Genomic datasets contain the effects of various unobserved biological va...
research
01/31/2022

Hierarchical clustering of mixed-type data based on barycentric coding

Clustering of mixed-type datasets can be a particularly challenging task...
research
12/13/2019

Understanding complex predictive models with Ghost Variables

We propose a procedure for assigning a relevance measure to each explana...
research
06/30/2021

Bayesian clustering using random effects models and predictive projections

Linear mixed models are widely used for analyzing hierarchically structu...

Please sign up or login with your details

Forgot password? Click here to reset