Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

02/07/2018
by   Alexander New, et al.
0

We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopulations and their associations with the regression target. We introduce a discriminative model that simultaneously learns cadre assignment and target-prediction rules. Sparsity-inducing priors are placed on the model parameters, under which independent feature selection is performed for both the cadre assignment and target-prediction processes. We learn models using adaptive step size stochastic gradient descent, and we assess cadre quality with bootstrapped sample analysis. We present simulated results showing that, when the true clustering rule does not depend on the entire set of features, our method significantly outperforms methods that learn subpopulation-discovery and target-prediction rules separately. In a materials-by-design case study, our model provides state-of-the-art prediction of polymer glass transition temperature. Importantly, the method identifies cadres of polymers that respond differently to structural perturbations, thus providing design insight for targeting or avoiding specific transition temperature ranges. It identifies chemically meaningful cadres, each with interpretable models. Further experimental results show that cadre methods have generalization that is competitive with linear and nonlinear regression models and can identify robust subpopulations.

READ FULL TEXT
research
11/27/2020

Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials

In Artificial Intelligence we often seek to identify an unknown target f...
research
10/30/2017

Contextual Regression: An Accurate and Conveniently Interpretable Nonlinear Model for Mining Discovery from Scientific Data

Machine learning algorithms such as linear regression, SVM and neural ne...
research
04/29/2022

The Directional Bias Helps Stochastic Gradient Descent to Generalize in Kernel Regression Models

We study the Stochastic Gradient Descent (SGD) algorithm in nonparametri...
research
12/01/2021

Training Experimentally Robust and Interpretable Binarized Regression Models Using Mixed-Integer Programming

In this paper, we explore model-based approach to training robust and in...
research
10/10/2022

When to encourage using Gaussian regression for feature selection tasks with time-to-event outcome

IMPORTANCE: Feature selection with respect to time-to-event outcomes is ...
research
11/17/2022

An Advantage Using Feature Selection with a Quantum Annealer

Feature selection is a technique in statistical prediction modeling that...
research
10/16/2017

Causal Rule Sets for Identifying Subgroups with Enhanced Treatment Effect

We introduce a novel generative model for interpretable subgroup analysi...

Please sign up or login with your details

Forgot password? Click here to reset