metboost: Exploratory regression analysis with hierarchically clustered data

02/13/2017
by   Patrick J. Miller, et al.
0

As data collections become larger, exploratory regression analysis becomes more important but more challenging. When observations are hierarchically clustered the problem is even more challenging because model selection with mixed effect models can produce misleading results when nonlinear effects are not included into the model (Bauer and Cai, 2009). A machine learning method called boosted decision trees (Friedman, 2001) is a good approach for exploratory regression analysis in real data sets because it can detect predictors with nonlinear and interaction effects while also accounting for missing data. We propose an extension to boosted decision decision trees called metboost for hierarchically clustered data. It works by constraining the structure of each tree to be the same across groups, but allowing the terminal node means to differ. This allows predictors and split points to lead to different predictions within each group, and approximates nonlinear group specific effects. Importantly, metboost remains computationally feasible for thousands of observations and hundreds of predictors that may contain missing values. We apply the method to predict math performance for 15,240 students from 751 schools in data collected in the Educational Longitudinal Study 2002 (Ingels et al., 2007), allowing 76 predictors to have unique effects for each school. When comparing results to boosted decision trees, metboost has 15 improved prediction performance. Results of a large simulation study show that metboost has up to 70 improved prediction performance compared to boosted decision trees when group sizes are small

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2015

Finding structure in data using multivariate tree boosting

Technology and collaboration enable dramatic increases in the size of ps...
research
06/29/2020

Handling Missing Data in Decision Trees: A Probabilistic Approach

Decision trees are a popular family of models due to their attractive pr...
research
04/26/2018

Handling Missing Values using Decision Trees with Branch-Exclusive Splits

In this article we propose a new decision tree construction algorithm. T...
research
02/05/2022

Backtrack Tie-Breaking for Decision Trees: A Note on Deodata Predictors

A tie-breaking method is proposed for choosing the predicted class, or o...
research
03/06/2021

visTree: Visualization of Subgroups for a Decision Tree

Decision trees are flexible prediction models which are constructed to q...
research
06/03/2013

Prediction with Missing Data via Bayesian Additive Regression Trees

We present a method for incorporating missing data in non-parametric sta...
research
04/09/2018

Sample-Derived Disjunctive Rules for Secure Power System Operation

Machine learning techniques have been used in the past using Monte Carlo...

Please sign up or login with your details

Forgot password? Click here to reset