Finding structure in data using multivariate tree boosting

11/06/2015
by   Patrick J. Miller, et al.
0

Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles like random forests (Strobl, Malley, and Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called Gradient Boosted Regression Trees (Friedman, 2001). Our method, multivariate tree boosting, can be used for identifying important predictors, detecting predictors with non-linear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary without parametric assumptions. We provide the R package 'mvtboost' to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package 'gbm' (Ridgeway, 2013) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff and Keyes, 1995). Simulations verify that our approach identifies predictors with non-linear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2022

Backtrack Tie-Breaking for Decision Trees: A Note on Deodata Predictors

A tie-breaking method is proposed for choosing the predicted class, or o...
research
02/13/2017

metboost: Exploratory regression analysis with hierarchically clustered data

As data collections become larger, exploratory regression analysis becom...
research
08/01/2022

Accelerated and interpretable oblique random survival forests

The oblique random survival forest (RSF) is an ensemble supervised learn...
research
10/12/2009

Node harvest

When choosing a suitable technique for regression and classification wit...
research
04/15/2020

Exploiting Categorical Structure Using Tree-Based Methods

Standard methods of using categorical variables as predictors either end...
research
06/11/2023

Well-Calibrated Probabilistic Predictive Maintenance using Venn-Abers

When using machine learning for fault detection, a common problem is the...
research
05/09/2020

JigSaw: A tool for discovering explanatory high-order interactions from random forests

Machine learning is revolutionizing biology by facilitating the predicti...

Please sign up or login with your details

Forgot password? Click here to reset