FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

06/17/2020
by   Yuancheng Xu, et al.
9

This paper proposes FREEtree, a tree-based method for high dimensional longitudinal data with correlated features. Popular machine learning approaches, like Random Forests, commonly used for variable selection do not perform well when there are correlated features and do not account for data observed over time. FREEtree deals with longitudinal data by using a piecewise random effects model. It also exploits the network structure of the features by first clustering them using weighted correlation network analysis, namely WGCNA. It then conducts a screening step within each cluster of features and a selection step among the surviving features, that provides a relatively unbiased way to select features. By using dominant principle components as regression variables at each leaf and the original features as splitting variables at splitting nodes, FREEtree maintains its interpretability and improves its computational efficiency. The simulation results show that FREEtree outperforms other tree-based methods in terms of prediction accuracy, feature selection accuracy, as well as the ability to recover the underlying structure.

READ FULL TEXT

page 12

page 13

page 14

page 15

research
07/01/2021

ControlBurn: Feature Selection by Sparse Forests

Tree ensembles distribute feature importance evenly amongst groups of co...
research
03/24/2021

A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

When fitting statistical models, some predictors are often found to be c...
research
09/21/2012

Regression trees for longitudinal and multiresponse data

Previous algorithms for constructing regression tree models for longitud...
research
11/11/2019

LMLFM: Longitudinal Multi-Level Factorization Machines

Selecting important variables and learning predictive models from high-d...
research
12/10/2015

Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance

Recursive partitioning approaches producing tree-like models are a long ...
research
04/02/2022

Structural randomised selection

An important problem in the analysis of high-dimensional omics data is t...
research
01/31/2019

Random forests for high-dimensional longitudinal data

Random forests is a state-of-the-art supervised machine learning method ...

Please sign up or login with your details

Forgot password? Click here to reset