Energy Trees: Regression and Classification With Structured and Mixed-Type Covariates

07/10/2022
by   Riccardo Giubilei, et al.
0

The continuous growth of data complexity requires methods and models that adequately account for non-trivial structures, as any simplification may induce loss of information. Many analytical tools have been introduced to work with complex data objects in their original form, but such tools can typically deal with single-type variables only. In this work, we propose Energy Trees as a model for regression and classification tasks where covariates are potentially both structured and of different types. Energy Trees incorporate Energy Statistics to generalize Conditional Trees, from which they inherit statistically sound foundations, interpretability, scale invariance, and lack of distributional assumptions. We focus on functions and graphs as structured covariates and we show how the model can be easily adapted to work with almost any other type of variable. Through an extensive simulation study, we highlight the good performance of our proposal in terms of variable selection and robustness to overfitting. Finally, we validate the model's predictive ability through two empirical analyses with human biological data.

READ FULL TEXT
research
07/06/2019

The revisited knockoffs method for variable selection in L1-penalised regressions

We consider the problem of variable selection in regression models. In p...
research
01/06/2021

Joint Variable Selection of both Fixed and Random Effects for Gaussian Process-based Spatially Varying Coefficient Models

Spatially varying coefficient (SVC) models are a type of regression mode...
research
06/06/2018

ABC Variable Selection with Bayesian Forests

Few problems in statistics are as perplexing as variable selection in th...
research
01/13/2023

Scalable Estimation for Structured Additive Distributional Regression

Recently, fitting probabilistic models have gained importance in many ar...
research
07/16/2020

Principled Selection of Baseline Covariates to Account for Censoring in Randomized Trials with a Survival Endpoint

The analysis of randomized trials with time-to-event endpoints is nearly...
research
05/08/2020

Flexible co-data learning for high-dimensional prediction

Clinical research often focuses on complex traits in which many variable...
research
07/27/2021

Subset selection for linear mixed models

Linear mixed models (LMMs) are instrumental for regression analysis with...

Please sign up or login with your details

Forgot password? Click here to reset