Structural randomised selection

04/02/2022
by   Fan Wang, et al.
0

An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong multicollinearity and model uncertainty. We propose a new ensemble learning approach for improving the performance of sparse penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse penalised regression approach as the "base learner". Using synthetic data and real biological datasets, we demonstrate that STRANDS typically improves upon its base learner, and that taking account of the correlation structure in the first step can help to improve the efficiency with which the model space may be explored.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2020

A variable selection approach for highly correlated predictors in high-dimensional genomic data

In genomic studies, identifying biomarkers associated with a variable of...
research
09/24/2021

A comprehensive review of variable selection in high-dimensional regression for molecular biology

Variable selection methods are widely used in molecular biology to detec...
research
10/25/2022

Improving Group Lasso for high-dimensional categorical data

Sparse modelling or model selection with categorical data is challenging...
research
06/17/2020

FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

This paper proposes FREEtree, a tree-based method for high dimensional l...
research
07/03/2023

Variable selection in a specific regression time series of counts

Time series of counts occurring in various applications are often overdi...
research
10/13/2017

Sparse Weighted Canonical Correlation Analysis

Given two data matrices X and Y, sparse canonical correlation analysis (...
research
08/09/2014

LARSEN-ELM: Selective Ensemble of Extreme Learning Machines using LARS for Blended Data

Extreme learning machine (ELM) as a neural network algorithm has shown i...

Please sign up or login with your details

Forgot password? Click here to reset