Controlling false discoveries in high-dimensional situations: Boosting with stability selection

11/05/2014
by   Benjamin Hofner, et al.
0

Modern biotechnologies often result in high-dimensional data sets with much more variables than observations (n ≪ p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provides insights into the usefulness of this combination. Limitations are discussed and guidance on the specification and tuning of stability selection is given. The interpretation of the used error bounds is elaborated and insights for practical data analysis are given. The results will be used to detect differentially expressed phenotype measurements in patients with autism spectrum disorders. All methods are implemented in the freely available R package stabs.

READ FULL TEXT

page 9

page 20

page 21

page 22

research
07/03/2018

Controlling the False Discovery Rate via Knockoff for High Dimensional Ising Model Variable Selection

In high dimensional data analysis, it is important to effectively contro...
research
12/13/2017

Stability Selection for Structured Variable Selection

In variable or graph selection problems, finding a right-sized model or ...
research
02/10/2022

Loss-guided Stability Selection

In modern data analysis, sparse model selection becomes inevitable once ...
research
05/30/2012

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Over the past decades, statisticians and machine-learning researchers ha...
research
10/05/2017

A Universal Simulation Platform for Flexible Systems

This article proposes a universal simulation platform for simulating sys...
research
08/02/2018

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Penalized likelihood methods are widely used for high-dimensional regres...
research
02/10/2017

L_2Boosting for Economic Applications

In the recent years more and more high-dimensional data sets, where the ...

Please sign up or login with your details

Forgot password? Click here to reset