Ultra-high Dimensional Multiple Output Learning With Simultaneous Orthogonal Matching Pursuit: A Sure Screening Approach

12/17/2010
by   Mladen Kolar, et al.
0

We propose a novel application of the Simultaneous Orthogonal Matching Pursuit (S-OMP) procedure for sparsistant variable selection in ultra-high dimensional multi-task regression problems. Screening of variables, as introduced in fan08sis, is an efficient and highly scalable way to remove many irrelevant variables from the set of all variables, while retaining all the relevant variables. S-OMP can be applied to problems with hundreds of thousands of variables and once the number of variables is reduced to a manageable size, a more computationally demanding procedure can be used to identify the relevant variables for each of the regression outputs. To our knowledge, this is the first attempt to utilize relatedness of multiple outputs to perform fast screening of relevant variables. As our main theoretical contribution, we prove that, asymptotically, S-OMP is guaranteed to reduce an ultra-high number of variables to below the sample size without losing true relevant variables. We also provide formal evidence that a modified Bayesian information criterion (BIC) can be used to efficiently determine the number of iterations in S-OMP. We further provide empirical evidence on the benefit of variable selection using multiple regression outputs jointly, as opposed to performing variable selection for each output separately. The finite sample performance of S-OMP is demonstrated on extensive simulation studies, and on a genetic association mapping problem. Keywords Adaptive Lasso; Greedy forward regression; Orthogonal matching pursuit; Multi-output regression; Multi-task learning; Simultaneous orthogonal matching pursuit; Sure screening; Variable selection

READ FULL TEXT
research
07/21/2021

Bayesian iterative screening in ultra-high dimensional settings

Variable selection in ultra-high dimensional linear regression is often ...
research
11/07/2014

Faithful Variable Screening for High-Dimensional Convex Regression

We study the problem of variable selection in convex nonparametric regre...
research
04/09/2007

High-dimensional variable selection

This paper explores the following question: what kind of statistical gua...
research
11/22/2010

Variational approximation for heteroscedastic linear models and matching pursuit algorithms

Modern statistical applications involving large data sets have focused a...
research
04/30/2020

A robust variable screening procedure for ultra-high dimensional data

Variable selection in ultra-high dimensional regression problems has bec...
research
09/04/2011

Variable Selection in High Dimensions with Random Designs and Orthogonal Matching Pursuit

The performance of Orthogonal Matching Pursuit (OMP) for variable select...
research
07/12/2018

Orthogonal Matching Pursuit for Text Classification

In text classification, the problem of overfitting arises due to the hig...

Please sign up or login with your details

Forgot password? Click here to reset