Doubly Robust Feature Selection with Mean and Variance Outlier Detection and Oracle Properties

06/22/2021
by   Luca Insolia, et al.
0

We propose a general approach to handle data contaminations that might disrupt the performance of feature selection and estimation procedures for high-dimensional linear models. Specifically, we consider the co-occurrence of mean-shift and variance-inflation outliers, which can be modeled as additional fixed and random components, respectively, and evaluated independently. Our proposal performs feature selection while detecting and down-weighting variance-inflation outliers, detecting and excluding mean-shift outliers, and retaining non-outlying cases with full weights. Feature selection and mean-shift outlier detection are performed through a robust class of nonconcave penalization methods. Variance-inflation outlier detection is based on the penalization of the restricted posterior mode. The resulting approach satisfies a robust oracle property for feature selection in the presence of data contamination – which allows the number of features to exponentially increase with the sample size – and detects truly outlying cases of each type with asymptotic probability one. This provides an optimal trade-off between a high breakdown point and efficiency. Computationally efficient heuristic procedures are also presented. We illustrate the finite-sample performance of our proposal through an extensive simulation study and a real-world application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2020

Simultaneous Feature Selection and Outlier Detection with Optimality Guarantees

Sparse estimation methods capable of tolerating outliers have been broad...
research
08/08/2017

An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data

Feature selection has been studied widely in the literature. However, th...
research
03/21/2021

Homophily Outlier Detection in Non-IID Categorical Data

Most of existing outlier detection methods assume that the outlier facto...
research
01/15/2020

Outlier Detection Ensemble with Embedded Feature Selection

Feature selection places an important role in improving the performance ...
research
10/12/2020

Robust Finite Mixture Regression for Heterogeneous Targets

Finite Mixture Regression (FMR) refers to the mixture modeling scheme wh...
research
05/21/2012

Variance function estimation in high-dimensions

We consider the high-dimensional heteroscedastic regression model, where...
research
01/06/2022

Predictive Criteria for Prior Selection Using Shrinkage in Linear Models

Choosing a shrinkage method can be done by selecting a penalty from a li...

Please sign up or login with your details

Forgot password? Click here to reset