Simultaneous Feature Selection and Outlier Detection with Optimality Guarantees

07/12/2020
by   Luca Insolia, et al.
0

Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which affect both the response and the design matrix. We develop a general framework for this class of problems and propose the use of mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We characterize the theoretical properties of our approach, i.e. a necessary and sufficient condition for the robustly strong oracle property, which allows the number of features to exponentially increase with the sample size; the optimal estimation of the parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer constraints and to warm-start the algorithm. We show the superior performance of our proposal compared to existing heuristic methods through numerical simulations and an application investigating the relationships between the human microbiome and childhood obesity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

Doubly Robust Feature Selection with Mean and Variance Outlier Detection and Oracle Properties

We propose a general approach to handle data contaminations that might d...
research
06/24/2023

High-dimensional outlier detection and variable selection via adaptive weighted mean regression

This paper proposes an adaptive penalized weighted mean regression for o...
research
03/30/2021

A General Framework of Nonparametric Feature Selection in High-Dimensional Data

Nonparametric feature selection in high-dimensional data is an important...
research
02/16/2015

Random Subspace Learning Approach to High-Dimensional Outliers Detection

We introduce and develop a novel approach to outlier detection based on ...
research
02/20/2023

Model-based feature selection for neural networks: A mixed-integer programming approach

In this work, we develop a novel input feature selection framework for R...
research
02/12/2019

Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring

We consider the two-group classification problem and propose a kernel cl...
research
09/12/2022

Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem

We study the feature-based newsvendor problem, in which a decision-maker...

Please sign up or login with your details

Forgot password? Click here to reset