Post-selection inference with HSIC-Lasso

10/29/2020
by   Tobias Freidling, et al.
0

Detecting influential features in complex (non-linear and/or high-dimensional) datasets is key for extracting the relevant information. Most of the popular selection procedures, however, require assumptions on the underlying data - such as distributional ones -, which barely agree with empirical observations. Therefore, feature selection based on nonlinear methods, such as the model-free HSIC-Lasso, is a more relevant approach. In order to ensure valid inference among the chosen features, the selection procedure must be accounted for. In this paper, we propose selective inference with HSIC-Lasso using the framework of truncated Gaussians together with the polyhedral lemma. Based on these theoretical foundations, we develop an algorithm allowing for low computational costs and the treatment of the hyper-parameter selection issue. The relevance of our method is illustrated using artificial and real-world datasets. In particular, our empirical findings emphasise that type-I error control at the considered level can be achieved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2019

A Strongly Consistent Sparse k-means Clustering with Direct l_1 Penalization on Variable Weights

We propose the Lasso Weighted k-means (LW-k-means) algorithm as a simple...
research
02/02/2012

High-Dimensional Feature Selection by Feature-Wise Non-Linear Lasso

The goal of supervised feature selection is to find a subset of input fe...
research
08/31/2016

A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations

In this paper, we study the challenge of feature selection based on a re...
research
07/11/2023

CR-Lasso: Robust cellwise regularized sparse regression

Cellwise contamination remains a challenging problem for data scientists...
research
10/10/2018

ET-Lasso: Efficient Tuning of Lasso for High-Dimensional Data

The L1 regularization (Lasso) has proven to be a versatile tool to selec...
research
04/26/2022

Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production

In this paper, we utilize a machine learning approach to identify the si...
research
02/28/2018

Semi-Analytic Resampling in Lasso

An approximate method for conducting resampling in Lasso, the ℓ_1 penali...

Please sign up or login with your details

Forgot password? Click here to reset