PULasso: High-dimensional variable selection with presence-only data

11/22/2017
by   Hyebin Song, et al.
0

In various real-world problems, we are presented with positive and unlabelled data, referred to as presence-only responses and where the number of covariates p is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this paper, we develop the PUlasso algorithm for variable selection and classification with positive and unlabelled responses. Our algorithm involves using the majorization-minimization (MM) framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm is guaranteed to converge to a stationary point, and then prove that any stationary point achieves the minimax optimal mean-squared error of slogp/n, where s is the sparsity of the true parameter. We also demonstrate through simulations that our algorithm out-performs state-of-the-art algorithms in the moderate p settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2015

On the Computational Complexity of High-Dimensional Bayesian Variable Selection

We study the computational complexity of Markov chain Monte Carlo (MCMC)...
research
04/17/2021

Mixed Effect Modeling and Variable Selection for Quantile Regression

It is known that the estimating equations for quantile regression (QR) c...
research
07/07/2021

ENNS: Variable Selection, Regression, Classification and Deep Neural Network for High-Dimensional Data

High-dimensional, low sample-size (HDLSS) data problems have been a topi...
research
08/03/2023

The use of the EM algorithm for regularization problems in high-dimensional linear mixed-effects models

The EM algorithm is a popular tool for maximum likelihood estimation but...
research
11/17/2014

Group Regularized Estimation under Structural Hierarchy

Variable selection for models including interactions between explanatory...
research
09/19/2012

Comunication-Efficient Algorithms for Statistical Optimization

We analyze two communication-efficient algorithms for distributed statis...
research
09/20/2020

Expectation propagation for the diluted Bayesian classifier

Efficient feature selection from high-dimensional datasets is a very imp...

Please sign up or login with your details

Forgot password? Click here to reset