Fast Projection onto the Capped Simplex withApplications to Sparse Regression in Bioinformatics

10/16/2021
by   Andersen Ang, et al.
0

We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton's method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method. We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more). We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2018

A Distributed Quasi-Newton Algorithm for Empirical Risk Minimization with Nonsmooth Regularization

In this paper, we propose a communication- and computation- efficient di...
research
12/03/2021

Fast Projected Newton-like Method for Precision Matrix Estimation with Nonnegative Partial Correlations

We study the problem of estimating precision matrices in multivariate Ga...
research
10/30/2019

Iterative Hessian Sketch in Input Sparsity Time

Scalable algorithms to solve optimization and regression tasks even appr...
research
10/03/2019

A sparse semismooth Newton based augmented Lagrangian method for large-scale support vector machines

Support vector machines (SVMs) are successful modeling and prediction to...
research
02/09/2018

Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning

In this paper, we revisit the large-scale constrained linear regression ...
research
01/10/2014

Extension of Sparse Randomized Kaczmarz Algorithm for Multiple Measurement Vectors

The Kaczmarz algorithm is popular for iteratively solving an overdetermi...
research
12/23/2020

Automatic Scansion of Spanish Poetry without Syllabification

In recent years, several systems of automated metric analysis of Spanish...

Please sign up or login with your details

Forgot password? Click here to reset