Fast computation of the principal components of genotype matrices in Julia

08/09/2018
by   Jiahao Chen, et al.
0

Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2021

An iterative coordinate descent algorithm to compute sparse low-rank approximations

In this paper, we describe a new algorithm to build a few sparse princip...
research
01/03/2019

Projecting "better than randomly": How to reduce the dimensionality of very large datasets in a way that outperforms random projections

For very large datasets, random projections (RP) have become the tool of...
research
01/19/2020

How to Detect and Construct N-matrices

N-matrices are real n× n matrices all of whose principal minors are nega...
research
05/28/2021

Sparse Principal Components Analysis: a Tutorial

The topic of this tutorial is Least Squares Sparse Principal Components ...
research
02/02/2016

On the Nyström and Column-Sampling Methods for the Approximate Principal Components Analysis of Large Data Sets

In this paper we analyze approximate methods for undertaking a principal...
research
10/08/2019

SIMPCA: A framework for rotating and sparsifying principal components

We propose an algorithmic framework for computing sparse components from...
research
02/25/2015

On Convolutional Approximations to Linear Dimensionality Reduction Operators for Large Scale Data Processing

In this paper, we examine the problem of approximating a general linear ...

Please sign up or login with your details

Forgot password? Click here to reset