pylspack: Parallel algorithms and data structures for sketching, column subset selection, regression and leverage scores

03/05/2022
by   Aleksandros Sobczyk, et al.
0

We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix and (iii) computation of the squared row norms of the product of two matrices, with a special focus on "tall-and-skinny" matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package (https://github.com/IBM/pylspack) whose core is written in C++ and parallelized with OpenMP, and which is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2022

Robust Parameter Identifiability Analysis via Column Subset Selection

We advocate a numerically reliable and accurate approach for practical p...
research
11/09/2020

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

We create classical (non-quantum) dynamic data structures supporting que...
research
09/13/2022

Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems

Approximate solutions to large least squares problems can be computed ef...
research
03/15/2019

Subset Selection for Matrices with Fixed Blocks

Subset selection for matrices is the task of extracting a column sub-mat...
research
05/23/2021

Estimating leverage scores via rank revealing methods and randomization

We study algorithms for estimating the statistical leverage scores of re...
research
11/18/2021

Parallel Algorithms for Masked Sparse Matrix-Matrix Products

Computing the product of two sparse matrices (SpGEMM) is a fundamental o...
research
11/20/2018

Analytic Network Learning

Based on the property that solving the system of linear matrix equations...

Please sign up or login with your details

Forgot password? Click here to reset