Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

02/10/2015
by   Jiyan Yang, et al.
0

In this era of large-scale data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable processing of massive data. Here, we review recent work on developing and implementing randomized matrix algorithms in large-scale parallel and distributed environments. Randomized algorithms for matrix problems have received a great deal of attention in recent years, thus far typically either in theory or in machine learning applications or with implementations on a single machine. Our main focus is on the underlying theory and practical implementation of random projection and random sampling algorithms for very large very overdetermined (i.e., overconstrained) ℓ_1 and ℓ_2 regression problems. Randomization can be used in one of two related ways: either to construct sub-sampled problems that can be solved, exactly or approximately, with traditional numerical methods; or to construct preconditioned versions of the original full problem that are easier to solve with traditional iterative algorithms. Theoretical results demonstrate that in near input-sparsity time and with only a few passes through the data one can obtain very strong relative-error approximate solutions, with high probability. Empirical results highlight the importance of various trade-offs (e.g., between the time to construct an embedding and the conditioning quality of the embedding, between the relative importance of computation versus communication, etc.) and demonstrate that ℓ_1 and ℓ_2 regression problems can be solved to low, medium, or high precision in existing distributed systems on up to terabyte-sized data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2012

The Fast Cauchy Transform and Faster Robust Linear Regression

We provide fast algorithms for overconstrained ℓ_p regression and relate...
research
01/18/2016

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Large scale optimization problems are ubiquitous in machine learning and...
research
02/10/2020

Straggler-resistant distributed matrix computation via coding theory

The current BigData era routinely requires the processing of large scale...
research
04/15/2015

Theory of Dual-sparse Regularized Randomized Reduction

In this paper, we study randomized reduction methods, which reduce high-...
research
08/30/2023

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems

Algorithms from Randomized Numerical Linear Algebra (RandNLA) are known ...
research
02/12/2015

Weighted SGD for ℓ_p Regression with Randomized Preconditioning

In recent years, stochastic gradient descent (SGD) methods and randomize...
research
07/06/2020

Parallel Algorithms for Successive Convolution

In this work, we consider alternative discretizations for PDEs which use...

Please sign up or login with your details

Forgot password? Click here to reset