Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition

10/11/2022
by   Vivek Bharadwaj, et al.
0

Low-rank Candecomp / PARAFAC (CP) Decomposition is a powerful tool for the analysis of sparse tensors, which can represent diverse datasets involving discrete-valued variables. Given a sparse tensor, producing a low-rank CP decomposition is computation and memory intensive, typically involving several large, structured linear least-squares problems. Several recent works have provided randomized sketching methods to reduce the cost of these least squares problems, along with shared-memory prototypes of their algorithms. Unfortunately, these prototypes are slow compared to optimized non-randomized tensor decomposition. Furthermore, they do not scale to tensors that exceed the memory capacity of a single shared-memory device. We extend randomized algorithms for CP decomposition to the distributed-memory setting and provide high-performance implementations competitive with state-of-the-art non-randomized libraries. These algorithms sample from a distribution of statistical leverage scores to reduce the cost of repeated least squares solves required in the tensor decomposition. We show how to efficiently sample from an approximate leverage score distribution of the left-hand side of each linear system when the CP factor matrices are distributed by block rows among processors. In contrast to earlier works that only communicate dense factor matrices in a Cartesian topology between processors, we use sampling to avoid expensive reduce-scatter collectives by communicating selected nonzeros from the sparse tensor and a small subset of factor matrix rows. On the CPU partition of the NERSC Cray EX supercomputer Perlmutter, our high-performance implementations require just seconds to compute low-rank approximations of real-world sparse tensors with billions of nonzeros.

READ FULL TEXT

page 1

page 2

page 4

page 7

page 9

page 10

research
04/02/2021

Fast and Accurate Randomized Algorithms for Low-rank Tensor Decompositions

Low-rank Tucker and CP tensor decompositions are powerful tools in data ...
research
06/30/2020

Practical Leverage-Based Sampling for Low-Rank Tensor Decomposition

Conventional algorithms for finding low-rank canonical polyadic (CP) ten...
research
05/01/2019

High-performance sampling of generic Determinantal Point Processes

Determinantal Point Processes (DPPs) were introduced by Macchi as a mode...
research
07/21/2020

Randomized Online CP Decomposition

CANDECOMP/PARAFAC (CP) decomposition has been widely used to deal with m...
research
08/26/2021

H2OPUS-TLR: High Performance Tile Low Rank Symmetric Factorizations using Adaptive Randomized Approximation

Tile low rank representations of dense matrices partition them into bloc...
research
12/07/2020

SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition

Sparse Tucker Decomposition (STD) algorithms learn a core tensor and a g...
research
05/23/2019

User-Device Authentication in Mobile Banking using APHEN for Paratuck2 Tensor Decomposition

The new financial European regulations such as PSD2 are changing the ret...

Please sign up or login with your details

Forgot password? Click here to reset