On Optimizing Distributed Tucker Decomposition for Sparse Tensors

The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight distribution scheme, which achieves the best of both worlds. We show that the scheme is near-optimal on certain fundamental metrics associated with the HOOI procedure and as a result, near-optimal on the computational load (FLOPs). Though the scheme may incur higher communication volume, the computation time is the dominant factor and as the result, the scheme achieves better performance on the overall HOOI execution time. Our experimental evaluation on large real-life tensors (having up to 4 billion elements) shows that the scheme outperforms the prior schemes on the HOOI execution time by a factor of up to 3x. On the other hand, its distribution time is comparable to the prior lightweight schemes and is typically lesser than the execution time of a single HOOI iteration.

READ FULL TEXT
research
09/17/2023

Dynasor: A Dynamic Memory Layout for Accelerating Sparse MTTKRP for Tensor Decomposition on Multi-core CPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the most...
research
12/14/2020

A Krylov-Schur like method for computing the best rank-(r_1,r_2,r_3) approximation of large and sparse tensors

The paper is concerned with methods for computing the best low multiline...
research
06/26/2018

Improving tasks throughput on accelerators using OpenCL command concurrency

A heterogeneous architecture composed by a host and an accelerator must ...
research
07/17/2020

Provable Near-Optimal Low-Multilinear-Rank Tensor Recovery

We consider the problem of recovering a low-multilinear-rank tensor from...
research
09/09/2022

Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm

Analyzing large graph data is an essential part of many modern applicati...
research
07/30/2020

New approach to MPI program execution time prediction

The problem of MPI programs execution time prediction on a certain set o...
research
11/28/2018

A Study of the Complexity and Accuracy of Direction of Arrival Estimation Methods Based on GCC-PHAT for a Pair of Close Microphones

This paper investigates the accuracy of various Generalized Cross-Correl...

Please sign up or login with your details

Forgot password? Click here to reset