Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation

07/21/2022
by   Hussam Al Daas, et al.
0

Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2017

Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions

Accurate numerical calculations of electronic structure are often domina...
research
09/29/2020

Communication Lower-Bounds for Distributed-Memory Computations for Mass Spectrometry based Omics Data

Mass spectrometry based omics data analysis require significant time and...
research
05/10/2022

The spatial computer: A model for energy-efficient parallel computation

We present a new parallel model of computation suitable for spatial arch...
research
11/15/2019

Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs

For most relevant computation, the energy and time needed for data movem...
research
02/19/2018

Communication-Optimal Convolutional Neural Nets

Efficiently executing convolutional neural nets (CNNs) is important in m...
research
08/20/2021

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Matrix factorizations are among the most important building blocks of sc...
research
05/26/2022

Cost-efficient Gaussian Tensor Network Embeddings for Tensor-structured Inputs

This work discusses tensor network embeddings, which are random matrices...

Please sign up or login with your details

Forgot password? Click here to reset