Strassen's Algorithm for Tensor Contraction

04/11/2017
by   Jianyu Huang, et al.
0

Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multi-dimensional generalization of matrix multiplication (GEMM). While Strassen's algorithm for GEMM is well studied in theory and practice, extending it to accelerate TC has not been previously pursued. Thus, we believe this to be the first paper to demonstrate how one can in practice speed up tensor contraction with Strassen's algorithm. By adopting a Block-Scatter-Matrix format, a novel matrix-centric tensor layout, we can conceptually view TC as GEMM for a general stride storage, with an implicit tensor-to-matrix transformation. This insight enables us to tailor a recent state-of-the-art implementation of Strassen's algorithm to TC, avoiding explicit transpositions (permutations) and extra workspace, and reducing the overhead of memory movement that is incurred. Performance benefits are demonstrated with a performance model as well as in practice on modern single core, multicore, and distributed memory parallel architectures, achieving up to 1.3x speedup. The resulting implementations can serve as a drop-in replacement for various applications with significant speedup.

READ FULL TEXT

page 10

page 11

research
07/01/2016

High-Performance Tensor Contraction without Transposition

Tensor computations--in particular tensor contraction (TC)--are importan...
research
07/01/2016

Design of a high-performance GEMM-like Tensor-Tensor Multiplication

We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel appr...
research
07/17/2023

Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids

We propose an algorithm that aims at minimizing the inter-node communica...
research
07/10/2020

Distributed-Memory DMRG via Sparse and Dense Parallel Tensor Contractions

The Density Matrix Renormalization Group (DMRG) algorithm is a powerful ...
research
12/18/2018

MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations

We present MatRox, a novel model-based algorithm and implementation of H...
research
06/16/2022

Deinsum: Practically I/O Optimal Multilinear Algebra

Multilinear algebra kernel performance on modern massively-parallel syst...
research
09/17/2023

Dynasor: A Dynamic Memory Layout for Accelerating Sparse MTTKRP for Tensor Decomposition on Multi-core CPU

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the most...

Please sign up or login with your details

Forgot password? Click here to reset