Design of a high-performance GEMM-like Tensor-Tensor Multiplication

07/01/2016
by   Paul Springer, et al.
0

We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the identification of three index sets, involved in the tensor contraction, which enable us to systematically reduce an arbitrary tensor contraction to loops around a highly tuned "macro-kernel". This macro-kernel operates on suitably prepared ("packed") sub-tensors that reside in a specified level of the cache hierarchy. In contrast to previous approaches to tensor contractions, GETT exhibits desirable features such as unit-stride memory accesses, cache-awareness, as well as full vectorization, without requiring auxiliary memory. To compare our technique with other modern tensor contractions, we integrate GETT alongside the so called Transpose-Transpose-GEMM-Transpose and Loops-over-GEMM approaches into an open source "Tensor Contraction Code Generator" (TCCG). The performance results for a wide range of tensor contractions suggest that GETT has the potential of becoming the method of choice: While GETT exhibits excellent performance across the board, its effectiveness for bandwidth-bound tensor contractions is especially impressive, outperforming existing approaches by up to 12.4×. More precisely, GETT achieves speedups of up to 1.41× over an equivalent-sized GEMM for bandwidth-bound tensor contractions while attaining up to 91.3% of peak floating-point performance for compute-bound tensor contractions.

READ FULL TEXT

page 23

page 24

research
04/11/2017

Strassen's Algorithm for Tensor Contraction

Tensor contraction (TC) is an important computational kernel widely used...
research
10/19/2018

Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication

We study the known techniques for designing Matrix Multiplication algori...
research
05/31/2019

Parameterization of tensor network contraction

We present a conceptually clear and algorithmically useful framework for...
research
07/01/2016

High-Performance Tensor Contraction without Transposition

Tensor computations--in particular tensor contraction (TC)--are importan...
research
03/07/2016

TTC: A high-performance Compiler for Tensor Transpositions

We present TTC, an open-source parallel compiler for multidimensional te...
research
03/28/2020

Vectorization and Minimization of Memory Footprint for Linear High-Order Discontinuous Galerkin Schemes

We present a sequence of optimizations to the performance-critical compu...

Please sign up or login with your details

Forgot password? Click here to reset