SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring

by   Adhitha Dias, et al.

Sparse tensor algebra computations have become important in many real-world applications like machine learning, scientific simulations, and data mining. Hence, automated code generation and performance optimizations for tensor algebra kernels are paramount. Recent advancements such as the Tensor Algebra Compiler (TACO) greatly generalize and automate the code generation for tensor algebra expressions. However, the code generated by TACO for many important tensor computations remains suboptimal due to the absence of a scheduling directive to support transformations such as distribution/fusion. This paper extends TACO's scheduling space to support kernel distribution/loop fusion in order to reduce asymptotic time complexity and improve locality of complex tensor algebra computations. We develop an intermediate representation (IR) for tensor operations called branched iteration graph which specifies breakdown of the computation into smaller ones (kernel distribution) and then fuse (loop fusion) outermost dimensions of the loop nests, while the innermost dimensions are distributed, to increase data locality. We describe exchanges of intermediate results between space iteration spaces, transformation in the IR, and its programmatic invocation. Finally, we show that the transformation can be used to optimize sparse tensor kernels. Our results show that this new transformation significantly improves the performance of several real-world tensor algebra computations compared to TACO-generated code.


page 9

page 10

page 12


A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR

Tensor algebra is widely used in many applications, such as scientific c...

A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra

We address the problem of optimizing mixed sparse and dense tensor algeb...

Sparse Tensor Algebra Optimizations with Workspaces

This paper shows how to optimize sparse tensor algebraic expressions by ...

Automatic Generation of Sparse Tensor Kernels with Workspaces

Recent advances in compiler theory describe how to compile sparse tensor...

An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs

While loop reordering and fusion can make big impacts on the constant-fa...

Deinsum: Practically I/O Optimal Multilinear Algebra

Multilinear algebra kernel performance on modern massively-parallel syst...

Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modelling

Sparse tiling is a technique to fuse loops that access common data, thus...