SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

07/11/2022
by   Zihao Ye, et al.
10

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging because a single sparse format cannot maximize hardware efficiency, and single-shot compilers cannot keep up with latest hardware and system advances. We show that the key to addressing both challenges is two forms of composability. In this paper, we propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads. SparseTIR constructs a search space over these composable components for performance tuning. With these improvements, SparseTIR obtains consistent performance speedups vs vendor libraries on GPUs for single operators: 1.1-3.3x for GNN operators, 1.1-3.3x for sparse attention operators, and 0.6-2.2x for sparse convolution operators. SparseTIR also accelerates end-to-end GNNs by 1.1-2.2x for GraphSAGE training, and 4.2-16.8x for RGCN inference.

READ FULL TEXT

page 4

page 5

page 9

page 10

research
04/12/2021

Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads

During the past decade, novel Deep Learning (DL) algorithms/workloads an...
research
05/21/2018

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for ...
research
01/20/2021

SparseDNN: Fast Sparse Deep Learning Inference on CPUs

The last few years have seen gigantic leaps in algorithms and systems to...
research
04/15/2023

STen: Productive and Efficient Sparsity in PyTorch

As deep learning models grow, sparsity is becoming an increasingly criti...
research
04/25/2023

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

During the past decade, Deep Learning (DL) algorithms, programming syste...
research
02/13/2018

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Deep learning models with convolutional and recurrent networks are now u...
research
06/10/2020

OpEvo: An Evolutionary Method for Tensor Operator Optimization

Training and inference efficiency of deep neural networks highly rely on...

Please sign up or login with your details

Forgot password? Click here to reset