Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads

04/12/2021
by   Evangelos Georganas, et al.
11

During the past decade, novel Deep Learning (DL) algorithms/workloads and hardware have been developed to tackle a wide range of problems. Despite the advances in workload/hardware ecosystems, the programming methodology of DL-systems is stagnant. DL-workloads leverage either highly-optimized, yet platform-specific and inflexible kernels from DL-libraries, or in the case of novel operators, reference implementations are built via DL-framework primitives with underwhelming performance. This work introduces the Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL-workloads with high-productivity. TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. The TPP specification is platform-agnostic, thus code expressed via TPPs is portable, whereas the TPP implementation is highly-optimized and platform-specific. We demonstrate the efficacy of our approach using standalone kernels and end-to-end DL-workloads expressed entirely via TPPs that outperform state-of-the-art implementations on multiple platforms.

READ FULL TEXT

page 9

page 11

research
04/25/2023

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

During the past decade, Deep Learning (DL) algorithms, programming syste...
research
07/11/2022

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Sparse tensors are rapidly becoming critical components of modern deep l...
research
06/15/2019

High-Performance Deep Learning via a Single Building Block

Deep learning (DL) is one of the most prominent branches of machine lear...
research
07/09/2022

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Deploying deep learning models on various devices has become an importan...
research
11/18/2016

GaDei: On Scale-up Training As A Service For Deep Learning

Deep learning (DL) training-as-a-service (TaaS) is an important emerging...
research
06/02/2020

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

Deep Neural Networks (DNNs) have revolutionized many aspects of our live...
research
04/17/2023

Effective implementation of the High Performance Conjugate Gradient benchmark on GraphBLAS

Applications in High-Performance Computing (HPC) environments face chall...

Please sign up or login with your details

Forgot password? Click here to reset