A Learned Performance Model for Tensor Processing Units

08/03/2020
by   Samuel J. Kaufman, et al.
0

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for a specific program. However, they are difficult to develop because contemporary processors are complex, and the recent proliferation of deep learning accelerators has increased the development burden. We demonstrate a method of learning performance models from a corpus of tensor computation graph programs for Tensor Processing Unit (TPU) instances. We show that our learned model outperforms a heavily-optimized analytical performance model on two tasks – tile-size selection and operator fusion – and that it helps an autotuner discover faster programs in a setting where access to TPUs is limited or expensive.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2022

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

As deep learning models nowadays are widely adopted by both cloud servic...
research
05/03/2017

cuTT: A High-Performance Tensor Transpose Library for CUDA Compatible GPUs

We introduce the CUDA Tensor Transpose (cuTT) library that implements hi...
research
01/27/2023

Matching Linear Algebra and Tensor Code to Specialized Hardware Accelerators

Dedicated tensor accelerators demonstrate the importance of linear algeb...
research
06/01/2019

A Technique for Finding Optimal Program Launch Parameters Targeting Manycore Accelerators

In this paper, we present a new technique to dynamically determine the v...
research
10/29/2022

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Pipelining between data loading and computation is a critical tensor pro...
research
01/12/2018

Comprehensive Optimization of Parametric Kernels for Graphics Processing Units

This work deals with the optimization of computer programs targeting Gra...
research
04/29/2022

Analytical Performance Estimation during Code Generation on Modern GPUs

Automatic code generation is frequently used to create implementations o...

Please sign up or login with your details

Forgot password? Click here to reset