ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler

10/12/2018
by   Matthew Sotoudeh, et al.
0

Domain specific accelerators present new challenges and opportunities for code generation onto novel instruction sets, communication fabrics, and memory architectures. In this paper we introduce an intermediate representation (IR) which enables both deep learning computational kernels and hardware capabilities to be described in the same IR. We then formulate and apply instruction mapping to determine the possible ways a computation can be performed on a hardware system. Next, our scheduler chooses a specific mapping and determines the data movement and computation order. In order to manage the large search space of mappings and schedules, we developed a flexible framework that allows heuristics, cost models, and potentially machine learning to facilitate this search problem. With this system, we demonstrate the automated extraction of matrix multiplication kernels out of recent deep learning kernels such as depthwise-separable convolution. In addition, we demonstrate two to five times better performance on DeepBench sized GEMMs and GRU RNN execution when compared to state-of-the-art (SOTA) implementations on new hardware and up to 85 performance for SOTA implementations on existing hardware.

READ FULL TEXT
research
07/11/2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To...
research
05/02/2018

Glow: Graph Lowering Compiler Techniques for Neural Networks

This paper presents the design of Glow, a machine learning compiler for ...
research
02/27/2021

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description

The emergence of machine learning, image and audio processing on edge de...
research
02/02/2023

GraphAGILE: An FPGA-based Overlay Accelerator for Low-latency GNN Inference

This paper presents GraphAGILE, a domain-specific FPGA-based overlay acc...
research
09/16/2023

Rewriting History: Repurposing Domain-Specific CGRAs

Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices...
research
04/17/2019

Relay: A High-Level Compiler for Deep Learning

Frameworks for writing, compiling, and optimizing deep learning (DL) mod...
research
11/24/2019

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

Performance optimization is the art of continuous seeking a harmonious m...

Please sign up or login with your details

Forgot password? Click here to reset