DeepAI AI Chat
Log In Sign Up

Compiling Halide Programs to Push-Memory Accelerators

05/26/2021
by   Qiaoyi Liu, et al.
Stanford University
0

Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become obsolete as applications change. Programmable domain-specific accelerators have emerged as a promising middle-ground between these two extremes, but such architectures have traditionally been difficult compiler targets. The main obstacle is that these accelerators often use a different memory abstraction than CPUs and GPUs: push memories that send a data stream from one computation kernel to other kernels, possibly reordered. To address the compilation challenges caused by push memories, we propose that the representation of memory in the middle and backend of the compiler be altered to combine storage with address generation and control logic in a single structure – a unified buffer. We show that this compiler abstraction can be implemented efficiently on a programmable accelerator, and design a memory mapping algorithm that combines polyhedral analysis and software vectorization techniques to target our accelerator. Our evaluation shows that the compiler supports programmability while maintaining high performance. It can compile a wide range of image processing and machine learning applications to our accelerator with 4.7x better runtime and 4.3x better energy-efficiency as compared to an FPGA.

READ FULL TEXT

page 1

page 9

04/03/2021

Compiler Infrastructure for Specializing Domain-Specific Memory Templates

Specialized hardware accelerators are becoming important for more and mo...
03/06/2023

Domain-Specific Computational Storage for Serverless Computing

While (1) serverless computing is emerging as a popular form of cloud ex...
12/22/2022

A Domain-Extensible Compiler with Controllable Automation of Optimisations

In high performance domains like image processing, physics simulation or...
09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...
01/03/2018

Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and ...
07/14/2020

SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant Execution

Hardware-enclaves that target complex CPU designs compromise both securi...
01/29/2019

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

Memristor crossbars are circuits capable of performing analog matrix-vec...