Compiling Halide Programs to Push-Memory Accelerators

05/26/2021
by   Qiaoyi Liu, et al.
0

Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become obsolete as applications change. Programmable domain-specific accelerators have emerged as a promising middle-ground between these two extremes, but such architectures have traditionally been difficult compiler targets. The main obstacle is that these accelerators often use a different memory abstraction than CPUs and GPUs: push memories that send a data stream from one computation kernel to other kernels, possibly reordered. To address the compilation challenges caused by push memories, we propose that the representation of memory in the middle and backend of the compiler be altered to combine storage with address generation and control logic in a single structure – a unified buffer. We show that this compiler abstraction can be implemented efficiently on a programmable accelerator, and design a memory mapping algorithm that combines polyhedral analysis and software vectorization techniques to target our accelerator. Our evaluation shows that the compiler supports programmability while maintaining high performance. It can compile a wide range of image processing and machine learning applications to our accelerator with 4.7x better runtime and 4.3x better energy-efficiency as compared to an FPGA.

READ FULL TEXT

page 1

page 9

research
04/03/2021

Compiler Infrastructure for Specializing Domain-Specific Memory Templates

Specialized hardware accelerators are becoming important for more and mo...
research
03/06/2023

Domain-Specific Computational Storage for Serverless Computing

While (1) serverless computing is emerging as a popular form of cloud ex...
research
12/22/2022

A Domain-Extensible Compiler with Controllable Automation of Optimisations

In high performance domains like image processing, physics simulation or...
research
04/06/2023

ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators

Image processing algorithms are prime targets for hardware acceleration ...
research
09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...
research
01/03/2018

Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification

Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and ...
research
07/14/2020

SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant Execution

Hardware-enclaves that target complex CPU designs compromise both securi...

Please sign up or login with your details

Forgot password? Click here to reset