Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000

10/21/2022
by   Jianbin Fang, et al.
0

As the hardware industry moves towards using specialized heterogeneous many-cores to avoid the effects of the power wall, software developers are finding it hard to deal with the complexity of these systems. This article shares our experience when developing a programming model and its supporting compiler and libraries for Matrix-3000, which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization. To assist its software development, we developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler. Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000, while the high-level model allows programmers to use the OpenCL programming standard. We detail our design choices and highlight the lessons learned from developing systems software to enable the programming of bare-metal accelerators. Our programming models have been deployed to the production environment of an exascale prototype system.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
03/01/2022

Specialized Accelerators and Compiler Flows: Replacing Accelerator APIs with a Formal Software/Hardware Interface

Specialized accelerators are increasingly used to meet the power-perform...
research
07/03/2023

APEIRON: composing smart TDAQ systems for high energy physics experiments

APEIRON is a framework encompassing the general architecture of a distri...
research
01/11/2022

HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous Computing

Heterogeneous computers integrate general-purpose host processors with d...
research
07/01/2020

Is Rust Used Safely by Software Developers?

Rust, an emerging programming language with explosive growth, provides a...
research
04/13/2022

Modular and Didactic Compiler Design with XML Inter-Phases Communication

In Compiler Design courses, students learn how a program written in high...
research
08/27/2023

CUDA-PIM: End-to-End Integration of Digital Processing-in-Memory from High-Level C++ to Microarchitectural Design

Digital processing-in-memory (PIM) architectures mitigate the memory wal...
research
09/12/2023

C4CAM: A Compiler for CAM-based In-memory Accelerators

Machine learning and data analytics applications increasingly suffer fro...

Please sign up or login with your details

Forgot password? Click here to reset