HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

05/18/2020
by   Yao Chen, et al.
0

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the individual nodes of such clusters but is not intended for deployment in a distributed manner. Fortunately, the original OpenCL semantics naturally fit into the programming environment of heterogeneous clusters. In this paper, we propose a heterogeneity-aware OpenCL-like (HaoCL) programming framework to facilitate the programming of a wide range of scientific applications including DL and GP workloads on large-scale heterogeneous clusters. With HaoCL, existing applications can be directly deployed on heterogeneous clusters without any modifications to the original OpenCL source code and without awareness of the underlying hardware topologies and configurations. Our experiments show that HaoCL imposes a negligible overhead in a distributed environment, and provides near-linear speedups on standard benchmarks when computation or data size exceeds the capacity of a single node. The system design and the evaluations are presented in this demo paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2022

Towards Performance Portable Programming for Distributed Heterogeneous Systems

Hardware heterogeneity is here to stay for high-performance computing. L...
research
08/20/2020

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs hav...
research
03/05/2023

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Hardware heterogeneity is here to stay for high-performance computing. L...
research
04/26/2021

A PGAS Communication Library for Heterogeneous Clusters

This work presents a heterogeneous communication library for clusters of...
research
06/24/2019

Heterogeneous Active Messages (HAM) – Implementing Lightweight Remote Procedure Calls in C++

We present HAM (Heterogeneous Active Messages), a C++-only active messag...
research
09/27/2018

FanStore: Enabling Efficient and Scalable I/O for Distributed Deep Learning

Emerging Deep Learning (DL) applications introduce heavy I/O workloads o...
research
01/07/2020

High Performance I/O For Large Scale Deep Learning

Training deep learning (DL) models on petascale datasets is essential fo...

Please sign up or login with your details

Forgot password? Click here to reset