Concurrent CPU-GPU Task Programming using Modern C++

03/16/2022
by   Tsung-Wei Huang, et al.
0

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient implementations of heterogeneous decomposition strategies. Our new CPU-GPU programming model allows users to express a problem in a way that adapts to effective separation of concerns and expertise encapsulation. Compared with existing libraries, Heteroflow is more cost-efficient in performance scaling, programming productivity, and solution generality. We have evaluated Heteroflow on two real applications in VLSI design automation and demonstrated the performance scalability across different CPU-GPU numbers and problem sizes. At a particular example of VLSI timing analysis with million-scale tasking, Heteroflow achieved 7.7x runtime speed-up (99 vs 13 minutes) over a baseline on a machine of 40 CPU cores and 4 GPUs.

READ FULL TEXT

page 1

page 2

research
04/23/2020

Cpp-Taskflow: A General-purpose Parallel and Heterogeneous Task Programming System at Scale

The Cpp-Taskflow project addresses the long-standing question: How can w...
research
04/23/2020

Cpp-Taskflow v2: A General-purpose Parallel and Heterogeneous Task Programming System at Scale

The Cpp-Taskflow project addresses the long-standing question: How can w...
research
11/26/2019

Summarizing CPU and GPU Design Trends with Product Data

Moore's Law and Dennard Scaling have guided the semiconductor industry f...
research
11/20/2022

A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration

The simplex algorithm has been successfully used for many years in solvi...
research
09/13/2018

DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

We present a new parallel algorithm for probabilistic graphical model op...
research
11/08/2019

AMOEBA: A Coarse Grained Reconfigurable Architecture for Dynamic GPU Scaling

Different GPU applications exhibit varying scalability patterns with net...
research
04/21/2018

Parallel Implementations of Cellular Automata for Traffic Models

The Biham-Middleton-Levine (BML) traffic model is a simple two-dimension...

Please sign up or login with your details

Forgot password? Click here to reset