Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

09/10/2020
by   Stijn Heldens, et al.
0

All-pairs compute problems apply a user-defined function to each combination of two items of a given data set. Although these problems present an abundance of parallelism, data reuse must be exploited to achieve good performance. Several researchers considered this problem, either resorting to partial replication with static work distribution or dynamic scheduling with full replication. In contrast, we present a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization. We evaluate our solution using three real-world applications (from digital forensics, localization microscopy, and bioinformatics) on different platforms (from a desktop machine to a supercomputer). Results shows excellent efficiency and scalability when scaling to 96 GPUs, even obtaining super-linear speedups due to a distributed cache.

READ FULL TEXT
research
01/26/2023

Odyssey: A Journey in the Land of Distributed Data Series Similarity Search

This paper presents Odyssey, a novel distributed data-series processing ...
research
07/29/2016

An Asynchronous Task-based Fan-Both Sparse Cholesky Solver

Systems of linear equations arise at the heart of many scientific and en...
research
07/29/2019

Modeling Shared Cache Performance of OpenMP Programs using Reuse Distance

Performance modeling of parallel applications on multicore computers rem...
research
03/09/2023

GPU-enabled Function-as-a-Service for Machine Learning Inference

Function-as-a-Service (FaaS) is emerging as an important cloud computing...
research
07/26/2019

The demise of the filesystem and multi level service architecture

Many astronomy data centres still work on filesystems. Industry has move...
research
03/18/2021

Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model

Software reuse, especially partial reuse, poses legal and security threa...
research
11/28/2022

Distributed Parallelization of xPU Stencil Computations in Julia

We present a straightforward approach for distributed parallelization of...

Please sign up or login with your details

Forgot password? Click here to reset