Accelerating Graph Analytics on a Reconfigurable Architecture with a Data-Indirect Prefetcher

01/29/2023
by   Yichen Yang, et al.
0

The irregular nature of memory accesses of graph workloads makes their performance poor on modern computing platforms. On manycore reconfigurable architectures (MRAs), in particular, even state-of-the-art graph prefetchers do not work well (only 3 This is because caches in MRAs are typically not large enough to host a large quantity of prefetched data, and many employs shared caches that such prefetchers simply do not support. This paper studies the design of a data prefetcher for an MRA called Transmuter. The prefetcher is built on top of Prodigy, the current best-performing data prefetcher for CPUs. The key design elements that adapt the prefetcher to the MRA include fused prefetcher status handling registers and a prefetch handshake protocol to support run-time reconfiguration, in addition, a redesign of the cache structure in Transmuter. An evaluation of popular graph workloads shows that synergistic integration of these architectures outperforms a baseline without prefetcher by 1.27x on average and by as much as 2.72x on some workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2019

LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans

The specific characteristics of graph workloads make it hard to design a...
research
05/11/2023

Characterizing the impact of last-level cache replacement policies on big-data workloads

In recent years, graph-processing has become an essential class of workl...
research
07/21/2022

Templating Shuffles

Cloud data centers are rapidly evolving. At the same time, large-scale d...
research
06/30/2022

Exploiting Inherent Elasticity of Serverless in Irregular Algorithms

Serverless computing, in particular the Function-as-a-Service (FaaS) exe...
research
08/03/2020

Efficient Orchestration of Host and Remote Shared Memory for Memory Intensive Workloads

Since very few contributions to the development of an unified memory orc...
research
11/18/2022

AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads

Data-intensive applications involving irregular memory streams are ineff...
research
10/19/2020

High-Performance Distributed RMA Locks

We propose a topology-aware distributed Reader-Writer lock that accelera...

Please sign up or login with your details

Forgot password? Click here to reset