AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads

11/18/2022
by   Chi Zhang, et al.
0

Data-intensive applications involving irregular memory streams are inefficiently handled by modern processors and memory systems highly optimized for regular, contiguous data. Recent work tackles these inefficiencies in hardware through core-side stream extensions or memory-side prefetchers and accelerators, but fails to provide end-to-end solutions which also achieve high efficiency in on-chip interconnects. We propose AXI-Pack, an extension to ARM's AXI4 protocol introducing bandwidth-efficient strided and indirect bursts to enable end-to-end irregular streams. AXI-Pack adds irregular stream semantics to memory requests and avoids inefficient narrow-bus transfers by packing multiple narrow data elements onto a wide bus. It retains full compatibility with AXI4 and does not require modifications to non-burst-reshaping interconnect IPs. To demonstrate our approach end-to-end, we extend an open-source RISC-V vector processor to leverage AXI-Pack at its memory interface for strided and indexed accesses. On the memory side, we design a banked memory controller efficiently handling AXI-Pack requests. On a system with a 256-bit-wide interconnect running FP32 workloads, AXI-Pack achieves near-ideal peak on-chip bus utilizations of 87 2.4x, and energy efficiency improvements of 5.3x and 2.1x over a baseline using an AXI4 bus on strided and indirect benchmarks, respectively.

READ FULL TEXT

page 1

page 5

research
08/01/2018

MARS: Memory Aware Reordered Source

Memory bandwidth is critical in today's high performance computing syste...
research
07/26/2022

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Applications with low data reuse and frequent irregular memory accesses,...
research
12/02/2016

Implementation and evaluation of data-compression algorithms for irregular-grid iterative methods on the PEZY-SC processor

Iterative methods on irregular grids have been used widely in all areas ...
research
05/08/2023

A Case for CXL-Centric Server Processors

The memory system is a major performance determinant for server processo...
research
04/21/2023

Viewing Allocators as Bin Packing Solvers Demystifies Fragmentation

This paper presents a trace-based simulation methodology for constructin...
research
07/14/2020

Irregular Accesses Reorder Unit: Improving GPGPU Memory Coalescing for Graph-Based Workloads

GPGPU architectures have become established as the dominant parallelizat...
research
01/29/2023

Accelerating Graph Analytics on a Reconfigurable Architecture with a Data-Indirect Prefetcher

The irregular nature of memory accesses of graph workloads makes their p...

Please sign up or login with your details

Forgot password? Click here to reset