MARS: Memory Aware Reordered Source

08/01/2018
by   Ishwar Bhati, et al.
0

Memory bandwidth is critical in today's high performance computing systems. The bandwidth is particularly paramount for GPU workloads such as 3D Gaming, Imaging and Perceptual Computing, GPGPU due to their data-intensive nature. As the number of threads and data streams in the GPUs increases with each generation, along with a high available memory bandwidth, memory efficiency is also crucial in order to achieve desired performance. In presence of multiple concurrent data streams, the inherent locality in a single data stream is often lost as these streams are interleaved while moving through multiple levels of memory system. In DRAM based main memory, the poor request locality reduces row-buffer reuse resulting in underutilized and inefficient memory bandwidth. In this paper we propose Memory-Aware Reordered Source (MARS) architecture to address memory inefficiency arising from highly interleaved data streams. The key idea of MARS is that with a sufficiently large lookahead before the main memory, data streams can be reordered based on their row-buffer address to regain the lost locality and improve memory efficiency. We show that MARS improves achieved memory bandwidth by 11% for a set of synthetic microbenchmarks. Moreover, MARS does so without any specific knowledge of the memory configuration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2018

A Memory Controller with Row Buffer Locality Awareness for Hybrid Memory Systems

Non-volatile memory (NVM) is a class of promising scalable memory techno...
research
11/18/2022

AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads

Data-intensive applications involving irregular memory streams are ineff...
research
12/16/2018

Evaluating Row Buffer Locality in Future Non-Volatile Main Memories

DRAM-based main memories have read operations that destroy the read data...
research
07/17/2019

CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers

Memory controller scheduling is crucial in multicore processors, where D...
research
10/07/2019

DSPatch: Dual Spatial Pattern Prefetcher

High main memory latency continues to limit performance of modern high-p...
research
06/30/2023

HashMem: PIM-based Hashmap Accelerator

Hashmaps are widely utilized data structures in many applications to per...
research
01/13/2021

EXMA: A Genomics Accelerator for Exact-Matching

Genomics is the foundation of precision medicine, global food security a...

Please sign up or login with your details

Forgot password? Click here to reset