MARS: Memory Aware Reordered Source
Memory bandwidth is critical in today's high performance computing systems. The bandwidth is particularly paramount for GPU workloads such as 3D Gaming, Imaging and Perceptual Computing, GPGPU due to their data-intensive nature. As the number of threads and data streams in the GPUs increases with each generation, along with a high available memory bandwidth, memory efficiency is also crucial in order to achieve desired performance. In presence of multiple concurrent data streams, the inherent locality in a single data stream is often lost as these streams are interleaved while moving through multiple levels of memory system. In DRAM based main memory, the poor request locality reduces row-buffer reuse resulting in underutilized and inefficient memory bandwidth. In this paper we propose Memory-Aware Reordered Source (MARS) architecture to address memory inefficiency arising from highly interleaved data streams. The key idea of MARS is that with a sufficiently large lookahead before the main memory, data streams can be reordered based on their row-buffer address to regain the lost locality and improve memory efficiency. We show that MARS improves achieved memory bandwidth by 11% for a set of synthetic microbenchmarks. Moreover, MARS does so without any specific knowledge of the memory configuration.
READ FULL TEXT