DeepAI AI Chat
Log In Sign Up

Improving Multi-Application Concurrency Support Within the GPU Memory System

08/16/2017
by   Rachata Ausavarungnirun, et al.
0

GPUs exploit a high degree of thread-level parallelism to hide long-latency stalls. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-scale computing environments. However, while CPUs offer relatively seamless multi-application concurrency, and are an excellent fit for multitasking and for virtualized environments, GPUs currently offer only primitive support for multi-application concurrency. Much of the problem in a contemporary GPU lies within the memory system, where multi-application execution requires virtual memory support to manage the address spaces of each application and to provide memory protection. In this work, we perform a detailed analysis of the major problems in state-of-the-art GPU virtual memory management that hinders multi-application execution. Existing GPUs are designed to share memory between the CPU and GPU, but do not handle multi-application support within the GPU well. We find that when multiple applications spatially share the GPU, there is a significant amount of inter-core thrashing on the shared TLB within the GPU. The TLB contention is high enough to prevent the GPU from successfully hiding stall latencies, thus becoming a first-order performance concern. We introduce MASK, a memory hierarchy design that provides low-overhead virtual memory support for the concurrent execution of multiple applications. MASK extends the GPU memory hierarchy to efficiently support address translation through the use of multi-level TLBs, and uses translation-aware memory and cache management to maximize throughput in the presence of inter-application contention.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 9

03/19/2018

Techniques for Shared Resource Management in Systems with Throughput Processors

The continued growth of the computational capability of throughput proce...
05/16/2018

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems

This article features extended summaries and retrospectives of some of t...
08/24/2020

CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM

The share of the top 500 supercomputers with NVIDIA GPUs is now over 25 ...
08/01/2018

CRUM: Checkpoint-Restart Support for CUDA's Unified Memory

Unified Virtual Memory (UVM) was recently introduced on recent NVIDIA GP...
04/16/2017

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performan...
05/19/2017

GPU System Calls

GPUs are becoming first-class compute citizens and are being tasked to p...
11/15/2019

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

To satisfy the compute and memory demands of deep neural networks, neura...