AMOEBA: A Coarse Grained Reconfigurable Architecture for Dynamic GPU Scaling

11/08/2019
by   Xianwei Cheng, et al.
0

Different GPU applications exhibit varying scalability patterns with network-on-chip (NoC), coalescing, memory and control divergence, and L1 cache behavior. A GPU consists of several StreamingMulti-processors (SMs) that collectively determine how shared resources are partitioned and accessed. Recent years have seen divergent paths in SM scaling towards scale-up (fewer, larger SMs) vs. scale-out (more, smaller SMs). However, neither scaling up nor scaling out can meet the scalability requirement of all applications running on a given GPU system, which inevitably results in performance degradation and resource under-utilization for some applications. In this work, we investigate major design parameters that influence GPU scaling. We then propose AMOEBA, a solution to GPU scaling through reconfigurable SM cores. AMOEBA monitors and predicts application scalability at run-time and adjusts the SM configuration to meet program requirements. AMOEBA also enables dynamic creation of heterogeneous SMs through independent fusing or splitting. AMOEBA is a microarchitecture-based solution and requires no additional programming effort or custom compiler support. Our experimental evaluations with application programs from various benchmark suites indicate that AMOEBA is able to achieve a maximum performance gain of 4.3x, and generates an average performance improvement of 47

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

The GPU programming model is primarily aimed at the development of appli...
research
07/05/2019

RegDem: Increasing GPU Performance via Shared Memory Register Spilling

GPU utilization, measured as occupancy, is limited by the parallel threa...
research
01/02/2023

Hardware Abstractions and Hardware Mechanisms to Support Multi-Task Execution on Coarse-Grained Reconfigurable Arrays

Domain-specific accelerators are used in various computing systems rangi...
research
03/16/2022

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help develo...
research
02/23/2022

Improving Scalability with GPU-Aware Asynchronous Tasks

Asynchronous tasks, when created with over-decomposition, enable automat...
research
04/09/2020

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

With the end of both Dennard's scaling and Moore's law, computer users a...
research
04/22/2020

Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation

Resource balancing has been effectively used to mitigate the long-term a...

Please sign up or login with your details

Forgot password? Click here to reset