Compiler-Guided Throughput Scheduling for Many-core Machines

03/11/2021
by   Girish Mururu, et al.
0

Modern ARM-based servers such as ThunderX and ThunderX2 offer a tremendous amount of parallelism by providing dozens or even hundreds of processors. However, exploiting these computing resources for reuse-heavy, data dependent workloads is a big challenge because of shared cache resources. In particular, schedulers have to conservatively co-locate processes to avoid cache conflicts since miss penalties are detrimental and conservative co-location decisions lead to lower resource utilization. To address these challenges, in this paper we explore the utility of predictive analysis of applications' execution to dynamically forecast resource-heavy workload regions, and to improve the efficiency of resource management through the use of new proactive methods. Our approach relies on the compiler to insert "beacons" in the application at strategic program points to periodically produce and/or update the attributes of anticipated resource-intense program region(s). The compiler classifies loops in programs based on predictability of their execution time and inserts different types of beacons at their entry/exit points. The precision of the information carried by beacons varies as per the analyzability of the loops, and the scheduler uses performance counters to fine tune co-location decisions. The information produced by beacons in multiple processes is aggregated and analyzed by the proactive scheduler to respond to the anticipated workload requirements. For throughput environments, we develop a framework that demonstrates high-quality predictions and improvements in throughput over CFS by 1.4x on an average and up to 4.7x on ThunderX and 1.9x on an average and up to 5.2x on ThunderX2 servers on consolidated workloads across 45 benchmarks.

READ FULL TEXT

page 9

page 10

page 11

research
02/18/2021

Effective Cache Apportioning for Performance Isolation Under Compiler Guidance

With a growing number of cores per socket in modern data-centers where m...
research
01/02/2023

Hardware Abstractions and Hardware Mechanisms to Support Multi-Task Execution on Coarse-Grained Reconfigurable Arrays

Domain-specific accelerators are used in various computing systems rangi...
research
10/29/2020

Prediction-Based Power Oversubscription in Cloud Platforms

Datacenter designers rely on conservative estimates of IT equipment powe...
research
09/13/2018

Do Your Cores Play Nicely? A Portable Framework for Multi-core Interference Tuning and Analysis

Multi-core architectures can be leveraged to allow independent processes...
research
01/08/2018

Towards General Distributed Resource Selection

The advantages of distributing workloads and utilizing multiple distribu...
research
05/12/2014

Resource-Aware Programming for Robotic Vision

Humanoid robots are designed to operate in human centered environments. ...
research
12/30/2017

A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

The design of general purpose processors relies heavily on a workload ga...

Please sign up or login with your details

Forgot password? Click here to reset