Prediction-Based Power Oversubscription in Cloud Platforms

10/29/2020
by   Alok Kumbhare, et al.
0

Datacenter designers rely on conservative estimates of IT equipment power draw to provision resources. This leaves resources underutilized and requires more datacenters to be built. Prior work has used power capping to shave the rare power peaks and add more servers to the datacenter, thereby oversubscribing its resources and lowering capital costs. This works well when the workloads and their server placements are known. Unfortunately, these factors are unknown in public clouds, forcing providers to limit the oversubscription so that performance is never impacted. In this paper, we argue that providers can use predictions of workload performance criticality and virtual machine (VM) resource utilization to increase oversubscription. This poses many challenges, such as identifying the performance-critical workloads from black-box VMs, creating support for criticality-aware power management, and increasing oversubscription while limiting the impact of capping. We address these challenges for the hardware and software infrastructures of Microsoft Azure. The results show that we enable a 2x increase in oversubscription with minimum impact to critical workloads.

READ FULL TEXT
research
10/10/2020

Understanding Cloud Workloads Performance in a Production like Environment

Understanding inter-VM interference is of paramount importance to provid...
research
05/10/2017

IOTune: A G-states Driver for Elastic Performance of Block Storage

Imagining a disk which provides baseline performance at a relatively low...
research
07/20/2022

Hydra: Hybrid Server Power Model

With the growing complexity of big data workloads that require abundant ...
research
04/11/2019

FECBench: A Holistic Interference-aware Approach for Application Performance Modeling

Services hosted in multi-tenant cloud platforms often encounter performa...
research
07/06/2020

Disaggregating Non-Volatile Memory for Throughput-Oriented Genomics Workloads

Massive exploitation of next-generation sequencing technologies requires...
research
03/11/2021

Compiler-Guided Throughput Scheduling for Many-core Machines

Modern ARM-based servers such as ThunderX and ThunderX2 offer a tremendo...
research
08/24/2023

POLCA: Power Oversubscription in LLM Cloud Providers

Recent innovation in large language models (LLMs), and their myriad use-...

Please sign up or login with your details

Forgot password? Click here to reset