POLCA: Power Oversubscription in LLM Cloud Providers

08/24/2023
by   Pratyush Patel, et al.
0

Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference and training power consumption patterns. Based on our analysis of these LLMs, we claim that the average and peak power utilization in LLM clusters for inference should not be very high. Our deductions align with the data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment, makes it challenging to have a reliable and robust power oversubscription mechanism. We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30 inference, with minimal performance loss

READ FULL TEXT

page 4

page 10

research
03/22/2021

Power Modeling for Effective Datacenter Planning and Compute Management

Datacenter power demand has been continuously growing and is the key dri...
research
09/28/2021

Power Consumption Analysis of Parallel Algorithms on GPUs

Due to their highly parallel multi-cores architecture, GPUs are being in...
research
07/20/2022

Hydra: Hybrid Server Power Model

With the growing complexity of big data workloads that require abundant ...
research
10/29/2020

Prediction-Based Power Oversubscription in Cloud Platforms

Datacenter designers rely on conservative estimates of IT equipment powe...
research
07/01/2019

A Scalable Architecture for Power Consumption Monitoring in Industrial Production Environments

Detailed knowledge about the electrical power consumption in industrial ...
research
08/02/2021

FIRESTARTER 2: Dynamic Code Generation for Processor Stress Tests

Processor stress tests target to maximize processor power consumption by...

Please sign up or login with your details

Forgot password? Click here to reset