A Holistic Analysis of Datacenter Operations: Resource Usage, Energy, and Workload Characterization – Extended Technical Report

by   Laurens Versluis, et al.

Improving datacenter operations is vital for the digital society. We posit that doing so requires our community to shift, from operational aspects taken in isolation to holistic analysis of datacenter resources, energy, and workloads. In turn, this shift will require new analysis methods, and open-access, FAIR datasets with fine temporal and spatial granularity. We leverage in this work one of the (rare) public datasets providing fine-grained information on datacenter operations. Using it, we show strong evidence that fine-grained information reveals new operational aspects. We then propose a method for holistic analysis of datacenter operations, providing statistical characterization of node, energy, and workload aspects. We demonstrate the benefits of our holistic analysis method by applying it to the operations of a datacenter infrastructure with over 300 nodes. Our analysis reveals both generic and ML-specific aspects, and further details how the operational behavior of the datacenter changed during the 2020 COVID-19 pandemic. We make over 30 main observations, providing holistic insight into the long-term operation of a large-scale, public scientific infrastructure. We suggest such observations can help immediately with performance engineering tasks such as predicting future datacenter load, and also long-term with the design of datacenter infrastructure.


page 8

page 16

page 21

page 24

page 27


Holistic Fine-grained GGS Characterization: From Detection to Unbalanced Classification

Recent studies have demonstrated the diagnostic and prognostic values of...

A Holistic Framework for Analyzing the COVID-19 Vaccine Debate

The Covid-19 pandemic has led to infodemic of low quality information le...

Operational Characterization of a Public Scientific Datacenter During and Beyond the COVID-19 Period

Datacenters are imperative for the digital society. They offer services ...

DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems

The complexity of today's HPC systems increases as we move closer to the...

A Holistic View on Resource Management in Serverless Computing Environments: Taxonomy and Future Directions

Serverless computing has emerged as an attractive deployment option for ...

A System-Level Voltage/Frequency Scaling Characterization Framework for Multicore CPUs

Supply voltage scaling is one of the most effective techniques to reduce...

Glo-In-One: Holistic Glomerular Detection, Segmentation, and Lesion Characterization with Large-scale Web Image Mining

The quantitative detection, segmentation, and characterization of glomer...

Please sign up or login with your details

Forgot password? Click here to reset