BayesPerf: Minimizing Performance Monitoring Errors Using Bayesian Statistics

02/22/2021
by   Subho S. Banerjee, et al.
3

Hardware performance counters (HPCs) that measure low-level architectural and microarchitectural events provide dynamic contextual information about the state of the system. However, HPC measurements are error-prone due to non determinism (e.g., undercounting due to event multiplexing, or OS interrupt-handling behaviors). In this paper, we present BayesPerf, a system for quantifying uncertainty in HPC measurements by using a domain-driven Bayesian model that captures microarchitectural relationships between HPCs to jointly infer their values as probability distributions. We provide the design and implementation of an accelerator that allows for low-latency and low-power inference of the BayesPerf model for x86 and ppc64 CPUs. BayesPerf reduces the average error in HPC measurements from 40.1 multiplexed. The value of BayesPerf in real-time decision-making is illustrated with a simple example of scheduling of PCIe transfers.

READ FULL TEXT
research
06/07/2018

Dwarf in a Giant: Enabling Scalable, High-Resolution HPC Energy Monitoring for Real-Time Profiling and Analytics

Energy efficiency, predictive maintenance and security are today key cha...
research
10/04/2020

The Technologies Required for Fusing HPC and Real-Time Data to Support Urgent Computing

The use of High Performance Computing (HPC) to compliment urgent decisio...
research
05/16/2021

DRAS-CQSim: A Reinforcement Learning based Framework for HPC Cluster Scheduling

For decades, system administrators have been striving to design and tune...
research
10/17/2020

The role of interactive super-computing in using HPC for urgent decision making

Technological advances are creating exciting new opportunities that have...
research
06/22/2023

Sustainable HPC: Modeling, Characterization, and Implications of Carbon Footprint in Modern HPC Systems

The rapid growth in demand for HPC systems has led to a rise in energy c...
research
03/12/2019

Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual comp...
research
10/05/2019

A Benchmark to Evaluate InfiniBand Solutions for Java Applications

Low-latency network interconnects, such as InfiniBand, are commonly used...

Please sign up or login with your details

Forgot password? Click here to reset