Validation of hardware events for successful performance pattern identification in High Performance Computing

10/11/2017
by   Thomas Röhl, et al.
0

Hardware performance monitoring (HPM) is a crucial ingredient of performance analysis tools. While there are interfaces like LIKWID, PAPI or the kernel interface perf_event which provide HPM access with some additional features, many higher level tools combine event counts with results retrieved from other sources like function call traces to derive (semi-)automatic performance advice. However, although HPM is available for x86 systems since the early 90s, only a small subset of the HPM features is used in practice. Performance patterns provide a more comprehensive approach, enabling the identification of various performance-limiting effects. Patterns address issues like bandwidth saturation, load imbalance, non-local data access in ccNUMA systems, or false sharing of cache lines. This work defines HPM event sets that are best suited to identify a selection of performance patterns on the Intel Haswell processor. We validate the chosen event sets for accuracy in order to arrive at a reliable pattern detection mechanism and point out shortcomings that cannot be easily circumvented due to bugs or limitations in the hardware.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

NumaPerf: Predictive and Full NUMA Profiling

Parallel applications are extremely challenging to achieve the optimal p...
research
12/22/2021

Supporting RISC-V Performance Counters through Performance analysis tools for Linux (Perf)

Increased attention to RISC-V in Cloud, Data Center, Automotive and Netw...
research
04/24/2023

Exploration and Exploitation of Hidden PMU Events

Performance Monitoring Unit (PMU) is a common hardware module in Intel C...
research
05/12/2020

Understanding Memory Access Patterns Using the BSC Performance Tools

The growing gap between processor and memory speeds results in complex m...
research
07/13/2018

Tools for Analyzing Parallel I/O

Parallel application I/O performance often does not meet user expectatio...
research
09/10/2021

An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis

In this paper, we proposed an effective and efficient multi-core shared-...

Please sign up or login with your details

Forgot password? Click here to reset