NUMAscope: Capturing and Visualizing Hardware Metrics on Large ccNUMA Systems

11/23/2021
by   Daniel J. Blueman, et al.
0

Cache-coherent non-uniform memory access (ccNUMA) systems enable parallel applications to scale-up to thousands of cores and many terabytes of main memory. However, since remote accesses come at an increased cost, extra measures are necessitated to scale the applications to high core-counts and process far greater amounts of data than a typical server can hold. In a similar manner to how applications are optimized to improve cache utilization, applications also need to be optimized to improve data-locality on ccNUMA systems to use larger topologies effectively. The first step to optimizing an application is to understand what slows it down. Consequently, profiling tools, or manual instrumentation, are necessary to achieve this. When optimizing applications on large ccNUMA systems, however, there are limited mechanisms to capture and present actionable telemetry. This is partially driven by the proprietary nature of such interconnects, but also by the lack of development of a common and accessible (read open-source) framework that developers or vendors can leverage. In this paper, we present an open-source, extensible framework that captures high-rate on-chip events with low overhead (<10 presented framework can operate in live or record mode, allowing both real-time monitoring or capture for later post-workload or offline analysis. High-resolution visualization is available either through a standards-based (web) interactive graphical interface or through a convenient textual interface for quick-look analysis.

READ FULL TEXT

page 1

page 3

research
09/01/2016

On-Chip Mechanisms to Reduce Effective Memory Access Latency

This dissertation develops hardware that automatically reduces the effec...
research
12/01/2017

An LLVM Instrumentation Plug-in for Score-P

Reducing application runtime, scaling parallel applications to higher nu...
research
01/29/2018

Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization

Visapult is a prototype application and framework for remote visualizati...
research
10/23/2020

Development of the complex system for the remote monitoring of the human heart rate

An implementation of the remote pulse monitoring system which allows obs...
research
12/26/2019

Performance benefits of Intel(R) OptaneTM DC persistent memory for the parallel processing of large neuroimaging data

Open-access neuroimaging datasets have reached petabyte scale, and conti...
research
01/01/2021

Optimizing Data Cube Visualization for Web Applications: Performance and User-Friendly Data Aggregation

Current open source applications which allow for cross-platform data vis...
research
05/06/2023

Memory Disaggregation: Advances and Open Challenges

Compute and memory are tightly coupled within each server in traditional...

Please sign up or login with your details

Forgot password? Click here to reset