ScalAna: Automating Scaling Loss Detection with Graph Analysis

09/03/2020
by   Yuyang Jin, et al.
0

Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads. In this work, we design ScalAna that uses static analysis techniques to achieve the best of both worlds - it enables the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques to build a Program Structure Graph, which records the main computation and communication patterns as well as the program's control structures. At runtime, we adopt lightweight techniques to collect performance data according to the graph structure and generate a Program Performance Graph. With this graph, we propose a novel approach, called backtracking root cause detection, which can automatically and efficiently detect the root cause of scaling loss. We evaluate ScalAna with real applications. Results show that our approach can effectively locate the root cause of scaling loss for real applications and incurs 1.73 average for up to 2,048 processes. We achieve up to 11.11 improvement by fixing the root causes detected by ScalAna on 2,048 processes.

READ FULL TEXT
research
09/27/2019

Automatically Tracing Imprecision Causes in JavaScript Static Analysis

Researchers have developed various techniques for static analysis of Jav...
research
05/13/2022

Automatic Root Cause Quantification for Missing Edges in JavaScript Call Graphs (Extended Version)

Building sound and precise static call graphs for real-world JavaScript ...
research
03/09/2023

RCABench: Open Benchmarking Platform for Root Cause Analysis

Fuzzing has contributed to automatically identifying bugs and vulnerabil...
research
08/25/2022

Apptainer Without Setuid

Apptainer (formerly known as Singularity) since its beginning implemente...
research
05/06/2019

Multi-threaded Output in CMS using ROOT

CMS has worked aggressively to make use of multi-core architectures, rou...
research
12/20/2019

Online Analysis of Distributed Dataflows with Timely Dataflow

We present ST2, an end-to-end solution to analyze distributed dataflows ...
research
10/31/2018

Making root cause analysis feasible for large code bases: a solution approach for a climate model

Applications that simulate complex physical processes can be composed of...

Please sign up or login with your details

Forgot password? Click here to reset