Pipit: Enabling programmatic analysis of parallel execution traces

06/19/2023
by   Abhinav Bhatele, et al.
0

Performance analysis is an important part of the oft-repeated, iterative process of performance tuning during the development of parallel programs. Per-process per-thread traces (detailed logs of events with timestamps) enable in-depth analysis of parallel program execution to identify various kinds of performance issues. Often times, trace collection tools provide a graphical tool to analyze the trace output. However, these GUI-based tools only support specific file formats, are difficult to scale when the data is large, limit data exploration to the implemented graphical views, and do not support automated comparisons of two or more datasets. In this paper, we present a programmatic approach to analyzing parallel execution traces by leveraging pandas, a powerful Python-based data analysis library. We have developed a Python library, Pipit, on top of pandas that can read traces in different file formats (OTF2, HPCToolkit, Projections, Nsight, etc.) and provide a uniform data structure in the form of a pandas DataFrame. Pipit provides operations to aggregate, filter, and transform the events in a trace to present the data in different ways. We also provide several functions to quickly identify performance issues in parallel executions.

READ FULL TEXT

page 7

page 8

page 10

research
07/30/2022

Traveler: Navigating Task Parallel Traces for Performance Analysis

Understanding the behavior of software in execution is a key step in ide...
research
03/18/2021

Tools and Algorithms for SoC Communication Traces

In this paper, we study seven well-known trace analysis techniques both ...
research
02/08/2021

Learning from Shader Program Traces

Deep networks for image processing typically learn from RGB pixels. This...
research
05/27/2019

Detecting Missing Dependencies and Notifiers in Puppet Programs

Puppet is a popular computer system configuration management tool. It pr...
research
05/14/2022

ACT now: Aggregate Comparison of Traces for Incident Localization

Incidents in production systems are common and downtime is expensive. Ap...
research
04/02/2021

Daisen: A Framework for Visualizing Detailed GPU Execution

Graphics Processing Units (GPUs) have been widely used to accelerate art...
research
01/08/2020

Learning to Encode and Classify Test Executions

The challenge of automatically determining the correctness of test execu...

Please sign up or login with your details

Forgot password? Click here to reset