LASSi: Metric based I/O analytics for HPC

06/10/2019
by   Karthee Sivalingam, et al.
0

LASSi is a tool aimed at analyzing application usage and contention caused by use of shared resources (filesystem or network) in a HPC system. LASSi was initially developed to support the ARCHER system where there are large variations in application requirements and occasional user complaints regarding filesystem performance manifested by variation in job runtimes or poor interactive response. LASSi takes an approach of defining derivative risk and ops metrics that relate to unusually high application I/O behaviour. The metrics are shown to correlate to applications that can experience variable performance or that may impact the performance of other applications. LASSi uses I/O statistics over time to provide application I/O profiles and has been automated to generate daily reports for ARCHER. We demonstrate how LASSi provides holistic I/O analysis by monitoring filesystem I/O, generating coarse profiles of filesystems and application runs and automating analysis of application slowdown using metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2019

MPCDF HPC Performance Monitoring System: Enabling Insight via Job-Specific Analysis

This paper reports on the design and implementation of the HPC performan...
research
01/12/2018

Effect of Meltdown and Spectre Patches on the Performance of HPC Applications

In this work we examine how the updates addressing Meltdown and Spectre ...
research
07/29/2019

Staged deployment of interactive multi-application HPC workflows

Running scientific workflows on a supercomputer can be a daunting task f...
research
01/20/2023

ARcode: HPC Application Recognition Through Image-encoded Monitoring Data

Knowing HPC applications of jobs and analyzing their performance behavio...
research
09/10/2021

An Execution Fingerprint Dictionary for HPC Application Recognition

Applications running on HPC systems waste time and energy if they: (a) u...
research
10/14/2019

DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems

The complexity of today's HPC systems increases as we move closer to the...
research
10/09/2020

Analyzing HPC Support Tickets: Experience and Recommendations

High performance computing (HPC) user support teams are the first line o...

Please sign up or login with your details

Forgot password? Click here to reset