A parallel workload has extreme variability in a production environment

01/11/2018
by   R. Henwood, et al.
0

Writing data in parallel is a common operation in some computing environments and a good proxy for a number of other parallel processing patterns. The duration of time taken to write data in large-scale compute environments can vary considerably. This variation comes from a number of sources, both systematic and transient. The result is a highly complex behavior that is difficult to characterize. This paper further develops the model for parallel task variability proposed in the paper "A parallel workload has extreme variability" (Henwood et. al 2016). This model is the Generalized Extreme Value (GEV) distribution. This paper further develops the systematic analysis that leads to the GEV model with the addition of a traffic congestion term. Observations of a parallel workload are presented from a High Performance Computing environment under typical production conditions, which include traffic congestion. An analysis of the workload is performed and shows the variability tends towards GEV as the order of parallelism is increased. The results are presented in the context of Amdahl's law and the predictive properties of a GEV models are discussed. A optimization for certain machine designs is also suggested.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/23/2020

Cpp-Taskflow v2: A General-purpose Parallel and Heterogeneous Task Programming System at Scale

The Cpp-Taskflow project addresses the long-standing question: How can w...
research
03/21/2021

Understanding performance variability in standard and pipelined parallel Krylov solvers

In this work, we collect data from runs of Krylov subspace methods and p...
research
04/23/2020

Cpp-Taskflow: A General-purpose Parallel and Heterogeneous Task Programming System at Scale

The Cpp-Taskflow project addresses the long-standing question: How can w...
research
02/17/2020

Re-evaluating scaling methods for distributed parallel systems

The paper explains why Amdahl's Law shall be interpreted specifically fo...
research
12/14/2020

Application-aware Congestion Mitigation for High-Performance Computing Systems

High-performance computing (HPC) systems frequently experience congestio...
research
06/26/2019

Interactive Physics-Inspired Traffic Congestion Management

This paper proposes a new physics-based approach to effectively control ...
research
01/20/2020

The Parallelism Motifs of Genomic Data Analysis

Genomic data sets are growing dramatically as the cost of sequencing con...

Please sign up or login with your details

Forgot password? Click here to reset