Cilkmem: Algorithms for Analyzing the Memory High-Water Mark of Fork-Join Parallel Programs

10/27/2019
by   Tim Kaler, et al.
0

Software engineers designing recursive fork-join programs destined to run on massively parallel computing systems must be cognizant of how their program's memory requirements scale in a many-processor execution. Although tools exist for measuring memory usage during one particular execution of a parallel program, such tools cannot bound the worst-case memory usage over all possible parallel executions. This paper introduces Cilkmem, a tool that analyzes the execution of a deterministic Cilk program to determine its p-processor memory high-water mark (MHWM), which is the worst-case memory usage of the program over all possiblep-processor executions. Cilkmem employs two new algorithms for computing the p-processor MHWM. The first algorithm calculates the exact p-processor MHWM in O(T_1 · p) time, where T_1 is the total work of the program. The second algorithm solves, in O(T_1) time, the approximate threshold problem, which asks, for a given memory threshold M, whether the p-processor MHWM exceeds M/2 or whether it is guaranteed to be less than M. Both algorithms are memory efficient, requiring O(p · D) and O(D) space, respectively, where D is the maximum call-stack depth of the program's execution on a single thread. Our empirical studies show that Cilkmem generally exhibits low overheads. Across ten application benchmarks from the Cilkbench suite, the exact algorithm incurs a geometric-mean multiplicative overhead of 1.54 for p=128, whereas the approximation-threshold algorithm incurs an overhead of 1.36 independent of p. In addition, we use Cilkmem to reveal and diagnose a previously unknown issue in a large image-alignment program contributing to unexpectedly high memory usage under parallel executions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2021

Efficient Parallel Graph Trimming by Arc-Consistency

Given a large data graph, trimming techniques can reduce the search spac...
research
05/28/2023

An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache

We investigate the utility of augmenting a microprocessor with a single ...
research
05/11/2002

Computing stable models: worst-case performance estimates

We study algorithms for computing stable models of propositional logic p...
research
07/24/2020

Corpse Reviver: Sound and Efficient Gradual Typing via Contract Verification

Gradually-typed programming languages permit the incremental addition of...
research
06/12/2018

Efficient Characterization of Hidden Processor Memory Hierarchies

A processor's memory hierarchy has a major impact on the performance of ...
research
06/02/2017

Capri: A Control System for Approximate Programs

Approximate computing trades off accuracy of results for resources such ...
research
01/05/2021

An Ownership Policy and Deadlock Detector for Promises

Task-parallel programs often enjoy deadlock freedom under certain restri...

Please sign up or login with your details

Forgot password? Click here to reset