PaSh: Light-touch Data-Parallel Shell Processing

07/18/2020
by   Nikos Vasilakis, et al.
0

This paper presents PaSh, a system for parallelizing POSIX shell scripts. Given a script, PaSh converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script – one that adds POSIX constructs to explicitly guide parallelism coupled with PaSh-provided Unix-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU commands – two large and commonly used groups – guides the annotation language and optimized aggregator library that PaSh uses. Finally, PaSh's PaSh's extensive evaluation over 44 unmodified Unix scripts shows significant speedups (0.89–61.1×, avg: 6.7×) stemming from the combination of its program transformations and runtime primitives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

An Order-aware Dataflow Model for Extracting Shell Script Parallelism

We present a dataflow model for extracting data parallelism latent in Un...
research
05/03/2022

Extended Abstract: Productive Parallel Programming with Parsl

Parsl is a parallel programming library for Python that aims to make it ...
research
07/31/2023

Verified Scalable Parallel Computing with Why3

BSML is a pure functional library for the multi-paradigm language OCaml....
research
05/27/2020

GraFS: Graph Analytics Fusion and Synthesis

Graph analytics elicits insights from large graphs to inform critical de...
research
11/13/2018

Task Graph Transformations for Latency Tolerance

The Integrative Model for Parallelism (IMP) derives a task graph from a ...
research
03/23/2021

Row-Polymorphic Types for Strategic Rewriting

We present a type system for strategy languages that express program tra...
research
11/04/2022

Tierkreis: A Dataflow Framework for Hybrid Quantum-Classical Computing

We present Tierkreis, a higher-order dataflow graph program representati...

Please sign up or login with your details

Forgot password? Click here to reset