PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

04/19/2018

An open-source job management framework for parameter-space exploration: OACIS

We present an open-source software framework for parameter-space explora...
05/11/2021

Distributed In-memory Data Management for Workflow Executions

Complex scientific experiments from various domains are typically modele...
12/09/2019

Lightweight Container-based User Environment

Modern operating systems all support multi-users that users could share ...
01/26/2015

JMS: A workflow management system and web-based cluster front-end for the Torque resource manager

Motivation: Complex computational pipelines are becoming a staple of mod...
04/16/2019

Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics

In a new effort to make our research transparent and reproducible by oth...
09/03/2019

Large Scale Parallelization Using File-Based Communications

In this paper, we present a novel and new file-based communication archi...
05/09/2019

parasweep: A template-based utility for generating, dispatching, and post-processing of parameter sweeps

We introduce parasweep, a free and open-source utility for facilitating ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Computational approaches, such as modeling and simulation, are widely used to find information and patterns otherwise not readily available. These applications are often complex due to the sheer size of the parameter space and long run times (Walker and Guiang, 2007). Moreover, applications may consist of compound basic workflow structures (e.g. process, pipeline, data distribution, data aggregation, and data redistribution (Bharathi et al., 2008)). As a consequence, testing, monitoring, and validating workflows is not trivial because parameters may come from disparate sources (e.g., command line arguments, environment variables, files, or a combination of these).

High-performance clusters and grid systems are practical for performing parameter studies due to their large collection of processors and storage resources (Król et al., 2016; Deelman et al., [n. d.]; Tilson et al., 2008), although local computers can also be used due to the advancement of graphic processors and other accelerators (Ino et al., 2014). The setup, submission, and orchestration of such jobs in computing clusters may be a challenge, particularly to non-programmers or novice users for conducting parameter studies in a parallel or distributed fashion (DeVivo et al., 2001; Prodan and Fahringer, 2004). Previous work has evaluated static and dynamic scheduling algorithms for managing workflow structures efficiently in cluster and grid systems (Yu et al., 2008; Smanchat et al., 2009; Buyya et al., 2002). Clearly, scientific research has benefited from systematic parameter studies as a means to find an optimal or reasonable set of parameters (Sedlmair et al., 2014; Fey et al., 2004; Tan et al., 2005).

This work presents the ongoing effort of designing PaPaS, an easy-to-use Python 3 framework for describing and executing parameter and performance studies in local and cluster computers. PaPaS serves as a lightweight workflow manager configured via a simple keyword/value workflow language. In Section 2 we present examples of existing tools for parameter studies. A representative scenario of job’s execution in cluster systems is used as motivation in Section 3, contributions of PaPaS are included here as well. Section 4 gives a general overview of PaPaS architectural design, followed by a description of its workflow language, Section 5. As case studies, a parameter study of a multi-agent NetLogo model (Wilensky and Rand, 2015) is found in Section 6, and a parameter study of a matrix multiply using OpenMP threading library is shown in Section 7. The remainder of the manuscript summarizes the work and describes several enhancements that will make PaPaS interact well with existing workflow management tools.

2. Related Work

There are numerous available tools and frameworks for running parameter studies in cluster and grid-enabled systems. Commonly, these workflow management systems need to be installed by administrators as system-wide software, and include web interfaces for user interaction (Ries and Schröder, 2010; Wolstencroft et al., 2013; Volkov and Sukhoroslov, 2015; Smirnov et al., 2016). Also, some of these tools have a high number of modules interacting with one another which makes the system not ideal for either novice computer users or simple jobs. It is worth noting that workflow management systems have been studied since the advent of the Internet and are widely used. Several mature projects are available, for example, Taverna (Missier et al., 2010), Pegasus (Deelman et al., 2015), and Nimrod/K (Abramson et al., 2008).

In the area of parameter studies, OACIS (Murase et al., 2017) is a management framework for exploring parameter spaces. It provides a web interface for submitting and monitoring jobs sent to remote system. A limitation is that the simulation can use inputs provided from the command line or a file. OpenMOLE (Reuillon et al., 2013) is a framework that supports distributed NetLogo (Wilensky, 2008) runs, at the expense that a configuration file based on a domain specific language is generated by the user. This configuration file includes parameters and tasks, and controls the job’s distribution. A simpler workflow application, Snakemake, runs as a single user’s program and is written in Python (Koster and Rahmann, 2012). Snakemake is based on GNU Make syntax and infers task dependencies from files dependencies.

3. PaPaS Motivation

High-performance clusters are desired to have a high utilization activity to quantify a positive return-of-investment (Fulton et al., 2017; Simakov et al., 2015), since these are costly systems to build and maintain. Software monitoring tools, e.g., XDMoD (Furlani et al., 2013; Palmer et al., 2015)

, are valuable for gathering vast amounts of performance metrics used to improved large-scale multi-user systems. The execution behavior of jobs is affected by the submission order, scheduling heuristics 

(Zaharia et al., 2010), and utilization rate. Figure 1

presents cases representable of the start and stop times of 25 jobs. For every task the scheduler has to handle the start and stop actions, this overhead can be reduced if multiple user jobs are batched together into a single cluster job. Parameter studies require the execution of many application’s instances, this is a combinatorial optimization problem 

(Blum and Roli, 2003).

Figure 1. Representation of execution behavior of 25 jobs running in a managed multi-user cluster under different forms of submission, scheduling, and cluster activity. For each submission form, all jobs are submitted simultaneously. The optimal scenario corresponds to submitting 25 jobs to a cluster with at least 25 available compute nodes. Every job starts and ends at the same time. The serial case occurs when the scheduler decides to run one job at a time, without delays between the end and start of consecutive tasks. If the cluster activity is high or the scheduler is not fair enough, consecutive tasks will have different delays in between and a common scenario takes place.

The motivation of this work is to provide a simple methodology for performing parameter studies for general application classes, while improving a system’s utilization and reducing overall completion time. PaPaS is a versatile framework for describing parameter and performance studies using flexible configuration files. Section 5.1 exposes the combinatorial approach used in PaPaS to enumerate all possible unique workflow instances. The primary contributions of the PaPaS framework are:

  • Deploying a user-space tool for expressing workflows targeted at parameter and performance studies, with no administrator or system-wide installations

  • Expressiveness of parameter study workflows using common text formats (i.e., YAML, JSON, INI), preventing users to write complex scripts

  • Combinations of parameters can be a mix of command line arguments, environment variables, files, and simple regular expressions for file contents, and

  • Support for batching user jobs as a single cluster job using MPI library

Another motivation for the PaPaS framework is its applicability for evaluating machine learning and natural language processing algorithms  

(Mayer et al., 2018)

. Due to the numerous set of tunable hyperparameters  

(Witt and Seifert, 2017) and the breadth of machine learning toolkits available, both parameter and performance studies are labor-intensive. PaPaS framework can provide immediate benefits to such scenarios.

4. PaPaS Framework Design

Previous works have shown that parameter studies require several operations for effectively managing the workflows: value propagation via dependency graphs, orchestration, I/O management, monitoring, provenance, visualization, on-line feedback, and others (Shi and Dongarra, 2006; DeVivo et al., 2001; Walker and Guiang, 2007). The PaPaS framework is a collection of modular systems, each with unique functionality and independent interfaces. Figure 2 presents the overall architectural design of PaPaS. The primary system components are the parameter study, workflow, cluster, and visualization engines.

4.1. Parameter Study Engine

A parameter study represents a set of workflows to be executed, where a workflow corresponds to an instance having a unique parameter combination. Users write a parameter file using a keyword-based workflow description language which is described in Section 5

. A workflow’s description can be divided across multiple parameter files; this allows composition and re-usability of task configurations. Parameter files follow either YAML, JSON, or INI-like data serialization formats with minor constraints. The processing of these files consists of a parsing and syntax validation step, followed by string interpolation for parameters that were specified with multiple values. The operation of interpolation identifies all the possible unique parameter combinations and forwards this information to a workflow generator which in turn spawns a workflow engine instance per combination. Parameter study configurations are stored in a file database as part of the monitoring activity. PaPaS provides checkpoint-restart functionality in case of fault or a deliberate pause/stop operation. A parameter study’s state can be saved in a workflow file and reloaded at a later time. Another method of defining a parameter study is through the workflow generator Python 3 interface. This mechanism adds the hooks to embed PaPaS as a task of a larger user-defined workflow.

4.2. Workflow Engine

Workflow engines are a core component as they orchestrate the execution of workflow instances. The task generator takes a workflow description and constructs a directed acyclic graph (DAG) where nodes correspond to indivisible tasks. A task manager controls the scheduling and monitoring of tasks. PaPaS runs easily on a local laptop or workstation. For cluster systems, workflow tasks are delegated to the cluster engine component. Several factors affect scheduling heuristics such as task dependencies, availability and capability of computing resources, and the application(s) behavior. A task profiler measures each task’s runtime, but currently this only serves as performance feedback to the user. Workflow engine actions, task/workflow statistics, and logs are stored in a per-workflow file storage database; this information is later used to include provenance details at either workflow completion or a checkpoint. A visualization engine enables access to a view of the workflow’s DAG. The workflow engine communicates the progression of states to the visualization engine.

4.3. Cluster Engine

The cluster engine is a component that serves as an interface for both managed and unmanaged computer clusters. A managed cluster is assumed to be used concurrently by multiple users and makes use of a batch system (e.g., PBS, SGE), while an unmanaged cluster is mostly single-user and has a SSH setup. For managed clusters, the common approach is to submit a single task per batch job. Single task submissions are mainly applicable for applications that achieve a high-utilization of computing resources or have long execution times, and adding concurrent task executions hinder performance. For single-node and single-core applications, submitting a large number of jobs to a multi-tenant system may not necessarily be the best approach. PaPaS workflow and cluster engines enable grouping intra/inter-workflow tasks as a single batch job. The main mechanism for grouping tasks as single jobs is using a C++ MPI task dispatcher. In some cases, task grouping increases the cluster’s utilization efficiency, reduces batch/scheduling operations, and improves turnaround time of jobs. Section 6 presents a case study portraying these effects.

4.4. Visualization Engine

The DAGs generated by the workflow engine are used to construct visual graphs of the overall workflow as well as the current state of the processing. PaPaS utilizes a wrapper over PyGraphviz (Hagberg et al., [n. d.]) to build and update graphs on-demand. A workflow visualization can be viewed and exported in text or common image formats. This capability can also be enabled as a validation method of the parameter study configuration prior to any execution taking place.

Figure 2. PaPaS architecture consists of four principal and modular engines: (1) parameter study, (2) workflow, (3) cluster, and (4) visualization. User interacts with the parameter study engine using parameter files or the Python 3 API. A workflow engine manages the execution of workflows as well as gathering profiling and provenance information. The visualization engine serves a visual aid for validating a parameter study and for visual monitoring.

5. PaPaS Workflow Description Language

This section describes the workflow description language (WDL) specification used by the PaPaS framework. The PaPaS WDL consists of a set of keywords that can describe individual tasks, task dependencies, parameter sets, and general configurations. This is in contrast to the common description methods for workflows and parameter studies: parametric modeling languages 

(Abramson et al., 2000), DAG languages (Farkas and Kacsuk, 2011), XML (Deelman et al., 2015), task data flow (Wozniak et al., 2013), declarative languages (Reuillon et al., 2010), libraries extending existing programming languages (Bergstra et al., 2013), template systems (Lorca et al., 2011; Casanova et al., 2000), graphical languages (Yarrow et al., 2000), UML diagrams (Dumas and Ter Hofstede, 2001), GNU Make-based (Koster and Rahmann, 2012), test systems (Yu et al., 2013), and others (Van Der Aalst and Ter Hofstede, 2005). An advantage of using a keyword-value WDL is that it can impose stricter constraints to reduce complex and convoluted expressions that are allowed on other WDLs, as a driving philosophy of PaPaS is to be simple and accessible to support non-programmers.

PaPaS’s WDL is based on a mix of lists and associative structures. As a consequence, it is serializable and can be converted to common human-readable formats such as YAML, JSON, and INI. Workflow descriptions are transformed into a common internal format. The following is the general specification of rules for configuring parameter studies using YAML format.

  • A parameter study consists of tasks (or sections), identified by a task (or section) as the only key, and followed by up to two levels of keyword-value entries. That is, the first set of values can themselves be a pair of keyword-value entries.

  • The delimiter for keyword-value entries is the colon character.

  • Indentation, tab or whitespace, is used to make a value pertain to a particular keyword.

  • A single-line comment is a line that starts with a pound or hash symbol (#).

  • A keyword can be specified using any alphanumeric character.

  • All keywords are parsed as strings and values are inferred from written format.

  • Keywords that are not predefined are considered as a user-defined keywords and can be used in value interpolations.

  • Ranges with a step size are supported for numerical values using the notation start:step:end.

  • A task is identified by the command keyword.

  • Value interpolation uses a flat associative array syntax.

  • Intra-task interpolation using ${} syntax is allowed using values from both entry levels (e.g., ${keyword} and ${keyword:value}).

  • Inter-task interpolation using ${} syntax is allowed using values from both entry levels (e.g., ${task:keyword} and ${task:keyword:value}).

The list below presents a list of common keywords corresponding to PaPaS WDL:

  • command – string representing the command line to run

  • name – string describing the task

  • environ – dictionary of environment variables where keywords are the actual names of the environment variables.

  • after – list of tasks dependencies, prerequisites

  • infiles – dictionary of input files, keywords are arbitrary

  • outfiles – dictionary of output files, keywords are arbitrary

  • substitute – used for interpolation of partial file contents. Expects a keyword/value pair where keyword is a Python 3 regular expression and value is a list of strings to be used instead.

  • parallel – mode to use for parallelism, (e.g., ssh, MPI)

  • batch – batch system of cluster (e.g., PBS)

  • nnodes – number of nodes to use for a cluster job

  • ppnode – number of task processes to run per nodes

  • hosts – hostnames or IP addresses of compute nodes

  • fixed – list of parameters to be fixed. All of these parameters need to have the same number of values to allow ordered one-to-one mappings.

  • sampling – samples a subset of the parameter space based on a given distribution (uniform, random).

5.1. Parameter combinatorial approach

A key aspect of PaPaS framework is its approach for expressing parameter combinations easily while being general enough for most parameter and performance studies. Every parameter and its values are implicitly used to generate the Cartesian product of parametric combinations. Each unique combination of parameters represents a workflow to be executed.

Parameters have a unique name and are allowed to be multi-valued. Consider a set of parameters, , where is a parameter with possible values and corresponds to the value of . A total of workflows are generated automatically by the PaPaS workflow engine. Then, this workflow set, , is defined as

A PaPaS workflow is an instance of . Programmatically, this can be implemented as nested loop structures. In some cases, the Cartesian product of parameters is either not desired or too large to run all workflow combinations in a reasonable amount of time. PaPaS utilizes the keywords fixed and sampling to control the combinatorial set of workflows, . Parameters listed in the fixed set need to have the same number of values to allow one-to-one mappings between each other. Workflows will be generated from the Cartesian product of parameters not listed as fixed combined with a single set of values made from the ordered values of fixed parameters. Multiple fixed statements are allowed in a PaPaS parameter file, this further generalizes combinations and can be used to specify constant single-valued parameters. Programmatically, this can be implemented by moving all the fixed parameters into the outermost loop structures (grouped by same fixed clauses). For example, consider the total number of workflows generated for parameters with and listed in the same fixed clause. Then,

where represents an incomplete subset of workflows (parameters are missing).

6. Parameter sweep: Distributed NetLogo model

In this example, we used the UTK’s Advanced Computing Facility (ACF) cluster to implement a parameter sweep of a NetLogo Behavioral model. A Lustre parallel file system and the PBS batch system are available. The model simulates the transmission of Clostridium difficile in a healthcare setting and explicitly incorporates healthcare workers as vectors of transmission while tracking individual patient antibiotic histories and contamination levels of ward rooms due to C. difficile. We used PaPaS to deploy multiple instances of NetLogo by varying some XML elements from the original input file. Input files that were exactly the same for each workflow instance were placed in a NFS directory, so only a single copy of each was made. Figures 

3 and 4 show the scheduling and runtime results of 25 models with varying number of compute nodes per job (N) and number of MPI processes per job (P). The best results correspond to the scheme that clustered jobs concurrently in multiple nodes (i.e., 2N-1P and 2N-2P), since the overall completion time was lowest as well as the number of scheduler interactions. On the other hand, the worst scheme resulted from submitting jobs independently and letting the cluster scheduler manage all the jobs.

Figure 3. Initial execution behavior of 25 NetLogo simulations using different grouping schemes in terms of compute nodes (N) and number of MPI processes per node (P). Time begins as soon as a job started execution. Note that the scheduler start times have the greater variability.
Figure 4. Final execution behavior of NetLogo simulations from Figure 3. Each simulation’s total execution time was approximately 30 minutes and the cluster’s utilization was always above 70%. PaPaS technique of grouping jobs in MPI-supported clusters is closer to the optimal case (see Figure 1) while the scheduler operates in the normal regime.

7. Performance study: Local matrix multiply application using OpenMP

Many times software developers need to profile different algorithms, environment configurations, and parameters for decision-making in terms of algorithms, hardware, etc. One such example of a performance study is performing weak and strong scaling studies at the processor level. OpenMP is a common library for enabling multiple threads in compute intensive code regions. Scaling studies run programs by varying the number of OpenMP threads and the input size. It is common to control the threading configuration via OpenMP environment variables. PaPaS is suitable for such scenarios as it allows expressing such experiments fairly concise. For example, consider an OpenMP-based matrix multiply application called matmul that multiplies a pair of randomly generated squared matrices and has two positional command line options: (1) matrix size and (2) file for resulting matrix. Let us show how to configure a PaPaS scaling study by running matmul for input sizes 16–16384 using multiples of 2, and varying the number of OpenMP threads from 1–8 in steps of 1. This study corresponds to 88 independent executions of matmul. Since PaPaS measures the runtime of each task, the application is not mandated to have an internal timer (unless higher precision is needed). Additional profiling statistics are up to the user’s applications. Figure 5 shows a PaPaS parameter file adhering to the specifications of the scaling study. Figure 6 shows all the workflow instances generated for the matmul application.

Figure 5. Example of a PaPaS workflow configuration using YAML for an OpenMP-based matrix multiply application. The study performs tasks for both weak and strong scaling. Matrix size is varied by doubling and number of threads is varied in steps of 1. PaPaS keywords are shown in boldface.
Figure 6. Set of workflow instances generated by PaPaS matmul parameter file from Figure 5.

8. Conclusions

Parameter sweep applications and benchmark applications are common use case examples of scenarios in which scientists seek to identify sets of suitable input parameters and perform application execution. This work present PaPaS, a Python 3 generic framework for creating parameter studies in local, distributed (MPI or SSH), and shared (PBS) computer systems. A study can consist of a single task, multiple independent tasks, or multiple dependent tasks (i.e., workflows). A parameter study can be described using one or multiple simple, but powerful, parameter files, thus allowing greater flexibility than most existing tools. Moreover, PaPaS supports parameters consisting of environment variables, command line options, files as a whole, file contents, or a combination of these. PaPaS provides mechanisms to express ranges, Cartesian product, bijection, and constant parameters, thus, enabling a wide range of possibilities for the user. Each unique combination of parameters triggers a workflow instance which is executed independently of other workflow instances. PaPaS orchestrates the execution of workflow instances, measures runtimes, and provides the user with provenance information. By providing these capabilities, PaPaS promises to enhance computational and data science productivity.

9. Future Work

The PaPaS framework provides exciting support for computational and data science users to achieve higher productivity. Despite its capabilities, there are numerous extensions to PaPaS under consideration to provide even more usability, flexibility, and productivity. Future efforts are to integrate PaPaS workflows into grid workflow systems, such as Taverna and Pegasus, to readily extend the potential PaPaS user community. One potential approach is to allow the exchange of PaPaS task description files with Pegasus and similar actively developed workflow management systems. A PaPaS task internal representation can be converted to define a Pegasus workflow via the Pegasus Python libraries for writing direct acyclic graphs in XML (DAX). In this scheme, PaPaS would serve as a front-end tool for defining parameter studies while leveraging a wide array of features provided by the Pegasus framework.

Currently, the PaPaS design does not supports nor provides a mechanism to express automatic aggregation of files, even if tasks utilize the same names for output files. Some difficulties that arise with automatic aggregation of files are content ordering and parsing tasks correctly (replicated file names). In order to support automatic aggregation, additional keywords will need to be included in the PaPaS workflow language.

An additional feature to aid in workflow creation is to use a graphical interface from which the user can define parameter studies. This extension can be designed with capabilities to create, modify, and/or remove tasks from workflows, as well as for viewing workflow graphs.

Although there are tools that support inline Python code as the commands to be executed (Koster and Rahmann, 2012), this ability is constrained from PaPaS as workflow configuration files are limited by design.

The PaPaS framework will be extended to support tools for measuring application performance, in addition to the current runtime measures. One popular example of such tools is PAPI (Terpstra et al., 2010). The current design only measures the runtime of each parameter study workflow, workflow instance, and task. Higher-detail of profiling metrics could be useful for: (1) providing the user with additional profiling information, mainly for benchmarking studies, and (2) as feedback for improving workflow planning and scheduling decisions.

There is still work to investigate for managing and scheduling parameter workflows. For example, consider a parameter workflow containing tasks with same parameters and tasks with multi-valued parameters. Then, the user may wish to dictate that the set of workflows will follow a depth-first or breadth-first execution.

These kinds of additional features could significantly broaden the usefulness and resultant productivity improvements provided by PaPaS.

Acknowledgements.
This research is based upon work supported in part by Sponsor UT-Battelle, LLC Rl under Contract No:DE-AC05-00OR22725, Grant #3 a joint program with the U.S. Department of Veterans Affairs under the Million Veteran Project Computational Health Analytics for Medical Precision to Improve Outcomes Now (MVP-CHAMPION), and by the joint Sponsor DMS/NIGMS Mathematical Biology Program Rlthrough NIH award No:R01GM113239 Grant #3

References

  • (1)
  • Abramson et al. (2008) David Abramson, Colin Enticott, and Ilkay Altinas. 2008. Nimrod/K: Towards massively parallel dynamic grid workflows. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE Press, 24.
  • Abramson et al. (2000) David Abramson, Jonathan Giddy, and Lew Kotler. 2000. High performance parametric modeling with Nimrod/G: Killer application for the global grid?. In Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International. IEEE, 520–528.
  • Bergstra et al. (2013) James Bergstra, Dan Yamins, and David D Cox. 2013. Hyperopt: A Python library for optimizing the hyperparameters of machine learning algorithms. In Proceedings of the 12th Python in Science Conference. 13–20.
  • Bharathi et al. (2008) Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and Karan Vahi. 2008. Characterization of scientific workflows. In Workflows in Support of Large-Scale Science, 2008. WORKS 2008. 3rd Workshop on. IEEE, 1–10.
  • Blum and Roli (2003) Christian Blum and Andrea Roli. 2003. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys (CSUR) 35, 3 (2003), 268–308.
  • Buyya et al. (2002) Rajkumar Buyya, David Abramson, Jonathan Giddy, and Heinz Stockinger. 2002. Economic models for resource management and scheduling in grid computing. Concurrency and Computation: Practice and Experience 14, 13-15 (2002), 1507–1542.
  • Casanova et al. (2000) Henri Casanova, Francine Berman, Graziano Obertelli, and Richard Wolski. 2000. The AppLeS parameter sweep template: User-level middleware for the grid. In Supercomputing, ACM/IEEE 2000 Conference. IEEE, 60–60.
  • Deelman et al. ([n. d.]) Ewa Deelman, Tom Peterka, Ilkay Altintas, Christopher D Carothers, Kerstin Kleese van Dam, Kenneth Moreland, Manish Parashar, Lavanya Ramakrishnan, Michela Taufer, and Jeffrey Vetter. [n. d.]. The future of scientific workflows. The International Journal of High Performance Computing Applications ([n. d.]), 1094342017704893.
  • Deelman et al. (2015) Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. 2015. Pegasus: A workflow management system for science automation. Future Generation Computer Systems 46 (2015), 17–35. https://doi.org/10.1016/j.future.2014.10.008 Funding Acknowledgements: NSF ACI SDCI 0722019, NSF ACI SI2-SSI 1148515 and NSF OCI-1053575.
  • DeVivo et al. (2001) Adrian DeVivo, Maurice Yarrow, Karen M McCann, and Bryan Biegel. 2001. A comparison of parameter study creation and job submission tools. (2001).
  • Dumas and Ter Hofstede (2001) Marlon Dumas and Arthur HM Ter Hofstede. 2001. UML activity diagrams as a workflow specification language. In International Conference on the Unified Modeling Language. Springer, 76–90.
  • Farkas and Kacsuk (2011) Zoltan Farkas and Peter Kacsuk. 2011. P-GRADE portal: A generic workflow system to support user communities. Future Generation Computer Systems 27, 5 (2011), 454–465.
  • Fey et al. (2004) Dietmar Fey, Marcus Komann, and Christian Kauhaus. 2004. A framework for optimizing parameter studies on a cluster computer by the example of micro-system design. In European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, 436–441.
  • Fulton et al. (2017) Ben Fulton, Steven Gallo, Robert Henschel, Tom Yearke, Katy Börner, Robert L DeLeon, Thomas Furlani, Craig A Stewart, and Matt Link. 2017. XDMoD value analytics: A tool for measuring the financial and intellectual ROI of your campus cyberinfrastructure facilities. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact. ACM, 49.
  • Furlani et al. (2013) Thomas R Furlani, Barry L Schneider, Matthew D Jones, John Towns, David L Hart, Steven M Gallo, Robert L DeLeon, Charng-Da Lu, Amin Ghadersohi, Ryan J Gentner, et al. 2013. Using XDMoD to facilitate XSEDE operations, planning and analysis. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery. ACM, 46.
  • Hagberg et al. ([n. d.]) AA Hagberg, DA Schult, and M Renieris. [n. d.]. PyGraphviz–A Python interface to the Graphviz graph layout and visualization package.
  • Ino et al. (2014) Fumihiko Ino, Kentaro Shigeoka, Tomohiro Okuyama, Masaya Motokubota, and Kenichi Hagihara. 2014. A parallel scheme for accelerating parameter sweep applications on a GPU. Concurrency and Computation: Practice and Experience 26 (2014), 516–531.
  • Koster and Rahmann (2012) Johannes Koster and Sven Rahmann. 2012. Snakemake: A scalable bioinformatics workflow engine. Bioinformatics 28, 19 (2012), 2520–2522.
  • Król et al. (2016) Dariusz Król, Renata Słota, and Jacek Kitowski. 2016. Parameter studies on heterogeneous computing infrastructures with the Scalarm platform. In High Performance Computing and Simulation (HPCS), 2016 International Conference on. IEEE, 9–17.
  • Lorca et al. (2011) Alejandro Lorca, Eduardo Huedo, and Ignacio M Llorente. 2011. The Grid [Way] Job Template Manager, a tool for parameter sweeping. Computer Physics Communications 4, 182 (2011), 1047–1060.
  • Mayer et al. (2018) Benjamin Mayer, Josh Arnold, Edmon Begoli, Everett Rush, Michael Drewry, Kris Brown, Eduardo Ponce, and Sudarshan Srinivasan. 2018. Evaluating text analytic frameworks for mental health surveillance. In Proceedings of the 1st Workshop on Emerging Data Engineering Methods and Approaches for Precision Medicine (DEPM’18). IEEE Xplore. in press.
  • Missier et al. (2010) Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Wei Tan, Alexandra Nenadic, Ian Dunlop, Alan Williams, Tom Oinn, and Carole Goble. 2010. Taverna, reloaded. In International Conference on Scientific and Statistical Database Management. Springer, 471–481.
  • Murase et al. (2017) Y Murase, T Uchitane, and N Ito. 2017. An open-source job management framework for parameter-space exploration: OACIS. In Journal of Physics: Conference Series, Vol. 921. IOP Publishing, 012001.
  • Palmer et al. (2015) Jeffrey T Palmer, Steven M Gallo, Thomas R Furlani, Matthew D Jones, Robert L DeLeon, Joseph P White, Nikolay Simakov, Abani K Patra, Jeanette Sperhac, Thomas Yearke, et al. 2015. Open XDMoD: A tool for the comprehensive management of high-performance computing resources. Computing in Science & Engineering 17, 4 (2015), 52–62.
  • Prodan and Fahringer (2004) Radu Prodan and Thomas Fahringer. 2004. ZENTURIO: A grid middleware-based tool for experiment management of parallel and distributed applications. J. Parallel and Distrib. Comput. 64, 6 (2004), 693–707.
  • Reuillon et al. (2010) Romain Reuillon, Florent Chuffart, Mathieu Leclaire, Thierry Faure, Nicolas Dumoulin, and David Hill. 2010. Declarative task delegation in OpenMOLE. In High Performance Computing and Simulation (HPCS), 2010 International Conference on. IEEE, 55–62.
  • Reuillon et al. (2013) Romain Reuillon, Mathieu Leclaire, and Sebastien Rey-Coyrehourcq. 2013. OpenMOLE, a workflow engine specifically tailored for the distributed exploration of simulation models. Future Generation Computer Systems 29, 8 (2013), 1981–1990.
  • Ries and Schröder (2010) Christian Benjamin Ries and Christian Schröder. 2010. ComsolGrid–A framework for performing large-scale parameter studies using COMSOL Multiphysics and the Berkeley Open Infrastructure for Network Computing (BOINC). Applied Sciences (2010), 8.
  • Sedlmair et al. (2014) Michael Sedlmair, Christoph Heinzl, Stefan Bruckner, Harald Piringer, and Torsten Möller. 2014. Visual parameter space analysis: A conceptual framework. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2161–2170.
  • Shi and Dongarra (2006) Zhiao Shi and Jack J Dongarra. 2006. Scheduling workflow applications on processors with different capabilities. Future Generation Computer Systems 22, 6 (2006), 665–675.
  • Simakov et al. (2015) Nikolay A Simakov, Joseph P White, Robert L DeLeon, Amin Ghadersohi, Thomas R Furlani, Matthew D Jones, Steven M Gallo, and Abani K Patra. 2015.

    Application kernels: HPC resources performance monitoring and variance analysis.

    Concurrency and Computation: Practice and Experience 27, 17 (2015), 5238–5260.
  • Smanchat et al. (2009) Sucha Smanchat, Maria Indrawan, Sea Ling, Colin Enticott, and David Abramson. 2009. Scheduling multiple parameter sweep workflow instances on the grid. In e-Science, 2009. e-Science’09. 5th IEEE International Conference on. IEEE, 300–306.
  • Smirnov et al. (2016) Sergey Smirnov, Oleg Sukhoroslov, and Sergey Volkov. 2016. Integration and combined use of distributed computing resources with Everest. Procedia Computer Science 101 (2016), 359–368.
  • Tan et al. (2005) SH Tan, Ryuji Inai, Masaya Kotaki, and Seeram Ramakrishna. 2005. Systematic parameter study for ultra-fine fiber fabrication via electrospinning process. Polymer 46, 16 (2005), 6128–6134.
  • Terpstra et al. (2010) Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009. Springer Berlin Heidelberg, Berlin, Heidelberg, 157–173.
  • Tilson et al. (2008) Jeffrey L Tilson, Mark SC Reed, and Robert J Fowler. 2008. Workflows for performance evaluation and tuning. In Cluster Computing, 2008 IEEE International Conference on. IEEE, 79–88.
  • Van Der Aalst and Ter Hofstede (2005) Wil MP Van Der Aalst and Arthur HM Ter Hofstede. 2005. YAWL: Yet another workflow language. Information Systems 30, 4 (2005), 245–275.
  • Volkov and Sukhoroslov (2015) Sergey Volkov and Oleg Sukhoroslov. 2015. A generic web service for running parameter sweep experiments in distributed computing environment. Procedia Computer Science 66 (2015), 477–486.
  • Walker and Guiang (2007) Edward Walker and Chona Guiang. 2007. Challenges in executing large parameter sweep studies across widely distributed computing environments. In Proceedings of the 5th IEEE Workshop on Challenges of Large Applications in Distributed Environments. ACM, 11–18.
  • Wilensky (2008) Uri Wilensky. 2008. NetLogo 4.0. 4. (2008).
  • Wilensky and Rand (2015) Uri Wilensky and William Rand. 2015. An introduction to agent-based modeling: modeling natural, social, and engineered complex systems with NetLogo. MIT Press.
  • Witt and Seifert (2017) Nils Witt and Christin Seifert. 2017. Understanding the influence of hyperparameters on text embeddings for text classification tasks. In International Conference on Theory and Practice of Digital Libraries. Springer, 193–204.
  • Wolstencroft et al. (2013) Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, et al. 2013. The Taverna workflow suite: Designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Research 41, W1 (2013), W557–W561.
  • Wozniak et al. (2013) Justin M Wozniak, Timothy G Armstrong, Michael Wilde, Daniel S Katz, Ewing Lusk, and Ian T Foster. 2013. Swift/T: Scalable data flow programming for many-task applications. In ACM SIGPLAN Notices, Vol. 48. ACM, 309–310.
  • Yarrow et al. (2000) Maurice Yarrow, Karen M McCann, Rupak Biswas, and Rob F Van der Wijngaart. 2000. An advanced user interface approach for complex parameter study process specification on the information power grid. In International Workshop on Grid Computing. Springer, 146–157.
  • Yu et al. (2008) Jia Yu, Rajkumar Buyya, and Kotagiri Ramamohanarao. 2008. Workflow scheduling algorithms for grid computing. In Metaheuristics for Scheduling in Distributed Computing Environments. Springer, 173–214.
  • Yu et al. (2013) Linbin Yu, Yu Lei, Raghu N Kacker, and D Richard Kuhn. 2013. Acts: A combinatorial test generation tool. In Software Testing, Verification and Validation (ICST), 2013 IEEE Sixth International Conference on. IEEE, 370–375.
  • Zaharia et al. (2010) Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European Conference on Computer Systems. ACM, 265–278.