Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

07/06/2021
by   Justin M. Wozniak, et al.
0

Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.

READ FULL TEXT VIEW PDF
06/11/2021

Toward Efficient Interactions between Python and Native Libraries

Python has become a popular programming language because of its excellen...
04/04/2019

Automated Fortran--C++ Bindings for Large-Scale Scientific Applications

Although many active scientific codes use modern Fortran, most contempor...
10/26/2018

AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

The last improvements in programming languages, programming models, and ...
04/14/2020

A Fortran-Keras Deep Learning Bridge for Scientific Computing

Implementing artificial neural networks is commonly achieved via high-le...
10/17/2018

Asynchronous Execution of Python Code on Task Based Runtime Systems

Despite advancements in the areas of parallel and distributed computing,...
01/29/2013

PyXNAT: XNAT in Python

As neuroimaging databases grow in size and complexity, the time research...
05/06/2019

Parsl: Pervasive Parallel Programming in Python

High-level programming languages such as Python are increasingly used to...

I Introduction

An increasing number of modern scientific applications and tools are built by using a variety of languages and libraries. These complex software products combine performance-critical libraries implemented in native code (C, C++, Fortran) with high-level functionality expressed in rapidly developed and modified scripts. Additional specialized language-specific features may be used for concurrency, I/O, the use of accelerators, and so on. These development techniques have been used in a wide range of application domains, from materials science and protein analysis to power grid simulation.

Such applications and tools are commonly developed with the following software development pattern. First, a native code library is built or repurposed for the core processing. Second, a collection of scripts is built up around the core library or program to express the complex, often dynamic, but less performance-critical coordination logic. Such “wrapper scripts” may be developed with shell, Python, Tcl, or other tools. Third, when additional scalability is required, native code or additional scripts are developed to deploy the application in some distributed computing model such as MPI, Swift, some other grid workflow system, or with custom wrapper scripts that submit jobs to a scheduler such as PBS.

Swift [1] is a programming language and runtime designed to ease the software development methodology described above. Swift has a well-defined concept of wrapper scripts, the ability to coordinate calls to tools through its programming model, and built-in support for many schedulers and data movement protocols. The latest implementation, Swift/T [2], generates an MPI program from the Swift script and provides tools to run that program on various scheduled resources. This approach has allowed Swift/T to scale the execution of scripted applications to hundreds of thousands of cores [3].

The Swift/T framework supports direct calls to native code through library loading and access. As described above, however, modern scientific applications are built not only with native code, but also with scripts and scripting interfaces to core libraries. Thus, to ease the coordination of calls to tools in the Swift programming model, we wish to support direct calls to script code without calling external programs or forcing the user to master complex linking techniques.

In this work, we report on new features in Swift that support direct calls to Python, R, and Tcl. These features, which could easily be extended to other scripting languages, allow Swift scripts to orchestrate distributed execution of code written in a wide variety of languages, currently including C, C++, Fortran, Python, R, Tcl, and the shell. Indeed, any external program may be called through the shell-based technique.

The method presented here is a more approachable software development technique for distributed-memory computing than are traditional techniques. Using MPI, the developer could write MPI code in C and call to an application component script (say, in Python). In this technique, the user would have to manage the call to the script, possibly using an internal API specific to that language. Application data would have to be marshaled to and from the component and among processes in a laborious manner. The developer would have to build significant infrastructure to manage load balancing and other distributed computing challenges. Alternatively, the developer could try a scripting language-specific MPI implementation, which might ease some but not all of the described challenges. Additionally, that approach would limit the number of languages that could be used; it is unlikely that communicating among MPI processes in multiple languages would work as desired.

In our method, the developer starts with a Swift script that describes the calls to application components in a convenient syntax. Swift data is passed among language-specific components implicitly as the script progresses; no user data marshaling is required. (MPI messaging is used internally by the Swift/T runtime.) Multiple components written in different languages may be brought together. Progress and load balancing are managed by the Swift runtime. Overall, the approach provides a coherent programming model, allows for compatibility among multiple languages, provides high scalability, and is compatible with advanced architectures such as the Cray XE6 and Blue Gene/Q.

The rest of this paper is as follows. In Section II we describe the architecture of Swift/T, and in Section III we describe the interlanguage features that are the focus of this paper. In Section IV we offer concluding remarks and a glimpse of future work.

Ii Architecture

We next provide some background on the Swift language, describe the Swift/T architecture, and discuss how Swift/T calls application components.

Ii-a Swift language

Swift is a scripting language with C-like syntax, with pervasive, automatic concurrency built into the language. Concurrency is achieved through dataflow processing, in which progress depends on the availability of input data, not statement ordering. For example, in the code fragment

1 int x;
2 x = f(3);
3 int y1 = g(x,1);
4 int y2 = g(x,2);

the declaration int x; creates a future x. Subsequent function calls to g() block until a value is stored in x. When f() completes, both calls to g() are eligible to run concurrently on different processors.

Massive concurrency can be achieved in Swift with relatively little code. For example, in the code fragment

1 foreach i in [0:9] {
2   int t = f(i);
3   if (g(t) == 0) { printf("g(%i)==0", t); }
4 }

the foreach loop executes each loop body for a unique value of i from 0…9 concurrently. Each execution of f() may be run concurrently, but each g(t) is blocked on the corresponding f(t). The code implies the dataflow dependencies shown in Fig. 1, where several parallel pipelines of tasks are present. Swift will construct and execute these pipelines in parallel on any available resources.

Fig. 1: Diagram of implicit dataflow of Swift loop.

In the Swift model, bulk user computation is performed in leaf tasks: user code outside of Swift, such as libraries or external programs. These are load-balanced between available processors by dispatching tasks on demand. If f() and g() are compute-intensive functions with varying runtimes, the asynchronous, load-balanced Swift model is an excellent fit.

Ii-B Swift/T runtime

Swift/T [2] is a reimplementation of the Swift/K [1] framework for high-performance computing.

Swift/K excels at distributed, grid, and cloud computing, and offers wide-ranging support for schedulers (PBS, LSF, SLURM, SGE, Condor, Cobalt, SSH) and data transfer, fault tolerance, and other features useful for that environment. K indicates that the language is implemented atop the Karajan workflow engine.

Swift/T is designed for high-performance computing at the largest scale. T indicates that the key features are implemented by the Turbine dataflow engine [4]. In this implementation of Swift, the Swift script is translated into a runtime framework based on the MPI-based Asynchronous Dynamic Load Balancer (ADLB) [5] and Turbine libraries, which evaluate Swift semantics in a distributed manner (no bottleneck). Thus, at run time, Swift/T programs are MPI programs.

The Swift/T architecture is diagrammed in Fig. 2. Each MPI process operates as an engine, ADLB server, or worker. Engines carry out Swift logic, creating leaf tasks for execution. ADLB servers, shown as an opaque subsystem, distribute tasks to workers, which execute user work (such as f() and g() in our example above). Typically the vast majority of processes (99%+) are designated as workers. The engine and server processes are called control processes and collectively orchestrate script execution.

Fig. 2: Swift/T runtime architecture.

Iii Swift interfaces to various languages

Swift/T has multiple new methods not reported previously for calling to user code . In this section, we consider these in detail.

Iii-a Calling Tcl

The Swift/T compiler (STC) translates user Swift code to a representation (Turbine code) that uses the Turbine, ADLB, MPI, and user libraries, all of which are written in C. While STC could generate C code, we desired a compiler target with the following properties: 1) A straightforward way to ship code fragments through ADLB for load balancing and evaluation elsewhere, 2) A textual, easily readable format, and 3) A runtime that did not require the user to run the C compiler in order to avoid complexities on advanced systems. Thus, we chose Tcl to represent Turbine code, and made use of the ease of calling C from Tcl in order to bind the system together.

Since Swift/T runs on Tcl, calling from Swift to Tcl is the most advanced interlanguage feature in Swift/T. Consider the Swift code fragment

1 (int o) f(int i, int j)
2 "my_package" "1.0"
3 [ "set <<o>> [ f <<i>> <<j>> ]" ];
4 ...
5 int x = f(2, 3);

In this code, Tcl procedure f is made available to Swift with the given signature. When inputs i and j are available, the Tcl code (line 3) is executed. The Tcl package my_package 1.0 is loaded on the assumption that f will be found in that package. The Swift/T runtime supports user additions to TCLLIBPATH so that arbitrary Tcl code may be attached to a Swift/T run.

Interlanguage operation is supported by 1) inserting dataflow semantics to the interface between Swift/T and Tcl and 2) automatic type conversion. The Tcl code on line 3 cannot execute until inputs i and j are set and transmitted to the worker on which the code will be executed, and storage for output o has been allocated. This code is automatically inserted into the compiler output by STC and is hidden from the user (by default). The programmer provides a template for the Tcl code. Double angle brackets <<>> indicate that a variable should appear in that location. Swift/T variables are automatically converted to the appropriate Tcl types, which are oriented toward string representations.

The ease of interlanguage operation here offers multiple beneficial features to Swift/T development and application users. First, the ease of exposing simple Tcl snippets to Swift allowed for the rapid development of Swift builtins such as printf(), strcat(), etc. Many Tcl features can easily be brought into Swift this way. Second, Swift users often express a desire to mix dataflow programming with short fragments of imperative code. This is easily done by extending the Tcl fragment on line 3 to a multiline script snippet, using the Swift multiline string syntax. Certain arithmetical or string expressions may be easier to perform in Tcl than in Swift, especially for experienced Tcl or shell programmers. Third, existing components built in Tcl can easily be brought into Swift by using Swift support for Tcl packages. Fourth, the strength of Tcl support for calling native code is easily brought into Swift as well, as described in the following subsection.

Iii-B Calling native code

A primary goal of Swift/T is to speed the development process for scaling existing codes in compiled languages (C, C++, Fortran) to high-performance systems. Thus, good support for calling these languages is paramount. Tcl provides good support for calling native code, and good tools such as SWIG are available. This approach has demonstrated the ability to successfully call native code in many applications, including applications that may be expressed as MPI libraries [6].

In order to call into an existing native code program from Swift, the following steps must be followed. First, the user identifies the key functions to be called. Simple types (numbers, strings) must be used to ensure compatibility with Swift. Second, the program is compiled as a loadable library - any use of main() must be removed through conditional compilation. Third, the library headers are processed by SWIG to generate Tcl bindings for the C/C++ functions; in the case of Fortran, a C++-formatted header is first created with FortWrap [7], then processed by SWIG. Fourth, the user writes Swift bindings for the generated Tcl bindings as described in the previous subsection. Fifth, a Tcl package is constructed containing the native code library and any additional Tcl scripts that the user desires to include. Figure 3 illustrates the process of binding a C code with Tcl using SWIG. The functions in object afunc.o become callable from within Swift/T code.

Fig. 3: SWIG providing Tcl bindings for C functions callable from Swift/T.

The interlanguage complications here are more challenging than that in the Tcl case because more language considerations must be taken into account. Our approach has been to delegate complexities and conventions to SWIG, since it is a general-purpose tool (i.e., learning SWIG has broader utility than learning a Swift-specific tool). Thus, type conversion conventions are delegated to SWIG conventions.

In addition to simple types, scientific users of native code languages often desire to operate on bulk data in arrays. The Swift approach to these is to handle pointers to byte arrays as a novel type: blob (binary large object). The Swift/T runtime handles blobs in a similar manner to strings, but with appropriate handling for binary data. This approach allows users to write dataflow scripts that operate on C-formatted strings and arrays, contiguous binary data structures, and even multidimensional Fortran arrays.

SWIG supports operations functions that consume and produce pointers as represented by Tcl variables. Thus, Swift/T provides a small library called blobutils to handle transmission of the Swift/T blob type to raw pointers compatible with SWIG. Type conversion routines are provided to handle many common cases. For example, SWIG will not automatically convert void to doubleblobutils provides tools to handle the simple but myriad interlanguage complexities found when operating on binary data.

Iii-C Calling Python or R

As described above, many modern scientific applications have key components or interfaces built in Python, R, or other high-level languages. Previous workflow programming systems call external languages by executing the external interpreter executables. This strategy is undesirable for Swift/T, however, because at large scale the filesystem overheads are unacceptable. Additionally, on specialized supercomputers such as the Blue Gene/Q, launching external programs is not possible at all.

Our approach, based in Swift/T, treats the external interpreters for Python and R as native code libraries. Thus, the complexity of calling them is reduced to the complexity of calling a C library from Swift/T, which was addressed in the previous section. First, a Tcl extension for each language was constructed. (These could conceivably be reused by non-Swift developers who simply desire to call Python or R from Tcl.) Then, a Swift/T leaf function was written that evaluates fragments of code. Users interact only with the high-level Swift/T leaf function, greatly reducing complexity.

In the Swift model, each task is started without state; only the well-defined Swift inputs are available. When calling into an external interpreter, however, old state from the previous task could be available and cause confusion or debugging issues (this is not a security issue, since all of this state is inside the Swift/T MPI run). One approach is to finalize the interpreter at the end of each task and reinitialize it when the next task is started, thus clearing any state. This approach raises concerns about performance and possible resource leaks. Thus, we provide options to either retain the interpreter or reinitialize it. (Old interpreter state can also be used to store useful data if the programmer is careful.)

Iv Conclusion

Modern scientific application development is trending toward greater software complexity and more demanding performance requirements. These applications blend structured and unstructured computing patterns, features for distributed and parallel computing, and the use of specialized libraries for everything from numerics to I/O. For continued progress in scientific computing, tools must be developed and adopted that enable rapid prototyping and development of complex, large scale applications.

In this work, we provided a broad overview of relevant scientific computing applications that combine computing patterns and use multiple languages. We described the Swift/T system for high-performance computing, highlighted its new features to support scripting languages such as Python and R, and showed how these can be combined to solve numerical problems. We also described the rich shell interface retained and extended from Swift/K. We described our use of embedded script interpreters, making interlanguage programming relatively easy while remaining compatible with systems having restricted OS functionality such as the IBM Blue Gene/Q. Additionally, we showed how the many small file problem common in scripted solutions can be addressed with our static packages.

In future work, we intend to improve support for external languages by improving support for automatically translating more complex data types. Future applications are sure to challenge the current performance envelope, and we will improve and apply our techniques to solve bigger problems with more advanced tools on the largest scale machines.

Acknowledgments

This material was based upon work supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. Work by Katz was supported by the National Science Foundation while working at the Foundation. Any opinion, finding, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References

  • [1] M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster, “Swift: A language for distributed parallel scripting,” Parallel Computing, vol. 37, pp. 633–652, 2011.
  • [2] J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. Lusk, and I. T. Foster, “Swift/T: Scalable data flow programming for many-task applications,” in Proc. CCGrid, 2013.
  • [3] T. G. Armstrong, J. M. Wozniak, M. Wilde, and I. T. Foster, “Compiler techniques for massively scalable implicit task parallelism,” in Proc. SC, 2014.
  • [4] J. M. Wozniak, T. G. Armstrong, K. Maheshwari, E. L. Lusk, D. S. Katz, M. Wilde, and I. T. Foster, “Turbine: A distributed-memory dataflow engine for high performance many-task applications,” Fundamenta Informaticae, vol. 28, no. 3, 2013.
  • [5] E. L. Lusk, S. C. Pieper, and R. M. Butler, “More scalability, less pain: A simple programming model and its implementation for extreme computing,” SciDAC Review, vol. 17, pp. 30–37, January 2010.
  • [6] J. M. Wozniak, T. Peterka, T. G. Armstrong, J. Dinan, E. Lusk, M. Wilde, and I. Foster, “Dataflow coordination of data-parallel tasks via MPI 3.0,” in Proc. EuroMPI, 2013.
  • [7] J. McFarland, “FortWrap web site,” http://fortwrap.sourceforge.net.