High software quality is one of the most important goals of software development. Software testing serves as the most widely used approach to ensure the quality of software meet expectation. A good way to test software is to include automated tests in the build process. With the rise of Extreme Programming (XP) and Test Driven Development (TDD), self-testing processes for code development have become popular and are widely adopted by many software development projects. As software becomes increasingly structurally complicated, the number of developers involved in the development process increases. As each developer makes progress, they commit their work periodically (every several hours or days) to the central code repository (e.g., git, SVN). Not only does each developer’s work require testing, the integration of work between developers also requires testing. So, Continuous Integration (CI)  is widely adopted in many software development projects. A CI server is used dedicatedly for testing. Each time a developer makes a commit of her work to the central code repository, the CI server automatically make a clone of the project and conduct pre-designed tests, so that it can constantly monitor the quality of the software in terms of correctness and report potential problems in a timely fashion, helping developers make bug fixes more efficiently.
When it comes to HPC applications, performance and scalability are the other two important factors of software quality besides correctness, since the application are usually designed to deliver high performance on given platforms. Also applications that aim to solve complex time-consuming problems are expected to obtain good speedup when deployed on multi-node clusters, many-core architectures, or large-scale supercomputers. The scalability of HPC application is usually interpreted as how much speed up can be obtained given more computing resources. Better scalability means that the HPC application can use the underlying computing resources more efficiency and constantly deliver good performance on a various amount of computing resources.
During the HPC application development, as developers make progress, and they commit their work to the central code repository, the scalability of the application can change. For instance, it can be caused by changes in algorithm design, tunable parameters, and different hardware architectures of target production systems. For example, Fig. 1 shows the performance of Legion , a data-centric parallel programming system, changes with different source code commits. The performance is obtained by running a benchmark software, PENNANT, on the Legion system. As we can see the execution time can significantly change as developers make progress. Receiving performance or scalability results like this in a timely manner can greatly help developers make better decisions about their code design and deliver HPC software with expected quality. However, current designs of CI services are commonly focused on monitoring the software quality in terms of correctness (e.g., detecting software bugs). To the best of our knowledge, none of the current work can easily enable automatic performance or scalability tests in CI since the test environment of CI is usually deployed on a single machine incapable of conducting large-scale scalability test.
In this work, we propose a performance and scalability test system for CI – BeeSwarm. BeeSwarm can be used as a plug-in for any current CI service. It takes the widely used Docker container as input, and the performance and scalability test can run on both HPC cluster environments and cloud computing environments. Just like the original correctness test in CI, the performance and scalability test are also autonomic. It only requires users to make simple specifications about the test environment they want to use and the test specification they need. Every time developers commit a change to the central code repository, they can choose to schedule a scalability test after the success of original correctness test. The performance and scalability test results will be automatically pushed back to the central code repository. Although we deploy BeeSwarm on Travis CI and GitLab CI in this work, it can also be deployed on any other CI test environment. To deploy on another CI platform, only minimum modifications to the BeeSwarm configuration scripts are necessary, which makes BeeSwarm highly portable across CI platforms. In addition, although we only show the use of Chameleon cloud, the scalability test can also be executed on any other BEE-supported platform (HPC clusters, AWS, OpenStack, etc). This gives developers the flexibility to choose the platform they want their applications to run on.
The rest of this paper is organized as follows. We motivate our work in section II. In section III, we give necessary background that can help readers understand this work. We provide design details of BeeSwarm in section IV followed by experimental evaluation in section V. Section VI discuss recent work that related to ours. Finally, section VII concludes our work.
|Commit HASH||Commit message|
|725e549dc||legion: fixing a potential hang with old-style bounds checking|
|3edff3290||regent: small bug fix to openmp code generation for regent|
|d0b157755||tools: small bug fix for legion prof ascii deserializer|
|1162649ea||legion: small bug fix for dependence analysis of close operations involving different children in different modes for the same field|
|2818b5fe9||legion: small bug fix for remote advances of version numbers|
|824d6c77d||legion: fixing a bug where we were not properly paging in version states for remote virtual mappings|
In this section, we use an example to motivate our work by showing the necessity of having automatic scalability test in CI. In Fig. 1 we show the performance of Legion changes as developers make progress. However, it is hard to find out the exactly which commit(s) causes the performance change. For example, the performance of Legion improved significantly from commit 1e96 to 4400. Commit 4400 is a merge operation between two branches, which totally contains about 61300 lines of code changes composing hundreds of commits. It is hard to tell which commit(s) causes the performance improvement. By searching the commit tree of Legion, we found several commits focusing on bug fixing that may potentially affect performance. We list several of them in table I. So, if scalability test was available in the CI for Legion upgrade, we would be able to easily find the root cause of the performance change by searching in the scalability test results for each commit and keep track of the changes that benefit or hurt the scalability.
Iii-a Build and Execution Environments (BEE)
BEE [6, 7, 5] is a containerization environment that enables HPC applications to run on both HPC and cloud computing platforms. BEE provides a unified user interface for automatic job launching and monitoring. BEE users only need to wrap their applications in a standard Docker image and provide a simple BeeFile (job execution environment description) to run on BEE. Since the same Docker image is used across platforms, no source code modification is necessary. In this work, we build BeeSwarm based on BEE, so it naturally inherits all benefits of BEE. This allows us to build a unified scalability test system across multiple platforms.
Iii-B Continuous Integration (CI)
CI was first named and proposed by Grady Booch in 1991. Its aim was to greatly reduce integration problems. CI was initially combined with automated unit testing to run on the developer’s local machine before committing to the central code repository. However, as software being developed becomes more complicated and more people are involved in developing, localized testing becomes inefficient and the code base on each developer’s machine can easily become outdated, so integration can still be problematic. The longer a branch of code remains checked out, the greater the risk of multiple integration conflicts and failures when the developer branch is reintegrated into the main line. So, centralized build servers are used for CI. The build servers can perform more frequent (e.g., every commit) test runs and provide reports back to the developers. Driven by these benefits many HPC application development projects are now using CI. For example, almost all projects in Next-Generation Code Project in Los Alamos National Laboratory are using CI . Currently, many CI tools are available to developers such as Travis CI, GitLab CI, Circle CI, Codeship, etc. Many computing platforms also provide CI as a feature in their services such as AWS, Azure, etc. However, current designs of CI services only focus on detecting software bugs in the HPC softwares. To the best of our knowledge, none of the current work can easily enable automatic scalability tests in CI. So, in this work we propose to enable easy scalability tests for HPC developers.
In order to fulfill the goals of BeeSwarm the software architecture required would require that we both leverage industry standards, while at the same time implementing new functionality to BEE. BeeSwarm is a general solution that can be deployed on any git repository, any CI service and any BEE-supported computing platform. For the purposes of example the software platforms Travis CI and GitLab CI are used as two CI platform, and Chameleon Cloud are used a scalability test platform. Fig. 2 shows the architecture of BeeSwarm. BEE is at the core of the architecture and serves a number of vital roles. As part of the continuous integration process BEE is deployed on the CI test environment, from there it is responsible for managing the workflow associated with creating a scalable test environment, copying required test scripts, initiating the target application, and finally parsing the output. Fig. 3 shows the workflow of Travis CI/GitLab CI with BeeSwarm. Once developers make commits to the central code repository, the original CI correctness test will be triggered. If the test finishes without a fail, BeeSwarm will start to deploy the scalability on BEE-supported computing platform, gather the results and push back to the code repository. It is crucial that we use BeeSwarm to conduct scalability test, since the CI test environment is usually deployed on a single machine incapable for large-scale scalability test. There are four major design tasks in BeeSwarm and we discuss them as follows.
Iv-a Integrate BEE in CI Test Environment
Each time a developer commits to the central code repository, a new CI test job is triggered on the CI test environment. That means in order to launch BEE inside that test environment, we need to install BEE every time before the scalability test. To minimize overhead caused by the installation, we designed a more efficient customized BEE installer for CI environment. Since BEE does not run any test locally in the CI environment, we remove the image building process that was originally in the BEE installer, required when BEE runs jobs in a virtual machine on a system. Also, we design a simplified BEE launcher (discussed in the next subsection), which requires less dependent packages/libraries, simplifying the BEE installer. Finally, to enable remote control of compute platforms through SSH, we add SSH key generation in the new BEE installer. This was not present in the original BEE installer, since it can utilize the current user’s key. With all kinds of optimization, we are able to keep the BEE installation time to less than two minutes, causing only a slight overhead compared with minutes to hours of CI test and scalability test.
Iv-B Customize BEE Launcher for CI Test Environment
BEE was designed to handle multiple tasks simultaneously, so it adopted a server-client structure, in which the server is a centralized controller (i.e., BEE Orchestration Controller) that stores the global information of all running BEE jobs and clients, a series of BEE launchers (each targeting a computing platform). This structure can facilitate normal use, however it can be cumbersome to launch BEE jobs on a CI test environment using the server-client structure (first start the BEE Orchestration Controller in the background and then launch the job using the BEE launcher). Since we only run one BEE job for each CI job, there is no need to use the centralized controller to keep all the information of multiple jobs. So, in this work we design a simplified BEE launcher. It allows Travis to launch the BEE job with just one simple command. Basically, we integrate the input parser and job launching process together in our simplified BEE launcher.
Iv-C Customized beefile
beefile is a simple JSON-format task description file used by BEE as user input. It contains necessary information needed to launch a task using BEE that include Docker images tag, platform-specific settings, and run script for both sequential runs and parallel runs. Here, we extend the run script configuration part for parallel runs. In the original design, uses need to specify each parallel run command one by one including the script to invoke, number of node to use, and number of processes to be used per node. Since users usually only need to run a few parallel run command, this design is clear and simple to use. However, for scalability test, users expect to run their application with a series of configurations (e.g., increasing number of nodes/processes). Fill in each configuration one by one can be cumbersome. So, we extend the beefile to allow easier configuration. Specifically, instead of letting users specify each configuration one by one, we now allow users to specify a range of configurations. For example, a range of nodes and a range of processes per node. In addition, we also users to specify, whether they want to increase the number of nodes or processes linearly with a fixed step size or logarithm with base of two. An example beefile is shown in Listing 1.
Iv-D Test Scalability on BEE-supported Platform
Since CI services usually only allocate one computing node (e.g., virtual machine) for each job, it is impractical to conduct a scalability test beyond one node. So, in this work we choose to use BEE as the computing back-end for the scalability test. BEE supports launching any kind of computing task on a variety of computing platforms, ranging from HPC systems to cloud computing systems (e.g., Amazon EC2, OpenStack). It can launch each job on as many nodes as each computing platform allows. BEE takes a job description file, Beefile, as input, that specifies all job related information including selecting the target platform, name tag of the Docker container for the application, and run scripts that the user specifies to be run when the application is deployed on the target platform. To launch a BEE job for the scalability test, we keep using the same Beefile as the job description. To specify specific test configurations for the scalability test, users only need to add multiple entries to the ”mpirun” section inside the Beefile. Deploying the execution environment on the target system can take several minutes, to avoid setting up the environment repeatedly for each test, the BEE-CI launcher will first scan through the Beefile, and then setup the environment with the maximum number of nodes needed to conduct all tests.
Iv-E Collect and Store Scalability Test Results
Unlike common CI tests that only provide results in the form of “pass” or “no pass” to developers, scalability test reports a variety of information generated from different execution scales to developers. Since the information that developers care about is different from application to application, it is hard to develop a universal monitor strategy to gather information that suits everyone’s needs. So instead, we leave this part to the developers. We let developers program their applications so that after each run the application will output relevant information. BEE will gather all the outputs from different runs as separate files, transferred and saved in the CI test environment. Next, we require developers to provide an output parser that can parse all relevant information from the output files and generate one final result file. Finally, BEE will push the final result file back to the central code repository and rename the file using the git build number to distinguish final result files generated from different commits.
In this section, we conduct experiments to show the performance and scalability of BeeSwarm. We use a Department of Energy (DOE) code, FleCSALE , as an example software development project. FleCSALE is a computer software package developed for studying problems that can be characterized using continuum dynamics, such as fluid flow. It is specifically developed for existing and emerging large distributed memory system architectures. We deploy BeeSwarm on both Travis CI and GitLab CI. For Travis CI, We use the default virtual machine based execution environment to run the original correctness test and BeeSwarm. For GitLab CI, we user the Docker-in-Docker (i.e., dind) runner to run the original correctness test and BeeSwarm. We found that Docker-in-Docker runner enables an more easy-to-configured environment for BeeSwarm compared to other runner types. We use Chameleon Cloud  as the computation back-end for the scalability tests. The Chameleon Cloud is an OpenStack-based cloud computing platform that offer bare-metal access to all computing nodes. It is currently deployed at University of Chicago and the Texas Advanced Computing Center with total 650 multi-core nodes. We conduct our test on the nodes located at University of Chicago.
V-a Modified CI script
In this section, we show a sample modified Travis CI script (similar on GitLab CI) for FleCSALE that has BeeSwarm scalability test enabled (Listing 2). Line 1 - 13 are the original FleCSALE test code on Travis CI. To enable BeeSwarm scalability test, we only need to add less than 10 lines of simple code (line 14 - 23). The original CI script include building a Docker image (line 9), running the Docker image (line 10) to correctness test scrips, and push the image to DockerHub if the test was successful (line 14). We add the BeeSwarm configuration and launching scripts after the image is successfully pushed onto the DockerHub. We obtain and install BeeSwarm in line 14 - 16. We add necessary environment variables (for OpenStack and BeeSwarm) in line 17. The scalability test is launched using a simple command in line 18. We add a 120 minutes timeout here since Travis CI would kill a job if a command runs more than 10 minutes by default and a scalability test usually needs more time than that. The actual timeout length can be set based on need of a specific application. Finally, we run the output parser in line 19 followed by pushing scalability test result to original code repository in line 20 - 23. It can be seen that with minimum modification current CI scripts can easily enable scalability test through BeeSwarm and the scalability test code is highly portable across any kind of CI service platforms.
V-B Required environment variables
|DOCKER_USERNAME||Username for Docker image registry.|
|DOCKER_PASSWORD||Password for Docker image registry.|
|REPO_TOKEN||Access token used for pushing scalability test results back to the code repository.|
|REPO_URL||The URL to the code repository.|
|REPO_BRANCH||The current branch of the code repository.|
|BUILD_NUM||Current build number.|
|OS_USERNAME||Username for accessing OpenStack platform.|
|OS_PASSWORD||Password for accessing OpenStack platform.|
|OS_RESERVATION_ID||Reservation ID used for current scalability test on OpenStack platform.|
Table II lists the variables that are necessary for BeeSwarm in the CI test environment. DOCKER_USERNAME and DOCKER_PASSWORD are used to access (e.g., pull and push) Docker images from the images registry. REPO_TOKEN is used to let BeeSwarm push the scalability test results back to the original code repository. REPO_BRANCH and BUILD_NUM are used to make sure that BeeSwarm will push the scalability test results back to the corresponding branch with build number marked in the commit message. OS_USERNAME and OS_PASSWORD are used to access the OpenStack platforms (e.g., Chameleon cloud) and OS_RESERVATION_ID is used to specify a list of nodes used for scalability test.
V-C Performance of BeeSwarm
In order to evaluate the performance of BeeSwarm, we discuss the overhead of launching BeeSwarm and the scalability of BeeSwarm for large-scaled test.
V-C1 Overhead of BeeSwarm
Fig. 4 and Fig. 5 show the time breakdown of CI for FleCSALE with BeeSwarm scalability test, including the original correctness test on Travis CI and one set of multi-node scalability tests using BeeSwarm. The scalability test involves different execution configurations that range from 1 process to 128 processes. We can see the major overhead of BeeSwarm comes from deploying the scalability test environment. This is mainly caused by long instance launching time on Chameleon cloud. However, since CI tests are usually not on the critical path of applications’ development process, the extra time cost brings negligible impact to developers.
V-C2 Scalability of BeeSwarm
Since BeeSwarm is designed for launching large-scaled parallel applications, the scalability of BeeSwarm itself is also very important. As we mentioned before, the main overhead of BeeSwarm comes from deploying the scalability test environment. Fig. 6 shows the performance of deploying the scalability test environment for BeeSwarm. We test it with an increasing number of processes ranging from 1 to 1024. We run the scalability test on 16 instances on Chameleon cloud. Each instance has 64 cores. From Fig. 6, we can see the time cost is nearly constant (less than 900 seconds) as we increase the number of process. This indicate the scalability of BeeSwarm itself is sufficient for large-scale test.
V-D Scalability Test Showcase
We use FleCSALE to showcase a sample scalability test using BeeSwarm. We configure it to run using 2 to 32 processes on one or two nodes. When using two nodes, we evenly divide the total number of processes among them (each has 1 to 16 processes). The file generated by BeeSwarm is in the comma separated values (CSV) file format, and we plotted the result data in Fig. 7. Even with this simple test using BeeSwarm, we can observe some interesting behavior of FleCSALE. We can see FleCSALE gains better speedup (1.73x - 4.01x) on a single node environment compared to the speedup on two nodes (1.05x - 1.40x) given the same total number of processes. This may suggest that inter-node communication could be a performance bottleneck for FleCSALE running on systems similar to Chameleon.
This result can effectively give developers the scalability data of the application they are developing, so that they can make adjustment to their application in a more timely manner. Not only can the developer observe behavior of different processing schemes, but using BeeSwarm can help aid them to see performance improvement or degradation of their application as they push changes to the application.
Vi Related Work
Scalability is one of the most important metric when we evaluate the quality of HPC applications. Many works have been done to build scalability test tools to facilitate HPC application development. For example,  proposed a a lightweight profiling library for MPI applications, which is only based on statistical information about MPI functions and brings little performance overhead.  proposed a effective scalability testing and analysis system – STAS.  proposed a configurable MPI scalability analysis tool for Blue Gene/L supercomputer.  proposed a performance tool, Vampir, that can be used to detect hot spots in HPC applications. This can efficiently help HPC developers make their applications more scalable.  proposed JACE (Job Auto-creator and Executor), a tool that enables automation of creation and execution of complex performance and scalability regression tests. It can help developers tune an application on a given platform to maximize performance given different optimization flags and tunable variables.  presented a HPC performance and scalability test tool, Hawk-i, that uses cloud computing platforms to test HPC applications in order to reduce the effort to access relative scarce and on-demand high performance resources.  proposed, ParaProf, a portable, extensible, and scalable tool for parallel performance profile analysis. It gathers rich number of hardware counters and traceable information in order to offer much more detailed profiling result similar to state-of-the-art single process profiling tools.  proposed a scalability test tool, PATHA, that uses system logs to extract key performance measures and apply the statistical tools and data mining methods on the performance data to identify bottlenecks or to debug the performance issues in HPC applications. Although these recent work is useful in scalability test for HPC applications, their tools or systems cannot be easily adopted by current HPC application development projects since they either require modification to the HPC application or a complicated installation or configuration process in order to make their tools working properly on a given HPC platform.
In this work, we first discuss the benefit of CI in the software development process. Then, we propose to bring scalability tests to CI so that developers can also get feedback about their applications in terms of scalability in addition to functionality. We design BeeSwarm, as a scalability test system for most CI environments. It is easy to use and can be integrated into any software development workflow. A variety of computing platforms can be used as a computing back-end for scalability tests. Experiments were conducted on Travis CI and GitLab CI with Chameleon Cloud as computing backend. Experimental results show that BeeSwarm offers good performance and scalability on large scale tests.
-  (2012) Legion: expressing locality and independence with logical regions. In Proceedings of the international conference on high performance computing, networking, storage and analysis, pp. 66. Cited by: Fig. 1, §I.
-  (2003) Paraprof: a portable, extensible, and scalable tool for parallel performance profile analysis. In European Conference on Parallel Processing, pp. 17–26. Cited by: §VI.
-  (2013) Custom hot spot analysis of hpc software with the vampir performance tool suite. In Tools for High Performance Computing 2012, pp. 95–114. Cited by: §VI.
-  (2017) Flexible computational science infrastructure (flecsi): overview & application progress. Technical report Los Alamos National Lab.(LANL), Los Alamos, NM (United States). Cited by: §V.
-  (2018) Build and execution environment (bee): an encapsulated environment enabling hpc applications running everywhere. In 2018 IEEE International Conference on Big Data (Big Data), pp. 1737–1746. Cited by: §III-A.
-  (2017)(Website) External Links: Cited by: §III-A, §IV.
-  (2018) BeeFlow: a workflow management system for in situ processing across hpc and cloud systems. In 38th IEEE International Conference on Distributed Computing Systems (ICDCS), Cited by: §III-A.
-  (2006) Stas: a scalability testing and analysis system. In Cluster Computing, 2006 IEEE International Conference on, pp. 1–10. Cited by: §VI.
-  (2006) MPI performance analysis tools on blue gene/l. In SC 2006 Conference, Proceedings of the ACM/IEEE, pp. 16–16. Cited by: §VI.
-  (2016) LANL asc advanced technology development and mitigation: next-generation code project (ngc). Technical report Los Alamos National Lab.(LANL), Los Alamos, NM (United States). Cited by: §III-B.
-  (2015) PENNANT: an unstructured mesh mini-app for advanced architecture research. Concurrency and Computation: Practice and Experience 27 (17), pp. 4555–4572. Cited by: Fig. 1, §I.
-  (2006) Continuous integration. Thought-Works) http://www. thoughtworks. com/Continuous Integration. pdf 122, pp. 14. Cited by: §I.
-  (2015) Next generation clouds, the chameleon cloud testbed, and software defined networking (sdn). In Cloud Computing Research and Innovation (ICCCRI), 2015 International Conference on, pp. 73–79. Cited by: §IV, §V.
-  (2012) Tool for performance tuning and regression analyses of hpc systems and applications. In High Performance Computing (HiPC), 2012 19th International Conference on, pp. 1–6. Cited by: §VI.
-  (2012) Hawk-i hpc cloud benchmark tool. Msc in high performance computing, University of Edinburgh, Edinburgh. Cited by: §VI.
-  (2005) Mpip: lightweight, scalable mpi profiling. Cited by: §VI.
-  (2015) PATHA: performance analysis tool for hpc applications. In Computing and Communications Conference (IPCCC), 2015 IEEE 34th International Performance, pp. 1–8. Cited by: §VI.