Continuous evaluation of the performance of cloud infrastructure for scientific applications

12/13/2018 ∙ by Mohammad Mohammadi, et al. ∙ 0

Cloud computing recently developed into a viable alternative to on-premises systems for executing high-performance computing (HPC) applications. With the emergence of new vendors and hardware options, there is now a growing need to continuously evaluate the performance of the infrastructure with respect to the most commonly-used simulation workflows. We present an online ecosystem and the corresponding tools aimed at providing a collaborative and repeatable way to assess the performance of the underlying hardware for multiple real-world application-specific benchmark cases. The ecosystem allows for the benchmark results to be stored and shared online in a centrally accessible database in order to facilitate their comparison, traceability, and curation. We include the current up-to-date example results for multiple cloud vendors and explain how to contribute new results and benchmark cases.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cloud computing now represents a viable alternative to on-premises hardware for high-performance scientific applications. Although the feasibility and advantages of cloud computing for high-performance computing (HPC) workloads were debated for over a decade [1, 2, 3, 4], the recent advancements in the field make it into a competitive and cost-effective solution to run compute-intensive parallel workloads for a large variety of models and application areas [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. In our previous studies [18, 19, 20, 21, 22] we demonstrated that HPC in the cloud is ready for a widespread adoption and can provide a viable and cost-efficient alternative to capital-intensive on-premises hardware deployments for large-scale high-throughput and distributed memory calculations.

With the ever-increasing adoption of cloud computing, the emergence of new vendors and computing hardware, there is a growing need for continuous evaluation of the infrastructure performance for the real-world workloads. However, the lack of intuitive and collaborative tools have made such an assessment challenging. There exist multiple previous works in the field either, however they usually focus on generic computing and leave out real-world application use cases and lack a collaborative and systematic ecosystem to share and manage the results [23, 24, 25, 26, 27, 28].

We hereby present the concept and the associated software tools for an online benchmarking ecosystem able to continuously and collaboratively assess the performance of computing hardware for real-world scientific applications. The aim of the ecosystem is three-fold: to assist community in choosing the best hardware for HPC applications, to allow cloud vendors to identify the bottlenecks and improve their services, and help HPC application developers to identify and address implementation-related challenges accordingly.

The ecosystem includes ExaBench suite [29], an open-source modular and extensible software tool able to facilitate the performance assessment of computing systems and a centrally accessible collaborative online repository to store and manage the results. In the following, we outline the ecosystem and its operations, provide example results, and explain how to contribute to its further development.

Figure 1: Schematic representation of the online ecosystem presented in this manuscript. The three main components - Open-source codebase, Database of results and their online visual representation, are outlined in the middle. ”EB” denotes the ExaBench tool. ”Cluster” refers to on-premises computing clusters. ”Cloud” denotes the public/private cloud systems. Two types of contributors - to codebase (in orange) and to results (green) are shown. Codebase contributors help extend the test cases. Results contributors run ExaBench tool and publish the results to the centrally accessible database. The results are available to the wider community.

2 Ecosystem

We view the ecosystem as an online platform allowing multiple people to collaboratively evaluate the performance of computing hardware for compute-intensive applications.

2.1 Components

We identify the following components:

  • ExaBench, an open-source modular software tool to facilitate the performance assessment of computing systems. The tool supports multiple benchmark cases to evaluate the performance of scientific applications.

  • Results database, a centrally accessible repository to store the results in order to facilitate their comparison, traceability and curation.

  • Results page, an online resource presenting the results in a visual manner.

  • Sites, or physical location with unique identifiers where the benchmarks are executed.

  • Contributors, who use the ExaBench tool to submit the benchmark results to the central database and/or contribute to the ExaBench source code by adding support for new benchmark cases and metrics,.

  • Community, the broader set of users and interested parties.

Schematic representation of is available in Fig. 1.

2.2 Operations

When considering the functions of the ecosystem, we envision the following process. Sites’ administrators install ExaBench tool from the source code available online and developed and maintained continuously by the codebase contributors (including the authors of this manuscript, originally). Benchmarks are executed on the underlying hardware and their results are stored automatically in the database in a certain format. In order to evaluate the performance the benchmark cases are executed with entirely equivalent setups for the number of nodes and processors per node configurable in the tool. The execution time is used to evaluate the performance [30]. The cases are designed to be compact so that they can be executed within a reasonable timeframe on sites with different hardware configurations. The results are stored in an centrally accessible database for the community to analyze the efficiency of the computing systems.

3 ExaBench

Exabyte benchmarks suite (ExaBench) is an open-source modular and extensible software tool written in Python aimed to help the scientific and engineering communities to asses the performance of cloud- and on-premises systems. The suite consists of three main components, benchmarks, metrics, and results, outlined in the following sections.

3.1 Benchmarks

”Benchmarks” package implements benchmark cases for real-world scientific applications. Each application is introduced as a sub-package containing multiple benchmark cases, templates, and input files to cover different application execution patterns. The cases are implemented in an object-oriented modular way representing a class with methods to prepare and execute the simulation(s) and extract its results. The following summarizes the currently supported benchmark types:

  • High-Performance Linpack (HPL)

  • Vienna Ab-initio Simulation Package (VASP [31])

  • GRoningen MAchine for Chemical Simulations (GROMACS [32])

3.2 Metrics

”Metrics” package implements the metrics used to analyze the results. The list of currently supported metrics is given below. The term ”performance gain” is referred to the ratio of performance in giga-FLOPS (GFLOPS) for a given number of nodes to the performance for a single node.

  • Speedup ratio

    : The ratio of the Performance Gain defined above for a given number of nodes to the ideal speedup. This metric is used for HPL benchmark cases to estimate the extent to which HPL calculations can be efficiently scaled.

  • Speedup: The inverse of total runtime in seconds for a benchmark case. The metric is used to understand how quickly application-specific benchmark cases (e.g. VASP) can be executed.

  • Performance per core: The ratio of performance in giga-FLOPS (GFLOPS) to the total number of cores used by the benchmark case.

3.3 Results

”Results” package implements the necessary handlers to store the results. When the benchmarks are executed, their results are stored and shared online in a centrally accessible and collaborative repository [33]. Readers should note that the data stored there is preliminary/raw and so might not be accurate. Nevertheless, it allows to automate the process and minimizes human error. We automatically generate the charts for the metric explained above to compare the sites. Each point in the graphs is the average of existing results for the specific site and configuration as the benchmarks may be executed multiple times on a site.

4 Example Results

Below we present some example results of benchmarks performed using ”ExaBench” tool and available online as part of the ecosystem. For more details the readers are referred to the full explanation available in [34].

4.1 High-performance Linpack

We present a comparison of the speedup ratios for High-performance Linpack benchmark In Fig. 2. We consider 4 cloud computing vendors: Amazon Web Services (AWS) with c4 (default and c5 instance types, Microsoft Azure (AZ), Oracle Cloud (OL), and Google Compute Engine (GCE). The speedup ratio metric is obtained from the results of HPL benchmark running on 1,2,4, and 8 nodes with hyper-threading disabled. As it can be seen, Oracle and Microsoft Azure exhibit better scaling because of the low-latency interconnect network.

Figure 2: Speedup Ratio vs Number of Nodes. Speedup ratio for 1,2,4 and 8 nodes are investigated and given by the points. Lines are drawn to guide the eye. The legend is as follows: AWS-NHT - Amazon Web Services with hyper-threading disabled; AWS-NHT-C5 - same as AWS-NHT with C5 instances; AZ-IB-H - Microsoft Azure Infiniband-interconnected H16r VMs; OL-NHT - Oracle Cloud BM.HPC2.36 instances with hyper-threading disabled; GCE-NHT-H - Google Compute Engine n1-highcpu-64 machines with hyper-threading disabled, Haswell platform.

4.2 Vienna ab-initio simulation package

A comparison of the speedups for Vienna ab-initio simulation package (VASP) is given in Fig. 3. We show results for the test case involving the parallelization over the electronic bands for a large-unit-cell material (refered to as ”VASP-ELB”). We use version 5.3.5 with the corresponding set of atomic pseudopotentials. The goal of this benchmark is to estimate the extend to which a VASP calculation can be efficiently scaled in a distributed memory execution scenario. As it can be seen, the vendors with higher CPU clockspeed and low-latency interconnect network perform better.

Figure 3: Speedup vs Number of Nodes for Vienna ab-initio simulations package, parallelization over electronic bands. Speedup for 1,2,4 and 8 nodes are investigated and given by the points. Lines are drawn to guide the eye. The legend is as follows: AWS-NHT - Amazon Web Services with hyper-threading disabled; AZ-IB-H - Microsoft Azure Infiniband-interconnected H16r VMs; OL-NHT - Oracle Cloud BM.HPC2.36 instances with hyper-threading disabled; GCE-NHT-H - Google Compute Engine n1-highcpu-64 machines with hyper-threading disabled, Haswell platform.

5 Contribution

We embrace the open-source online character of the ecosystem presented here and encourage collaborative contributions. The ecosystem can be further extended in two ways, by contributing to the results or extending the codebase.

5.1 Contributing to the results

In order to contribute to the results one should configure, and execute the benchmarks and send the results to the central database, all with the help of ExaBench tool. Readers are referred to the comprehensive explanation about the installation, configuration, and operation of the tool available online inside the corresponding GitHub repository [29].

5.2 Extending ExaBench

In order to extend the source code with new cases and metrics, it is recommended to ”fork” the repository and introduce the adjustments there. The changes in the fork can further be considered for merging into the repository as it is commonly used on GitHub. This process is explained in more details elsewhere online [35] and inside the ExaBench repository itself [29].

6 Perspectives and Outlook

High-performance and parallel computing today is more important than ever due to the end of Moore’s law in conventional semiconductor technology scaling. HPC is no longer a domain of highly specialized applications only. The latter still exist and are needed, but gradually become a minority. For that reason and in order to facilitate the timely and objective insights, the importance of a continuous collaborative performance assessment is strong today and will grow further in the future. Following the limited set of applications we incorporated into the ecosystem today, many more use cases in computational fluid dynamics, electronic design automation, drug discovery, computational chemistry, etc. can be introduced by extending the source code and contributing the results.

We envision that the ecosystem will help the community to choose the optimal setup for running resource-intensive workloads, and let cloud vendors to improve their services in a competitive and transparent environment. We see how such an environment can lead to further democratization of HPC and its proliferation in the industrial research and development, which in turn will accelerate progress in the corresponding industries.

7 Conclusion

In this manuscript we present an online ecosystem to study the efficiency and suitability of computing hardware for a variety of real-world high-performance computing applications. The ecosystem provides a collaborative, continuous and transparent evaluation of the cloud as well as on-premises infrastructures by automating the benchmarking process and storing the results in a centrally accessible repository available to the community. We showcase example results, and explain how to contribute the assessment of performance metrics for new sites, and how to extend the package further by adding new benchmark cases and metrics.


  • [1] J. Napper and P. Bientinesi, “Can cloud computing reach the top500?” Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop. ACM, vol. 2008, pp. 17––20, 2009.
  • [2] A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, “Performance analysis of cloud computing services for many-tasks scientific computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 6, pp. 931–945, 2011.
  • [3] K. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. Wasserman, and N. Wright, “Performance analysis of high performance computing applications on the amazon web services cloud,” Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom 2010), pp. 159–168, 2010. [Online]. Available:
  • [4] K. e. a. Yelick, “The magellan report on cloud computing for science,” U.S. Department of Energy Office of Science, 2011. [Online]. Available:
  • [5] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, “The cost of doing science on the cloud: the montage example,” Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, pp. 1–12, 2008.
  • [6] C. Evangelinos and C. Hill, “Cloud computing for parallel scientific hpc applications: Feasibility of running coupled atmosphere-ocean climate models on amazons ec2,” ratio, vol. 2, no. 2.40, pp. 2–34, 2008.
  • [7] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, 2007.
  • [8] K. Keahey, R. Figueiredo, J. Fortes, T. Freeman, and M. Tsugawa, “Science clouds: Early experiences in cloud computing for scientific applications,” Cloud Computing and Applications, vol. 2008, 2008.
  • [9] K. Keahey, “Cloud computing for science,” Proceedings of the 21st International Conference on Scientific and Statistical Database Management. Springer-Verlag, vol. 2008, p. 478, 2009.
  • [10] J. Li, D. Agarwal, M. Humphrey, C. van Ingen, K. Jackson, and Y. Ryu, “escience in the cloud: A modis satellite data reprojection and reduction pipeline in the windows azure platform,” Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium, 2010.
  • [11] L. Ramakrishnan, K. R. Jackson, S. Canon, S. Cholia, and J. Shalf, “Defining future platform requirements for e-science clouds,” Proceedings of the ACM Symposium on Cloud Computing, 2010.
  • [12] J. Rehr, F. Vila, J. Gardner, L. Svec, and M. Prange, “Scientific computing in the cloud,” Computing in Science and Engineering, vol. 99, 2010.
  • [13] S. P. Ahuja and S. Mani, “The state of high performance computing in the cloud,” Journal of Emerging Trends in Computing and Information Sciences, vol. 3, no. 2, 2012.
  • [14] I. Sadooghi, J. Hernandez Martin, T. Li, K. Brandstatter, Y. Zhao, K. Maheshwari, T. Pais Pitta Lacerda Ruivo, S. Timm, G. Garzoglio, and I. Raicu, “Understanding the performance and potential of cloud computing for scientific applications,” IEEE Transactions on Cloud Computing, vol. 5, pp. 1–1, 01 2015.
  • [15] M. Connor, H. M. Deeks, E. Dawn, O. Metatla, A. Roudaut, M. Sutton, L. M. Thomas, B. R. Glowacki, R. Sage, P. Tew, M. Wonnacott, P. Bates, A. J. Mulholland, and D. R. Glowacki, “Sampling molecular conformations and dynamics in a multiuser virtual reality framework,” Science Advances, vol. 4, no. 6, 2018. [Online]. Available:
  • [16] J. L. Hellerstein, K. J. Kohlhoff, and D. E. Konerding, “Science in the cloud: Accelerating discovery in the 21st century,” IEEE Internet Computing, vol. 16, no. 4, pp. 64–68, July 2012.
  • [17] K. Kohlhoff, D. Shukla, M. Lawrenz, G. R Bowman, D. Konerding, D. Belov, R. Altman, and V. Pande, “Corrigendum: Cloud-based simulations on google exacycle reveal ligand modulation of gpcr activation pathways,” Nature chemistry, vol. 6, pp. 15–21, 02 2014.
  • [18] T. Bazhirov, M. Mohammadi, K. Ding, and S. Barabash, “Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing,” Proceedings of the American Physical Society March Meeting 2017, 2017. [Online]. Available:
  • [19] P. Das, M. Mohammadi, and T. Bazhirov, “Accessible computational materials design with high fidelity and high throughput,”, 2018.
  • [20] P. Das and T. Bazhirov, “Electronic properties of binary compounds with high throughput and high fidelity,”, 2018.
  • [21] M. Mohammadi and T. Bazhirov, “Comparative benchmarking of cloud computing vendors with high performance linpack,” in Proceedings of the 2Nd International Conference on High Performance Compilation, Computing and Communications, ser. HP3C.   New York, NY, USA: ACM, 2018, pp. 1–5. [Online]. Available:
  • [22] T. Bazhirov, “Fast and accessible first-principles calculations of vibrational properties of materials,”, 2018.
  • [23] H. Kasture and D. Sanchez, “Tailbench: a benchmark suite and evaluation methodology for latency-critical applications,” in 2016 IEEE International Symposium on Workload Characterization (IISWC), Sept 2016, pp. 1–10.
  • [24] K. Hwang, X. Bai, Y. Shi, M. Li, W. Chen, and Y. Wu, “Cloud performance modeling with benchmark evaluation of elastic scaling strategies,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 1, pp. 130–143, Jan 2016.
  • [25] J. Scheuner and P. Leitner, “A cloud benchmark suite combining micro and applications benchmarks,” in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, ser. ICPE ’18.   New York, NY, USA: ACM, 2018, pp. 161–166. [Online]. Available:
  • [26] N. R. Herbst, S. Kounev, A. Weber, and H. Groenda, “Bungee: An elasticity benchmark for self-adaptive iaas cloud environments,” in Proceedings of the 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, ser. SEAMS ’15.   Piscataway, NJ, USA: IEEE Press, 2015, pp. 46–56. [Online]. Available:
  • [27] “Cloudsuite, a benchmark suite for cloud services.” [Online]. Available:
  • [28] “Spec benchmarks.” [Online]. Available:
  • [29] Exabyte Benchmarks, GitHub repository. [Online]. Available:
  • [30] D. A. P. J. L. Hennessy, “Computer architecture: A quantitative approach,” Morgan Kaufmann, 2003.
  • [31] “Vienna ab-initio simulation package, the official website.” [Online]. Available:
  • [32] “Groningen machine for chemical simulations (gromacs), the official website.” [Online]. Available:
  • [33] “Exabyte benchmarks suite results, google spreadsheet.” [Online]. Available:
  • [34] “Cloud-based materials modeling benchmarks.” [Online]. Available:
  • [35] GitHub Standard Fork and Pull Request Workflow. [Online]. Available: