Comparative benchmarking of cloud computing vendors with High Performance Linpack

02/09/2017 ∙ by Mohammad Mohammadi, et al. ∙ 0

We present a comparative analysis of the maximum performance achieved by the Linpack benchmark on compute intensive hardware publicly available from multiple cloud providers. We study both performance within a single compute node, and speedup for distributed memory calculations with up to 32 nodes or at least 512 computing cores. We distinguish between hyper-threaded and non-hyper-threaded scenarios and estimate the performance per single computing core. We also compare results with a traditional supercomputing system for reference. Our findings provide a way to rank the cloud providers and demonstrate the viability of the cloud for high performance computing applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

During the last decade cloud computing established itself as a viable alternative to on-premises hardware for mission-critical applications in multiple areas [2, 3, 4]. For high performance computing (HPC) workloads that traditionally required large and cost-intensive hardware procurement, however, the feasibility and advantages of cloud computing are still debated. In particular, it is often questioned whether software applications that require distributed memory can be efficiently run on ”commodity” compute infrastructure publicly available from cloud computing vendors [5].

Several studies reported on the poor applicability of cloud-based environments for scientific computing. Multiple research groups ran both standard benchmark suites such as Linpack and NAS [6, 7, 8], and network performance tests [9]. The cost of solving a system of linear equations was found to increase exponentially with the problem size, illustrating that cloud was not mature enough for such workloads in [10]. A study of the impact of virtualization on network performance reported significant throughput instability and abnormal delay variations [9]. An empirical performance evaluation attempted in [11] found that while cloud computing services are insufficient for scientific applications at large, they may still be a good solution for the scientists who need resources instantly and temporarily. The performance of a set of typical scientific supercomputing workloads on Amazon EC2 was also found to be lower than for traditional HPC systems in [5, 12].

Some prior studies had a positive view on the use of cloud computing. The performance of selected bio-informatics and astronomy software was examined, and cloud was found to provide a feasible, cost-effective model in [13, 14]. In [15] it was found that cloud is capable of supporting responsive on-demand, small sized HPC applications. The evaluation of micro-benchmarks, kernels, and e-Science workloads in [7] found low performance and reliability, however reported on the potential applicability for scientists that need resources immediately and temporarily. The costs and challenges associated with running a diverse set of science applications on the cloud were studied and found to hold promise in [16, 17, 18, 19, 20]. In [21] it was shown that Amazon Elastic Compute Cloud is a feasible platform for applications that do not require advanced network performance. A general review of the field of HPC applications and their state in cloud computing was also conducted in [22].

Recent advancements have made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they can serve as an alternative pathway to enable computational discovery and design of new materials through high-throughput screening. At Exabyte Inc. we have previously demonstrated this with a case study involving high-throughput screening of structural alloys using modeling tools rooted in first-principles quantum mechanical techniques [23]. During an example run we were able to scale to 10,656 computing cores within 7 minutes from the start. This motivated the need for further benchmarking. In order to address the concerns about the specificity of the materials simulation techniques employed during the aforementioned case study we decided to use a more general tool for the purpose of our current analysis.

In this work we benchmark the performance of the publicly available cloud computing hardware with High Performance Linpack [24, 1], the benchmark that during the last two decades was employed to rank the top supercomputing systems [24]

on the global scale. We compare 4 cloud computing vendors, and include results for a traditional supercomputer (number 60 on the top500.org list at the moment of this writing

[25]). Our findings demonstrate that the best-in-class cloud computing options can already deliver similar scaling patterns and match, if not exceed, the performance per core of the more traditional high performance computing systems.

2 Methodology

Benchmarking presented in this article is done through High Performance Linpack (HPL). The program solves a random system of linear equations, represented by a dense matrix, in double precision (64 bits) arithmetic on distributed-memory computers. It does so through a two-dimensional block-cyclic data distribution, and right-looking variant of the LU factorization with row partial pivoting. It is a portable and freely available software package. HPL provides testing and timing means to quantify the accuracy of the obtained solution as well as the time-to-completion. The best performance achievable depends on a variety of factors, and the algorithm is scalable such that its parallel efficiency is kept constant with respect to per processor memory usage. Readers may consult the following references for more information: [1, 26, 24, 27].

Below we present the content of an example input file for the HPL benchmark suite. Ns is matrix size for the underlying system of linear equations, Ps and Qs are the process grid dimensions. These parameters are changed for each reported case based on the number of cores and memory used. In order to achieve the optimal performance, the largest problem size that fits in the memory should be selected. The amount of memory used by HPL is dependent on the size of the coefficient matrix. The logic behind choosing the exact input parameters could be demonstrated by the following line of thought. In case of 4 nodes with 256 Mb of memory each, there is a total of 1 Gb, or 125 million double precision (8 bytes) elements. The square root of this number is 11585. As one has to leave memory for the operating system as well as for other system processes, so a problem size of 10000 would be a good fit. Ps and Qs depend on the physical interconnection network. As a rule of thumb P and Q are taken to be approximately equal, with Q slightly larger than P.

    HPL.out      output file name
    6            device out
    1            # of problems
    456768       Ns
    1            # of NBs
    192          NBs
    1            PMAP process mapping
    1            # of process grids (P x Q)
    32           Ps
    36           Qs
    16.0         threshold
    1            # of panel fact
    1            PFACTs
    1            # of recursive stopping criterion
    4            NBMINs
    1            # of panels in recursion
    2            NDIVs
    1            # of recursive panel facts
    1            RFACTs
    1            # of broadcast
    6            BCASTs
    1            # of lookahead depths
    0            DEPTHs
    0            SWAP
    1            swapping threshold
    1            L1
    1            U
    0            Equilibration

3 Results

We present the cloud server instance types and hardware specification for all studied cases inside Table I. We choose the highest performing servers available in an on-demand fashion. Most of the compute servers have 16 physical cores and all have at least 2GB per or random access memory per core. The network options differ quite a bit, from 54 to 1 gigabit per second in bandwidth. We also provide metrics for the traditional supercomputing system used as reference [25].

Provider Nodes Cores Freq. RAM Net
AWS-* c4.8xlarage 18 2.9 60 10
Azure-AZ Standard_F16s 16 2.4 32 10
Azure-IB-A A9 16 2.6 112 32
Azure-IB-H H16 16 3.2 112 54
SoftLayer Virtual 16 2.0 64 1
Rackspace Compute1-60 16 2.8 60 5
NERSC Edison 24 2.4 64 64
Table I: Hardware specification for the compute nodes used during benchmarking. Core count for physical computing cores and processor frequency, in GHz, are given together with Memory (RAM) size, in gigabytes, and network bandwidth in gigabit-per-second [28, 29, 30, 31].

3.1 Amazon Web Services

For Amazon Web Services (AWS) we study 3 different scenarios: the default hyper-threaded, non-hyper-threaded and non-hyper-threaded mode with placement group option enabled. The c4.8xlarge instance types are used.

3.1.1 Hyper-threaded regime

Table II shows the results for AWS instances with hyper-threading enabled (default regime). It can be seen that the ratio of absolute speedup to the number of nodes rapidly decreases as the node count increases.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 36 0.53 1.63 1.00
2 72 0.98 3.26 1.85
4 144 1.51 6.53 2.87
8 288 2.90 13.05 5.50
16 576 5.23 26.10 9.92
32 1152 8.65 52.20 16.41
Table II: [AWS] Results for Amazon Web Services c4.8xlarge instances with hyperthreading enabled (default scenario). Core count is given for virtual (hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup). It can be seen that the ratio of absolute speedup to the number of nodes falls rapidly as the number of nodes is increased.

3.1.2 Non-hyper-threaded regime

Table III shows the results for AWS with Hyper-Threading disabled. Thus only 18 out of 36 cores were used to run the benchmark, and each core was able to boost into the turbo-frequency [32]. It can be seen that the ratio of absolute speedup to the number of nodes still rapidly degrades with increased node count.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 18 0.64 0.82 1.00
2 36 1.14 1.63 1.77
4 72 1.94 3.26 3.02
8 144 3.51 6.53 5.47
16 288 5.59 13.05 8.71
32 576 10.68 26.10 16.65
Table III: [AWS-NHT] Results for Amazon Web Services c4.8xlarge instances with hyper-threading disabled. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.1.3 Non-hyper-threaded regime with placement groups

Table IV shows the HPL benchmark results with hyper-threading disabled and placement group option enabled. A placement group is a logical grouping of instances within a single availability zone, recommended for applications that benefit from low network latency, high network throughput, or both [33]. It can be seen, however, that the ratio of absolute speedup to the number of nodes shows marginal differences with respect to the previous scenario, where placement group option was not used.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 18 0.62 0.82 1.00
2 36 1.14 1.63 1.82
4 72 1.97 3.26 3.15
8 144 3.51 6.53 5.61
16 288 5.70 13.05 9.12
32 576 10.74 26.10 17.18
Table IV: [AWS-NHT-PG] Results for Amazon Web Services c4.8xlarge instances with hyper-threading disabled and with placement group option enabled at provision time. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.2 Microsoft Azure

3.2.1 F-series

Table V shows the HPL benchmark results running on Azure Standard_F16 instances. Although the overall performance degradation with increased node count is evident, it appears to be less severe than for AWS. The bare performance is worse however.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 16 0.48 0.6 1.00
2 32 0.87 1.2 1.82
4 64 1.49 2.4 3.14
8 128 3.04 4.8 6.38
16 256 5.33 9.6 11.18
32 512 10.53 19.2 22.11
Table V: [AZ-F] Results for Azure F-series instances. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.2.2 A-series

Table V shows the HPL benchmark results running on Azure Standard_A9 instances using Infiniband interconnect network. The low-latency network interconnect definitely affects the scaling, increasing the speed-up ratio  from 0.5 to  0.9 for 32 compute nodes. The bare performance figures, however are still better for AWS due to the higher processor speed.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 16 0.30 0.65 1.00
2 32 0.58 1.3 1.95
4 64 1.16 2.6 3.91
8 128 2.25 5.2 7.56
16 256 4.42 10.4 14.88
32 512 8.59 20.8 28.94
Table VI: [AZ-A] Results for Azure A-series instances with Infiniband [34] interconnect network. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.2.3 H-series

Table VI shows the HPL benchmark results running on Azure Standard_H16r instances using Infiniband interconnect network. The low-latency network interconnect enables the best scaling pattern, with sustained ratio above 0.9 in the 1-32 node count (1-512 computing cores) range. The bare performance figures are best of all cases studied, even when compared with the traditional supercomputing system of reference.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 16 0.61 0.8 1.00
2 32 1.22 1.6 2.01
4 64 2.40 3.2 3.93
8 128 4.69 6.4 7.69
16 256 9.09 12.8 14.91
32 512 17.26 25.6 28.33
Table VII: [AZ-H] Results for Azure H-series instances with Infiniband [34] interconnect network. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.3 Rackspace

Table VIII shows the HPL benchmark results running on Rackspace Compute1-60 instances. Overall, the results are similar to AWS and Azure. A slight variation (spike) in the speedup ratio for 16 nodes can be associated with the underlying network topology of the cloud datacenter.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 32 0.16 0.7 1.00
2 64 0.28 1.4 1.68
4 128 0.57 2.8 3.46
8 256 0.98 5.6 5.97
16 512 2.14 11.2 13.07
32 1024 3.04 22.4 18.55
Table VIII: [RS] Results for Rackspace Compute1-60 instances. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.4 IBM SoftLayer

Table IX shows the HPL benchmark results running on SoftLayer virtual servers. The network quickly saturates at scale, demonstrating the worst performance out of all cases studied. The processor clock speeds are also inferior when compared to other cloud options.

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 32 0.57 0.525 1.00
2 64 0.66 1.05 1.16
4 128 0.44 2.1 0.77
8 256 0.67 4.2 1.17
16 512 1.46 8.4 2.58
32 1024 2.46 16.8 4.33
Table IX: [SL] Results for SoftLayer virtual servers with 32 cores, 64 GB RAM and 1Gb/s bandwidth. Core count is given for physical (non-hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

3.5 Nersc

Table X shows the HPL benchmark results running on NERSC Edison supercomputer with hyper-threading enabled. Edison is a Cray XC30, with a peak performance of 2.57 PFLOPS, 133,824 compute cores, 357 terabytes of memory, and 7.56 petabytes of disk, holding number 60 rank on the top500 list of the best supercomputers at the moment of this writing [25].

Nodes Cores Rmax (TFLOPS) Rpeak (TFLOPS) Speedup
1 48 0.38 0.9 1.00
2 96 0.73 1.8 1.91
4 192 1.34 3.6 3.48
8 384 2.79 7.2 7.27
16 768 5.40 14.4 14.06
32 1536 10.44 28.8 27.17
Table X: [NERSC-E] Results for NERSC Edison supercomputer with hyper-threading enabled. Core count is given for virtual (hyper-threaded) computing cores. Numbers of computing nodes (Nodes) and total computing cores (Cores) are given together with the maximum achieved (Rmax) and peak (Rpeak) performance indicators, and the absolute achieved speedup (Speedup).

4 Discussion

In Fig. 1 we present a comparison of the speedup ratios for the scenarios described the previous part. As it can be seen, Microsoft Azure outperforms other cloud providers because of the low latency interconnect network that facilitates efficient scaling. SoftLayer has the least favorable speedup ratio at scale, likely because of the interconnect network again. AWS and Rackspace show a significant degree of parallel performance degradation, such that at 32 nodes the measured performance is about one-half of the peak value.

Figure 1: Speedup ratios (the ratios of maximum speedup Rmax to peak speedup Rpeak) against the number of nodes for all benchmarked cases. Speedup ratio for 1,2,4,8,16 and 32 nodes are investigated and given by points. Lines are drawn to guide the eye. The legend is as follows: AWS - Amazon Web Services in the default hyper-threaded regime; AWS-NHT - same, with hyperthreading disabled; AWS-NHT-PG - same, with placement group option enabled; AZ - Microsoft Azure standard F16 instances; AZ-IB-A - same provider, A9 instances; AZ-IB-H - same provider, H16 instances; RS - Rackspace compute1-60 instances; SL - IBM/Softlayer virtual servers; NERSC - Edison computing facility of the National Energy Research Scientific Computing Center.

Fig. 2 shows a comparative plot of the performance per core in giga-FLOPS for the previously described scenarios. Microsoft Azure H-instances are the highest performing option in this view as well (AZ-IB-H). One interesting fact is that although Microsoft Azure A-instances (AZ-IB-A) show better overall scaling in Fig. 1, AWS c4.8xlarge instances deliver better performance per core for up to 16 nodes. This is likely because of faster processors speed. NERSC Edison supercomputer delivers a rather low performance per core metric, likely due to the type of processors used.

Our results demonstrate that the current generation of publicly available cloud computing systems are capable of delivering comparable, if not better, performance than the top-tier traditional high performance computing systems. This fact confirms that cloud computing is already a viable and cost-effective alternative to traditional cost-intensive supercomputing procurement. We believe that with further advancements in virtualization, such as low-overhead container technology, and future improvements in cloud datacenter hardware we may experience a large-scale migration from on-premises to cloud-based usage for high performance applications, similar to what happened with less compute-intensive workloads.

Figure 2: Performance per core in giga-FLOPS against the number of nodes for all benchmarked cases. Performance per core is obtained by dividing the maximum performance by the total number of computing cores. The legend is the same as in Fig. 1. Lines are given to guide the eye.

5 Conclusion

We benchmarked the performance of the best available computing hardware from public cloud providers with high performance Linpack. We optimized the benchmark for each computing environment and evaluated the relative performance for distributed memory calculations. We found Microsoft Azure to deliver the best results, and demonstrated that the performance per single computing core on public cloud to be comparable to modern traditional supercomputing systems. Based on our findings we suggest that the concept of high performance computing in the cloud is ready for a widespread adoption and can provide a viable and cost-efficient alternative to capital-intensive on-premises hardware deployments.

6 Acknowledgement

Authors would like to thank Michael G. Haverty for reading the manuscript, and acknowledge support from the National Energy Research Scientific Computing Center in a form of a startup allocation.

References

  • [1] J. Dongarra, P. Luszczek, and A. Petitet, “The LINPACK benchmark: past, present and future,” Concurrency and Computation: Practice and Experience, vol. 15, no. 9, pp. 803–820, 2003. [Online]. Available: http://dx.doi.org/10.1002/cpe.728
  • [2] P. Mell and T. Grance, “The nist definition of cloud computing,” National Institute of Standards and Technology, vol. 53, no. 6, p. 50, 2009.
  • [3] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Journal of Emerging Trends in Computing and Information Sciences, vol. 53, no. 4, pp. 50–58, 2010. [Online]. Available: http://cacm.acm.org/magazines/2010/4/81493-a-view-of-cloud-computing/fulltext
  • [4] J. W. Rittinghouse and J. F. Ransome, Cloud computing: implementation, management, and security.   CRC press, 2016.
  • [5] K. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. Wasserman, and N. Wright, “Performance analysis of high performance computing applications on the amazon web services cloud,” Proceedings of the IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom 2010), pp. 159–168, 2010. [Online]. Available: http://ieeexplore.ieee.org/document/5708447/
  • [6] R. Masud, “High performance computing with clouds,” Technical Report, University of Oregon, 2010.
  • [7] S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, “An early performance analysis of cloud computing services for scientific computing,” Delft University of Technology, Tech. Rep, 2008.
  • [8] E. Walker, “Benchmarking amazon ec2 for high-performance scientific computing,” USENIX Login, vol. 33, no. 5, pp. 18–23, 2008.
  • [9] G. Wang and T. E. Ng, “The impact of virtualization on network performance of amazon ec2 data center,” Proceedings of IEEE INFOCOM, 2010.
  • [10] J. Napper and P. Bientinesi, “Can cloud computing reach the top500?” Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop. ACM, vol. 2008, pp. 17––20, 2009.
  • [11] A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, “Performance analysis of cloud computing services for many-tasks scientific computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 6, pp. 931–945, 2011.
  • [12] K. e. a. Yelick, “The magellan report on cloud computing for science,” U.S. Department of Energy Office of Science, 2011. [Online]. Available: https://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_Final_Report.pdf
  • [13] S. Hazelhurst, “Scientific computing using virtual high-performance computing: a case study using the amazon elastic computing cloud,” Proceedings of the 2008 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries: riding the wave of technology. ACM, pp. 94–103, 2008.
  • [14] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, “The cost of doing science on the cloud: the montage example,” Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE Press, pp. 1–12, 2008.
  • [15] C. Evangelinos and C. Hill, “Cloud computing for parallel scientific hpc applications: Feasibility of running coupled atmosphere-ocean climate models on amazons ec2,” ratio, vol. 2, no. 2.40, pp. 2–34, 2008.
  • [16] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, 2007.
  • [17] K. Keahey, R. Figueiredo, J. Fortes, T. Freeman, and M. Tsugawa, “Science clouds: Early experiences in cloud computing for scientific applications,” Cloud Computing and Applications, vol. 2008, 2008.
  • [18] K. Keahey, “Cloud computing for science,” Proceedings of the 21st International Conference on Scientific and Statistical Database Management. Springer-Verlag, vol. 2008, p. 478, 2009.
  • [19] J. Li, D. Agarwal, M. Humphrey, C. van Ingen, K. Jackson, and Y. Ryu, “escience in the cloud: A modis satellite data reprojection and reduction pipeline in the windows azure platform,” Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium, 2010.
  • [20] L. Ramakrishnan, K. R. Jackson, S. Canon, S. Cholia, and J. Shalf, “Defining future platform requirements for e-science clouds,” Proceedings of the ACM Symposium on Cloud Computing, 2010.
  • [21] J. Rehr, F. Vila, J. Gardner, L. Svec, and M. Prange, “Scientific computing in the cloud,” Computing in Science and Engineering, vol. 99, 2010.
  • [22] S. P. Ahuja and S. Mani, “The state of high performance computing in the cloud,” Journal of Emerging Trends in Computing and Information Sciences, vol. 3, no. 2, 2012.
  • [23] T. Bazhirov, M. Mohammadi, K. Ding, and S. Barabash, “Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing,” Proceedings of the American Physical Society March Meeting 2017, 2017. [Online]. Available: http://meetings.aps.org/Meeting/MAR17/Session/C1.7
  • [24] “Linpack, top500.org webpage.” [Online]. Available: https://www.top500.org/project/linpack
  • [25] “Edison supercomputer, top500.org ranking.” [Online]. Available: https://www.top500.org/system/178443
  • [26] [Online]. Available: http://www.netlib.org/benchmark/hpl
  • [27] J. Dongarra, J. Bunch, C. Moler, and G. W. Stewart, “Linpack users guide,” 1979.
  • [28] “Amazon ec2 instance types.” [Online]. Available: https://aws.amazon.com/ec2/instance-types
  • [29] “Sizes for linux virtual machines in azure.” [Online]. Available: https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-linux-sizes
  • [30] “Rackspace virtual cloud server flavors.” [Online]. Available: https://developer.rackspace.com/docs/cloud-servers/v2/general-api-info/flavors/
  • [31] “Softlayer virtual servers.” [Online]. Available: http://www.softlayer.com/virtual-servers
  • [32] “Turbo boost technology.” [Online]. Available: http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-technology.html
  • [33] “Amazon elastic compute cloud: Placement group.” [Online]. Available: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html
  • [34] G. F. Pfister, “An introduction to the infiniband architecture,” High Performance Mass Storage and Parallel I/O, vol. 42, pp. 617–632, 2001.