Performance Measurements of Supercomputing and Cloud Storage Solutions

08/01/2017
by   Michael Jones, et al.
0

Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

Performance Comparison of DAOS and Lustre for Object Data Storage Approaches

High-performance object stores are an emerging technology which offers a...
research
01/21/2023

Auditing Lustre file system

With the increasing time, we are facing massive demand for the increasin...
research
10/20/2020

Amazon Data Scraping: How it can benefit for modern business?

To begin with, #Amazon is known as the world’s largest Internet retailer...
research
09/07/2020

Design and Evaluation of a Simple Data Interface for Efficient Data Transfer Across Diverse Storage

Modern science and engineering computing environments often feature stor...
research
09/14/2017

Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems

Large-scale systems with arrays of solid state disks (SSDs) have become ...
research
07/16/2019

Distributed data storage for modern astroparticle physics experiments

The German-Russian Astroparticle Data Life Cycle Initiative is an intern...
research
01/20/2022

High Performance Parallel I/O and In-Situ Analysis in the WRF Model with ADIOS2

As the computing power of large-scale HPC clusters approaches the Exasca...

Please sign up or login with your details

Forgot password? Click here to reset