The Petascale DTN Project: High Performance Data Transfer for HPC Facilities

05/26/2021
by   Eli Dart, et al.
0

The movement of large-scale (tens of Terabytes and larger) data sets between high performance computing (HPC) facilities is an important and increasingly critical capability. A growing number of scientific collaborations rely on HPC facilities for tasks which either require large-scale data sets as input or produce large-scale data sets as output. In order to enable the transfer of these data sets as needed by the scientific community, HPC facilities must design and deploy the appropriate data transfer capabilities to allow users to do data placement at scale. This paper describes the Petascale DTN Project, an effort undertaken by four HPC facilities, which succeeded in achieving routine data transfer rates of over 1PB/week between the facilities. We describe the design and configuration of the Data Transfer Node (DTN) clusters used for large-scale data transfers at these facilities, the software tools used, and the performance tuning that enabled this capability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2023

NSF RESUME HPC Workshop: High-Performance Computing and Large-Scale Data Management in Service of Epidemiological Modeling

The NSF-funded Robust Epidemic Surveillance and Modeling (RESUME) projec...
research
03/30/2020

Deep-learning enhancement of large scale numerical simulations

Traditional simulations on High-Performance Computing (HPC) systems typi...
research
08/26/2017

An Assessment of Data Transfer Performance for Large-Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6

We document the data transfer workflow, data transfer performance, and o...
research
10/06/2020

Conceptual and Technical Challenges for High Performance Computing

High Performance Computing (HPC) aims at providing reasonably fast compu...
research
10/13/2020

mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards

Multi-dimensional arrays are ubiquitous in high-performance computing (H...
research
05/22/2019

FQL: An Extensible Feature Query Language and Toolkit on Searching Software Characteristics for HPC Applications

The amount of large-scale scientific computing software is dramatically ...
research
10/03/2022

HPC Storage Service Autotuning Using Variational-Autoencoder-Guided Asynchronous Bayesian Optimization

Distributed data storage services tailored to specific applications have...

Please sign up or login with your details

Forgot password? Click here to reset