HTCondor data movement at 100 Gbps

07/08/2021
by   Igor Sfiligoi, et al.
0

HTCondor is a major workload management system used in distributed high throughput computing (dHTC) environments, e.g., the Open Science Grid. One of the distinguishing features of HTCondor is the native support for data movement, allowing it to operate without a shared filesystem. Coupling data handling and compute scheduling is both convenient for users and allows for significant infrastructure flexibility but does introduce some limitations. The default HTCondor data transfer mechanism routes both the input and output data through the submission node, making it a potential bottleneck. In this document we show that by using a node equipped with a 100 Gbps network interface (NIC) HTCondor can serve data at up to 90 Gbps, which is sufficient for most current use cases, as it would saturate the border network links of most research universities at the time of writing.

READ FULL TEXT
research
08/26/2017

An Assessment of Data Transfer Performance for Large-Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6

We document the data transfer workflow, data transfer performance, and o...
research
04/25/2018

Processing Database Joins over a Shared-Nothing System of Multicore Machines

To process a large volume of data, modern data management systems use a ...
research
05/16/2019

StashCache: A Distributed Caching Federation for the Open Science Grid

Data distribution for opportunistic users is challenging as they neither...
research
03/15/2022

Data Transfer and Network Services management for Domain Science Workflows

This paper describes a vision and work in progress to elevate network re...
research
01/12/2018

Arhuaco: Deep Learning and Isolation Based Security for Distributed High-Throughput Computing

Grid computing systems require innovative methods and tools to identify ...
research
10/09/2020

Distributed Computing in a Pandemic: A Review of Technologies Available for Tackling COVID-19

The current COVID-19 global pandemic caused by the SARS-CoV-2 betacorona...
research
11/04/2019

Raft Consensus Algorithm: an Effective Substitute for Paxos in High Throughput P2P-based Systems

One of the significant problem in peer-to-peer databases is collision pr...

Please sign up or login with your details

Forgot password? Click here to reset