Sea: A lightweight data-placement library for Big Data scientific computing

07/04/2022
by   Valérie Hayot-Sasson, et al.
0

The recent influx of open scientific data has contributed to the transitioning of scientific computing from compute intensive to data intensive. Whereas many Big Data frameworks exist that minimize the cost of data transfers, few scientific applications integrate these frameworks or adopt data-placement strategies to mitigate the costs. Scientific applications commonly rely on well-established command-line tools that would require complete reinstrumentation in order to incorporate existing frameworks. We developed Sea as a means to enable data-placement strategies for scientific applications executing on HPC clusters without the need to reinstrument workflows. Sea leverages GNU C library interception to intercept POSIX-compliant file system calls made by the applications. We designed a performance model and evaluated the performance of Sea on a synthetic data-intensive application processing a representative neuroimaging dataset (the Big Brain). Our results demonstrate that Sea significantly improves performance, up to a factor of 3×.

READ FULL TEXT

page 9

page 11

research
05/29/2019

Evaluation of pilot jobs for Apache Spark applications on HPC clusters

Big Data has become prominent throughout many scientific fields and, as ...
research
12/30/2020

SDN helps Big Data to optimize access to data

This chapter introduces the state-of-the-art in the emerging area of com...
research
08/07/2018

MaRe: Container-Based Parallel Computing with Data Locality

Application containers are emerging as key components in scientific proc...
research
07/25/2018

Big Data: the End of the Scientific Method?

We argue that the boldest claims of Big Data are in need of revision and...
research
07/06/2023

Applying Process Mining on Scientific Workflows: a Case Study

Computer-based scientific experiments are becoming increasingly data-int...
research
05/10/2021

Skew-Oblivious Data Routing for Data-Intensive Applications on FPGAs with HLS

FPGAs have become emerging computing infrastructures for accelerating ap...
research
05/01/2018

SAGE: Percipient Storage for Exascale Data Centric Computing

We aim to implement a Big Data/Extreme Computing (BDEC) capable system i...

Please sign up or login with your details

Forgot password? Click here to reset