The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

08/24/2021
by   Valérie Hayot-Sasson, et al.
0

To support the growing demands of neuroscience applications, researchers are transitioning to cloud computing for its scalable, robust and elastic infrastructure. Nevertheless, large datasets residing in object stores may result in significant data transfer overheads during workflow execution. Prefetching, a method to mitigate the cost of reading in mixed workloads, masks data transfer costs within processing time of prior tasks. We present an implementation of "Rolling Prefetch", a Python library that implements a particular form of prefetching from AWS S3 object store, and we quantify its benefits. Rolling Prefetch extends S3Fs, a Python library exposing AWS S3 functionality via a file object, to add prefetch capabilities. In measured analysis performance of a 500 GB brain connectivity dataset stored on S3, we found that prefetching provides significant speed-ups of up to 1.86x, even in applications consisting entirely of data loading. The observed speed-up values are consistent with our theoretical analysis. Our results demonstrate the usefulness of prefetching for scientific data processing on cloud infrastructures and provide an implementation applicable to various application domains.

READ FULL TEXT
research
03/19/2018

Cloud Infrastructure Provenance Collection and Management to Reproduce Scientific Workflow Execution

The emergence of Cloud computing provides a new computing paradigm for s...
research
05/22/2019

AXS: A framework for fast astronomical data processing based on Apache Spark

We introduce AXS (Astronomy eXtensions for Spark), a scalable open-sourc...
research
07/21/2018

Integrated IoT and Cloud Environment for Fingerprint Recognition

Big data applications involving the analysis of large datasets becomes a...
research
05/18/2022

Transparent Serverless execution of Python multiprocessing applications

Access transparency means that both local and remote resources are acces...
research
05/16/2023

Accelerating Communications in Federated Applications with Transparent Object Proxies

Advances in networks, accelerators, and cloud services encourage program...
research
03/31/2022

A Framework to capture and reproduce the Absolute State of Jupyter Notebooks

Jupyter Notebooks are an enormously popular tool for creating and narrat...

Please sign up or login with your details

Forgot password? Click here to reset