Block size estimation for data partitioning in HPC applications using machine learning techniques

11/19/2022
by   Riccardo Cantini, et al.
0

The extensive use of HPC infrastructures and frameworks for running data-intensive applications has led to a growing interest in data partitioning techniques and strategies. In fact, finding an effective partitioning, i.e. a suitable size for data blocks, is a key strategy to speed-up parallel data-intensive applications and increase scalability. This paper describes a methodology for data block size estimation in HPC applications, which relies on supervised machine learning techniques. The implementation of the proposed methodology was evaluated using as a testbed dislib, a distributed computing library highly focused on machine learning algorithms built on top of the PyCOMPSs framework. We assessed the effectiveness of our solution through an extensive experimental evaluation considering different algorithms, datasets, and infrastructures, including the MareNostrum 4 supercomputer. The results we obtained show that the methodology is able to efficiently determine a suitable way to split a given dataset, thus enabling the efficient execution of data-parallel applications in high performance environments.

READ FULL TEXT

page 1

page 6

page 7

page 8

page 9

research
04/30/2018

Experimental Verification and Analysis of Dynamic Loop Scheduling in Scientific Applications

Scientific applications are often irregular and characterized by large c...
research
06/18/2016

Scalability of VM Provisioning Systems

Virtual machines and virtualized hardware have been around for over half...
research
05/16/2018

A Software-Defined QoS Provisioning Framework for HPC Applications

With the emergence of large-scale data-intensive high-performance applic...
research
01/08/2021

Benchmarking Machine Learning: How Fast Can Your Algorithms Go?

This paper is focused on evaluating the effect of some different techniq...
research
08/31/2019

Detecting Covert Cryptomining using HPC

Cybercriminals have been exploiting cryptocurrencies to commit various u...
research
07/09/2022

A novel evaluation methodology for supervised Feature Ranking algorithms

Both in the domains of Feature Selection and Interpretable AI, there exi...
research
08/07/2023

Enhancing iteration performance on distributed task-based workflows

Task-based programming models have proven to be a robust and versatile w...

Please sign up or login with your details

Forgot password? Click here to reset