DaphneSched: A Scheduler for Integrated Data Analysis Pipelines
DAPHNE is a new open-source software infrastructure designed to address the increasing demands of integrated data analysis (IDA) pipelines, comprising data management (DM), high performance computing (HPC), and machine learning (ML) systems. Efficiently executing IDA pipelines is challenging due to their diverse computing characteristics and demands. Therefore, IDA pipelines executed with the DAPHNE infrastructure require an efficient and versatile scheduler to support these demands. This work introduces DaphneSched, the task-based scheduler at the core of DAPHNE. DaphneSched is versatile by incorporating eleven task partitioning and three task assignment techniques, bringing the state-of-the-art closer to the state-of-the-practice task scheduling. To showcase DaphneSched's effectiveness in scheduling IDA pipelines, we evaluate its performance on two applications: a product recommendation system and a linear regression model training. We conduct performance experiments on multicore platforms with 20 and 56 cores. The results show that the versatility of DaphneSched enabled combinations of scheduling strategies that outperform commonly used scheduling techniques by up to 13 efficient execution of applications with IDA pipelines.
READ FULL TEXT