Optimal Multi-Level Interval-based Checkpointing for Exascale Stream Processing Systems

12/16/2019
by   Sachini Jayasekara, et al.
0

State-of-the-art stream processing platforms make use of checkpointing to support fault tolerance, where a "checkpoint tuple" flows through the topology to all operators, indicating a checkpoint and triggering a checkpoint operation. The checkpoint will enable recovering from any kind of failure, be it as localized as a process fault or as wide spread as power supply loss to an entire rack of machines. As we move towards Exascale computing, it is becoming clear that this kind of "single-level" checkpointing is too inefficient to scale. Some HPC researchers are now investigating multi-level checkpointing, where checkpoint operations at each level are tailored to specific kinds of failure to address the inefficiencies of single-level checkpointing. Multi-level checkpointing has been shown in practice to be superior, giving greater efficiency in operation over single-level checkpointing. However, to date there is no theoretical basis that provides optimal parameter settings for an interval-based coordinated multi-level checkpointing approach. This paper presents a theoretical framework for determining optimal parameter settings in an interval-based multi-level periodic checkpointing system, that is applicable to stream processing. Our approach is stochastic, where at a given checkpoint interval, a level is selected with some probability for checkpointing. We derive the optimal checkpoint interval and associated optimal checkpoint probabilities for a multi-level checkpointing system, that considers failure rates, checkpoint costs, restart costs and possible failure during restarting, at every level. We confirm our results with stochastic simulation and practical experimentation.

READ FULL TEXT

page 1

page 5

research
11/27/2019

A Utilization Model for Optimization of Checkpoint Intervals in Distributed Stream Processing Systems

State-of-the-art distributed stream processing systems such as Apache Fl...
research
03/22/2019

Multi-Level Mesa

Multi-level Mesa is an extension to support the Python based Agents Base...
research
04/20/2016

Jansen-MIDAS: a multi-level photomicrograph segmentation software based on isotropic undecimated wavelets

Image segmentation, the process of separating the elements within an ima...
research
08/22/2019

Multi-level Graph Drawing using Infomap Clustering

Infomap clustering finds the community structures that minimize the expe...
research
08/11/2021

Hybrid Multi-level Crossover for Unit Test Case Generation

State-of-the-art search-based approaches for test case generation work a...

Please sign up or login with your details

Forgot password? Click here to reset