Enhancing MapReduce Fault Recovery Through Binocular Speculation

01/23/2019
by   Huansong Fu, et al.
0

MapReduce speculation plays an important role in finding potential task stragglers and failures. But a tacit dichotomy exists in MapReduce due to its inherent two-phase (map and reduce) management scheme in which map tasks and reduce tasks have distinctly different execution behaviors, yet reduce tasks are dependent on the results of map tasks. We reveal that speculation policies for fault handling in MapReduce do not recognize this dichotomy between map and reduce tasks, which leads to an issue of speculation myopia for MapReduce fault recovery. These issues cause significant performance degradation upon network and node failures. To address the speculation myopia caused by MapReduce dichotomy, we introduce a new scheme called binocular speculation to help MapReduce increase its assessment scope for speculation. As part of the scheme, we also design three component techniques including neighborhood glance, collective speculation and speculative rollback. Our evaluation shows that, with these techniques, binocular speculation can increase the coordination of map and reduce phases, and enhance the efficiency of MapReduce fault recovery.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2019

Smart Routing: Towards Proactive Fault-Handling in Software-Defined Networks

Software-defined networking offers numerous benefits against the legacy ...
research
09/19/2022

Rapid Recovery of Program Execution Under Power Failures for Embedded Systems with NVM

After power is switched on, recovering the interrupted program from the ...
research
05/02/2023

Multi-Task Multi-Behavior MAP-Elites

We propose Multi-Task Multi-Behavior MAP-Elites, a variant of MAP-Elites...
research
01/20/2021

Representation Evaluation Block-based Teacher-Student Network for the Industrial Quality-relevant Performance Modeling and Monitoring

Quality-relevant fault detection plays an important role in industrial p...
research
10/06/2020

WoLFRaM: Enhancing Wear-Leveling and Fault Tolerance in Resistive Memories using Programmable Address Decoders

Resistive memories have limited lifetime caused by limited write enduran...
research
03/04/2020

QED: using Quality-Environment-Diversity to evolve resilient robot swarms

In swarm robotics, any of the robots in a swarm may be affected by diffe...
research
12/21/2017

Fault Localization in Large-Scale Network Policy Deployment

The recent advances in network management automation and Software-Define...

Please sign up or login with your details

Forgot password? Click here to reset