Practically-Self-Stabilizing Vector Clocks in the Absence of Execution Fairness

12/21/2017
by   Iosif Salem, et al.
0

Vector clock algorithms are basic wait-free building blocks that facilitate causal ordering of events. As wait-free algorithms, they are guaranteed to complete their operations within a finite number of steps. Stabilizing algorithms allow the system to recover after the occurrence of transient faults, such as soft errors and arbitrary violations of the assumptions according to which the system was designed to behave. We present the first, to the best of our knowledge, stabilizing vector clock algorithm for asynchronous crash-prone message-passing systems that can recover in a wait-free manner after the occurrence of transient faults. In these settings, it is challenging to demonstrate a finite and wait-free recovery from (communication and crash failures as well as) transient faults, bound the message and storage sizes, deal with the removal of all stale information without blocking, and deal with counter overflow events (which occur at different network nodes concurrently). We present an algorithm that never violates safety in the absence of transient faults and provides bounded time recovery during fair executions that follow the last transient fault. The novelty is that in the absence of execution fairness, the algorithm guarantees a bound on the number of times in which the system might violate safety (while existing algorithms might block forever due to the presence of both transient faults and crash failures). Since vector clocks facilitate a number of elementary synchronization building blocks (without requiring remote replica synchronization) in asynchronous systems, we believe that our analytical insights are useful for the design of other systems that cannot guarantee execution fairness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2018

Self-Stabilizing and Private Distributed Shared Atomic Memory in Seldomly Fair Message Passing Networks

We study the problem of privately emulating shared memory in message-pas...
research
07/16/2020

Soft Errors Detection and Automatic Recovery based on Replication combined with different Levels of Checkpointing

Handling faults is a growing concern in HPC. In future exascale systems,...
research
04/07/2021

Self-stabilizing Multivalued Consensus in Asynchronous Crash-prone Systems

The problem of multivalued consensus is fundamental in the area of fault...
research
07/20/2018

Self-stabilization Overhead: an Experimental Case Study on Coded Atomic Storage

We study the problem of privately emulating shared memory in message-pas...
research
10/12/2020

Self-Stabilizing Indulgent Zero-degrading Binary Consensus

Guerraoui proposed an indulgent solution for the binary consensus proble...
research
04/29/2022

FRANCIS: Fast Reaction Algorithms for Network Coordination In Switches

Distributed protocols are widely used to support network functions such ...
research
03/24/2023

On the Susceptibility of QDI Circuits to Transient Faults

By design, quasi delay-insensitive (QDI) circuits exhibit higher resilie...

Please sign up or login with your details

Forgot password? Click here to reset