Reliable Actors with Retry Orchestration

11/22/2021
by   Olivier Tardieu, et al.
0

Enterprise cloud developers have to build applications that are resilient to failures and interruptions. We advocate for, formalize, implement, and evaluate a simple, albeit effective, fault-tolerant programming model for the cloud based on actors, reliable message delivery, and retry orchestration. Our model simultaneously guarantees that (1) failed actor invocations are retried until success and (2) that a strict happens before relationship is preserved across failures within each distributed chain of invocations and retries. These guarantees make it possible to productively develop fault-tolerant distributed applications leveraging cloud services, ranging from classic problems of concurrency theory to enterprise applications. Built as a service mesh, our runtime can compose application components written in any programming language and scale with the application. We measure overhead relative to reliable message queues. Using an application inspired by a typical enterprise scenario, we assess fault tolerance and the impact of fault recovery on performance.

READ FULL TEXT

page 5

page 6

page 11

research
02/25/2019

Reliable State Machines: A Framework for Programming Reliable Cloud Services

Building reliable applications for the cloud is challenging because of u...
research
01/10/2020

Fault Tolerance for Service Function Chains

Traffic in enterprise networks typically traverses a sequence of middleb...
research
10/31/2021

RRFT: A Rank-Based Resource Aware Fault Tolerant Strategy for Cloud Platforms

The applications that are deployed in the cloud to provide services to t...
research
12/28/2017

Reliable Messaging to Millions of Users with MigratoryData

Web-based notification services are used by a large range of businesses ...
research
10/21/2021

Model-based Reinforcement Learning for Service Mesh Fault Resiliency in a Web Application-level

Microservice-based architectures enable different aspects of web applica...
research
06/23/2023

Geometric Fault-Tolerant Control of Quadrotors in Case of Rotor Failures: An Attitude Based Comparative Study

The ability of aerial robots to operate in the presence of failures is c...
research
02/07/2018

Partisan: Enabling Cloud-Scale Erlang Applications

In this work, we present an alternative distribution layer for Erlang, n...

Please sign up or login with your details

Forgot password? Click here to reset