DXRAM's Fault-Tolerance Mechanisms Meet High Speed I/O Devices

07/10/2018
by   Kevin Beineke, et al.
0

In-memory key-value stores provide consistent low-latency access to all objects which is important for interactive large-scale applications like social media networks or online graph analytics and also opens up new application areas. But, when storing the data in RAM on thousands of servers one has to consider server failures. Only a few in-memory key-value stores provide automatic online recovery of failed servers. The most prominent example of these systems is RAMCloud. Another system with sophisticated fault-tolerance mechanisms is DXRAM which is optimized for small data objects. In this report, we detail the remote replication process which is based on logs, investigate selection strategies for the reorganization of these logs and evaluate the reorganization performance for sequential, random, zipf and hot-and-cold distributions in DXRAM. This is also the first time DXRAM's backup system is evaluated with high speed I/O devices, specifically with 56 GBit/s InfiniBand interconnect and PCI-e SSDs. Furthermore, we discuss the copyset replica distribution to reduce the probability for data loss and the adaptations to the original approach for DXRAM.

READ FULL TEXT

page 3

page 8

page 10

research
10/31/2022

uBFT: Microsecond-scale BFT using Disaggregated Memory [Extended Version]

We propose uBFT, the first State Machine Replication (SMR) system to ach...
research
10/18/2020

Fault Tolerance for Remote Memory Access Programming Models

Remote Memory Access (RMA) is an emerging mechanism for programming high...
research
12/04/2021

Invalidation-Based Protocols for Replicated Datastores

Distributed in-memory datastores underpin cloud applications that run wi...
research
02/05/2020

Observations on Porting In-memory KV stores to Persistent Memory

Systems that require high-throughput and fault tolerance, such as key-va...
research
08/20/2018

Loss Data Analytics

Loss Data Analytics is an interactive, online, freely available text. Th...
research
01/23/2021

HyCoR: Fault-Tolerant Replicated Containers Based on Checkpoint and Replay

HyCoR is a fully-operational fault tolerance mechanism for multiprocesso...
research
07/04/2017

Sequential Checking: Reallocation-Free Data-Distribution Algorithm for Scale-out Storage

Using tape or optical devices for scale-out storage is one option for st...

Please sign up or login with your details

Forgot password? Click here to reset