Mitigating the Performance-Efficiency Tradeoff in Resilient Memory Disaggregation

10/22/2019
by   Youngmoon Lee, et al.
0

Memory disaggregation has received attention in recent years as a promising idea to reduce the total cost of ownership (TCO) of memory in modern datacenters. However, relying on remote memory expands an application's failure domain and makes it susceptible to tail latency variations. In attempts to making disaggregated memory resilient, stateof-the-art solutions face the classic tradeoff between performance and efficiency: some double the memory overhead of disaggregation by replicating to remote memory, while many others limit performance by replicating to the local disk. We present Hydra, a configurable, erasure-coded resilience mechanism for common memory disaggregation solutions. It can transparently handle uncertainties arising from remote failures, evictions, memory corruptions, and stragglers from network imbalance with a significantly better performance-efficiency tradeoff than the state-of-the-art. We design a fine-tuned data path to achieve single us read/write latency to remote memory, develop decentralized algorithms for cluster-wide memory management, and analyze how to select parameters to mitigate independent and correlated uncertainties. Our integration of Hydra with two major memory disaggregation systems and evaluation on a 50-machine RDMA cluster demonstrates that it achieves the best of both worlds: it improves the latency and throughput of memory-intensive applications by up to 64.78X and 20.61X, respectively, over the state-of-the-art disk backup-based solution. At the same time, it provides performance similar to that of in-memory replication with 1.6X lower memory overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2019

Effectively Prefetching Remote Memory with Leap

Memory disaggregation over RDMA can improve the performance of memory-co...
research
03/19/2022

No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing

Serverless platforms essentially face a tradeoff between container start...
research
12/24/2021

Redy: Remote Dynamic Memory Cache

Redy is a cloud service that provides high performance caches using RDMA...
research
08/03/2020

Efficient Orchestration of Host and Remote Shared Memory for Memory Intensive Workloads

Since very few contributions to the development of an unified memory orc...
research
01/27/2020

Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques

Many performance critical systems today must rely on performance enhance...
research
04/25/2021

RDMAbox : Optimizing RDMA for Memory Intensive Workloads

We present RDMAbox, a set of low level RDMA opti-mizations that provide ...
research
07/15/2022

3PO: Programmed Far-Memory Prefetching for Oblivious Applications

Using memory located on remote machines, or far memory, as a swap space ...

Please sign up or login with your details

Forgot password? Click here to reset