Low-Latency, High-Throughput Garbage Collection (Extended Version)

10/31/2022
by   Wenyu Zhao, et al.
0

Production garbage collectors make substantial compromises in pursuit of reduced pause times. They require far more CPU cycles and memory than prior simpler collectors. concurrent copying collectors (C4, ZGC, and Shenandoah) suffer from the following design limitations. 1) Concurrent copying. They only reclaim memory by copying, which is inherently expensive with high memory bandwidth demands. Concurrent copying also requires expensive read and write barriers. 2) Scalability. They depend on tracing, which in the limit and in practice does not scale. 3) Immediacy. They do not reclaim older objects promptly, incurring high memory overheads. We present LXR, which takes a very different approach to optimizing responsiveness and throughput by minimizing concurrent collection work and overheads. 1) LXR reclaims most memory without any copying by using the Immix heap structure. It then combats fragmentation with limited judicious stop-the-world copying. 2) LXR uses reference counting to achieve both scalability and immediacy, promptly reclaiming young and old objects. It uses concurrent tracing as needed for identifying cyclic garbage. 3) To minimize pause times while allowing judicious copying of mature objects, LXR introduces remembered sets for reference counting and concurrent decrement processing. 4) LXR introduces a novel low-overhead write barrier that combines coalescing reference counting, concurrent tracing, and remembered set maintenance. The result is a collector with excellent responsiveness and throughput. On the widely-used Lucene search engine with a generously sized heap, LXR has 6x higher throughput while delivering 30x lower 99.9 percentile tail latency than the popular Shenandoah production collector in its default configuration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2020

Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

Today's datacenter applications are underpinned by datastores that are r...
research
05/02/2023

F2: Designing a Key-Value Store for Large Skewed Workloads

Today's key-value stores are either disk-optimized, focusing on large da...
research
12/14/2021

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Memory disaggregation architecture physically separates CPU and memory i...
research
04/05/2022

High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory

We show that the wavefront algorithm can achieve higher pairwise read al...
research
04/12/2022

Turning Manual Concurrent Memory Reclamation into Automatic Reference Counting

Safe memory reclamation (SMR) schemes are an essential tool for lock-fre...
research
02/17/2020

Concurrent Reference Counting and Resource Management in Wait-free Constant Time

A common problem when implementing concurrent programs is efficiently pr...
research
09/04/2023

Productive Development of Scalable Network Functions with NFork

Despite decades of research, developing correct and scalable concurrent ...

Please sign up or login with your details

Forgot password? Click here to reset