Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning

05/15/2022
by   Gagandeep Singh, et al.
31

Hybrid storage systems (HSS) use multiple different storage devices to provide high and scalable storage capacity at high performance. Recent research proposes various techniques that aim to accurately identify performance-critical data to place it in a "best-fit" storage device. Unfortunately, most of these techniques are rigid, which (1) limits their adaptivity to perform well for a wide range of workloads and storage device configurations, and (2) makes it difficult for designers to extend these techniques to different storage system configurations (e.g., with a different number or different types of storage devices) than the configuration they are designed for. We introduce Sibyl, the first technique that uses reinforcement learning for data placement in hybrid storage systems. Sibyl observes different features of the running workload as well as the storage devices to make system-aware data placement decisions. For every decision it makes, Sibyl receives a reward from the system that it uses to evaluate the long-term performance impact of its decision and continuously optimizes its data placement policy online. We implement Sibyl on real systems with various HSS configurations. Our results show that Sibyl provides 21.6 improvement in a performance-oriented/cost-oriented HSS configuration compared to the best previous data placement technique. Our evaluation using an HSS configuration with three different storage devices shows that Sibyl outperforms the state-of-the-art data placement policy by 23.9 reducing the system architect's burden in designing a data placement mechanism that can simultaneously incorporate three storage devices. We show that Sibyl achieves 80 of future access patterns while incurring a very modest storage overhead of only 124.4 KiB.

READ FULL TEXT

Authors

page 10

05/05/2018

AutoTiering: Automatic Data Placement Manager in Multi-Tier All-Flash Datacenter

In the year of 2017, the capital expenditure of Flash-based Solid State ...
04/26/2021

In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage

Log-structured storage has been widely deployed in various domains of st...
09/24/2021

Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning

Past research has proposed numerous hardware prefetching techniques, mos...
10/17/2021

A Learning-based Approach Towards Automated Tuning of SSD Configurations

Thanks to the mature manufacturing techniques, solid-state drives (SSDs)...
06/07/2021

Balancing Garbage Collection vs I/O Amplification using hybrid Key-Value Placement in LSM-based Key-Value Stores

Key-value (KV) separation is a technique that introduces randomness in t...
06/03/2022

Understanding NVMe Zoned Namespace (ZNS) Flash SSD Storage Devices

The standardization of NVMe Zoned Namespaces (ZNS) in the NVMe 2.0 speci...
01/22/2019

Investigating 3D Printer Residual Data

The continued adoption of Additive Manufacturing technologies is raising...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.