RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups

02/04/2013
by   Chun-Ho Ng, et al.
0

Scaling up the backup storage for an ever-increasing volume of virtual machine (VM) images is a critical issue in virtualization environments. While deduplication is known to effectively eliminate duplicates for VM image storage, it also introduces fragmentation that will degrade read performance. We propose RevDedup, a deduplication system that optimizes reads to latest VM image backups using an idea called reverse deduplication. In contrast with conventional deduplication that removes duplicates from new data, RevDedup removes duplicates from old data, thereby shifting fragmentation to old data while keeping the layout of new data as sequential as possible. We evaluate our RevDedup prototype using microbenchmark and real-world workloads. For a 12-week span of real-world VM images from 160 users, RevDedup achieves high deduplication efficiency with around 97 throughput on the order of 1GB/s. RevDedup also incurs small metadata overhead in backup/read operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2014

Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage

Backup storage systems often remove redundancy across backups via inline...
research
11/22/2021

KML: Using Machine Learning to Improve Storage Systems

Operating systems include many heuristic algorithms designed to improve ...
research
09/03/2022

Sion: Elastic Serverless Cloud Storage

Cloud object storage such as AWS S3 is cost-effective and highly elastic...
research
04/02/2018

Minimizing Content Staleness in Dynamo-Style Replicated Storage Systems

Consistency in data storage systems requires any read operation to retur...
research
03/17/2020

Evolution of the ROOT Tree I/O

The ROOT TTree data format encodes hundreds of petabytes of High Energy ...
research
04/29/2020

Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats

The proliferation of modern data processing tools has given rise to open...
research
08/16/2021

Causal Incremental Graph Convolution for Recommender System Retraining

Real-world recommender system needs to be regularly retrained to keep wi...

Please sign up or login with your details

Forgot password? Click here to reset