Automating Distributed Tiered Storage Management in Cluster Computing

07/04/2019
by   Herodotos Herodotou, et al.
0

Data-intensive platforms such as Hadoop and Spark are routinely used to process massive amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and new hardware technologies (e.g., NVRAM, SSDs) have recently led to the introduction of storage tiering in such settings. However, users are now burdened with the additional complexity of managing the multiple storage tiers and the data residing on them while trying to optimize their workloads. In this paper, we develop a general framework for automatically moving data across the available storage tiers in distributed file systems. Moreover, we employ machine learning for tracking and predicting file access patterns, which we use to decide when and which data to move up or down the storage tiers for increasing system performance. Our approach uses incremental learning to dynamically refine the models with new file accesses, allowing them to naturally adjust and adapt to workload changes over time. Our extensive evaluation using realistic workloads derived from Facebook and CMU traces compares our approach with several other policies and showcases significant benefits in terms of both workload performance and cluster efficiency.

READ FULL TEXT

page 10

page 11

page 12

page 13

research
08/27/2019

Performance modeling of a distributed file-system

Data centers have become center of big data processing. Most programs ru...
research
01/21/2023

Auditing Lustre file system

With the increasing time, we are facing massive demand for the increasin...
research
05/28/2010

Simulation de traces réelles d'E/S disque de PC

Under Windows operating system, existing I/O benchmarking tools does not...
research
09/16/2018

I/O Workload Management for All-Flash Datacenter Storage Systems Based on Total Cost of Ownership

Recently, the capital expenditure of flash-based Solid State Driver (SSD...
research
03/03/2023

Study on the Data Storage Technology of Mini-Airborne Radar Based on Machine Learning

The data rate of airborne radar is much higher than the wireless data tr...
research
08/20/2019

On the Diversity of Memory and Storage Technologies

The last decade has seen tremendous developments in memory and storage t...
research
12/06/2013

Towards the Framework of the File Systems Performance Evaluation Techniques and the Taxonomy of Replay Traces

This is the era of High Performance Computing (HPC). There is a great de...

Please sign up or login with your details

Forgot password? Click here to reset