SMURF: Efficient and Scalable Metadata Access for Distributed Applications

05/29/2021
by   Bing Zhang, et al.
0

In parallel with big data processing and analysis dominating the usage of distributed and cloud infrastructures, the demand for distributed metadata access and transfer has increased. In many application domains, the volume of data generated exceeds petabytes, while the corresponding metadata amounts to terabytes or even more. This paper proposes a novel solution for efficient and scalable metadata access for distributed applications across wide-area networks, dubbed SMURF. Our solution combines novel pipelining and concurrent transfer mechanisms with reliability, provides distributed continuum caching and prefetching strategies to sidestep fetching latency, and achieves scalable and high-performance metadata fetch/prefetch services in the cloud. We also study the phenomenon of semantic locality in real trace logs, which is not well utilized in metadata access prediction. We implement a novel prefetch predictor based on this observation and compare it with three existing state-of-the-art prefetch schemes on Yahoo! Hadoop audit traces. By effectively caching and prefetching metadata based on the access patterns, our continuum caching and prefetching mechanism significantly improves local cache hit rate and reduces the average fetching latency. We replayed approximately 20 Million metadata access operations from real audit traces, in which our system achieved 90 accuracy during prefetch prediction and reduced the average fetch latency by 50

READ FULL TEXT
research
12/05/2020

Optimal Caching for Low Latency in Distributed Coded Storage Systems

Erasure codes have been widely considered a promising solution to enhanc...
research
11/20/2022

Metadata Caching in Presto: Towards Fast Data Processing

Presto is an open-source distributed SQL query engine for OLAP, aiming f...
research
04/27/2018

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

In the era of big data and cloud computing, large amounts of data are ge...
research
06/20/2023

λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions

The metadata service (MDS) sits on the critical path for distributed fil...
research
05/03/2021

Analyzing scientific data sharing patterns for in-network data caching

The volume of data moving through a network increases with new scientifi...
research
01/18/2023

An NDN-Enabled Fog Radio Access Network Architecture With Distributed In-Network Caching

To meet the increasing demands of next-generation cellular networks (e.g...
research
03/21/2018

A Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems

Deduplication has been largely employed in distributed storage systems t...

Please sign up or login with your details

Forgot password? Click here to reset