Access Trends of In-network Cache for Scientific Data

05/11/2022
by   Ruize Han, et al.
0

Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period. We further show that machine learning models could predict cache utilization with an accuracy of 0.88. This demonstrates that such cache usage is predictable, which could be useful for managing complex networking resources such as in-network caching.

READ FULL TEXT
research
05/01/2023

Analyzing Transatlantic Network Traffic over Scientific Data Caches

Large scientific collaborations often share huge volumes of data around ...
research
07/20/2023

Effectiveness and predictability of in-network storage cache for scientific workflows

Large scientific collaborations often have multiple scientists accessing...
research
10/11/2019

Sub-query Fragmentation for Query Analysis and Data Caching in the Distributed Environment

When data stores and users are distributed geographically, it is essenti...
research
04/28/2021

FaaT: A Transparent Auto-Scaling Cache for Serverless Applications

Function-as-a-Service (FaaS) has become an increasingly popular way for ...
research
03/14/2022

Deploying in-network caches in support of distributed scientific data sharing

The importance of intelligent data placement, management, and analysis h...
research
05/03/2021

Analyzing scientific data sharing patterns for in-network data caching

The volume of data moving through a network increases with new scientifi...
research
02/02/2019

Learning-based Dynamic Cache Management in a Cloud

Caches are an important component of modern computing systems given thei...

Please sign up or login with your details

Forgot password? Click here to reset