Cold Storage Data Archives: More Than Just a Bunch of Tapes

04/09/2019
by   Bunjamin Memishi, et al.
0

The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention. In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2019

ServerMix: Tradeoffs and Challenges of Serverless Data Analytics

Serverless computing has become very popular today since it largely simp...
research
04/11/2023

An Empirical Evaluation of Columnar Storage Formats

Columnar storage is one of the core components of a modern data analytic...
research
11/04/2017

Workflow-Based Big Data Analytics in The Cloud Environment Present Research Status and Future Prospects

Workflow is a common term used to describe a systematic breakdown of tas...
research
06/17/2020

Wide-Area Data Analytics

We increasingly live in a data-driven world, with diverse kinds of data ...
research
02/21/2018

Managing and Querying Multi-versioned Documents using a Distributed Key-Value Store

We address the problem of compactly storing a large number of versions (...
research
02/21/2018

RStore: A Distributed Multi-version Document Store

We address the problem of compactly storing a large number of versions (...
research
01/23/2022

SToN: A New Fundamental Trade-off for Distributed Data Storage Systems

Locating data efficiently is a key process in every distributed data sto...

Please sign up or login with your details

Forgot password? Click here to reset