Cybercosm: New Foundations for a Converged Science Data Ecosystem

05/22/2021
by   Mark Asch, et al.
0

Scientific communities naturally tend to organize around data ecosystems created by the combination of their observational devices, their data repositories, and the workflows essential to carry their research from observation to discovery. However, these legacy data ecosystems are now breaking down under the pressure of the exponential growth in the volume and velocity of these workflows, which are further complicated by the need to integrate the highly data intensive methods of the Artificial Intelligence revolution. Enabling ground breaking science that makes full use of this new, data saturated research environment will require distributed systems that support dramatically improved resource sharing, workflow portability and composability, and data ecosystem convergence. The Cybercosm vision presented in this white paper describes a radically different approach to the architecture of distributed systems for data-intensive science and its application workflows. As opposed to traditional models that restrict interoperability by hiving off storage, networking, and computing resources in separate technology silos, Cybercosm defines a minimally sufficient hypervisor as a spanning layer for its data plane that virtualizes and converges the local resources of the system's nodes in a fully interoperable manner. By building on a common, universal interface into which the problems that infect today's data-intensive workflows can be decomposed and attacked, Cybercosm aims to support scalable, portable and composable workflows that span and merge the distributed data ecosystems that characterize leading edge research communities today.

READ FULL TEXT

page 1

page 2

page 4

page 11

research
09/06/2019

Agora: Towards An Open Ecosystem for Democratizing Data Science Artificial Intelligence

Data science and artificial intelligence are driven by a plethora of div...
research
03/21/2022

A Model and Survey of Distributed Data-Intensive Systems

Data is a precious resource in today's society, and is generated at an u...
research
05/01/2018

Computing Environments for Reproducibility: Capturing the "Whole Tale"

The act of sharing scientific knowledge is rapidly evolving away from tr...
research
09/05/2022

Rosetta: a container-centric science platform for resource-intensive, interactive data analysis

Rosetta is a science platform for resource-intensive, interactive data a...
research
09/26/2019

Artificial Intelligence BlockCloud (AIBC) Technical Whitepaper

The AIBC is an Artificial Intelligence and blockchain technology based l...
research
03/09/2023

Position Paper on Dataset Engineering to Accelerate Science

Data is a critical element in any discovery process. In the last decades...
research
07/01/2021

Toward Interoperable Cyberinfrastructure: Common Descriptions for Computational Resources and Applications

The user-facing components of the Cyberinfrastructure (CI) ecosystem, sc...

Please sign up or login with your details

Forgot password? Click here to reset