Workflow-Driven Distributed Machine Learning in CHASE-CI: A Cognitive Hardware and Software Ecosystem Community Infrastructure

02/26/2019
by   Ilkay Altintas, et al.
0

The advances in data, computing and networking over the last two decades led to a shift in many application domains that includes machine learning on big data as a part of the scientific process, requiring new capabilities for integrated and distributed hardware and software infrastructure. This paper contributes a workflow-driven approach for dynamic data-driven application development on top of a new kind of networked Cyberinfrastructure called CHASE-CI. In particular, we present: 1) The architecture for CHASE-CI, a network of distributed fast GPU appliances for machine learning and storage managed through Kubernetes on the high-speed (10-100Gbps) Pacific Research Platform (PRP); 2) A machine learning software containerization approach and libraries required for turning such a network into a distributed computer for big data analysis; 3) An atmospheric science case study that can only be made scalable with an infrastructure like CHASE-CI; 4) Capabilities for virtual cluster management for data communication and analysis in a dynamically scalable fashion, and visualization across the network in specialized visualization facilities in near real-time; and, 5) A step-by-step workflow and performance measurement approach that enables taking advantage of the dynamic architecture of the CHASE-CI network and container management infrastructure.

READ FULL TEXT
research
02/25/2019

Towards A Methodology and Framework for Workflow-Driven Team Science

Scientific workflows are powerful tools for management of scalable exper...
research
11/13/2022

Towards a Dynamic Composability Approach for using Heterogeneous Systems in Remote Sensing

Influenced by the advances in data and computing, the scientific practic...
research
07/05/2019

Networkmetrics unraveled: MBDA in Action

We propose networkmetrics, a new data-driven approach for monitoring, tr...
research
09/25/2018

Optimizing the Human-Machine Partnership with Zooniverse

Over the past decade, Citizen Science has become a proven method of dist...
research
04/13/2020

Software-Defined Network for End-to-end Networked Science at the Exascale

Domain science applications and workflow processes are currently forced ...
research
03/24/2020

AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance

The ever-growing availability of computing power and the sustained devel...
research
08/19/2019

AFrame: Extending DataFrames for Large-Scale Modern Data Analysis (Extended Version)

Analyzing the increasingly large volumes of data that are available toda...

Please sign up or login with your details

Forgot password? Click here to reset