Methods and Experiences for Developing Abstractions for Data-intensive, Scientific Applications

02/20/2020
by   Andre Luckow, et al.
0

Developing software for scientific applications that require the integration of diverse types of computing, instruments and data present challenges that are distinct from commercial software, due to scale, heterogeneity, and the need to integrate various programming and computational models with evolving and heterogeneous infrastructure. Pervasive and effective abstractions are thus critical. The process of developing abstractions for scientific applications and infrastructures is not well understood. While theory-based approaches are suited for well-defined, closed environments, they have severe limitations for designing abstractions for complex, real-world systems. The design science research (DSR) method provides the basis for designing effective systems that can handle real-world complexities. DSR consists of two complementary phases: design and evaluation. This paper applies the DSR method to the development of abstractions for scientific applications. Specifically, we address the critical problem of distributed resource management on heterogeneous infrastructure, a challenge that currently limits many scientific applications. We use the pilot-abstraction, a widely used resource management abstraction for high-performance, high throughput, big data, and streaming applications, as a case study. We evaluate the activities of the process and extensively evaluate the artifacts using different methods, including conceptual modeling, performance characterizations, and modeling. We demonstrate the applicability of the DSR method for holistically handling the complexity of parallel and distributed computing environments addressing important application, system, and infrastructure challenges of scientific applications. Finally, we capture our experiences and formulate different lessons learned.

READ FULL TEXT
research
01/26/2018

Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

An increasing number of scientific applications rely on stream processin...
research
06/14/2013

Rethinking Abstractions for Big Data: Why, Where, How, and What

Big data refers to large and complex data sets that, under existing appr...
research
11/10/2019

Enhancing Programmability, Portability, and Performance with Rich Cross-Layer Abstractions

Programmability, performance portability, and resource efficiency have e...
research
04/12/2018

Implementing Adaptive Ensemble Biomolecular Applications at Scale

Many scientific problems require multiple distinct computational tasks t...
research
04/06/2016

Accelerating Science: A Computing Research Agenda

The emergence of "big data" offers unprecedented opportunities for not o...
research
10/26/2021

A proposed method using GPU based SDO to optimize retail warehouses

Research in warehouse optimization has gotten increased attention in the...
research
09/22/2021

ProvLet: A Provenance Management Service for Long Tail Microscopy Data

Provenance management must be present to enhance the overall security an...

Please sign up or login with your details

Forgot password? Click here to reset