Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

01/26/2018
by   Andre Luckow, et al.
0

An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications is a complex task and requires the integration of heterogeneous, distributed infrastructure, frameworks, middleware and application components. Different application components are often written in different languages using different abstractions and frameworks. Often, additional components, such as a message broker (e.g. Kafka), are required to decouple data production and consumptions and avoiding issues, such as back-pressure. Streaming applications may be extremely dynamic due to factors, such as variable data rates caused by the data source, adaptive sampling techniques or network congestions, variable processing loads caused by usage of different machine learning algorithms. As a result application-level resource management that can respond to changes in one of these factors is critical. We propose Pilot-Streaming, a framework for supporting streaming frameworks, applications and their resource management needs on HPC infrastructure. Pilot-Streaming is based on the Pilot-Job concept and enables developers to manage distributed computing and data resources for complex streaming applications. It enables applications to dynamically respond to resource requirements by adding/removing resources at runtime. This capability is critical for balancing complex streaming pipelines. To address the complexity in developing and characterization of streaming applications, we present the Streaming Mini- App framework, which supports different plug-able algorithms for data generation and processing, e.g., for reconstructing light source images using different techniques. We utilize the Mini-App framework to conduct an evaluation of Pilot-Streaming.

READ FULL TEXT

page 6

page 7

page 8

research
09/13/2019

Performance Characterization and Modeling of Serverless and HPC Streaming Applications

Experiment-in-the-Loop Computing (EILC) requires support for numerous ty...
research
02/20/2020

Methods and Experiences for Developing Abstractions for Data-intensive, Scientific Applications

Developing software for scientific applications that require the integra...
research
01/29/2020

Smart Resource Management for Data Streaming using an Online Bin-packing Strategy

Data stream processing frameworks provide reliable and efficient mechani...
research
07/20/2018

Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison

Studies have demonstrated that Apache Spark, Flink and related framework...
research
07/18/2019

Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems

With the demand to process ever-growing data volumes, a variety of new d...
research
09/03/2021

AppSlice: A system for application-centric design of 5G and edge computing applications

Applications that use edge computing and 5G to improve response times co...
research
10/27/2017

External Memory Pipelining Made Easy With TPIE

When handling large datasets that exceed the capacity of the main memory...

Please sign up or login with your details

Forgot password? Click here to reset