Cost models for geo-distributed massively parallel streaming analytics

This report is part of the DataflowOpt project on optimization of modern dataflows and aims to introduce a data quality-aware cost model that covers the following aspects in combination: (1) heterogeneity in compute nodes, (2) geo-distribution, (3) massive parallelism, (4) complex DAGs and (5) streaming applications. Such a cost model can be then leveraged to devise cost-based optimization solutions that deal with task placement and operator configuration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2016

Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

Real-time analytics that requires integration and aggregation of heterog...
research
01/11/2023

TAPS: Topology-Aware Intra-Operator Parallelism Strategy Searching Algorithm for Deep Neural Networks

TAPS is a Topology-Aware intra-operator Parallelism strategy Searching a...
research
03/08/2021

Biogeography-Based Optimization of RC structures including static soil-structure interaction

A method to minimize the cost of the structural design of reinforced con...
research
01/21/2023

ScaDLES: Scalable Deep Learning over Streaming data at the Edge

Distributed deep learning (DDL) training systems are designed for cloud ...
research
07/13/2021

Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2

This paper aims to create a transition path from file-based IO to stream...
research
05/02/2018

Architecture for Analysis of Streaming Data

While several attempts have been made to construct a scalable and flexib...
research
10/30/2015

Streaming, Distributed Variational Inference for Bayesian Nonparametrics

This paper presents a methodology for creating streaming, distributed in...

Please sign up or login with your details

Forgot password? Click here to reset