K-Pg: Shared State in Differential Dataflows

12/06/2018
by   Frank McSherry, et al.
0

Many of the most popular scalable data-processing frameworks are fundamentally limited in the generality of computations they can express and efficiently execute. In particular, we observe that systems' abstractions limit their ability to share and reuse indexed state within and across computations. These limitations result in an inability to express and efficiently implement algorithms in domains where the scales of data call for them most. In this paper, we present the design and implementation of K-Pg, a data-processing framework that provides high-throughput, low-latency incremental view maintenance for a general class of iterative data-parallel computations. This class includes SQL, stratified Datalog with negation and non-monotonic aggregates, and much of graph processing. Our evaluation indicates that K-Pg's performance is either comparable to, or exceeds, that of specialized systems in multiple domains, while at the same time significantly generalizing their capabilities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

The Noir Dataflow Platform: Efficient Data Processing without Complexity

Today, data analysis drives the decision-making process in virtually eve...
research
08/05/2021

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data...
research
02/23/2018

Benchmarking Distributed Stream Processing Engines

Over the last years, stream data processing has been gaining attention b...
research
03/25/2021

Understanding the Challenges and Assisting Developers with Developing Spark Applications

To process data more efficiently, big data frameworks provide data abstr...
research
03/06/2022

An Adapter Architecture for Heterogeneous Data Processing in Bioinformatics Pipelines

Bioinformatics is a growing field focused on both the domains of compute...
research
03/23/2022

Pathways: Asynchronous Distributed Dataflow for ML

We present the design of a new large scale orchestration layer for accel...
research
08/31/2023

Meld: Exploring the Feasibility of a Framework-less Framework

HEP data-processing frameworks are essential ingredients in getting from...

Please sign up or login with your details

Forgot password? Click here to reset