Koji: Automating pipelines with mixed-semantics data sources

12/02/2018
by   Petar Maymounkov, et al.
0

We propose a new result-oriented semantic for defining data processing workflows that manipulate data in different semantic forms (files or services) in a unified manner. This approach enables users to define workflows for a vast variety of reproducible data-processing tasks in a simple declarative manner which focuses on application-level results, while automating all control-plane considerations (like failure recovery without loss of progress and computation reuse) behind the scenes. The uniform treatment of files and services as data enables easy integration with existing data sources (e.g. data acquisition APIs) and sinks of data (e.g. database services). Whereas the focus on containers as transformations enables reuse of existing data-processing systems. We describe a declarative configuration mechanism, which can be viewed as an intermediate representation (IR) of reproducible data processing pipelines in the same spirit as, for instance, TensorFlowtensorflow and ONNXonnx utilize IRs for defining tensor-processing pipelines.

READ FULL TEXT
research
11/04/2022

Rethinking Storage Management for Data Processing Pipelines in Cloud Data Centers

Data processing frameworks such as Apache Beam and Apache Spark are used...
research
02/02/2022

Data Processing Framework for Ship Performance Analysis

The hydrodynamic performance of a sea-going ship can be analysed using t...
research
06/24/2021

Zero-Cost, Arrow-Enabled Data Interface for Apache Spark

Distributed data processing ecosystems are widespread and their componen...
research
01/28/2021

tf.data: A Machine Learning Data Processing Framework

Training machine learning models requires feeding input data for models ...
research
03/04/2021

GAPS: Geo Data Portals for Air Pollution Studies

There is a wealth of data on air pollution within several users' reach, ...
research
10/04/2021

Prolog as a Querying Language for MongoDB

Today's database systems have shown to be capable of supporting AI appli...
research
09/16/2020

Strong data processing constant is achieved by binary inputs

For any channel P_Y|X the strong data processing constant is defined as ...

Please sign up or login with your details

Forgot password? Click here to reset