Data accounting and error counting

01/29/2023
by   Michał J. Gajda, et al.
0

Can we infer sources of errors from outputs of the complex data analytics software? Bidirectional programming promises that we can reverse flow of software, and translate corrections of output into corrections of either input or data analysis. This allows us to achieve holy grail of automated approaches to debugging, risk reporting and large scale distributed error tracking. Since processing of risk reports and data analysis pipelines can be frequently expressed using a sequence relational algebra operations, we propose a replacement of this traditional approach with a data summarization algebra that helps to determine an impact of errors. It works by defining data analysis of a necessarily complete summarization of a dataset, possibly in multiple ways along multiple dimensions. We also present a description to better communicate how the complete summarizations of the input data may facilitates easier debugging and more efficient development of analysis pipelines. This approach can also be described as an generalization of axiomatic theories of accounting into data analytics, thus dubbed data accounting. We also propose formal properties that allow for transparent assertions about impact of individual records on the aggregated data and ease debugging by allowing to find minimal changes that change behaviour of data analysis on per-record basis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2016

Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics

The relevance and importance of contextualizing data analytics is descri...
research
02/04/2019

Declarative Data Analytics: a Survey

The area of declarative data analytics explores the application of the d...
research
09/18/2019

Advances in Big Data Bio Analytics

Delivering effective data analytics is of crucial importance to the inte...
research
03/22/2019

Active-Code Replacement in the OODIDA Data Analytics Platform

OODIDA (On-board/Off-board Distributed Data Analytics) is a platform for...
research
10/26/2020

Discovering Neutrinos through Data Analytics

Astrophysical experiments produce Big Data which need efficient and effe...
research
05/19/2017

Foundations of Declarative Data Analysis Using Limit Datalog Programs

Motivated by applications in declarative data analysis, we study Datalog...
research
07/10/2020

COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning

Data analytics often involves hypothetical reasoning: repeatedly modifyi...

Please sign up or login with your details

Forgot password? Click here to reset