Contextuality from missing and versioned data

08/10/2017
by   Jason Morton, et al.
0

Traditionally categorical data analysis (e.g. generalized linear models) works with simple, flat datasets akin to a single table in a database with no notion of missing data or conflicting versions. In contrast, modern data analysis must deal with distributed databases with many partial local tables that need not always agree. The computational agents tabulating these tables are spatially separated, with binding speed-of-light constraints and data arriving too rapidly for these distributed views ever to be fully informed and globally consistent. Contextuality is a mathematical property which describes a kind of inconsistency arising in quantum mechanics (e.g. in Bell's theorem). In this paper we show how contextuality can arise in common data collection scenarios, including missing data and versioning (as in low-latency distributed databases employing snapshot isolation). In the companion paper, we develop statistical models adapted to this regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2007

Rough Sets Computations to Impute Missing Data

Many techniques for handling missing data have been proposed in the lite...
research
11/10/2021

variable selection and missing data imputation in categorical genomic data analysis by integrated ridge regression and random forest

Genomic data arising from a genome-wide association study (GWAS) are oft...
research
04/04/2023

Graphical Models of Entangled Missingness

Despite the growing interest in causal and statistical inference for set...
research
10/14/2022

Foundations for statistical inference in the analysis of human mobility data

In this paper, we provide a rigorous formulation of the so-called flight...
research
07/18/2022

Deeply-Learned Generalized Linear Models with Missing Data

Deep Learning (DL) methods have dramatically increased in popularity in ...
research
05/26/2021

ReStore – Neural Data Completion for Relational Databases

Classical approaches for OLAP assume that the data of all tables is comp...
research
10/23/2020

DBLog: A Watermark Based Change-Data-Capture Framework

It is a commonly observed pattern for applications to utilize multiple h...

Please sign up or login with your details

Forgot password? Click here to reset