Merging joint distributions via causal model classes with low VC dimension

04/09/2018
by   Dominik Janzing, et al.
0

If X,Y,Z denote sets of random variables, two different data sources may contain samples from P_X,Y and P_Y,Z, respectively. We argue that causal inference can help inferring properties of the 'unobserved joint distributions' P_X,Y,Z or P_X,Z. The properties may be conditional independences or also quantitative statements about dependences. More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions. Here, causal inference becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is 'true' (which is a problematic term when it is unclear how to define interventions) a causal hypothesis is useful whenever it correctly predicts statistical properties of unobserved joint distributions. Within such a 'pragmatic' application of causal inference, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations. I hypothesize that our pragmatic view on causality may even cover the usual meaning in terms of interventions and sketch why predicting the impact of interventions can sometimes also be phrased as a task of the above type.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2023

Reinterpreting causal discovery as the task of predicting unobserved joint statistics

If X,Y,Z denote sets of random variables, two different data sources may...
research
10/30/2022

Formalizing Statistical Causality via Modal Logic

We propose a formal language for describing and explaining statistical c...
research
10/11/2022

Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders

The ability to answer causal questions is crucial in many domains, as ca...
research
08/26/2020

Assessing Impact of Unobserved Confounders with Sensitivity Index Probabilities through Pseudo-Experiments

Unobserved confounders are a long-standing issue in causal inference usi...
research
02/14/2012

Detecting low-complexity unobserved causes

We describe a method that infers whether statistical dependences between...
research
06/20/2019

On the probability of a causal inference is robust for internal validity

The internal validity of observational study is often subject to debate....
research
06/13/2016

The Crossover Process: Learnability and Data Protection from Inference Attacks

It is usual to consider data protection and learnability as conflicting ...

Please sign up or login with your details

Forgot password? Click here to reset