Do-search – a tool for causal inference and study design with multiple data sources

by   Juha Karvanen, et al.

Epidemiological evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool, we present do-search, a recently developed algorithmic approach that can determine the identifiability of a causal effect. The approach is based on do-calculus, and it can utilize data with non-trivial missing data and selection bias mechanisms. When the effect is identifiable, do-search outputs an identifying formula on which numerical estimation can be based. When the effect is not identifiable, we can use do-search to recognize additional data sources and assumptions that would make the effect identifiable. Throughout the paper, we consider the effect of salt-adding behavior on blood pressure mediated by the salt intake as an example. The identifiability of this effect is resolved in various scenarios with different assumptions on confounding. There are scenarios where the causal effect is identifiable from a chain of experiments but not from survey data, as well as scenarios where the opposite is true. As an illustration, we use survey data from NHANES 2013–2016 and the results from a meta-analysis of randomized controlled trials and estimate the reduction in average systolic blood pressure under an intervention where the use of table salt is discontinued.


page 1

page 2

page 3

page 4


Borrowing from Supplemental Sources to Estimate Causal Effects from a Primary Data Source

The increasing multiplicity of data sources offers exciting possibilitie...

Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

Causal effect identification considers whether an interventional probabi...

Collaborative causal inference with a distributed data-sharing management

Data sharing barriers are paramount challenges arising from multicenter ...

Efficient Online Estimation of Causal Effects by Deciding What to Observe

Researchers often face data fusion problems, where multiple data sources...

An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects

We propose a new causal inference framework to learn causal effects from...

Combining Data from Surveys and Related Sources

To improve the precision of inferences and reduce costs there is conside...

Adaptive Multi-Source Causal Inference

Data scarcity is a tremendous challenge in causal effect estimation. In ...

Please sign up or login with your details

Forgot password? Click here to reset