Heterogeneity assessment in causal data fusion problems

08/10/2022
by   Tat-Thang Vo, et al.
0

Previous works have formalized the conditions under which findings from a source population could be reasonably extrapolated to another target population, the so-called "transportability" problem. While most of these works focus on a setting with two populations, many recent works have also provided the identifiability of a causal parameter when multiple data sources are available, under certain homogeneity assumptions. However, we know of little work examining transportability when data sources are possibly heterogeneous, e.g. in the distribution of mediators of the exposure-outcome relation. The presence of such heterogeneity generally invalidates the transportability assumption required in most of the literature. In this paper, we will propose a general approach for heterogeneity assessment when estimating the average exposure effect in a target population, with mediator and outcome data obtained from multiple external sources. To account for heterogeneity, we define different effect estimands when the mediator and outcome information is transported from different sources. We discuss the causal assumptions to identify these estimands, then propose efficient semi-parametric estimation strategies that allow the use of flexible data-adaptive machine learning methods to estimate the nuisance parameters. We also propose two new methods to investigate sources of heterogeneity in the transported estimates. These methods will inform users about how much of the observed statistical heterogeneity in the transported effects is due to the differences across data sources in: 1) conditional distribution of mediator variables, and/or 2) conditional distribution of the outcome. We illustrate the proposed methods using four sites that were part of the Moving to Opportunity Study, which was an experiment that randomized housing voucher receipt to participating families living in public housing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2022

On the distribution of individual causal effects of binary exposures using latent variable models

In recent years the field of causal inference from observational data ha...
research
03/10/2020

Pursuing Sources of Heterogeneity in Modeling Clustered Population

Researchers often have to deal with heterogeneous population with mixed ...
research
08/20/2021

Efficient Online Estimation of Causal Effects by Deciding What to Observe

Researchers often face data fusion problems, where multiple data sources...
research
11/29/2021

Efficient Estimation Under Data Fusion

We aim to make inferences about a smooth, finite-dimensional parameter b...
research
04/21/2021

Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome

We consider the optimal decision-making problem in a primary sample of i...
research
05/31/2021

Adaptive Multi-Source Causal Inference

Data scarcity is a tremendous challenge in causal effect estimation. In ...

Please sign up or login with your details

Forgot password? Click here to reset