Inference for Individual Mediation Effects and Interventional Effects in Sparse High-Dimensional Causal Graphical Models

by   Abhishek Chakrabortty, et al.
University of Pennsylvania

We consider the problem of identifying intermediate variables (or mediators) that regulate the effect of a treatment on a response variable. While there has been significant research on this topic, little work has been done when the set of potential mediators is high-dimensional and when they are interrelated. In particular, we assume that the causal structure of the treatment, the potential mediators and the response is a directed acyclic graph (DAG). High-dimensional DAG models have previously been used for the estimation of causal effects from observational data and methods called IDA and joint-IDA have been developed for estimating the effects of single interventions and multiple simultaneous interventions respectively. In this paper, we propose an IDA-type method called MIDA for estimating mediation effects from high-dimensional observational data. Although IDA and joint-IDA estimators have been shown to be consistent in certain sparse high-dimensional settings, their asymptotic properties such as convergence in distribution and inferential tools in such settings remained unknown. We prove high-dimensional consistency of MIDA for linear structural equation models with sub-Gaussian errors. More importantly, we derive distributional convergence results for MIDA in similar high-dimensional settings, which are applicable to IDA and joint-IDA estimators as well. To the best of our knowledge, these are the first distributional convergence results facilitating inference for IDA-type estimators. These results have been built on our novel theoretical results regarding uniform bounds for linear regression estimators over varying subsets of high-dimensional covariates, which may be of independent interest. Finally, we empirically validate our asymptotic theory and demonstrate the usefulness of MIDA in identifying large mediation effects via simulations and application to real data in genomics.


page 1

page 2

page 3

page 4


Finite-Sample Guarantees for High-Dimensional DML

Debiased machine learning (DML) offers an attractive way to estimate tre...

Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables

The last decade witnessed the development of algorithms that completely ...

Deep Generalized Method of Moments for Instrumental Variable Analysis

Instrumental variable analysis is a powerful tool for estimating causal ...

Causality and Generalizability: Identifiability and Learning Methods

This PhD thesis contains several contributions to the field of statistic...

The Causal Learning of Retail Delinquency

This paper focuses on the expected difference in borrower's repayment wh...

A Targeted Approach to Confounder Selection for High-Dimensional Data

We consider the problem of selecting confounders for adjustment from a p...

Sharp Inference on Selected Subgroups in Observational Studies

In modern drug development, the broader availability of high-dimensional...

Please sign up or login with your details

Forgot password? Click here to reset