The Projected Covariance Measure for assumption-lean variable significance testing

11/03/2022
by   Anton Rask Lundborg, et al.
0

Testing the significance of a variable or group of variables X for predicting a response Y, given additional covariates Z, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for X is non-zero. However, when the model is misspecified, the test may have poor power, for example when X is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of Y given X and Z does not depend on X. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of Y on X and Z using one half of the data, and then to estimate the expected conditional covariance between this projection and Y on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

READ FULL TEXT

page 24

page 25

research
11/27/2022

Reconciling model-X and doubly robust approaches to conditional independence testing

Model-X approaches to testing conditional independence between a predict...
research
04/19/2018

The Hardness of Conditional Independence Testing and the Generalised Covariance Measure

It is a common saying that testing for conditional independence, i.e., t...
research
08/09/2019

Goodness-of-fit testing in high-dimensional generalized linear models

We propose a family of tests to assess the goodness-of-fit of a high-dim...
research
05/23/2022

A General Framework for Powerful Confounder Adjustment in Omics Association Studies

Genomic data are subject to various sources of confounding, such as batc...
research
01/18/2021

Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis

We study the problem of testing the null hypothesis that X and Y are con...
research
12/25/2022

Test and Measure for Partial Mean Dependence Based on Deep Neural Networks

It is of great importance to investigate the significance of a subset of...
research
10/19/2017

On Affine and Conjugate Nonparametric Regression

Suppose the nonparametric regression function of a response variable Y o...

Please sign up or login with your details

Forgot password? Click here to reset