Testing for Regression Heteroskedasticity with High-Dimensional Random Forests

12/05/2022
by   Chi Chien-Ming, et al.
0

Statistical inference for high-dimensional regression heteroskedasticity is an important but under-explored problem. The current paper aims at filling this gap by proposing two tests, namely the variance difference test and the variance difference Breusch-Pagan test, for assessing high-dimensional regression heteroskedasticity. The former tests whether an explanatory feature of interest is associated with the conditional variance of a response variable, while the latter tests heteroskedasticity in the regression, which is known to be the Breusch-Pagan test problem. To formally establish the tests, we have derived rigorous P-values and test sizes, and analyzed the test power under a nonparametric heteroskedastic data generating model with high-dimensional input features. Such a model setting takes into account high-dimensional applications with flexible structures of heteroskedasticity and features having interaction effects on the mean of the response; these are common applications in many fields such as biology. Our methods leverage machine learning mean prediction methods such as random forests and use knockoff variables as negative controls. Particularly, the definition of knockoffs for our test statistics is more flexible than the original definition of knockoffs, and we give a detailed comparison of these two definitions and discuss the advantages of our knockoffs. The satisfactory empirical performance of the proposed tests is illustrated with simulation results and an HIV (Human Immunodeficiency Virus) case study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2018

Power Comparison between High Dimensional t-Test, Sign, and Signed Rank Tests

In this paper, we propose a power comparison between high dimensional t-...
research
07/04/2022

FACT: High-Dimensional Random Forests Inference

Random forests is one of the most widely used machine learning methods o...
research
08/09/2019

Goodness-of-fit testing in high-dimensional generalized linear models

We propose a family of tests to assess the goodness-of-fit of a high-dim...
research
08/04/2015

Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

Nonparametric two sample testing is a decision theoretic problem that in...
research
01/31/2018

A Distribution-Free Test of Independence and Its Application to Variable Selection

Motivated by the importance of measuring the association between the res...
research
03/05/2022

Fuzzy Forests For Feature Selection in High-Dimensional Survey Data: An Application to the 2020 U.S. Presidential Election

An increasingly common methodological issue in the field of social scien...
research
12/21/2018

Global and Local Two-Sample Tests via Regression

Two-sample testing is a fundamental problem in statistics. Despite its l...

Please sign up or login with your details

Forgot password? Click here to reset