Resampling-Based Multisplit Inference for High-Dimensional Regression
We propose a novel resampling-based method to construct an asymptotically exact test for any subset of hypotheses on coefficients in high-dimensional linear regression. It can be embedded into any multiple testing procedure to make confidence statements on relevant predictor variables. The method constructs permutation test statistics for any individual hypothesis by means of repeated splits of the data and a variable selection technique; then it defines a test for any subset by suitably aggregating its variables' test statistics. The resulting procedure is extremely flexible, as it allows different selection techniques and several combining functions. We present it in two ways: an exact method and an approximate one, that requires less memory usage and shorter computation time, and can be scaled up to higher dimensions. We illustrate the performance of the method with simulations and the analysis of real gene expression data.
READ FULL TEXT