DeepAI AI Chat
Log In Sign Up

Statistical inference with F-statistics when fitting simple models to high-dimensional data

by   Hannes Leeb, et al.

We study linear subset regression in the context of the high-dimensional overall model y = ϑ+θ' z + ϵ with univariate response y and a d-vector of random regressors z, independent of ϵ. Here, "high-dimensional" means that the number d of available explanatory variables is much larger than the number n of observations. We consider simple linear sub-models where y is regressed on a set of p regressors given by x = M'z, for some d × p matrix M of full rank p < n. The corresponding simple model, i.e., y=α+β' x + e, can be justified by imposing appropriate restrictions on the unknown parameter θ in the overall model; otherwise, this simple model can be grossly misspecified. In this paper, we establish asymptotic validity of the standard F-test on the surrogate parameter β, in an appropriate sense, even when the simple model is misspecified.


page 1

page 2

page 3

page 4


Testing Overidentifying Restrictions with High-Dimensional Data and Heteroskedasticity

This paper proposes a new test of overidentifying restrictions (called t...

Tracy-Widom limit for Kendall's tau

In this paper, we study a high-dimensional random matrix model from nonp...

Statistical Inference of Minimally Complex Models

Finding the best model that describes a high dimensional dataset, is a d...

Linear Response Based Parameter Estimation in the Presence of Model Error

Recently, we proposed a method to estimate parameters in stochastic dyna...

On Statistical Inference with High Dimensional Sparse CCA

We consider asymptotically exact inference on the leading canonical corr...

Testing for Regression Heteroskedasticity with High-Dimensional Random Forests

Statistical inference for high-dimensional regression heteroskedasticity...

High-dimensional inference: a statistical mechanics perspective

Statistical inference is the science of drawing conclusions about some s...