A model-free approach to linear least squares regression with exact probabilities
In a regression setting with observation vector y ∈ R^n and given finite collection (x_ν)_ν∈ N of regressor vectors x_ν∈ R^n, a typical question is whether a given subset of these regressors is sufficient to approximate y. A classical method for this question is the F test, assuming that y is a linear combination of the regressor vectors plus Gaussian white noise. In this note we show that the corresponding p-value has also a clear data-scientific interpretation without having to assume the data to be random. Then it is shown that such a dual interpretation is possible for a rather large family of tests, the underlying tool being normalized Haar measure on orthogonal groups.
READ FULL TEXT