Let, for , and denote theconsists of components, and that it is required to select a subset of the components that best explains the response variable .
Let denote any subset of the indices . We denote by the co-ordinates of associated with . To relate to we consider the following nonparametric regression setup:
where is the random error and the function is considered unknown. We assume that , where .
By assuming this framework we include the possibility that can be a one-dimensional, two-dimensional and so on a -dimensional function. We further assume that there exists a set of regressors which truly influences the dependent variable , and therefore function is the true function. Our problem is to identify , i.e., the set of truly active regressors. Note that we have not consider any specific form of the function. Irrespective of the form of the function, we are only interested in identifying the active regressors.
2 Notations and concepts
By a model, here we mean a particular subset . Our aim to find the true model among all possible candidate models.
For any where is the number of components in , we represent using basis functions as follows:
We assume that , henceforth abbreviated as , denotes the -th basis function spanning the relevant Hilbert space equipped with some appropriate inner product . can be expressed as the product of individual basis functions as
Here , henceforth , stands for the -th basis function for the -th component of the vector . We make the following assumption regarding :
For , is uniformly bounded.
Note that assumption (A1) implies that are uniformly bounded for all . Using assumption (A1), we have
where stands for up to some positive multiplicative constant.
To ensure that almost surely we need to choose the prior on carefully. In this regard we assume the following:
, where ’s and ’s satisfy
The above two convergence assumptions ensure, by virtue of simple application of Kolmogorov’s three series theorem characterizing series convergence (see Chow and Teicher (1988)), that
Hence, the prior on is a Gaussian process with mean
The covariance between and is given by
The marginal distribution of is then the -variate normal, given by
is the identity matrix of order. We denote this marginal model by .
The true model:
We assume that there exists exactly one particular subset of which is actually associated with the data generating process of . We term this subset as the true subset. The evaluation procedure of the proposed set of model selection basically rests on its ability to identify this true subset, irrespective of the form of the function . In a sense that once such a set is identified, considerable amount of time and money could be saved by discarding the other regressors in future research, and this does not depend on the functional form of relation between the response and the regressors.
Let us denote the true subset of covariate indices by , and the true set of uniformly bounded basis functions by
To distinguish the true model from the rest we add a index to the coefficients of the true model. The true function is then given by
where , with and . We denote the mean vector and the covariance matrix of the Gaussian process prior associated with (12) by and , respectively. We denote the corresponding marginal distribution of as .
The Bayes factor of any model to the true model associated with the data given uniform prior distribution on the model space is given by
Consider the following lemma stating the expressions for the expectation and variance of logarithm of. The proof is in the supplementary file.
Under the given setup, the expectation and variance of the Bayes factor of any subset of regressors and the true subset under the true subset is given as follows:
For any square matrix , let denote its
-th eigenvalue, i.e.,. For our purpose, let the eigenvalues be arranged in the decreasing order.
3 Weak consistency / probability convergence
In this section we modify the assumptions as follows:
Let . We assume that for all , as ,
To proceed, recall that , where denotes the appropriate lower triangular matrix associated with the Cholesky factorization, and , with . Then
It also follows that,
Let , and let us make the following additional assumptions
, where .
as , where .
Assume () – (). Then
From (13) we find that the expectation of logarithm of the Bayes factor is given by
To evaluate the first part in the above equation, note that
Note that , due to (A5).
Our next theorem shows that , as .
Under assumptions () – (),
Under assumptions () – (),
4 Almost sure convergence
Now, let us replace assumption () with the slightly stronger assumption
, for .
Assume (), (), (), (), () and (). Then
For convenience, we shall work with
where . Observe that
Now note that
where is a positive constant. The above result follows by repeated application of the inequality , for non-negative , , where .
Let us first obtain the asymptotic order of . Note that
Since is positive definite for any , it follows from Lemma 2 of the Appendix that for any ,
Let us now obtain the asymptotic order of . Note that is univariate normal with mean zero and variance
Now () holds if and only if there exists such that is non-negative definite, for large enough . Hence, further using () and () to see that is uniformly bounded for all , we obtain . Hence it follows that
Finally, we deal with which is the same as . Since , where, for , , by Lemma B of Serfling (1980) (page 68), it follows that