1 Introduction
Let, for , and denote the
th response variable and the associated vector of covariates. We assume that the covariate
consists of components, and that it is required to select a subset of the components that best explains the response variable .Let denote any subset of the indices . We denote by the coordinates of associated with . To relate to we consider the following nonparametric regression setup:
(1) 
where is the random error and the function is considered unknown. We assume that , where .
By assuming this framework we include the possibility that can be a onedimensional, twodimensional and so on a dimensional function. We further assume that there exists a set of regressors which truly influences the dependent variable , and therefore function is the true function. Our problem is to identify , i.e., the set of truly active regressors. Note that we have not consider any specific form of the function. Irrespective of the form of the function, we are only interested in identifying the active regressors.
2 Notations and concepts
By a model, here we mean a particular subset . Our aim to find the true model among all possible candidate models.
For any where is the number of components in , we represent using basis functions as follows:
(2) 
We assume that , henceforth abbreviated as , denotes the th basis function spanning the relevant Hilbert space equipped with some appropriate inner product . can be expressed as the product of individual basis functions as
Here , henceforth , stands for the th basis function for the th component of the vector . We make the following assumption regarding :

For , is uniformly bounded.
Note that assumption (A1) implies that are uniformly bounded for all . Using assumption (A1), we have
(3) 
where stands for up to some positive multiplicative constant.
To ensure that almost surely we need to choose the prior on carefully. In this regard we assume the following:

, where ’s and ’s satisfy
(4) (5)
The above two convergence assumptions ensure, by virtue of simple application of Kolmogorov’s three series theorem characterizing series convergence (see Chow and Teicher (1988)), that
(6) 
The proof of (6) is provided in Appendix A. Now (6) guarantees that almost surely, via (3). Therefore, almost surely belongs to the Hilbert space spanned by the basis functions.
Hence, the prior on is a Gaussian process with mean
(7) 
The covariance between and is given by
(8) 
By assumption (A1) that is uniformly bounded, it is guaranteed using (4) and (5) that both (7) and (8) are welldefined.
Now, for the dataset , where denotes the available th covariate vector associated with the indices , (7) and (8) yield the component mean vector and the dimensional covariance matrix, given by
(9)  
(10) 
The marginal distribution of is then the variate normal, given by
(11) 
where
is the identity matrix of order
. We denote this marginal model by .The true model:
We assume that there exists exactly one particular subset of which is actually associated with the data generating process of . We term this subset as the true subset. The evaluation procedure of the proposed set of model selection basically rests on its ability to identify this true subset, irrespective of the form of the function . In a sense that once such a set is identified, considerable amount of time and money could be saved by discarding the other regressors in future research, and this does not depend on the functional form of relation between the response and the regressors.
Let us denote the true subset of covariate indices by , and the true set of uniformly bounded basis functions by
To distinguish the true model from the rest we add a index to the coefficients of the true model. The true function is then given by
(12) 
where , with and . We denote the mean vector and the covariance matrix of the Gaussian process prior associated with (12) by and , respectively. We denote the corresponding marginal distribution of as .
The Bayes factor of any model to the true model associated with the data given uniform prior distribution on the model space is given by
(13)  
Consider the following lemma stating the expressions for the expectation and variance of logarithm of
. The proof is in the supplementary file.Lemma 1.
Under the given setup, the expectation and variance of the Bayes factor of any subset of regressors and the true subset under the true subset is given as follows:
(14)  
(15) 
For any square matrix , let denote its
th eigenvalue, i.e.,
. For our purpose, let the eigenvalues be arranged in the decreasing order.3 Weak consistency / probability convergence
In this section we modify the assumptions as follows:

Let . We assume that for all , as ,
where .
To proceed, recall that , where denotes the appropriate lower triangular matrix associated with the Cholesky factorization, and , with . Then
It also follows that,
(16) 
Let , and let us make the following additional assumptions

, where .

(17) (18) 
as , where .
Theorem 1.
Assume () – (). Then
(19) 
where .
Proof.
From (13) we find that the expectation of logarithm of the Bayes factor is given by
(20) 
To evaluate the first part in the above equation, note that
(21)  
Note that , due to (A5).
Our next theorem shows that , as .
Theorem 2.
Under assumptions () – (),
(25) 
as .
Instead of proving Theorem 2 we shall prove a stronger version of the theorem in Section 4 in the context of almost sure convergence.
Theorem 3.
Under assumptions () – (),
(26) 
4 Almost sure convergence
Now, let us replace assumption () with the slightly stronger assumption

, for .
Theorem 4.
Assume (), (), (), (), () and (). Then
(27) 
Proof.
For convenience, we shall work with
where . Observe that
(28) 
Now note that
(29) 
where is a positive constant. The above result follows by repeated application of the inequality , for nonnegative , , where .
Let us first obtain the asymptotic order of . Note that
(30) 
The following results (see, for example, Magnus (1978), Kendall and Stuart (1947)) will be useful for our purpose.
(31)  
(32)  
(33)  
(34) 
Substituting (31), (32), (33) and (34) in (30) we obtain
(35) 
Since is positive definite for any , it follows from Lemma 2 of the Appendix that for any ,
(36) 
Now, , as by (A5) (17). Hence, it follows using (36) and(A4), that for ,
(37) 
Substituting (37) in (35) we see that
(38) 
Let us now obtain the asymptotic order of . Note that is univariate normal with mean zero and variance
Now () holds if and only if there exists such that is nonnegative definite, for large enough . Hence, further using () and () to see that is uniformly bounded for all , we obtain . Hence it follows that
(39) 
Finally, we deal with which is the same as . Since , where, for , , by Lemma B of Serfling (1980) (page 68), it follows that
Comments
There are no comments yet.