1.1 Cross-validation Bayes Factors
Cross-validation Bayes Factor proposed by Hart and Malloure (2019) is a direct way to apply Cross-validation to Bayes factors. Assume that are independent and identically distributed variables from density . Let and be parametric models for
be parametric models forwhere and belong to some different (or the same) dimensional Euclidean spaces. Hence, the likelihood functions are and The first step to compute Bayes factor is to split the data matrix into two disjiont parts, which initially and for convenience we take to be
and refers to the particular data split, where , usually These two subsets of the data are training set and validation set in cross-validation. Now, let and be the maximum likelihood estimators of
be the maximum likelihood estimators ofand respectively, that are computed from the data set At the last step, we evaluate the likelihood functions using validation set, that is, . At this point and are two simple models for the underlying distribution of and therefore we have Cross-validation Bayes factor
However, such a cross validation statistics, depends on the particular training sample employed. If we take a geometric mean of
However, such a cross validation statistics, depends on the particular training sample employed. If we take a geometric mean ofrepeats of , then it is no longer dependent on the particular training sample. CVBF becomes
where used above represents .
and respectively. The Maximum likelihood estimator for the mean is just the sample mean. Computing the estimator using the training set and evaluating likelihood functions using the validation set, we take the ratio of likelihood functions of models we compared. Finally, the geometric average over all the possible training samples of size is calculated as
1.2 Geometric Intrinsic Bayes Factors
For Intrinsic Bayes Factors, we calculate a posterior using training samples on the prior distribution, and then evaluate the marginal likelihood functions on both models using the validation set. If we take a geometric mean of the IBFs, then it becomes a Geometric Intrinsic Bayes Factors, which is expressed as below (Here we use same notations as section 1.1),
where used above represents .
1.3 Corrected Intrinsic Bayes Factors
If a prior is a proper prior, it is supposed to integrate to 1. For example, a normal distribution is integrating to one while a uniform distribution is not integrating to one for whatever the choice of the constant c is because
If a prior is a proper prior, it is supposed to integrate to 1. For example, a normal distribution is integrating to one while a uniform distribution is not integrating to one for whatever the choice of the constant c is becauseTherefore, distributions that cannot integrate to 1, such as, a uniform distribution, are called improper priors, which are often considered as uninformative priors. Uninformative priors are sensible in Hypothesis Testing problems since they should take into account the null considered. The priors of arithmetic intrinsic Bayes Factors integrate to 1 absolutely. However, geometric intrinsic priors usually integrate to a finite positive constant so they need a correction . As Berger and Pericchi (1996b) proposed, the geometric intrinsic prior is
where , are in the set , is the parameter under analysis, is a constant value for under the simpler model and stands for non-informative.
And the integration for Geometric prior is
where is a positive constant.
Therefore, the Corrected Geometric prior is
which will be integrating to one.
After the corrections for Geometric Prior, then we need to modify the GIBF so that it becomes a Corrected GIBF.
2 Problem Statement
For the popular method, Intrinsic Bayes Factor, it only requires the number of parameters as a training sample size, which is a small number for an extensive collection of problems. However, it cannot avoid integration over the parameter space, even if it has an intrinsic prior distribution.
On the other hand, Cross-validation Bayes Factor seems to be quite simple because it does not have to choose a prior distribution and does not require integration. In this regard, if we can adapt CVBF to be a Bayesian method and establish a bridge between CVBF and IBF (Actually, it should be GIBF), then we could find a double cure. The first is to circumvent the computational difficulties related to GIBF, and the second is to make CVBF truly Bayesian statistics.
In a word, with the help of GIBF, CVBF may become a useful approximation if one finds a hidden prior distribution and a reasonable training sample size.
Another crucial thing is that, can we use CVBF in the Model Selections without other concerns? The stability and consistency are also considered in this article.
3 Normal Means Problem
Let us begin with the simplest problem to gain insight into the interplay between GIBF and CVBF. We are here analyzing a hypothesis testing with a null hypothesis
Let us begin with the simplest problem to gain insight into the interplay between GIBF and CVBF. We are here analyzing a hypothesis testing with a null hypothesis, in which is a normal distribution with mean and variance where is known. While the alternative hypothesis is a normal distribution with mean different from and the same variance .
3.1 CVBF in normal means problem
We apply the expression in section 1.1, CVBF becomes
where is the mean of the training samples while is the mean of the validation set.
In general, CVBF can be expressed as
The geometric average of CVBFs is
where is the number of simulation, in this case, , which includes all the possibilities of training sample sets, and is a Chi-square distribution with degrees of freedom and is a non-central Chi-square distribution with K degrees of freedom and non-centrality .
Hence, under the alternative model, we have
and the variance of logarithm of CVBF is
3.2 Corrected GIBF in normal means problem
We also apply the formula in section 1.2 to express GIBF in normal means problem. For simplicity, we use a uniform distribution as a non-informative prior distribution, we denote . IBF has been expressed by
where is the mean of training samples and is the mean of the data for split .
Training sample size 1 is a minimal size because it coincides with the number of parameters in the model. The corrected constant for GIBF we computed is . After this correction, we have corrected IBF
Therefore, corrected GIBF can be expressed by
Then, under model 1, the expectation of logarithm of corrected IBF is
and the variance is
We denote here the training sample size as , and yet a GIBF with a minimal training sample size performs quite well in most scenarios, in which some of these will be presented in the following section.
3.3 Bridge Rule and consistency analysis
When the null hypothesis is correct, to let GIBF and CVBF be approximately equivalent, a training sample size of CVBF should be assigned. We propose that a training sample size for CVBF should be
where is the sample size. Under this Rule, we pass the GIBF consistency under the null model to CVBF. It is worth mentioning that the Bridge Rule , at least at the domain , can be approximated by linear regression. In the Figure 1,
, can be approximated by linear regression. In the Figure 1,is fitted by a linear equation . Therefore, the bridge rule is approximately a straight line with a slope of 0.152.
On the other hand, this raises a problem: Do we obtain consistency for CVBF and GIBF under the alternative?
By equation , under the rule, the rate for CVBF could be changed. Under ,
where has a standard Chi-square distribution.
at a rate , where is a positive constant.
When it comes to corrected GIBF,
at a rate .
On the other hand, we analyze the scenario under ,
at a rate n.
Since term dominates the inequality, at a rate n.
If we ignore the constant terms, such as , we can simply express the equation for expectations of CVBF and corrected IBF as below,
respectively, as n goes to infinity. It is easy to see that at while also at when goes to infinity, which means Under the circumstance of choosing as a training sample size of Cross-Validation Bayes Factors, CVBF and corrected Geometric Intrinsic Bayes Factors achieve consistency under both null hypothesis and alternative assumption.
After illustrating the above, we can conclude that after the Rule, GIBF and CVBF have consistency under both the null model and the alternative model. Furthermore, they tend to the correct model at the same rate of convergence, quite a promising result.
It can be argued that the constant to correct the GIBF may be difficult to compute in complex problems. However, even if we do not calculate the constant exactly on each problem, but use the correction for Normal problems. Still, asymptotically both methods CVBF and GIBF are expected to be consistent at the same rate since the correction factor is just a fixed constant, bounded away both from zero and infinity.
3.4 Performances and simulations
Under the null hypothesis, we generate the data of size 100 by a normal distribution of . On the contrary, under the alternative model, we generate the data of the same size by a normal distribution of . We analyze type I errors and type II errors under different training sample sizes (from 5 to 95 with spacing 5). At this point, we employ Receiver Operating Characteristic (ROC) to evaluate scores via the Area Under Curve (AUC).
. We analyze type I errors and type II errors under different training sample sizes (from 5 to 95 with spacing 5). At this point, we employ Receiver Operating Characteristic (ROC) to evaluate scores via the Area Under Curve (AUC).
Figure 2 and Figure 3 suggest that the area under the curve of CVBF is 0.7125, while the AUC of GIBF is 0.9960. An area of 1 represents a perfect test, while an area of 0.5 represents a worthless test, which indicates that GIBF is an excellent and better method than CVBF, and one does not need to specify a training sample size. However, CVBF is still an attractive, simple method, although it needs a careful assessment of its training sets. Our proposal that we called a bridge rule seems to be sensible.
Now we vary the sample size but fix the training sample size for GIBF as 1. According to our Rule, the training sample size for CVBF would depend on the training sample size of GIBF. Under the null hypothesis, the data points are generated from a normal distribution of . In another scenario, under the alternative model, the data points are drawn from a normal distribution of . The sample size varies from 5 to 500, with spacing 5.
From Figure 4 and Figure 5, we can observe that CVBF and GIBF are consistent under the null hypothesis, which happens under the alternative model. Furthermore, the expectations coincide with the simulations. We use GIBF as a guide for choosing the training sets for CVBF and take advantage of the simplicity of CVBF for computing Bayes Factors. The only shortcoming is that we sacrifice the variability. The variance of CVBF is larger than one in GIBF, which we can also conclude from Section 3.1 and Section 3.2 if we vary , and take , , , and when we compute the expectations of variances. Fortunately, as Tukey and McLaughlin (1963) suggested, we can overcome this large variability by trimming the two ends of the ordered sequence of values of Bayes factors in our simulations, which reduces a significant width of the variances for CVBF.
4 Exponential case
The probability density function of an exponential is
The probability density function of an exponential is, where . The null hypothesis is and the alternative is .
4.1 IBF in exponential
The prior for IBF here we use Jefferys prior . Moreover, for simplicity, the training sample size of IBF equals one, the number of parameters.
Then the Intrinsic Bayes Factor is going to be
where is one data point, is the mean of the rest data points and is a Gamma function.
The correction factor for GIBF is , where is a digamma function at 1. Hence, the log of the corrected IBF is
where is a Gamma function, , , are Gamma distributions.
are Gamma distributions.
After taking the expectations of each term, the expectation of the log of GIBF is
where is a digamma function at n.
4.2 CVBF in exponential
Cross-validation Bayes Factor in the exponential case is
the log of the CVBF is
where is a Gamma distribution and is a Beta Distribution of the Second Kind.
is a Beta Distribution of the Second Kind.
Then, we attain the expectation of the log of CVBF.
4.3 Approximations and Bridge Rule
We need to make some approximations. One of properties for digamma function is
where is an Euler-Mascheroni constant, as in Johnson et al. (1970), we have an approximation
And also, for ,
After attaining approximations, we can figure out the Bridge Rule of training sets for CVBF in the exponential case. Without being surprised, under the null hypothesis, CVBF approximates to GIBF when using the bridge rule for a large sample size.
Under the null hypothesis, the expectation of the log of IBF the log of CVBF are
They both go to at a rate of .
Under the alternative, the expectations of the log-IBF and log-CVBF both go to at a rate of . Based on two situations above, we can conclude that CVBF and GIBF converge for a considerably big under Bridge Rule .
4.5 Simulations in exponential
To make a scenario, we suppose a hypothesis test, which is a null model against an alternative . Fixing the size of data as 100, we vary the to see the tendencies of CVBF and GIBF. We observed from Figure 6 that CVBF perfectly coincides with GIBF with tiny gaps under the Rule. Note that CVBF is just slightly sensitive to the parameter for the reason that, roughly, CVBF is in favor of the null hypothesis at the domain of while GIBF favors the null model at . The interval for selecting the null from CVBF is slightly narrower than the one from GIBF.
5 Two-parameter in normal means problem
With working out the one-parameter case, here we would like to look into a two-parameter normal means problem to check the consistency.
Suppose we have a hypothesis testing that the first half of data points have mean and second half of data points have mean . The system can be expressed by
where , and is known.
The hypothesis testing is
The priors for and are both identical uniform distributions. In this regard, the corrected IBF is going to be
where , ,…, are non-central Chi-squared distributions with 1 degree of freedom and non-central parameters and , respectively. is the training sample size, here .
CVBF is going to be
where and are standard Chi-squared distributions and , , are non-central Chi-squared distributions with 1 degree of freedom and non-central parameter , and , respectively.
The expectation of the log of corrected IBF (m should be set as 2) is going to be
The expectation of the log of CVBF is
when we apply the bridge rule, it becomes
We introduce a updated rule for training sample sizes of CVBF, which is
where is the number of parameters. In this case, the rule is , which forces IBF and CVBF to be equivalent in expectations when the null hypothesis model is valid. IBF and CVBF are both going to at a rate of when goes to under the null model; they are going to at a rate of when goes to under the alternative. Hence, we have verified that they are consistent in this setting.
6 Unknown-variance in normal means
We here analyze a hypothesis testing with a null hypothesis following a normal distribution and the alternative following a normal distribution , where is unknown and different from , and is unknown. Notice that this testing is quite different from Section 3.1. For convenience and simplicity, we use as a prior distribution to the null model and , that is, a modified Jefferys prior in Berger and Pericchi (1996b), as a prior distribution to an alternative model for computing GIBF.
6.1 Expressions of GIBF and CVBF
In this case, we use the number of parameters as a training sample size, which is 2, the number of parameters. Hence, IBF is
where and are two data points, which are training samples.
Furthermore, after taking geometric mean, GIBF is going to be
The expression of log of IBF is
where , , and are Chi-squared distributions with degree of freedom and non-centrality , degree of freedom and non-centrality , degree of freedom and non-centrality , and degree of freedom and non-centrality , respectively.
The expectation can only be evaluated as an infinite series (see Berger and Pericchi (1996a)), but numerical solutions are straightforward. As Berger and Pericchi (1996b) claimed, one can simulate the expectation with parameters using the MLE of the original data.
Here we do not correct IBF since the expectation is an infinite series. Fortunately, the corrected factor is just a constant; in this regard, we can analyze the consistency with or without the correction because of the minimal error.
Similarly, CVBF will be encountering difficulties. As other cases, one uses MLE to estimate parameters. In this case, there are two parameters, which are the mean and the variance. It is simply to compute the MLEs, and .
After partitioning data into a training set and a validation set, CVBF becomes
where is the mean of the training set.
Then the log of CVBF is
where is a random variable with a Chi-squared distribution.
is a random variable with a Chi-squared distribution.
However, the expectation will be an infinite series, which includes a confluent hypergeometric function of the first kind.
Luckily, we can still work out the expectations of IBF and CVBF under the null hypothesis model. When the null is true, the expectation of log of IBF becomes
and the expectation of log of CVBF becomes
in which we use the rule so that they have consistency when is large. In the next section, we will be analyzing the consistency under the alternative by simulations.
Since the n