A Bridge between Cross-validation Bayes Factors and Geometric Intrinsic Bayes Factors

06/11/2020 ∙ by Yekun Wang, et al. ∙ 0

Model Selections in Bayesian Statistics are primarily made with statistics known as Bayes Factors, which are directly related to Posterior Probabilities of models. Bayes Factors require a careful assessment of prior distributions as in the Intrinsic Priors of Berger and Pericchi (1996a) and integration over the parameter space, which may be highly dimensional. Recently researchers have been proposing alternatives to Bayes Factors that require neither integration nor specification of priors. These developments are still in a very early stage and are known as Prior-free Bayes Factors, Cross-Validation Bayes Factors (CVBF), and Bayesian "Stacking." This kind of method and Intrinsic Bayes Factor (IBF) both avoid the specification of prior. However, this Prior-free Bayes factor might need a careful choice of a training sample size. In this article, a way of choosing training sample sizes for the Prior-free Bayes factor based on Geometric Intrinsic Bayes Factors (GIBFs) is proposed and studied. We present essential examples with a different number of parameters and study the statistical behavior both numerically and theoretically to explain the ideas for choosing a feasible training sample size for Prior-free Bayes Factors. We put forward the "Bridge Rule" as an assignment of a training sample size for CVBF's that makes them close to Geometric IBFs. We conclude that even though tractable Geometric IBFs are preferable, CVBF's, using the Bridge Rule, are useful and economical approximations to Bayes Factors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background

1.1 Cross-validation Bayes Factors

Cross-validation Bayes Factor proposed by Hart and Malloure (2019) is a direct way to apply Cross-validation to Bayes factors. Assume that are independent and identically distributed variables from density . Let and

be parametric models for

where and belong to some different (or the same) dimensional Euclidean spaces. Hence, the likelihood functions are and The first step to compute Bayes factor is to split the data matrix into two disjiont parts, which initially and for convenience we take to be

and

and refers to the particular data split, where , usually These two subsets of the data are training set and validation set in cross-validation. Now, let and

be the maximum likelihood estimators of

and respectively, that are computed from the data set At the last step, we evaluate the likelihood functions using validation set, that is, . At this point and are two simple models for the underlying distribution of and therefore we have Cross-validation Bayes factor

However, such a cross validation statistics, depends on the particular training sample employed. If we take a geometric mean of

repeats of , then it is no longer dependent on the particular training sample. CVBF becomes

where used above represents .

For example, if there is a model that follows a normal distribution with an unknown mean and a known variance. Denote these two parameters by

and respectively. The Maximum likelihood estimator for the mean is just the sample mean. Computing the estimator using the training set and evaluating likelihood functions using the validation set, we take the ratio of likelihood functions of models we compared. Finally, the geometric average over all the possible training samples of size is calculated as

1.2 Geometric Intrinsic Bayes Factors

For Intrinsic Bayes Factors, we calculate a posterior using training samples on the prior distribution, and then evaluate the marginal likelihood functions on both models using the validation set. If we take a geometric mean of the IBFs, then it becomes a Geometric Intrinsic Bayes Factors, which is expressed as below (Here we use same notations as section 1.1),

where used above represents .

1.3 Corrected Intrinsic Bayes Factors

If a prior is a proper prior, it is supposed to integrate to 1. For example, a normal distribution is integrating to one while a uniform distribution is not integrating to one for whatever the choice of the constant c is because

Therefore, distributions that cannot integrate to 1, such as, a uniform distribution, are called improper priors, which are often considered as uninformative priors. Uninformative priors are sensible in Hypothesis Testing problems since they should take into account the null considered. The priors of arithmetic intrinsic Bayes Factors integrate to 1 absolutely. However, geometric intrinsic priors usually integrate to a finite positive constant so they need a correction . As Berger and Pericchi (1996b) proposed, the geometric intrinsic prior is

where , are in the set , is the parameter under analysis, is a constant value for under the simpler model and stands for non-informative.

And the integration for Geometric prior is

where is a positive constant.

Therefore, the Corrected Geometric prior is

which will be integrating to one.

After the corrections for Geometric Prior, then we need to modify the GIBF so that it becomes a Corrected GIBF.

2 Problem Statement

For the popular method, Intrinsic Bayes Factor, it only requires the number of parameters as a training sample size, which is a small number for an extensive collection of problems. However, it cannot avoid integration over the parameter space, even if it has an intrinsic prior distribution.

On the other hand, Cross-validation Bayes Factor seems to be quite simple because it does not have to choose a prior distribution and does not require integration. In this regard, if we can adapt CVBF to be a Bayesian method and establish a bridge between CVBF and IBF (Actually, it should be GIBF), then we could find a double cure. The first is to circumvent the computational difficulties related to GIBF, and the second is to make CVBF truly Bayesian statistics.

In a word, with the help of GIBF, CVBF may become a useful approximation if one finds a hidden prior distribution and a reasonable training sample size.

Another crucial thing is that, can we use CVBF in the Model Selections without other concerns? The stability and consistency are also considered in this article.

3 Normal Means Problem

Let us begin with the simplest problem to gain insight into the interplay between GIBF and CVBF. We are here analyzing a hypothesis testing with a null hypothesis

, in which is a normal distribution with mean and variance where is known. While the alternative hypothesis is a normal distribution with mean different from and the same variance .

3.1 CVBF in normal means problem

We apply the expression in section 1.1, CVBF becomes

where is the mean of the training samples while is the mean of the validation set.

In general, CVBF can be expressed as

(1)

where

stands for a Chi-squared distribution with 1 degree of freedom,

is a non-central Chi-squared distribution with 1 degree of freedom and non-centrality parameter

The geometric average of CVBFs is

where is the number of simulation, in this case, , which includes all the possibilities of training sample sets, and is a Chi-square distribution with degrees of freedom and is a non-central Chi-square distribution with K degrees of freedom and non-centrality .

Hence, under the alternative model, we have

and the variance of logarithm of CVBF is

3.2 Corrected GIBF in normal means problem

We also apply the formula in section 1.2 to express GIBF in normal means problem. For simplicity, we use a uniform distribution as a non-informative prior distribution, we denote . IBF has been expressed by

where is the mean of training samples and is the mean of the data for split .

Training sample size 1 is a minimal size because it coincides with the number of parameters in the model. The corrected constant for GIBF we computed is . After this correction, we have corrected IBF

where .

Therefore, corrected GIBF can be expressed by

Then, under model 1, the expectation of logarithm of corrected IBF is

and the variance is

We denote here the training sample size as , and yet a GIBF with a minimal training sample size performs quite well in most scenarios, in which some of these will be presented in the following section.

3.3 Bridge Rule and consistency analysis

When the null hypothesis is correct, to let GIBF and CVBF be approximately equivalent, a training sample size of CVBF should be assigned. We propose that a training sample size for CVBF should be

where is the sample size. Under this Rule, we pass the GIBF consistency under the null model to CVBF. It is worth mentioning that the Bridge Rule , at least at the domain

, can be approximated by linear regression. In the Figure 1,

is fitted by a linear equation . Therefore, the bridge rule is approximately a straight line with a slope of 0.152.

Figure 1: Linear Regression

The black line is our bridge rule at the domain , while the red line highlights a linear equation which approximates the bridge rule function.

On the other hand, this raises a problem: Do we obtain consistency for CVBF and GIBF under the alternative?

By equation , under the rule, the rate for CVBF could be changed. Under ,

where has a standard Chi-square distribution.

at a rate , where is a positive constant.

When it comes to corrected GIBF,

at a rate .

On the other hand, we analyze the scenario under ,

at a rate n.

In contrast,

Since term dominates the inequality, at a rate n.

If we ignore the constant terms, such as , we can simply express the equation for expectations of CVBF and corrected IBF as below,

respectively, as n goes to infinity. It is easy to see that at while also at when goes to infinity, which means Under the circumstance of choosing as a training sample size of Cross-Validation Bayes Factors, CVBF and corrected Geometric Intrinsic Bayes Factors achieve consistency under both null hypothesis and alternative assumption.

After illustrating the above, we can conclude that after the Rule, GIBF and CVBF have consistency under both the null model and the alternative model. Furthermore, they tend to the correct model at the same rate of convergence, quite a promising result.

It can be argued that the constant to correct the GIBF may be difficult to compute in complex problems. However, even if we do not calculate the constant exactly on each problem, but use the correction for Normal problems. Still, asymptotically both methods CVBF and GIBF are expected to be consistent at the same rate since the correction factor is just a fixed constant, bounded away both from zero and infinity.

3.4 Performances and simulations

Under the null hypothesis, we generate the data of size 100 by a normal distribution of . On the contrary, under the alternative model, we generate the data of the same size by a normal distribution of

. We analyze type I errors and type II errors under different training sample sizes (from 5 to 95 with spacing 5). At this point, we employ Receiver Operating Characteristic (ROC) to evaluate scores via the Area Under Curve (AUC).

Figure 2: ROC Curve for CVBF

The blue area is the area under curve (AUC) of CVBF, the thresholds here are training samples, which vary from 5 to 95 with spacing 5 when the sample size is 100.

Figure 3: ROC curve for GIBF

The blue area is the area under the curve (AUC) of GIBF, the thresholds here are training samples, which vary from 5 to 95 with spacing 5 when the sample size is 100.

Figure 2 and Figure 3 suggest that the area under the curve of CVBF is 0.7125, while the AUC of GIBF is 0.9960. An area of 1 represents a perfect test, while an area of 0.5 represents a worthless test, which indicates that GIBF is an excellent and better method than CVBF, and one does not need to specify a training sample size. However, CVBF is still an attractive, simple method, although it needs a careful assessment of its training sets. Our proposal that we called a bridge rule seems to be sensible.

Now we vary the sample size but fix the training sample size for GIBF as 1. According to our Rule, the training sample size for CVBF would depend on the training sample size of GIBF. Under the null hypothesis, the data points are generated from a normal distribution of . In another scenario, under the alternative model, the data points are drawn from a normal distribution of . The sample size varies from 5 to 500, with spacing 5.

Figure 4: Consistency under the Null in One-parameter Normal Case

In this figure, the red area is the range from the first quantile to the third quantile on 1000 simulations of the log of GIBF, while the blue area is the range from the first quantile to the third quantile on 1000 simulations of the log of CVBF. Moreover, the white line and the grey line are means on the 1000 simulations of the log of CVBF and GIBF, respectively. The yellow line in the left panel refers to the theoretical result of the expectation for CVBF. By contrast, the green line in the right panel is the expectation of GIBF.

Figure 5: Consistency under the Alternative in One-parameter Normal Case

In this figure which relates to the alternative hypothesis model, the red area is the range from the first quantile to the third quantile on 1000 simulations of the log of GIBF; while the blue area is the range from the first quantile to the third quantile on 1000 simulations of the log of CVBF. Moreover, the white line and the grey line are means on the 1000 simulations of the log of CVBF and GIBF, respectively. The yellow line refers to the theoretical result of the expectation for CVBF. By contrast, the green line is the expectation of GIBF.

From Figure 4 and Figure 5, we can observe that CVBF and GIBF are consistent under the null hypothesis, which happens under the alternative model. Furthermore, the expectations coincide with the simulations. We use GIBF as a guide for choosing the training sets for CVBF and take advantage of the simplicity of CVBF for computing Bayes Factors. The only shortcoming is that we sacrifice the variability. The variance of CVBF is larger than one in GIBF, which we can also conclude from Section 3.1 and Section 3.2 if we vary , and take , , , and when we compute the expectations of variances. Fortunately, as Tukey and McLaughlin (1963) suggested, we can overcome this large variability by trimming the two ends of the ordered sequence of values of Bayes factors in our simulations, which reduces a significant width of the variances for CVBF.

4 Exponential case

The probability density function of an exponential is

, where . The null hypothesis is and the alternative is .

4.1 IBF in exponential

The prior for IBF here we use Jefferys prior . Moreover, for simplicity, the training sample size of IBF equals one, the number of parameters.

Then the Intrinsic Bayes Factor is going to be

where is one data point, is the mean of the rest data points and is a Gamma function.

The correction factor for GIBF is , where is a digamma function at 1. Hence, the log of the corrected IBF is

where is a Gamma function, , ,

are Gamma distributions.

After taking the expectations of each term, the expectation of the log of GIBF is

where is a digamma function at n.

4.2 CVBF in exponential

Cross-validation Bayes Factor in the exponential case is

the log of the CVBF is

where is a Gamma distribution and

is a Beta Distribution of the Second Kind.

Then, we attain the expectation of the log of CVBF.

4.3 Approximations and Bridge Rule

We need to make some approximations. One of properties for digamma function is

where is an Euler-Mascheroni constant, as in Johnson et al. (1970), we have an approximation

And also, for ,

After attaining approximations, we can figure out the Bridge Rule of training sets for CVBF in the exponential case. Without being surprised, under the null hypothesis, CVBF approximates to GIBF when using the bridge rule for a large sample size.

4.4 Consistency

Under the null hypothesis, the expectation of the log of IBF the log of CVBF are

and

respectively.

They both go to at a rate of .

Under the alternative, the expectations of the log-IBF and log-CVBF both go to at a rate of . Based on two situations above, we can conclude that CVBF and GIBF converge for a considerably big under Bridge Rule .

4.5 Simulations in exponential

To make a scenario, we suppose a hypothesis test, which is a null model against an alternative . Fixing the size of data as 100, we vary the to see the tendencies of CVBF and GIBF. We observed from Figure 6 that CVBF perfectly coincides with GIBF with tiny gaps under the Rule. Note that CVBF is just slightly sensitive to the parameter for the reason that, roughly, CVBF is in favor of the null hypothesis at the domain of while GIBF favors the null model at . The interval for selecting the null from CVBF is slightly narrower than the one from GIBF.

Figure 6: The Bridge Rule Works under Different Values of the Parameter

In the figure, we generate the data of the size of 100. We Calculate the log of CVBF and the log of GIBF with 100 replicas when varying the parameter values. The red line and black line refer to means of all computed log-CVBF and log-GIBF, respectively.

5 Two-parameter in normal means problem

With working out the one-parameter case, here we would like to look into a two-parameter normal means problem to check the consistency.

Suppose we have a hypothesis testing that the first half of data points have mean and second half of data points have mean . The system can be expressed by

where , and is known.

The hypothesis testing is

The priors for and are both identical uniform distributions. In this regard, the corrected IBF is going to be

where , ,…, are non-central Chi-squared distributions with 1 degree of freedom and non-central parameters and , respectively. is the training sample size, here .

CVBF is going to be

where and are standard Chi-squared distributions and , , are non-central Chi-squared distributions with 1 degree of freedom and non-central parameter , and , respectively.

The expectation of the log of corrected IBF (m should be set as 2) is going to be

The expectation of the log of CVBF is

when we apply the bridge rule, it becomes

We introduce a updated rule for training sample sizes of CVBF, which is

where is the number of parameters. In this case, the rule is , which forces IBF and CVBF to be equivalent in expectations when the null hypothesis model is valid. IBF and CVBF are both going to at a rate of when goes to under the null model; they are going to at a rate of when goes to under the alternative. Hence, we have verified that they are consistent in this setting.

6 Unknown-variance in normal means

We here analyze a hypothesis testing with a null hypothesis following a normal distribution and the alternative following a normal distribution , where is unknown and different from , and is unknown. Notice that this testing is quite different from Section 3.1. For convenience and simplicity, we use as a prior distribution to the null model and , that is, a modified Jefferys prior in Berger and Pericchi (1996b), as a prior distribution to an alternative model for computing GIBF.

6.1 Expressions of GIBF and CVBF

In this case, we use the number of parameters as a training sample size, which is 2, the number of parameters. Hence, IBF is

where and are two data points, which are training samples.

Furthermore, after taking geometric mean, GIBF is going to be

The expression of log of IBF is

where , , and are Chi-squared distributions with degree of freedom and non-centrality , degree of freedom and non-centrality , degree of freedom and non-centrality , and degree of freedom and non-centrality , respectively.

The expectation can only be evaluated as an infinite series (see Berger and Pericchi (1996a)), but numerical solutions are straightforward. As Berger and Pericchi (1996b) claimed, one can simulate the expectation with parameters using the MLE of the original data.

Here we do not correct IBF since the expectation is an infinite series. Fortunately, the corrected factor is just a constant; in this regard, we can analyze the consistency with or without the correction because of the minimal error.

Similarly, CVBF will be encountering difficulties. As other cases, one uses MLE to estimate parameters. In this case, there are two parameters, which are the mean and the variance. It is simply to compute the MLEs, and .

After partitioning data into a training set and a validation set, CVBF becomes

where is the mean of the training set.

Then the log of CVBF is

where

is a random variable with a Chi-squared distribution.

However, the expectation will be an infinite series, which includes a confluent hypergeometric function of the first kind.

Luckily, we can still work out the expectations of IBF and CVBF under the null hypothesis model. When the null is true, the expectation of log of IBF becomes

and the expectation of log of CVBF becomes

in which we use the rule so that they have consistency when is large. In the next section, we will be analyzing the consistency under the alternative by simulations.

6.2 Simulations

Since the n