1 Introduction
The onesample hypothesis testing is a primary topic in any introductory statistics course. It involves the selection of a reference value for the (unknown) population mean . More specifically, let be an independent random sample taken from , where is the population variance. The interest is to test the hypothesis , where is a given real number. Within the classical frequentist frame work, if is known, then the test is commonly used for testing against the twosided alternative
. The test statistics in this case is
where is the sample mean. For a significant level , the critical value is defined to be the quantile of the standard normal distribution. Also, the value is equal to , where has the standard normal distribution. Then, is rejected if or the value less than . On the other hand, if is unknown, then the test statistic is
where
is the sample standard deviation. For a test with significant level
, let be the quantile of the distribution with degrees of freedom. The two sided value is equal to , where has the tdistribution with degrees of freedom. Similar to the test, is rejected if or the value is less than .While the above approach for hypothesis testing is wellknown and stable, it is difficult to find an alternative Bayesian counterpart in the literature. An exception includes the work of Rouder, Speckman, Sun, and Morey (2009) who proposed a Bayesian test, where
is unknown, using the Bayes factor (ratio of the marginal densities of the two models; Kass and Raftery, 1995). They placed the Jeffreys prior for
and the Cauchy prior on . They provided a webbased program (c.f. pcl.missouri.edu) in order to facilitate the use of their test. Remarkably, the authors mentioned detailed criticisms of using the values in hypothesis testing. For example, they indicated that the values do not allow researchers to state evidence for the null hypothesis. They also overstate the evidence against the null hypothesis. Although the value converges to zero as the sample size increases when the null hypothesis is false which is a desirable feature, thevalues are all equally likely and uniformly distributed between 0 and 1 when null is true. This distribution holds regardless of the sample size which means that increasing the sample size in this case will not help gaining evidence for the null hypothesis. In fact, this reflects Fisher’s sight that the null hypothesis can only be rejected and never accepted. Other relevant work, but in the twosample problem set up, includes Gönen, Johnson, Lu and Westfall (2005) and Wang and Lui (2016). For more recent articles about the limitations of using
values in hypotheses testing, we refer the reader to Evans (2015), Wasserstein and Lazar (2016), and references therein.Unlike the previous work, the hyperparameters of the prior in the new approached Bayesian are elicited and tested against priordata conflict and against being biased. For this, two elicitation algorithms developed by Evans (2015, 2018) are considered. In fact, the success of any Bayesian approach depends significantly on a proper selection of the hyperparameters of the prior. Part of the elicitation process involves checking the elicited prior for the priordata conflict and the bias (see Section 2). Then the concentration of the distribution of the KullbakLeibler divergence between the prior and the model of interest is compared to that between the posterior and the model. If the posterior is more concentrated about the hypothesized distribution than the prior, then this is evidence in favor of the null hypothesis and if the posterior is less concentrated then this is evidence against the null hypothesis. This comparison is made via a relative belief ratio, which measures the evidence in the observed data for or against the null. A measure of the strength of this evidence is also provided. So, the methodology is based on a direct measure of statistical evidence. We point out that, relative belief ratios have been recently used in problems that involve goodness of fit test and model checking. See, for example, AlLabadi (2018), AlLabadi and Evans (2018) and AlLabadi, Zeynep and Evans (2017, 2018) and Evans and Tomal (2018).
The proposed method brings many advantages to the problem of hypothesis testing. Besides its simplicity, and unlike the classical approach, the new approach possesses attractive and desirable features such as giving evidence in favor of the null hypothesis. Also, checking the prior for bias and priordata conflict permits avoid several undesirable paradoxes, such as Lindley’s paradox that may be encountered by the standard Bayesian methods that are based, for instance, on the Bayes factor (Evans, 2015).
The remainder of this paper is organized as follows. A general discussion about the relative belief ratio is given in Section 2. The definition and some fundamental properties of the Dirichlet process are presented in Section 3. In Section 4, an explicit expression to compute AndersonDarling distance between the Dirichlet process and its base measure is derived. In Section 5, a Bayesian nonparametric test for assessing multivariate normality is discussed and some of its relevant properties are developed. A computational algorithm to calculate the relative belief ratio for the implementation of the proposed test is developed in Section 6. In Section 7, the performance of the proposed test is established via four simulated examples and two real data sets. Finally, some concluding remarks are given in Section 8. All technical proofs are included in the supplementary material.
2 Inferences Using Relative Belief
Suppose we have a statistical model that is given by the density function (with respect to some measure), where is an unknown parameter that belongs to the parameter space . Let be the prior distribution of . After observing the data
, by Bayes’ theorem, the posterior distribution of
is given by the densitywhere
is the prior predictive density of the data.
Suppose that the interest is to make inference about an arbitrary parameter . Let denote the prior measure of with density . Let the corresponding posterior measure and density of be and respectively. The relative belief ratio for a hypothesized value of is defined by , where is a sequence of neighbourhoods of converging nicely (see, for example, Rudin (1974)) to as When and are continuous at
is the ratio of the posterior density to the prior density at That is, is measuring how beliefs have changed that is the true value from a priori to a posteriori. Baskurt and Evans (2013) proved that
(1) 
where is a minimal sufficient statistic of the model and is the prior predictive density of . The previous authors referred to (1
) as the SavageDickey ratio. It is to be noted that a relative belief ratio is similar to a Bayes factor (Kass and Raftery, 1995), as both are measures of evidence, but the latter measures it via the change in an odds ratio. A discussion about the relationship between relative belief ratios and Bayes factors is detailed in (Baskurt and Evans, 2013). More specifically, when a Bayes factor is defined via a limit in the continuous case, the limiting value is the corresponding relative belief ratio.
By a basic principle of evidence,
means that the data led to an increase in the probability that
is correct, and so there is evidence in favour of while means that the data led to a decrease in the probability that is correct, and so there is evidence against . Clearly, when , then there is no evidence either way.It is also important to calibrate whether this is strong or weak evidence for or against . As suggested in Evans (2015), a useful calibration of is obtained by computing the tail probability
(2) 
One way to view (2
) is as the posterior probability that the true value of
has a relative belief ratio no greater than that of the hypothesized value When there is evidence against then a small value for (2) indicates a large posterior probability that the true value has a relative belief ratio greater than and so there is strong evidence against When there is evidence in favour of then a large value for (2) indicates a small posterior probability that the true value has a relative belief ratio greater than . Therefore, there is strong evidence in favour of while a small value of (2) only indicates weak evidence in favour ofOne of the key concerns with Bayesian inference methods is that the prior can bias the analysis. Following Evans (2015), let
denote the conditional prior predictive distribution of the data given that , sois the conditional prior probability that the data is in the set
. The bias against can be measured by computing(3) 
and this is the prior probability that evidence will be obtained against when it is true. If the bias against is large, subsequently reporting, after seeing the data, then there is evidence against is not convincing.On the other hand, the bias in favor of is given by
(4) 
for values such that the difference between and represents the smallest difference of practical importance; note that this tends to decrease as moves farther away from . When the bias in favor is large, subsequently reporting, after seeing the data, then the is evidence in favor of is not convincing.
Another concern regarding priors is to measure the compatibility between the prior and the data. A chosen prior may be incorrect by being strongly contradicted by the data (Evans, 2015). A possible contradiction between the data and the prior is referred to as a priordata conflict. In principle, if the prior primarily places its mass in a region of the parameter space where the data suggest the true value does not lie, then there is a priordata conflict (Evans and Moshonov, 2006). That is, priordata conflict will occur whenever there is only a tiny overlap between the effective support regions of the model and the prior. In such situation, we must be concerned about what the effect of the prior is on the analysis (Evans, 2015). Methods for checking the prior in previous sense are developed in Evans and Moshonov (2006). See also Nott, Xueou, Evans, and Engler (2016) and Nott, Seah, ALLabadi, Evans, Ng and Englert (2019). The basic method for checking the prior involves computing the probability
(5) 
where is a minimal sufficient statistic of the model and is the prior predictive probability measure of with density . The value of (5) simply serves to locate the observed value in its prior distribution. If (5) is small, then lies in a region of low prior probability, such as a tail or antimode, which indicates a conflict. The consistency of this check follows from Evans and Jang (2011) where it is proven that, under quite general conditions, (5) converges to
(6) 
as the amount of data increases, where is the true value of the parameter. If (6) is small, then lies in a region of low prior probability which implies that the prior is not appropriate.
3 A Bayesian Alternative to the OneSample Test
3.1 The Approach
Let be an independent random sample from , where is known. The goal is to test the hypothesis , where is a given real number. The approach here is Bayesian. First we construct a prior on . Let be , where and are known hyperparameters and selected through the elicitation algorithms covered in Section 3.2. Thus, the posterior distribution of given is , where
(7) 
To proceed for the test using the relative belief ratio, there are two possible approaches. The first one is based on a direct computation of the relative belief ratio and its strength. This approach has been initiated in Baskurt and Evans (2013) with and when discussing the JeffreyLindely paradox. To find , notice that
The minimal sufficient statistics for is . Since , where independent of , it follows the prior predictive distribution of is . That is,
Thus,
(8) 
For the strength, we have
where and are defined in (7). After minor simplification we have,
(9)  
Similar to the conclusion in Baskurt and Evans (2013), as in (9), , which converges in distribution to when
, by the central limit theorem and the continuous mapping theorem, where
is the standard normal random variable. Hence, when
(i.e. is not rejected), the strength has an asymptotically uniform distribution on . On the other hand, we have converges to 0 almost surely (a.s.) when , since almost surely.As for the second approach, we compute the KL distance between the hypothesized distribution and the prior/posterior distributions. The change of the distance from a priori to a posteriori is compared through the relative belief ratio. Then, we give a brief summary about the KL distance. In general, the KL distance (sometimes called the entropy distance
) between two continuous cumulative distribution functions (cdf’s)
andwith corresponding probability density functions (pdf’s)
and (with respect to Lebesgue measure) is defined byIt is wellknown that and the equality holds if and only if . However, it is not symmetric and does not satisfy the triangle inequality (Cover and Thomas, 1991). In particular, the KL divergence between the two normal distributions and is given by (Duchi, 2014)
(10) 
Set and . It follows that from (10) that
If , let
(11) 
On the other hand, , if as defined in (7), let
(12) 
Note that, as
, by the strong law of large numbers,
, where is the true value of . Thus, by (12), if is true, we have . On the other hand, if is not true, then(13) 
What follows is that, if is true, then that distribution of should be more concentrated about than . So, the proposed test includes a comparison of the concentrations of the prior and posterior distributions of the KL divergence via a relative belief ratio based on the interpretation as discussed inSection 2.
3.2 Elicitation of the Prior
The success of methodology is influenced significantly by the choice of the hyperparameters and . Inappropriate values of the hyperparameters can lead to a failure in computing . To elicit proper values of the hyperparameters, we consider the method developed in Evans and Tomal (2018). Suppose that it is known with virtual certainty, based on the knowledge of the basic measurement being taken, that will lie in the interval for specified values . Here, virtual certainty is interpreted as , where is a large probability like 0.999. If , then after some simple algebra, .
3.3 Checking for PriorData Conflict
As pointed in Section 3.1, the minimal sufficient statistics for is with the prior predictive distribution of is . Thus,
(14) 
where is defined as in (5). Recall that, if (14) is small, then this indicates a priordata conflict and no priordata conflict otherwise. It is true that priordata conflict can be avoided by increasing (i.e. making the prior diffuse), however, as pointed in Evans (2018), this is not an appropriate approach as it will induce bias into the analysis. Thus, by (14), when lies in the tail of its prior distribution, we have a priordata conflict. Note that, as .
3.4 Checking for Bias
The bias against the hypothesis is measured by computing (3) with and as in (8). Note that, since the prior is centered at , there is never a strong bias against . On the other hand, the bias in favor of the hypothesis is measured by computing (4) with and as defined in (8). The interpretation of the bias was covered in Section 2.
3.5 The Algorithm
The approach will involve a comparison between the concentrations of the prior and posterior distribution of the KL divergence via a relative belief ratio, with the interpretation as discussed in Section 2. Since explicit forms of the densities of the distance are not available, the relative belief ratios need to be estimated via simulation. The following summarizes a computational algorithm for testing
.Algorithm A (New Test)

[label=()]

Elicit the hyperparameters and as described in Section 3.2.

Generate from .

Compute the KL distance between and as described in (11). Denote this distance by .

Repeat steps (ii) and (iii) to obtain a sample of values of .

Generate from , where and are defined in (7).

Compute the KL distance between and as described in (12). Denote this distance by .

Repeat steps (v) and (vi) to obtain a sample of values of .

Compute the relative belief ratio and the strength as follows:

Closed forms of and are not available. Thus, the relative brief ration and the strength need to be estimated via approximation. Let be a positive number. Let denote the empirical cdf of based on the prior sample in (3) and for let be the estimate of the the prior quantile of Here , and is the largest value of . Let denote the empirical cdf of based on the posterior sample in (vi). For , estimate by
(15) the ratio of the estimates of the posterior and prior contents of Thus, we estimate by where and are chosen so that is not too small (typically .

Estimate the strength by the finite sum
(16)

The following proposition establishes the consistency of the approach as the sample size increases. So, the procedure performs correctly as the sample size increases when is true. The proof follows immediately from Evans (2015), Section 4.7.1. See also ALLabadi and Evans (2018) for a similar result.
Proposition 1
Consider the discretization
. As
(i) if is true, then
and (ii) if is false and , then and
4 A Bayesian Alternative to the OneSample tTest
4.1 The Approach
In this section, we assume that is an independent random sample from , where is unknown. The goal is to test , where is a given real number. The first step in the approach is to construct priors on and
. We will consider the following hierarchical but conjugate prior (Evans 2015, p.171):
(17)  
(18) 
where , and are hyperparameters to be specified via elicitation as it will be described in Section 4.2. The posterior distribution of is given by:
(19)  
(20) 
where
(21) 
with To find , notice that the minimal sufficient statistic for is with independent of . The joint prior predictive of is given by (Evan, 2015):
(22) 
where is defined in (21). On the other hand, it can be shown that
Thus,
(23) 
For the strength we have,
where and are defined in (19) and (20), respectively. After some algebra, we reach the conclusion that coincides with (9), but here is random as defined in (17).
4.2 Elicitation of the prior
To elicit the prior, we consider the approach developed by Evan (2015, p.171). Suppose that it is known with virtual certainty (probability = 0.999) that for specified values . This is chosen to be as short as possible, based on the knowledge of the basic measurements being taken and without being unrealistic. We set (i.e, midpoint). With this choice, one hyperparameter has been specified. It follows that
This implies that
(25) 
An interval that contains virtually all the actual data measurements is given by . Since this interval cannot be unrealistically too short or too long, we let and be the upper and lower bounds on the halflength of the interval so that
That is,
(26) 
Now, from (25) and (26), we have:
which determine the conditional prior for . Note that can be made bigger by choosing a bigger value of .
4.3 Checking for Priordata Conflict
To assess whether is a reasonable value, we compute:
(30) 
where , and are as defined in Section 4.1. Clearly, computing (30) should be done by simulation. Thus, for specified values of , we generate as given in (17) and (18). Then generate
from the joint distribution given
and evaluate using (22). Repeating this many times and recording the proportion of values of that are less than or equal to gives a Monte Carlo estimate of (30).4.4 Checking for Bias
4.5 The Algorithm
The following algorithm outlines the KL approach described in Section 4 to test
Comments
There are no comments yet.