The one-sample hypothesis testing is a primary topic in any introductory statistics course. It involves the selection of a reference value for the (unknown) population mean . More specifically, let be an independent random sample taken from , where is the population variance. The interest is to test the hypothesis , where is a given real number. Within the classical frequentist frame work, if is known, then the -test is commonly used for testing against the two-sided alternative
. The test statistics in this case is
where is the sample mean. For a significant level , the critical value is defined to be the quantile of the standard normal distribution. Also, the -value is equal to , where has the standard normal distribution. Then, is rejected if or the -value less than . On the other hand, if is unknown, then the test statistic is
is the sample standard deviation. For a test with significant level, let be the quantile of the distribution with degrees of freedom. The two sided -value is equal to , where has the t-distribution with degrees of freedom. Similar to the -test, is rejected if or the -value is less than .
While the above approach for hypothesis testing is well-known and stable, it is difficult to find an alternative Bayesian counterpart in the literature. An exception includes the work of Rouder, Speckman, Sun, and Morey (2009) who proposed a Bayesian test, where
is unknown, using the Bayes factor (ratio of the marginal densities of the two models; Kass and Raftery, 1995). They placed the Jeffreys prior forand the Cauchy prior on . They provided a web-based program (c.f. pcl.missouri.edu) in order to facilitate the use of their test. Remarkably, the authors mentioned detailed criticisms of using the -values in hypothesis testing. For example, they indicated that the -values do not allow researchers to state evidence for the null hypothesis. They also overstate the evidence against the null hypothesis. Although the -value converges to zero as the sample size increases when the null hypothesis is false which is a desirable feature, the
-values are all equally likely and uniformly distributed between 0 and 1 when null is true. This distribution holds regardless of the sample size which means that increasing the sample size in this case will not help gaining evidence for the null hypothesis. In fact, this reflects Fisher’s sight that the null hypothesis can only be rejected and never accepted. Other relevant work, but in the two-sample problem set up, includes Gönen, Johnson, Lu and Westfall (2005) and Wang and Lui (2016). For more recent articles about the limitations of using-values in hypotheses testing, we refer the reader to Evans (2015), Wasserstein and Lazar (2016), and references therein.
Unlike the previous work, the hyperparameters of the prior in the new approached Bayesian are elicited and tested against prior-data conflict and against being biased. For this, two elicitation algorithms developed by Evans (2015, 2018) are considered. In fact, the success of any Bayesian approach depends significantly on a proper selection of the hyperparameters of the prior. Part of the elicitation process involves checking the elicited prior for the prior-data conflict and the bias (see Section 2). Then the concentration of the distribution of the Kullbak-Leibler divergence between the prior and the model of interest is compared to that between the posterior and the model. If the posterior is more concentrated about the hypothesized distribution than the prior, then this is evidence in favor of the null hypothesis and if the posterior is less concentrated then this is evidence against the null hypothesis. This comparison is made via a relative belief ratio, which measures the evidence in the observed data for or against the null. A measure of the strength of this evidence is also provided. So, the methodology is based on a direct measure of statistical evidence. We point out that, relative belief ratios have been recently used in problems that involve goodness of fit test and model checking. See, for example, Al-Labadi (2018), Al-Labadi and Evans (2018) and Al-Labadi, Zeynep and Evans (2017, 2018) and Evans and Tomal (2018).
The proposed method brings many advantages to the problem of hypothesis testing. Besides its simplicity, and unlike the classical approach, the new approach possesses attractive and desirable features such as giving evidence in favor of the null hypothesis. Also, checking the prior for bias and prior-data conflict permits avoid several undesirable paradoxes, such as Lindley’s paradox that may be encountered by the standard Bayesian methods that are based, for instance, on the Bayes factor (Evans, 2015).
The remainder of this paper is organized as follows. A general discussion about the relative belief ratio is given in Section 2. The definition and some fundamental properties of the Dirichlet process are presented in Section 3. In Section 4, an explicit expression to compute Anderson-Darling distance between the Dirichlet process and its base measure is derived. In Section 5, a Bayesian nonparametric test for assessing multivariate normality is discussed and some of its relevant properties are developed. A computational algorithm to calculate the relative belief ratio for the implementation of the proposed test is developed in Section 6. In Section 7, the performance of the proposed test is established via four simulated examples and two real data sets. Finally, some concluding remarks are given in Section 8. All technical proofs are included in the supplementary material.
2 Inferences Using Relative Belief
Suppose we have a statistical model that is given by the density function (with respect to some measure), where is an unknown parameter that belongs to the parameter space . Let be the prior distribution of . After observing the data
, by Bayes’ theorem, the posterior distribution ofis given by the density
is the prior predictive density of the data.
Suppose that the interest is to make inference about an arbitrary parameter . Let denote the prior measure of with density . Let the corresponding posterior measure and density of be and respectively. The relative belief ratio for a hypothesized value of is defined by , where is a sequence of neighbourhoods of converging nicely (see, for example, Rudin (1974)) to as When and are continuous at
is the ratio of the posterior density to the prior density at That is, is measuring how beliefs have changed that is the true value from a priori to a posteriori. Baskurt and Evans (2013) proved that
where is a minimal sufficient statistic of the model and is the prior predictive density of . The previous authors referred to (1
) as the Savage-Dickey ratio. It is to be noted that a relative belief ratio is similar to a Bayes factor (Kass and Raftery, 1995), as both are measures of evidence, but the latter measures it via the change in an odds ratio. A discussion about the relationship between relative belief ratios and Bayes factors is detailed in (Baskurt and Evans, 2013). More specifically, when a Bayes factor is defined via a limit in the continuous case, the limiting value is the corresponding relative belief ratio.
By a basic principle of evidence,
means that the data led to an increase in the probability thatis correct, and so there is evidence in favour of while means that the data led to a decrease in the probability that is correct, and so there is evidence against . Clearly, when , then there is no evidence either way.
It is also important to calibrate whether this is strong or weak evidence for or against . As suggested in Evans (2015), a useful calibration of is obtained by computing the tail probability
One way to view (2
) is as the posterior probability that the true value ofhas a relative belief ratio no greater than that of the hypothesized value When there is evidence against then a small value for (2) indicates a large posterior probability that the true value has a relative belief ratio greater than and so there is strong evidence against When there is evidence in favour of then a large value for (2) indicates a small posterior probability that the true value has a relative belief ratio greater than . Therefore, there is strong evidence in favour of while a small value of (2) only indicates weak evidence in favour of
One of the key concerns with Bayesian inference methods is that the prior can bias the analysis. Following Evans (2015), letdenote the conditional prior predictive distribution of the data given that , so
is the conditional prior probability that the data is in the set. The bias against can be measured by computing
and this is the prior probability that evidence will be obtained against when it is true. If the bias against is large, subsequently reporting, after seeing the data, then there is evidence against is not convincing.On the other hand, the bias in favor of is given by
for values such that the difference between and represents the smallest difference of practical importance; note that this tends to decrease as moves farther away from . When the bias in favor is large, subsequently reporting, after seeing the data, then the is evidence in favor of is not convincing.
Another concern regarding priors is to measure the compatibility between the prior and the data. A chosen prior may be incorrect by being strongly contradicted by the data (Evans, 2015). A possible contradiction between the data and the prior is referred to as a prior-data conflict. In principle, if the prior primarily places its mass in a region of the parameter space where the data suggest the true value does not lie, then there is a prior-data conflict (Evans and Moshonov, 2006). That is, prior-data conflict will occur whenever there is only a tiny overlap between the effective support regions of the model and the prior. In such situation, we must be concerned about what the effect of the prior is on the analysis (Evans, 2015). Methods for checking the prior in previous sense are developed in Evans and Moshonov (2006). See also Nott, Xueou, Evans, and Engler (2016) and Nott, Seah, AL-Labadi, Evans, Ng and Englert (2019). The basic method for checking the prior involves computing the probability
where is a minimal sufficient statistic of the model and is the prior predictive probability measure of with density . The value of (5) simply serves to locate the observed value in its prior distribution. If (5) is small, then lies in a region of low prior probability, such as a tail or anti-mode, which indicates a conflict. The consistency of this check follows from Evans and Jang (2011) where it is proven that, under quite general conditions, (5) converges to
as the amount of data increases, where is the true value of the parameter. If (6) is small, then lies in a region of low prior probability which implies that the prior is not appropriate.
3 A Bayesian Alternative to the One-Sample Test
3.1 The Approach
Let be an independent random sample from , where is known. The goal is to test the hypothesis , where is a given real number. The approach here is Bayesian. First we construct a prior on . Let be , where and are known hyperparameters and selected through the elicitation algorithms covered in Section 3.2. Thus, the posterior distribution of given is , where
To proceed for the test using the relative belief ratio, there are two possible approaches. The first one is based on a direct computation of the relative belief ratio and its strength. This approach has been initiated in Baskurt and Evans (2013) with and when discussing the Jeffrey-Lindely paradox. To find , notice that
The minimal sufficient statistics for is . Since , where independent of , it follows the prior predictive distribution of is . That is,
For the strength, we have
where and are defined in (7). After minor simplification we have,
Similar to the conclusion in Baskurt and Evans (2013), as in (9), , which converges in distribution to when
, by the central limit theorem and the continuous mapping theorem, where
is the standard normal random variable. Hence, when(i.e. is not rejected), the strength has an asymptotically uniform distribution on . On the other hand, we have converges to 0 almost surely (a.s.) when , since almost surely.
As for the second approach, we compute the KL distance between the hypothesized distribution and the prior/posterior distributions. The change of the distance from a priori to a posteriori is compared through the relative belief ratio. Then, we give a brief summary about the KL distance. In general, the KL distance (sometimes called the entropy distance
) between two continuous cumulative distribution functions (cdf’s)and
with corresponding probability density functions (pdf’s)and (with respect to Lebesgue measure) is defined by
It is well-known that and the equality holds if and only if . However, it is not symmetric and does not satisfy the triangle inequality (Cover and Thomas, 1991). In particular, the KL divergence between the two normal distributions and is given by (Duchi, 2014)
Set and . It follows that from (10) that
If , let
On the other hand, , if as defined in (7), let
Note that, as
, by the strong law of large numbers,, where is the true value of . Thus, by (12), if is true, we have . On the other hand, if is not true, then
What follows is that, if is true, then that distribution of should be more concentrated about than . So, the proposed test includes a comparison of the concentrations of the prior and posterior distributions of the KL divergence via a relative belief ratio based on the interpretation as discussed inSection 2.
3.2 Elicitation of the Prior
The success of methodology is influenced significantly by the choice of the hyperparameters and . Inappropriate values of the hyperparameters can lead to a failure in computing . To elicit proper values of the hyperparameters, we consider the method developed in Evans and Tomal (2018). Suppose that it is known with virtual certainty, based on the knowledge of the basic measurement being taken, that will lie in the interval for specified values . Here, virtual certainty is interpreted as , where is a large probability like 0.999. If , then after some simple algebra, .
3.3 Checking for Prior-Data Conflict
As pointed in Section 3.1, the minimal sufficient statistics for is with the prior predictive distribution of is . Thus,
where is defined as in (5). Recall that, if (14) is small, then this indicates a prior-data conflict and no prior-data conflict otherwise. It is true that prior-data conflict can be avoided by increasing (i.e. making the prior diffuse), however, as pointed in Evans (2018), this is not an appropriate approach as it will induce bias into the analysis. Thus, by (14), when lies in the tail of its prior distribution, we have a prior-data conflict. Note that, as .
3.4 Checking for Bias
The bias against the hypothesis is measured by computing (3) with and as in (8). Note that, since the prior is centered at , there is never a strong bias against . On the other hand, the bias in favor of the hypothesis is measured by computing (4) with and as defined in (8). The interpretation of the bias was covered in Section 2.
3.5 The Algorithm
The approach will involve a comparison between the concentrations of the prior and posterior distribution of the KL divergence via a relative belief ratio, with the interpretation as discussed in Section 2. Since explicit forms of the densities of the distance are not available, the relative belief ratios need to be estimated via simulation. The following summarizes a computational algorithm for testing.
Algorithm A (New Test)
Elicit the hyperparameters and as described in Section 3.2.
Generate from .
Compute the KL distance between and as described in (11). Denote this distance by .
Repeat steps (ii) and (iii) to obtain a sample of values of .
Generate from , where and are defined in (7).
Compute the KL distance between and as described in (12). Denote this distance by .
Repeat steps (v) and (vi) to obtain a sample of values of .
Compute the relative belief ratio and the strength as follows:
Closed forms of and are not available. Thus, the relative brief ration and the strength need to be estimated via approximation. Let be a positive number. Let denote the empirical cdf of based on the prior sample in (3) and for let be the estimate of the -the prior quantile of Here , and is the largest value of . Let denote the empirical cdf of based on the posterior sample in (vi). For , estimate by
the ratio of the estimates of the posterior and prior contents of Thus, we estimate by where and are chosen so that is not too small (typically .
Estimate the strength by the finite sum
The following proposition establishes the consistency of the approach as the sample size increases. So, the procedure performs correctly as the sample size increases when is true. The proof follows immediately from Evans (2015), Section 4.7.1. See also AL-Labadi and Evans (2018) for a similar result.
Consider the discretization
. As (i) if is true, then
and (ii) if is false and , then and
4 A Bayesian Alternative to the One-Sample t-Test
4.1 The Approach
In this section, we assume that is an independent random sample from , where is unknown. The goal is to test , where is a given real number. The first step in the approach is to construct priors on and
. We will consider the following hierarchical but conjugate prior (Evans 2015, p.171):
where , and are hyperparameters to be specified via elicitation as it will be described in Section 4.2. The posterior distribution of is given by:
with To find , notice that the minimal sufficient statistic for is with independent of . The joint prior predictive of is given by (Evan, 2015):
where is defined in (21). On the other hand, it can be shown that
For the strength we have,
4.2 Elicitation of the prior
To elicit the prior, we consider the approach developed by Evan (2015, p.171). Suppose that it is known with virtual certainty (probability = 0.999) that for specified values . This is chosen to be as short as possible, based on the knowledge of the basic measurements being taken and without being unrealistic. We set (i.e, mid-point). With this choice, one hyper-parameter has been specified. It follows that
This implies that
An interval that contains virtually all the actual data measurements is given by . Since this interval cannot be unrealistically too short or too long, we let and be the upper and lower bounds on the half-length of the interval so that
which determine the conditional prior for . Note that can be made bigger by choosing a bigger value of .
Lastly, to obtain relevant values of and , let denotes the CDF of distribution. From(26),
Now, suppose we want to determine the lower and upper bounds in (27), so that this interval contains with virtual certainty. Thus,
4.3 Checking for Prior-data Conflict
To assess whether is a reasonable value, we compute:
from the joint distribution givenand evaluate using (22). Repeating this many times and recording the proportion of values of that are less than or equal to gives a Monte Carlo estimate of (30).
4.4 Checking for Bias
4.5 The Algorithm
The following algorithm outlines the KL approach described in Section 4 to test