# On one-sample Bayesian tests for the mean

This paper deals with a new Bayesian approach to the standard one-sample z- and t- tests. More specifically, let x_1,...,x_n be an independent random sample from a normal distribution with mean μ and variance σ^2. The goal is to test the null hypothesis H_0: μ=μ_1 against all possible alternatives. The approach is based on using the well-known formula of the Kullbak-Leibler divergence between two normal distributions (sampling and hypothesized distributions selected in an appropriate way). The change of the distance from a priori to a posteriori is compared through the relative belief ratio (a measure of evidence). Eliciting the prior, checking for prior-data conflict and bias are also considered. Many theoretical properties of the procedure have been developed. Besides it's simplicity, and unlike the classical approach, the new approach possesses attractive and distinctive features such as giving evidence in favor of the null hypothesis. It also avoids several undesirable paradoxes, such as Lindley's paradox that may be encountered by some existing Bayesian methods. The use of the approach has been illustrated through several examples.

## Authors

• 1 publication
• 14 publications
• ### The Two-Sample Problem Via Relative Belief Ratio

This paper deals with a new Bayesian approach to the two-sample problem....
05/17/2018 ∙ by Luai Al Labadi, et al. ∙ 0

• ### A probabilistic assessment of the Indo-Aryan Inner-Outer Hypothesis

This paper uses a novel data-driven probabilistic approach to address th...
11/29/2019 ∙ by Chundra A. Cathcart, et al. ∙ 0

• ### A new Bayesian discrepancy measure

A Bayesian Discrepancy Test (BDT) is proposed to evaluate the distance o...
05/28/2021 ∙ by Francesco Bertolino, et al. ∙ 0

• ### Latent likelihood ratio tests for assessing spatial kernels in epidemic models

One of the most important issues in the critical assessment of spatio-te...
11/05/2019 ∙ by David Thong, et al. ∙ 0

• ### Bayesian tests of symmetry for the generalized von Mises distribution

Bayesian tests on the symmetry of the generalized von Mises model for pl...
05/03/2021 ∙ by Sara Salvador, et al. ∙ 0

• ### Test Martingales for bounded random variables

Test martingales have been proposed as a more intuitive approach to hypo...
01/29/2018 ∙ by Harrie Hendriks, et al. ∙ 0

• ### Revisiting Classifier Two-Sample Tests

The goal of two-sample tests is to assess whether two samples, S_P ∼ P^n...
10/20/2016 ∙ by David Lopez-Paz, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The one-sample hypothesis testing is a primary topic in any introductory statistics course. It involves the selection of a reference value for the (unknown) population mean . More specifically, let be an independent random sample taken from , where is the population variance. The interest is to test the hypothesis , where is a given real number. Within the classical frequentist frame work, if is known, then the -test is commonly used for testing against the two-sided alternative

. The test statistics in this case is

 z=¯x−μ1σ/√n,

where is the sample mean. For a significant level , the critical value is defined to be the quantile of the standard normal distribution. Also, the -value is equal to , where has the standard normal distribution. Then, is rejected if or the -value less than . On the other hand, if is unknown, then the test statistic is

 t=¯x−μ1s/√n,

where

is the sample standard deviation. For a test with significant level

, let be the quantile of the distribution with degrees of freedom. The two sided -value is equal to , where has the t-distribution with degrees of freedom. Similar to the -test, is rejected if or the -value is less than .

While the above approach for hypothesis testing is well-known and stable, it is difficult to find an alternative Bayesian counterpart in the literature. An exception includes the work of Rouder, Speckman, Sun, and Morey (2009) who proposed a Bayesian test, where

is unknown, using the Bayes factor (ratio of the marginal densities of the two models; Kass and Raftery, 1995). They placed the Jeffreys prior for

and the Cauchy prior on . They provided a web-based program (c.f. pcl.missouri.edu) in order to facilitate the use of their test. Remarkably, the authors mentioned detailed criticisms of using the -values in hypothesis testing. For example, they indicated that the -values do not allow researchers to state evidence for the null hypothesis. They also overstate the evidence against the null hypothesis. Although the -value converges to zero as the sample size increases when the null hypothesis is false which is a desirable feature, the

-values are all equally likely and uniformly distributed between 0 and 1 when null is true. This distribution holds regardless of the sample size which means that increasing the sample size in this case will not help gaining evidence for the null hypothesis. In fact, this reflects Fisher’s sight that the null hypothesis can only be rejected and never accepted. Other relevant work, but in the two-sample problem set up, includes Gönen, Johnson, Lu and Westfall (2005) and Wang and Lui (2016). For more recent articles about the limitations of using

-values in hypotheses testing, we refer the reader to Evans (2015), Wasserstein and Lazar (2016), and references therein.

Unlike the previous work, the hyperparameters of the prior in the new approached Bayesian are elicited and tested against prior-data conflict and against being biased. For this, two elicitation algorithms developed by Evans (2015, 2018) are considered. In fact, the success of any Bayesian approach depends significantly on a proper selection of the hyperparameters of the prior. Part of the elicitation process involves checking the elicited prior for the prior-data conflict and the bias (see Section 2). Then the concentration of the distribution of the Kullbak-Leibler divergence between the prior and the model of interest is compared to that between the posterior and the model. If the posterior is more concentrated about the hypothesized distribution than the prior, then this is evidence in favor of the null hypothesis and if the posterior is less concentrated then this is evidence against the null hypothesis. This comparison is made via a relative belief ratio, which measures the evidence in the observed data for or against the null. A measure of the strength of this evidence is also provided. So, the methodology is based on a direct measure of statistical evidence. We point out that, relative belief ratios have been recently used in problems that involve goodness of fit test and model checking. See, for example, Al-Labadi (2018), Al-Labadi and Evans (2018) and Al-Labadi, Zeynep and Evans (2017, 2018) and Evans and Tomal (2018).

The proposed method brings many advantages to the problem of hypothesis testing. Besides its simplicity, and unlike the classical approach, the new approach possesses attractive and desirable features such as giving evidence in favor of the null hypothesis. Also, checking the prior for bias and prior-data conflict permits avoid several undesirable paradoxes, such as Lindley’s paradox that may be encountered by the standard Bayesian methods that are based, for instance, on the Bayes factor (Evans, 2015).

The remainder of this paper is organized as follows. A general discussion about the relative belief ratio is given in Section 2. The definition and some fundamental properties of the Dirichlet process are presented in Section 3. In Section 4, an explicit expression to compute Anderson-Darling distance between the Dirichlet process and its base measure is derived. In Section 5, a Bayesian nonparametric test for assessing multivariate normality is discussed and some of its relevant properties are developed. A computational algorithm to calculate the relative belief ratio for the implementation of the proposed test is developed in Section 6. In Section 7, the performance of the proposed test is established via four simulated examples and two real data sets. Finally, some concluding remarks are given in Section 8. All technical proofs are included in the supplementary material.

## 2 Inferences Using Relative Belief

Suppose we have a statistical model that is given by the density function (with respect to some measure), where is an unknown parameter that belongs to the parameter space . Let be the prior distribution of . After observing the data

, by Bayes’ theorem, the posterior distribution of

is given by the density

 π(θ|x)=fθ(x)π(θ)m(x),

where

 m(x)=∫fθ(x)π(θ)dθ

is the prior predictive density of the data.

Suppose that the interest is to make inference about an arbitrary parameter . Let denote the prior measure of with density . Let the corresponding posterior measure and density of be and respectively. The relative belief ratio for a hypothesized value of is defined by , where is a sequence of neighbourhoods of converging nicely (see, for example, Rudin (1974)) to as When and are continuous at

 RBΨ(ψ0|x)=πΨ(ψ0|x)/πΨ(ψ0),

is the ratio of the posterior density to the prior density at That is, is measuring how beliefs have changed that is the true value from a priori to a posteriori. Baskurt and Evans (2013) proved that

 RBΨ(ψ0|x)=mT(T(x)|ψ0)/mT(T(x)), (1)

where is a minimal sufficient statistic of the model and is the prior predictive density of . The previous authors referred to (1

) as the Savage-Dickey ratio. It is to be noted that a relative belief ratio is similar to a Bayes factor (Kass and Raftery, 1995), as both are measures of evidence, but the latter measures it via the change in an odds ratio. A discussion about the relationship between relative belief ratios and Bayes factors is detailed in (Baskurt and Evans, 2013). More specifically, when a Bayes factor is defined via a limit in the continuous case, the limiting value is the corresponding relative belief ratio.

By a basic principle of evidence,

means that the data led to an increase in the probability that

is correct, and so there is evidence in favour of while means that the data led to a decrease in the probability that is correct, and so there is evidence against . Clearly, when , then there is no evidence either way.

It is also important to calibrate whether this is strong or weak evidence for or against . As suggested in Evans (2015), a useful calibration of is obtained by computing the tail probability

 ΠΨ(RBΨ(ψ|x)≤RBΨ(ψ0|x)|x). (2)

One way to view (2

) is as the posterior probability that the true value of

has a relative belief ratio no greater than that of the hypothesized value When there is evidence against then a small value for (2) indicates a large posterior probability that the true value has a relative belief ratio greater than and so there is strong evidence against When there is evidence in favour of then a large value for (2) indicates a small posterior probability that the true value has a relative belief ratio greater than . Therefore, there is strong evidence in favour of while a small value of (2) only indicates weak evidence in favour of

One of the key concerns with Bayesian inference methods is that the prior can bias the analysis. Following Evans (2015), let

denote the conditional prior predictive distribution of the data given that , so

is the conditional prior probability that the data is in the set

. The bias against can be measured by computing

 M(RBΨ(ψ0|x)≤1|ψ0) (3)

and this is the prior probability that evidence will be obtained against when it is true. If the bias against is large, subsequently reporting, after seeing the data, then there is evidence against is not convincing.On the other hand, the bias in favor of is given by

 M(RBΨ(ψ0|x)≥1|ψ′0) (4)

for values such that the difference between and represents the smallest difference of practical importance; note that this tends to decrease as moves farther away from . When the bias in favor is large, subsequently reporting, after seeing the data, then the is evidence in favor of is not convincing.

Another concern regarding priors is to measure the compatibility between the prior and the data. A chosen prior may be incorrect by being strongly contradicted by the data (Evans, 2015). A possible contradiction between the data and the prior is referred to as a prior-data conflict. In principle, if the prior primarily places its mass in a region of the parameter space where the data suggest the true value does not lie, then there is a prior-data conflict (Evans and Moshonov, 2006). That is, prior-data conflict will occur whenever there is only a tiny overlap between the effective support regions of the model and the prior. In such situation, we must be concerned about what the effect of the prior is on the analysis (Evans, 2015). Methods for checking the prior in previous sense are developed in Evans and Moshonov (2006). See also Nott, Xueou, Evans, and Engler (2016) and Nott, Seah, AL-Labadi, Evans, Ng and Englert (2019). The basic method for checking the prior involves computing the probability

 MT(mT(t)≤mT(T(x))), (5)

where is a minimal sufficient statistic of the model and is the prior predictive probability measure of with density . The value of (5) simply serves to locate the observed value in its prior distribution. If (5) is small, then lies in a region of low prior probability, such as a tail or anti-mode, which indicates a conflict. The consistency of this check follows from Evans and Jang (2011) where it is proven that, under quite general conditions, (5) converges to

 ΠT(π0(θ)≤π0(θtrue)), (6)

as the amount of data increases, where is the true value of the parameter. If (6) is small, then lies in a region of low prior probability which implies that the prior is not appropriate.

## 3 A Bayesian Alternative to the One-Sample z−Test

### 3.1 The Approach

Let be an independent random sample from , where is known. The goal is to test the hypothesis , where is a given real number. The approach here is Bayesian. First we construct a prior on . Let be , where and are known hyperparameters and selected through the elicitation algorithms covered in Section 3.2. Thus, the posterior distribution of given is , where

 μx=nλ20nλ20+1¯x+1nλ20+1μ0 and σ2x=λ20σ2nλ20+1. (7)

To proceed for the test using the relative belief ratio, there are two possible approaches. The first one is based on a direct computation of the relative belief ratio and its strength. This approach has been initiated in Baskurt and Evans (2013) with and when discussing the Jeffrey-Lindely paradox. To find , notice that

 RB(μ|x)=π(μ|T(x))π(μ)=π(μ)f(T(x))/mT(T(x))π(μ)=f(T(x))mT(T(x)).

The minimal sufficient statistics for is . Since , where independent of , it follows the prior predictive distribution of is . That is,

 mT(T(x))=√n2πσ2(1+nλ20)exp(−n2(¯x−μ0)2σ2(nλ20+1)).

Thus,

 RB(μ|x)=√1+nλ20exp(−n2σ2[(¯x−μ)2−(¯x−μ0)2(nλ20+1)]). (8)

For the strength, we have

 Π(exp(−n2σ2[(¯x−μ)2−(¯x−μ0)2(nλ20+1)]) = Π((μ−¯x)2≥(¯x−μ1)2∣∣∣x) = Π(|μ−¯x|≥|¯x−μ1|∣∣∣x) = Π(μ≥¯x+|¯x−μ1|∣∣x)+Π(μ≥¯x−|¯x−μ1|∣∣∣x) = 1−Φ(¯x+|¯x−μ1|−μxσx) +Φ(¯x−|¯x−μ1|−μxσx),

where and are defined in (7). After minor simplification we have,

 Π(RB(μ|x)≤RB(μ1|x)|x) = 1−Φ((1σ2+1nλ20σ2)1/2(√n|¯x−μ1|) (9) +√n¯xnλ20+1−√nμ0nλ20+1) +Φ((1σ2+1nλ20σ2)1/2(−√n|¯x−μ1|) +√n¯xnλ20+1−√nμ0nλ20+1).

Similar to the conclusion in Baskurt and Evans (2013), as in (9), , which converges in distribution to when

, by the central limit theorem and the continuous mapping theorem, where

is the standard normal random variable. Hence, when

(i.e. is not rejected), the strength has an asymptotically uniform distribution on . On the other hand, we have converges to 0 almost surely (a.s.) when , since almost surely.

As for the second approach, we compute the KL distance between the hypothesized distribution and the prior/posterior distributions. The change of the distance from a priori to a posteriori is compared through the relative belief ratio. Then, we give a brief summary about the KL distance. In general, the KL distance (sometimes called the entropy distance

) between two continuous cumulative distribution functions (cdf’s)

and

with corresponding probability density functions (pdf’s)

and (with respect to Lebesgue measure) is defined by

 d(P,Q)=∫p(x)log(p(x)q(x))dx.

It is well-known that and the equality holds if and only if . However, it is not symmetric and does not satisfy the triangle inequality (Cover and Thomas, 1991). In particular, the KL divergence between the two normal distributions and is given by (Duchi, 2014)

 d(P,Q)=log(σ1σ2)+12σ22[σ21+(μ1−μ2)2]−12. (10)

Set and . It follows that from (10) that

 d(P,Q)=(μ−μ1)22σ2.

If , let

 d(P,Q)=d(Pprior,Q)=(μ−μ1)22σ2. (11)

On the other hand, , if as defined in (7), let

 d(P,Q)=d(Ppost,Q)=(μ−μ1)22σ2. (12)

Note that, as

, by the strong law of large numbers,

, where is the true value of . Thus, by (12), if is true, we have . On the other hand, if is not true, then

 d(PPost,Q)a.s.→c>0. (13)

What follows is that, if is true, then that distribution of should be more concentrated about than . So, the proposed test includes a comparison of the concentrations of the prior and posterior distributions of the KL divergence via a relative belief ratio based on the interpretation as discussed inSection 2.

### 3.2 Elicitation of the Prior

The success of methodology is influenced significantly by the choice of the hyperparameters and . Inappropriate values of the hyperparameters can lead to a failure in computing . To elicit proper values of the hyperparameters, we consider the method developed in Evans and Tomal (2018). Suppose that it is known with virtual certainty, based on the knowledge of the basic measurement being taken, that will lie in the interval for specified values . Here, virtual certainty is interpreted as , where is a large probability like 0.999. If , then after some simple algebra, .

### 3.3 Checking for Prior-Data Conflict

As pointed in Section 3.1, the minimal sufficient statistics for is with the prior predictive distribution of is . Thus,

 (14)

where is defined as in (5). Recall that, if (14) is small, then this indicates a prior-data conflict and no prior-data conflict otherwise. It is true that prior-data conflict can be avoided by increasing (i.e. making the prior diffuse), however, as pointed in Evans (2018), this is not an appropriate approach as it will induce bias into the analysis. Thus, by (14), when lies in the tail of its prior distribution, we have a prior-data conflict. Note that, as .

### 3.4 Checking for Bias

The bias against the hypothesis is measured by computing (3) with and as in (8). Note that, since the prior is centered at , there is never a strong bias against . On the other hand, the bias in favor of the hypothesis is measured by computing (4) with and as defined in (8). The interpretation of the bias was covered in Section 2.

### 3.5 The Algorithm

The approach will involve a comparison between the concentrations of the prior and posterior distribution of the KL divergence via a relative belief ratio, with the interpretation as discussed in Section 2. Since explicit forms of the densities of the distance are not available, the relative belief ratios need to be estimated via simulation. The following summarizes a computational algorithm for testing

.

Algorithm A (New Test)

1. [label=()]

2. Elicit the hyperparameters and as described in Section 3.2.

3. Generate from .

4. Compute the KL distance between and as described in (11). Denote this distance by .

5. Repeat steps (ii) and (iii) to obtain a sample of values of .

6. Generate from , where and are defined in (7).

7. Compute the KL distance between and as described in (12). Denote this distance by .

8. Repeat steps (v) and (vi) to obtain a sample of values of .

9. Compute the relative belief ratio and the strength as follows:

1. Closed forms of and are not available. Thus, the relative brief ration and the strength need to be estimated via approximation. Let be a positive number. Let denote the empirical cdf of based on the prior sample in (3) and for let be the estimate of the -the prior quantile of Here , and is the largest value of . Let denote the empirical cdf of based on the posterior sample in (vi). For , estimate by

 ˆRBD(d|x)=M{^FD(^d(i+1)/M|x)−^FD(^di/M|x)}, (15)

the ratio of the estimates of the posterior and prior contents of Thus, we estimate by where and are chosen so that is not too small (typically .

2. Estimate the strength by the finite sum

 ∑{i≥i0:ˆRBD(^di/M|x)≤ˆRBD(0|x)}(^FD(^d(i+1)/M|x)−^FD(^di/M|x)). (16)

For fixed as then converges almost surely to and (15) and (16) converge almost surely to and , respectively.

The following proposition establishes the consistency of the approach as the sample size increases. So, the procedure performs correctly as the sample size increases when is true. The proof follows immediately from Evans (2015), Section 4.7.1. See also AL-Labadi and Evans (2018) for a similar result.

###### Proposition 1

Consider the discretization
. As (i) if is true, then

 RBD([0,di0/M)|x)a.s.→1/DPD([0,di0/M)), RBD([di/M,d(i+1)/M)|x)a.s.→0 whenever i≥i0, DPD(RBD(d|x)≤RBD(0|x)|x)a.s.→1,

and (ii) if is false and , then and

## 4 A Bayesian Alternative to the One-Sample t-Test

### 4.1 The Approach

In this section, we assume that is an independent random sample from , where is unknown. The goal is to test , where is a given real number. The first step in the approach is to construct priors on and

. We will consider the following hierarchical but conjugate prior (Evans 2015, p.171):

 1σ2∼gammarate(α0,β0) (17) μ|σ2∼N(μ0,λ20σ2), (18)

where , and are hyperparameters to be specified via elicitation as it will be described in Section 4.2. The posterior distribution of is given by:

 1σ2∣∣x1,...,xn∼gammarate(α0+n2,βx), (19) μ|σ2,x1,...,xn∼N(μx,(n+1λ20)2σ2) (20)

where

 μx=(n+1λ0)−1(μ0λ20+n¯x) and βx=β0+(n−1)S22+n(¯x−μ0)22(nλ20+1) (21)

with To find , notice that the minimal sufficient statistic for is with independent of . The joint prior predictive of is given by (Evan, 2015):

 mT(T(x))=Γ(n2+α0)Γ(α0)(n+1λ20)−12(2π)−n2βα00λ0(βx)−n2−α0, (22)

where is defined in (21). On the other hand, it can be shown that

 mT(T(x)|μ)=Γ(n2+α0)Γ(α0)(2π)−n2βα00(β0+n−12s2+n2(¯x−μ)2)−n2−α0.

Thus,

 RB(μ|x) =mT(T(x)|μ)mT(T(x)) =(n+1λ20)12[β0+n−12s2+n2(¯x−μ)2]−n2−α0(βx)−n2−α0 =(n+1λ20)12⎡⎢ ⎢ ⎢⎣β0+n−12s2+n2(¯x−μ)2β0+n−12s2+n2(¯x−μ)2nλ20+1⎤⎥ ⎥ ⎥⎦−n2−α0. (23)

For the strength we have,

 = Π⎛⎜ ⎜ ⎜⎝β0+n−12s2+n2(¯x−μ)2β0+n−12s2+n2(¯x−μ)2nλ20+1≤β0+n−12s2+n2(¯x−μ1)2β0+n−12s2+n2(¯x−μ)2nλ20+1∣∣∣x⎞⎟ ⎟ ⎟⎠,

where and are defined in (19) and (20), respectively. After some algebra, we reach the conclusion that coincides with (9), but here is random as defined in (17).

As for the KL approach, we compute and as given respectively in (11) and (12). The approach makes a comparison between the concentrations of the prior and posterior distributions of the KL divergence via the relative belief ratio.

### 4.2 Elicitation of the prior

To elicit the prior, we consider the approach developed by Evan (2015, p.171). Suppose that it is known with virtual certainty (probability = 0.999) that for specified values . This is chosen to be as short as possible, based on the knowledge of the basic measurements being taken and without being unrealistic. We set (i.e, mid-point). With this choice, one hyper-parameter has been specified. It follows that

 P(μ∈(a,b))≥0.999 ⟹P(a<μ

This implies that

 Φ(b−a2λ0σ)≥1.992=0.9995 ⟹b−a2λ0σ≥Φ−1(0.9995) ⟹σ≤b−a2λ0Φ−1(0.9995) ⟹σ2≤(b−a2)2[Φ−1(0.9995)]−2λ−20. (25)

An interval that contains virtually all the actual data measurements is given by . Since this interval cannot be unrealistically too short or too long, we let and be the upper and lower bounds on the half-length of the interval so that

 s1≤σΦ−1(0.9995)≤s2.

That is,

 s1Φ−1(0.9995)≤σ≤s2Φ−1(0.9995) (26)

Now, from (25) and (26), we have:

 (b−a2)2[Φ−1(0.9995)]−2λ−20=(s2Φ−1(0.9995))2 ⟹λ20=(b−a2)2s−22,

which determine the conditional prior for . Note that can be made bigger by choosing a bigger value of .

Lastly, to obtain relevant values of and , let denotes the CDF of distribution. From(26),

 s22[Φ−1(0.9995)]≤1σ2≤s21[Φ−1(0.9995)]. (27)

Now, suppose we want to determine the lower and upper bounds in (27), so that this interval contains with virtual certainty. Thus,

 G−1(α0,β0,0.9995)=s−21[Φ−1(0.9995)]2 (28) G−1(α0,β0,0.0005)=s−22[Φ−1(0.9995)]2. (29)

Then we numerically solve (28) and (29) for .

### 4.3 Checking for Prior-data Conflict

To assess whether is a reasonable value, we compute:

 MT(mT(¯x,s2)≤mT(¯x0,s20)), (30)

where , and are as defined in Section 4.1. Clearly, computing (30) should be done by simulation. Thus, for specified values of , we generate as given in (17) and (18). Then generate

from the joint distribution given

and evaluate using (22). Repeating this many times and recording the proportion of values of that are less than or equal to gives a Monte Carlo estimate of (30).

### 4.4 Checking for Bias

As in Section 3.4, the bias against the hypothesis is measured by computing (3) with and as given in (23). On the other hand, the bias in favor of the hypothesis is measured by computing (4) with and as defined in (23). The interpretation of the bias was given in Section 2.

### 4.5 The Algorithm

The following algorithm outlines the KL approach described in Section 4 to test