# The Two-Sample Problem Via Relative Belief Ratio

This paper deals with a new Bayesian approach to the two-sample problem. More specifically, let x=(x_1,...,x_n_1) and y=(y_1,...,y_n_2) be two independent samples coming from unknown distributions F and G, respectively. The goal is to test the null hypothesis H_0: F=G against all possible alternatives. First, a Dirichlet process prior for F and G is considered. Then the change of their Cramér-von Mises distance from a priori to a posteriori is compared through the relative belief ratio. Many theoretical properties of the procedure have been developed and several examples have been discussed, in which the proposed approach shows excellent performance.

03/03/2019

### On one-sample Bayesian tests for the mean

This paper deals with a new Bayesian approach to the standard one-sample...
07/03/2019

### A Bayesian Semiparametric Gaussian Copula Approach to a Multivariate Normality Test

In this paper, a Bayesian semiparametric copula approach is used to mode...
04/04/2019

### A Bayesian Nonparametric Test for Assessing Multivariate Normality

In this paper, a novel Bayesian nonparametric test for assessing multiva...
03/02/2019

### Kullback-Leibler Divergence for Bayesian Nonparametric Model Checking

Bayesian nonparametric statistics is an area of considerable research in...
06/07/2021

### Superconsistency of tests in high dimensions

To assess whether there is some signal in a big database, aggregate test...
12/07/2019

### Joins on Samples: A Theoretical Guide for Practitioners

Despite decades of research on approximate query processing (AQP), our u...
09/18/2014

### Belief revision by examples

A common assumption in belief revision is that the reliability of the in...

## 1 Introduction

For two independent samples, the two-sample problem is concerned to determine whether the two samples are generated from the same population. Although it is considered an old problem in statistics, it always attracts the attention of researchers due to it applications in different fields. For instance, in medical studies, one may want to asses the efficiency of a new drug to two groups of patients.

The two-sample problem can be stated formally as follows. Given two independent samples and , with and

being unknown continuous cumulative distribution functions (cdf’s), the aim is to test the null hypothesis

against all other alternatives.

The methodology developed in this paper is Bayesian and it is inspired from the recent work of Al-Labadi and Evans (2018) for model checking. At first, two Dirichlet processes and are considered as priors for and , respectively. Then the concentration of the posterior distribution of the distance between the two processes is compared to the concentration of the prior distribution of the distance between the two processes. If the posterior is more concentrated about the model than the prior, then this is evidence in favor of and if the posterior is less concentrated, then this is evidence against . This comparison is made through a particular measure of evidence known as the relative belief ratio, which will indicate whether there is evidence for or against . Moreover, a calibration of this evidence is provided concerning whether there is strong or weak evidence for or against the hypothesis. The proposed methodology is simple, general and does not require obtaining a closed form of the relative belief ratio. More details about relative belief ratio are highlighted in Section 2 of this paper.

Developing procedures for hypothesis testing has recently given a considerable attention in the literature of Bayesian nonparametric inference. A main stream of these procedures has focused on embedding the suggested model as a null hypothesis in a larger family of distributions. Then priors are placed on the null and the alternative and a Bayes factor is computed. For instance, Florens, Richard, and Rolin (1996) used a Dirichlet process for the prior on the alternative. Carota and Parmigiani (1996), Verdinelli and Wasserman (1998), Berger and Guglielmi (2001) and McVinish, Rousseau, and Mengersen (2009) considered a mixture of Dirichlet processes, a mixture of Gaussian processes, a mixture of Pólya trees and a mixture of triangular distributions, respectively, for the prior on the alternative. Another approach for model testing is based on placing a prior on the true distribution generating the data and measuring the distance between the posterior distribution and the proposed one. Swartz (1999) and Al-Labadi and Zarepour (2013, 2014a) considered the Dirichlet process prior and used the Kolmogorov distance to derive a goodness-of-fit test for continuous models. Viele (2000) used the Dirichlet process and the Kullback-Leibler distance to test discrete models. Hsieh (2011) used the Pólya tree prior and the Kullback-Leibler distance to test continuous distributions. The work described above focuses only on goodness of fit tests and model checking. With regard to the two-sample problem, the literature is very scarce and scattered. Some exceptions include the remarkable work of Holmes, Caron, Griffin, and Stephens (2015) who developed a way to compute the Bayes factor for testing the null hypothesis through the marginal likelihood of the data with Pólya tree priors centered either subjectively or using an empirical procedure. Under the null hypothesis, they modeled the two samples to come from a single random measure distributed as a Pólya tree, whereas under the alternative hypothesis the two samples come from two separate Pólya tree random measures. Ma and Wong (2011) allowed the two distributions to be generated jointly through optional coupling of a Pólya tree prior. Borgwardt and Ghahramani (2009) discussed two-sample tests based on Dirichlet process mixture models and derived a formula to compute the Bayes factor in this case. An extension of the Bayes factor approach based on Pólya tree priors to cover censored and multivariate data was proposed by Chen and Hanson (2014). Huang and Ghosh (2014) considered the two-sample hypothesis testing problems under Pólya tree priors and Lehmann alternatives. Shang and Reilly (2017) introduced a class of tests, which use the connection between the Dirichlet process prior and the Wilcoxon rank sum test. They also extend their idea using the Dirichlet process mixture prior and developed a Bayesian counterpart to the Wilcoxon rank sum statistic and the weighted log rank statistic for right and interval censored data. In a recent work, Al-Labadi and Zarepour (2017) proposed a method based on the Kolmogorov distance and samples from the Dirichlet process to assess the equality of two unknown distributions, where the distance between two posterior Dirichlet processes is compared with a reference distance. The parameters of the two Dirichlet processes are chosen so that any discrepancy between the posterior distance and the reference distance is only attributed to the difference between the two samples.

In Section 3, the Dirichlet process prior is briefly reviewed. In Section 4, the Cramér-von Mises distance between two Dirichlet processes is considered and several of its theoretical properties are developed. Section 5 addresses setting parameters of the two Dirichlet processes. In Section 6, a computational algorithm of the approach is developed. Section 7 presents several examples where the behaviour of the approach is inspected. Finally, some concluding remarks are made in Section 8. The proofs are placed in the Appendix.

## 2 Relative Belief Ratios

In this section, for the reader’s convenience, some background of relative belief ratios is provided. For more details about this topic consult, for example, Evans (2015). Let be a collection of densities on a sample space and be a prior on The posterior distribution of given that data is . For an arbitrary parameter of interest the prior and posterior densities of are denoted by and respectively. The relative belief ratio for a value is then defined by , where is a sequence of neighbourhoods of converging nicely (see, for example, Rudin (1974)) to as Quit generally

 RBΨ(ψ|x)=πΨ(ψ|x)/πΨ(ψ), (1)

the ratio of the posterior density to the prior density at That is, is measuring how beliefs have changed that is the true value from a priori to a posteriori

. Note that, a relative belief ratio is similar to a Bayes factor, as both are measures of evidence, but the latter measures this via the change in an odds ratio. A discussion about the relationship between relative belief ratios and Bayes factors is detailed in (Baskurt and Evans, 2013). In particular, when a Bayes factor is defined via a limit in the continuous case, the limiting value is the corresponding relative belief ratio.

By a basic principle of evidence,

means that the data led to an increase in the probability that

is correct, and so there is evidence in favour of while means that the data led to a decrease in the probability that is correct, and so there is evidence against . Clearly, when , then there is no evidence either way.

Thus, the value measures the evidence for the hypothesis It is also important to calibrate whether this is strong or weak evidence for or against . As suggested in Evans (2015), a useful calibration of is obtained by computing the tail probability

 ΠΨ(RBΨ(ψ|x)≤RBΨ(ψ0|x)|x). (2)

One way to view (2

) is as the posterior probability that the true value of

has a relative belief ratio no greater than that of the hypothesized value When so there is evidence against then a small value for (2) indicates a large posterior probability that the true value has a relative belief ratio greater than and so there is strong evidence against When so there is evidence in favour of then a large value for (2) indicates a small posterior probability that the true value has a relative belief ratio greater than and so there is strong evidence in favour of while a small value of (2) only indicates weak evidence in favour of

## 3 The Dirichlet Process

In this section, a concise summary of the Dirichlet process is given. Because of its attractive features, the Dirichlet process, formally introduced in Ferguson (1973), is considered the most well-known and widely used prior in Bayesian nonparametric inference. Consider a space with a algebra of subsets of . Let be a fixed probability measure on , called the base measure, and be a positive number, called the concentration parameter. Following Ferguson (1973), a random probability measure is called a Dirichlet process on with parameters and , denoted by , if for any finite measurable partition of with , . It is assumed that if , then with a probability one. Note that, for any and so and Thus, can be viewed as the center of the process. On the other hand, controls concentration, as the larger value of , the more likely that will be close to . We refer the reader to Al-Labadi and Abdelrazeq (2017) for additional interesting asymptotic properties of the Dirichlet process and other nonparametric priors.

A distinctive feature of the Dirichlet process, among many other nonparametric priors, is its conjugacy property. Specifically, if is a sample from , then the posterior distribution of is where

 Hx=a(a+n)−1H+n(a+n)−1Fn, (3)

with and is the Dirac measure at Notice that, is a convex combination of the prior base distribution and the empirical distribution. Clearly, as while as .

Following Ferguson (1973), has the following series representation

 P=∞∑i=1JiδYi, (4)

where , , independent of , and . It follows clearly from (4) that a realization of the Dirichlet process is a discrete probability measure. This is true even when the base measure is absolutely continuous. One could resemble the discreteness of with the discreteness of . Note that, since data is always measured to finite accuracy, the true distribution being sampled from is discrete. This makes the discreteness property of with no practical significant limitation. Indeed, by imposing the weak topology, the support for the Dirichlet process is quite large. Specifically, the support for the Dirichlet process is the set of all probability measures whose support is contained in the support of the base measure. This means if the support of the base measure is , then the space of all probability measures is the support of the Dirichlet process. In particular, if we have a normal base measure, then the Dirichlet process can choose any probability measure.

Zarepour and Al-Labadi (2012) derived the following series approximation with monotonically decreasing weights for the Dirichlet process

 PN=N∑i=1JiδYi, (5)

where and are as defined in (4), be the co-cdf of the g distribution and . They proved that, as , converges almost surely to (4). Note that is the

-th quantile of the g

distribution. This provides the following algorithm.

Algorithm A: Approximately generating a value from

1. Fix a relatively large positive integer .

2. For , generate .

3. Independent of , for generate exponential and put

4. For , compute

5. Use in (5) to obtain an approximate value from .

For other simulation methods for the Dirichlet process, see, for instance, Bondesson (1982), Sethuraman (1994), Wolpert and Ickstadt (1998) and Al-Labadi and Zarepour (2014b).

Throughout the paper, the notation could refer to either a probability measure or its corresponding cdf where the context determines the appropriate interpretation. That is, for all .

## 4 Cramér-von Mises Distance

A well-known and widely used distance between two distributions is the Cramér-von Mises Distance. For cdf’s and this is defined as

 dCvM(F,G)=∫∞−∞(F(x)−G(x))2G(dx).

The next lemma demonstrates that, as sample sizes get large, the Cramér-von Mises distance between posterior distributions of Dirichlet processes converges to the Cramér-von Mises distance between the true distributions generated the data.

###### Lemma 1

Given two independent samples and , with and being continuous cdf’s. Let , , and . Then, as , .

The next corollary shows that the posterior distribution of becomes concentrated around 0 as sample sizes increase if and only if holds. The proof follows straightforwardly from Lemma 1.

###### Lemma 2

Let and , with and being continuous cdf’s. Let and . As , (i) if is true, then and (ii) if  is false, then

The following result allows the use of the approximation (5) when considering the prior and posterior distributions of the Cramér-von Mises distance.

###### Lemma 3

Let and . Let and be two approximations of and , respectively, as defined in (5). Then, as ,

The next lemma demonstrates that the distribution of the distance between two Dirichlet processes is independent from the base measures. This result will play a key role in the proposed approach.

###### Lemma 4

Let and , where and are continuous. If , then the distribution of does not depend on and .

## 5 The Approach

Let and be independent samples with and being unknown continuous cdf’s. The goal to test the null hypothesis . To this end, we use the priors and so, by (3), and . From Lemma 1, almost surely approximate . Thus, it looks clear that if is true, then the posterior distribution of the distance between and should be more concentrated about than the prior distribution of the distance between and For example, in Figure 1-a (see Example 1), since is true, the plot of the posterior density of is much more concentrated about 0 than the the plot of the prior density of . So, the proposed test includes a comparison of the concentrations of the prior and posterior distributions of via a relative belief ratio based on with the interpretation as discussed inSection 2.

The success of the approach depends significantly on a suitable selection of the parameters of and . As illustrated below, inappropriate values of the parameters can lead to a failure in computing . We discuss first setting values of and . By Lemma 4, the distribution of is independent from the choice of the base measures when , where both need to be continuous. Thus, we suggest to set , although other choices of continuous distributions are certainly possible. An additional and important reason supporting the choice of is to avoid prior-data conflict (Evans and Moshonov, 2006; Al-Labadi and Evans, 2017). Prior-data conflict means that there is a tiny overlap between the effective support regions of and . In this context, the existence of prior-data conflict can yield to a failure in computing the distribution of about 0. To avoid prior-data conflict, it is necessary that and share the same effective support (note that, and have the same support as and , respectively), which can certainly be secured by setting . The effect of prior-data conflict is demonstrated in Section 7, Table 2.

The selection of and is also important. It is possible to consider several values of and . In general, the values of and depends in and , respectively. As indicated in Al-Labadi and Zarepour (2017), should be chosen to have a value at most as otherwise the prior may become too influential. Holmes et al. (2015) recommend using values between 1 and 10 and checking the sensitivity of the results to the chosen values. The following algorithm outlines a procedure for selecting the concentration parameters.

Algorithm B: Selection of concentration parameters

1. Start by setting and compute the relative belief ratio and its strength. Algorithm C in the next section addresses such computations.

2. Consider more concentrated priors by setting larger values of and .

3. Compute the corresponding relative belief ratio. There are two scenarios:

1. If the value of the relative belief ratio in step 1 is less (greater) than 1 and the new value is less (greater) than 1, then there is an evidence against (in favour) .

2. If the value of the relative belief ratio in step 1 is greater than 1 and the new value is greater (less) than 1, then this is an evidence against (in favour) .

Algorithm B is further explored in Table 1 of Section 7. In most cases, setting is found to be adequate. Holmes et al. (2015) recommend using values between 1 and 10 and checking the sensitivity of the results to the chosen values.

## 6 Computations

Closed forms of the densities of and are typically not available. Thus, the relative belief ratios need to be approximated via simulation. The following gives a computational algorithm to test . This algorithm is a revised version of Algorithm B of Al Labadi and Evans (2018). Algorithm C: Relative belief algorithm for the two-sample problem

1. Use Algorithm A to (approximately) generate a from and a from .

2. Compute .

3. Repeat steps (1)-(2) to obtain a sample of values from the prior of .

4. Use Algorithm A to (approximately) generate a from and from .

5. Compute .

6. Repeat steps (4)-(5) to obtain a sample of values of .

7. Let be a positive number. Let denote the empirical cdf of based on the prior sample in (3) and for let

be the estimate of

the -th prior quantile of Here , and is the largest value of . Let denote the empirical cdf of based on the posterior sample in 6. For , estimate by

 ˆRBD(d|x,y)=M{^FD(^d(i+1)/M|x,y)−^FD(^di/M|x,y)}, (6)

the ratio of the estimates of the posterior and prior contents of It follows that, we estimate by where and is chosen so that is not too small (typically .

8. Estimate the strength by the finite sum

 ∑{i≥i0:ˆRBD(^di/M|x,y)≤ˆRBD(0|x,y)}(^FD(^d(i+1)/M|x,y)−^FD(^di/M|x,y)). (7)

For fixed as then converges almost surely to and (6) and (7) converge almost surely to and , respectively.

9. As detailed in Algorithm B, repeat steps (1)-(8) for larger values of and .

The following proposition establishes the consistency of the approach to the two-sample problem as sample size increases. So the procedure performs correctly as sample size increases when is true. The proof follows immediately from Evans (2015), Section 4.7.1.

###### Proposition 5

Consider the discretization
. As (i) if is true, then

 RBD([0,di0/M)|x,y)a.s.→1/DPD([0,di0/M)), RBD([di/M,d(i+1)/M)|x,y)a.s.→0 whenever i≥i0, DPD(RBD(d|x,y)≤RBD(0|x,y)|x,y)a.s.→1,

and (ii) if is false and , then and

## 7 Examples

In this section, the approach is illustrated through three examples. In Examples 1 and 2, the methodology is assessed using simulated samples from a variety of distributions and in Example 3 an application to a real data set is presented.

The following notation is used for the distributions in the tables, namely,

is the normal distribution with mean

, is the distribution with degrees of freedom, exp

is the exponential distribution with mean

and

is the uniform distribution over

. For all cases, we set in Algorithm A and , in Algorithm B. The results are also compared with the frequentist Cramér-von Mises (CvM) test. To calculate p-values of the CvM test, the R function “cramer.test” is used. We also compared our results with the Bayesian nonparametric tests of Holmes et al. (2015) and Al-Labadi and Zarepour (2017). Since the obtained results are similar in these tests, we reported only the results of the new approach.

Example 1. Consider samples generated from the distributions in Table 1, where each sample is of size 50 (Case 1- Case 9). These distributions are also considered in Holmes et al. (2015) and Al-Labadi and Zarepour (2017). To study the sensitivity of the approach to the choice of concentration parameters, various values of and are considered. The results are reported in Table 1. Recall that, we want and the strength close to 1 when is true and and the strength close to 0 when is false. It follows that, the methodology performs perfectly in all cases. For example, in Case 1, since and strength, there is no reason to doubt that the two sampling distributions are not identical. On the other hand, in Case 2, since and strength, the two samples are drawn from two different distributions. We point out that the standard Cramér-von Mises test failed to recognize the difference in Case 6 (i.e., and ). Notice that, in all cases, the appropriate conclusion is attained with . The other values of and considered in Table 1 support the reached conclusions.

Figure 1 provides plots of the density of the prior distance and the posterior distance for some cases in Example 1. It follows, for instance, from Figure 1 that the posterior density of the distance is more concentrated about 0 than the prior density of the distance when the two distributions are equal but not to the same degree otherwise.

It is also interesting to consider the effect of prior-data conflict on the methodology. As discussed in Section 5, prior-data conflict will occur whenever there is only a tiny overlap between and . Table 2 gives the outcomes when and for a particular sample of sizes with various choices of and . Obviously, only when we get the correct conclusion. This illustrates the importance of setting in the priors and .

Figure 2 also provides plots of the density of the prior distance and the posterior distance for the cases in Table 2. It follows that the correct conclusion is only obtained when .

Example 2. In this example, we explore the performance of the proposed test as sample sizes increase. We consider samples from the distributions , (Case 1) and , (Case 2). The results are summarized in Table 3. It follows that the null hypothesis is not rejected in Case 1 but rejected in Case 2 . Clearly, the proposed approach works well even with small sample sizes.

Example 3. The proposed approach of the two-sample problems is illustrated on the chickwts data in R, where weights in grams are recorded for six groups of newly hatched chicks fed different supplements. The goal of this experiment was to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. The first hypotheses of interest is to test whether the distributions of weight of chicks fed by soybean and linseed supplements differ. In the second hypothesis, we examine whether the distributions of weight of chick for sunflower and linseed groups differ. The ordered chick weights for the three samples are:

soybean: 158 171 193 199 230 243 248 248 250 267 271 316 327 329

linseed: 141 148 169 181 203 213 229 244 257 260 271 309

sunflower: 226 295 297 318 320 322 334 339 340 341 392 423

The values recorded in Table 4 do not support the evidence that the distributions of the weight of chicks fed by soybean and linseed supplements differ. On the other hand, they underline that the sunflower and linseed groups differ.

## 8 Concluding Remarks

A Bayesian approach for the two-sample problem based on the use of the Dirichlet process and relative belief has been developed. Implementing the approach is fairly simple and does not require obtaining a closed form of the relative belief ratio. Through several examples, it has been shown that the approach performs extremely well. While Cramér-von Mises distance has been used in this paper, other distance measures such as Anderson-Darling distance and the Kullback-Leibler distance are possible. It is also possible to extend the approach to cover the case of censored data.

## References

• [1]

Al-Labadi, L., and Abdelrazeq, I. (2017). On functional central limit theorems of Bayesian nonparametric priors.

Statistical Methods & Applications, 26,  215–229.
• [2] Al-Labadi, L., and Evans, M. (2018). Prior based model checking. To appear in Canadian Journal of Statistics.
• [3] Al-Labadi, L., and Evans, M. (2017). Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Analysis, 12, 705–728.
• [4] Al-Labadi, L., and Zarepour, M. (2014a). Goodness of fit tests based on the distance between the Dirichlet process and its base measure. Journal of Nonparametric Statistics, 26, 341–357.
• [5] Al-Labadi, L., and Zarepour, M. (2014b). On simulations from the two-parameter Poisson-Dirichlet process and the normalized inverse-Gaussian process. Sankhyā A, 76, 158–176.
• [6] Al-Labadi, L., and Zarepour, M. (2013). A Bayesian nonparametric goodness of fit test for right censored data based on approximate samples from the beta-Stacy process. Canadian Journal of Statistics, 41, 3, 466–487.
• [7] Al-Labadi, L., and Zarepour, M. (2017). Two-sample Kolmogorov-Smirnov test using a Bayesian nonparametric approach. Mathematical Methods of Statistics, 26, 212–225.
• [8] Baskurt, Z. , and Evans, M. (2013). Hypothesis assessment and inequalities for Bayes factors and relative belief ratios. Bayesian Analysis, 8, 3, 569-590.
• [9]

Berger, J. O., and Guglielmi, A. (2001). Bayesian testing of a parametric model versus nonparametric alternatives.

Journal of the American Statistical Association, 96, 174–184.
• [10] Bondesson, L. (1982). On simulation from infinitely divisible distributions. Advances in Applied Probability, 14, 885–869.
• [11] Borgwardt, K. M., and Ghahramani, Z. (2009). Bayesian two-sample tests. http://arxiv.org/abs/0906.4032.
• [12] Carota, C., and Parmigiani, G. (1996). On Bayes factors for nonparametric alternatives. In Bayesian Statistics 5 (J. M. Bernardo, J. . Berger, A. P. Dawid, and A. F. M., eds.) Smith. Oxford University Press, London.
• [13] Chen, Y., and Hanson, T. (2014). Bayesian nonparametric k-sample tests for censored and uncensored data. Computational Statistics and Data Analysis, 71, 335–346.
• [14] Evans, M. (2015). Measuring Statistical Evidence Using Relative Belief. Monographs on Statistics and Applied Probability 144, CRC Press, Taylor & Francis Group.
• [15] Evans, M. and Moshonov, H. (2006). Checking for prior-data conflict. Bayesian Analysis, 1, 893–914.
• [16] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209-230.
• [17] Florens, J. P., Richard, J. F., and Rolin, J. M. (1996). Bayesian encompassing specification tests of a parametric model against a nonparametric alternative. Technical Report 9608, Universitsé Catholique de Louvain, Institut de statistique.
• [18]

Hsieh, P. (2011). A nonparametric assessment of model adequacy based on Kullback-Leibler divergence.

Statistics and Computing, 23, 149-162.
• [19] Holmes, C. C., Caron, F., Griffin, J. E., and Stephens, D. A. (2015). Two-sample Bayesian nonparametric hypothesis testing. Bayesian Analysis, 2, 297–320.
• [20] James, L. F. (2008). Large sample asymptotics for the two-parameter Poisson-Dirichlet process. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, ed. B. Clarke and S. Ghosal, Ohio: Institute of Mathematical Statistics, 187–199.
• [21] Lavine, M. (1992). Some aspects of Pólya tree distributions for statistical modelling. Annals of Statistics, 20, 1222–1235.
• [22] Ma, L., and Wong, W. H. (2011). Coupling optional pólya trees and the two sample problem. Journal of the American Statistical Association, 106, 1553–1565.
• [23] McVinish, R., Rousseau, J., and Mengersen, K. (2009). Bayesian goodness of fit testing with mixtures of triangular distributions. Scandivavian Journal of Statistics, 36, 337–354.
• [24] Rudin, W. (1974). Real And Complex Analysis, Second Edition. McGraw-Hill, New York.
• [25] Evans, M. (2015). Measuring Statistical Evidence Using Relative Belief. Monographs on Statistics and Applied Probability 144, CRC Press, Taylor & Francis Group.
• [26] Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639–650.
• [27] Swartz, T. B. (1999). Nonparametric goodness-of-fit. Communications in Statistics: Theory and Methods, 28, 2821-2841.
• [28] Verdinelli, I., and Wasserman, L. (1998). Bayesian goodness-of-fit testing using finite-dimensional exponential families. Annals of Statistics, 26, 1215–1241.
• [29] Viele, K., (2000). Evaluating fit using Dirichlet processes. Technical Report 384, University of Kentucky, Dept. of Statistics.
• [30] Wolpert, R. L., and Ickstadt, K., (1998). Simulation of Lévy random fields. In Practical Nonparametric and Semiparametric Bayesian Statistics, ed. D. Day, P. Muller, and D. Sinha, Springer, 227–242.
• [31] Zarepour, M., and Al-Labadi, L. (2012). On a rapid simulation of the Dirichlet process. Statistics & Probability Letters, 82, 5, 916–924.

## Appendix A Proofs

Proof of Lemma 1 For any cdf’s and , we have . Since , and (James, 2008; Al-Labadi and Abdelrazeq, 2017), the dominated convergence theorem completes the proof.

Proof of Lemma 3 The proof is similar to the proof of Lemma 1. We include the proof for the sake of completeness. For cdf’s and , we have . Since , and , the result is followed by the dominating convergence theorem.

Proof of Lemma 4 Since is nondecreasing, we have

 θi

It follows from (4) that

 P(t)=P((−∞,t]) = ∞∑i=1Jiδθi((−∞,t])=∞∑i=1JiδH1(θi)((0,H1(t)]).

Observe that, since

is a sequence of i.i.d. random variables with continuous distribution

for we have , where is a sequence of i.i.d. random variables with a uniform distribution on . Hence, where and is the Lebesgue measure on . Similarly, where . Thus,

 dCvM(P,Q) = ∫∞−∞(P(t)−Q(t))2Q(dt) = ∫∞−∞(Pλ(H1(t))−Qλ(H2(t)))2Qλ(H2(dt))

If , and since is continuous, we have

 dCvM(P,Q) = ∫∞−∞(Pλ(H(t))−Qλ(H(t)))2Qλ(H(dt)) = ∫10(Pλ(z)−Qλ(z))2Qλ(dz).

This shows that the distribution of does not depend on the base measures and whenever .