1 Introduction
At a highlevel, statistical analysis aims to separate signal from noise, and one of the more challenging problems is deconvolution or, more generally, estimation of a mixing distribution based on samples from the mixture. Suppose that data are independent and identically distributed from a density with respect to Lebesgue measure on , which we model as a mixture
(1) 
where is a known kernel, i.e., is a density on for each , and is a unknown mixing density with respect to a known finite measure on . Alternatively, one can view this model hierarchically by assuming that are independent and identically distributed from , and , given , are independently distributed from , . So if we think of as “signals” with distribution , and the version corrupted by noise, then our goal—inference about based on data from model (1)—can be viewed as separation of signal from noise.
Often, is assumed to be discrete with finitely many points in . For these socalled finite mixture models, standard modes of inference can be applied. For example, when the number of support points of is known, there is a likelihood function with relatively simple form that can be optimized, often using the EM algorithm (Dempster et al 1977; Teel et al. 2015), to produce the corresponding maximum likelihood estimator, to which the classical asymptotic distribution theory applies (e.g., Redner and Walker 1984); for a comprehensive treatment, see McLachlan and Peel (2000). Similarly, with this same likelihood and a corresponding prior distribution for , one can apply an EMlike dataaugmentation strategy (e.g., van Dyk and Meng 2001) to carry out a Bayesian analysis. In the more realistic scenario where the number of components in the finite mixture model is unknown, those methods described above can be modified by introducing a penalty term or a prior distribution on the number of mixture components, as in Leroux (1992) and Richard and Green (1997).
For the case considered here, where is a smooth mixing density, a number of methods for nonparametric estimation have appeared in the literature. When is a locationshift kernel, so that the mixture density is just a convolution, estimation of is referred to as deconvolution, a case that has been studied in Fan (1991), Stefanski and Carroll (1990), and Zhang (1990). For general mixtures, there are a variety of different approaches. Maximizing the likelihood will almost surely produce a discrete estimate of the mixing distribution (Lindsay 1995), which is not a satisfactory estimate of a smooth mixing density. Various approaches aim to smooth the discrete nonparametric maximum likelihood estimator, either directly (e.g., Eggermont and LaRiccia 1995) or by introducing a smoothness penalty (e.g., Liu et all 2009). Chae et al (2018) investigate an iterative algorithm that generates a sequence of smooth mixing density estimates that converge to the nonparametric maximum likelihood estimator. Another interesting and related method, which is the focus of the present paper, is that based on the predictive recursion algorithm first described in Newton et al. (1998), Newton and Zhang (1999), and Newton (2002), with extensions and theoretical properties developed in Ghosh and Tokdar (2006), Martin and Ghosh (2008), Tokdar et al. (2009), and Martin and Tokdar (2009, 2011, 2012); for a recent review of these developments, see Martin (2018).
Beyond estimation, a goal is to quantify uncertainty about the mixing density and, for this, the literature is scarce. The work that has been done is as listed below. Bissanz et al (2007) discuss asymptotic and bootstrap confidence bands for deconvolution problems, building on ideas first presented in Bickel and Rosenblatt (1973). Link and Sauer (1995) discuss empirical Bayes estimation of an empirical mixing distribution with emphasis on construction of interval estimates, using the nonparametric maximum likelihood estimator. Fortini and Petrone (2019)
have developed asymptotically approximate credible intervals for the cumulative distribution function based on a quasiBayesian interpretation of the predictive recursion algorithm.
A seemingly undesirable feature of the predictive recursion estimator is that it depends on the order in which the data is processed. In particular, this means that the estimator is not a function of the sufficient statistic—which, in this setting, is the empirical distribution—and, hence, the estimator cannot be Bayesian. In previous literature on predictive recursion, the focus has been on reducing its dependence on the order. For example, Newton (2002) suggested elimination of the orderdependence by averaging the estimators over a number of randomly chosen permutations; see Tokdar et al. (2009) for details. The idea in the present paper is to leverage predictive recursion’s orderdependence for the purpose of uncertainty quantification. Specifically, we propose to generate multiple copies of the predictive recursion estimator by permuting the data sequence, and then use this permutationbased distribution as an approximation of the estimator’s sampling distribution. After a review of the predictive recursion estimator and its properties in Section 2, we show in Section 3
that, for a given feature of the mixing distribution, the estimator’s sampling distribution variance can be approximately unbiasedly estimated by the corresponding permutationbased distribution variance. Being able to accurately estimate the spread of the relevant sampling distribution immediately suggests using the same permutation distribution quantiles as an approximate confidence interval for that mixing distribution feature. Numerical results presented in Section
4 reveal that this permutationbased approach gives approximately valid confidence intervals in finite samples across a range of mixture models and for various features of the mixing distribution, including the density function at a point.2 Predictive recursion
Recall that we observe independent data from the mixture in (1), and the goal is to estimate the mixing density . The following predictive recursion algorithm returns a computationally efficient nonparametric estimator of .
Predictive Recursion Algorithm.
Start with an initial estimate of the mixing density and a sequence of weights . Using the observations from the mixture model, in that order, compute
(2) 
where is the mixture corresponding to . Finally, return and as the estimates of and , respectively.
An interesting observation is that, if is a smooth density with respect to the measure , then the output, , of the predictive recursion algorithm will also be a smooth density. Compare this to the nonparametric MLE which is almost surely discrete, regardless of the smoothness of in (1). Therefore, there is no need for post hoc smoothing of the predictive recursion estimator. And the ability to specify the dominating measure in the predictive recursion algorithm proved to be a useful property in the multiple testing application considered in Martin and Tokdar (2012).
Aside from estimating the mixing density itself, one can readily estimate various features of the mixing distribution. That is, if is a suitable function, then can be estimated by . For example, we can estimate the mixing distribution function at a point by taking to be the indicator function corresponding to .
Asymptotic convergence properties of the predictive recursion estimator were investigated in Tokdar et al. (2009) and Martin and Tokdar (2009). To summarize, under suitable tail conditions on the kernel, if is identifiable from the mixture model (1) and if the weights in the predictive recursion algorithm satisfy
then converges to almost surely in the weak topology, that is, if is a bounded and continuous function, then almost surely.
3 Leveraging orderdependence
It is clear from (2) that depends on the order in which the data are processed. However, since the data are assumed to be independent and identically distributed, the ordering should be irrelevant. To alleviate predictive recursion’s seemingly undesirable orderdependence, Newton (2002) and others have suggested to average the final estimate, , over a number of randomly chosen permutations of the data sequence. Tokdar et al. (2009) describe this as a sort of Rao–Blackwellization, replacing by a Monte Carlo approximation of its conditional expectation given the order statistics. Here, instead of trying to remove predictive recursion’s orderdependence, we propose to leverage it for the purpose of uncertainty quantification.
Let denote the permutation group on integers , i.e., the set of all bijections from to itself. If is the predictive recursion estimator based on data , in its given order, write for the corresponding estimator based on the data , permuted according to . If is a suitable function, write and as the estimators of based on the original and permuted data, respectively.
Our proposal is to approximate the sampling distribution of , as a function of sampled from the mixture in (1), by the distribution of , for fixed , as a function of . The justification for our claim that this provides an accurate approximation, at least asymptotically, is the following simple identity,
(3) 
where is the variance with respect to the distribution of from the mixture model (1), and
is the variance with respect to the joint distribution of
and , assumed to be independent. The identity (3) holds because, when data are independent and identically distributed, the extra layer of permutations changes nothing. In other words, if we imagine enumerating all possible realizations of to evaluate the lefthand side of (3), then we would get no new realizations if we also enumerated permutations for evaluating the righthand side.The practical value of the identity (3) is that the righthand side suggests a familiar totalvariance decomposition:
(4) 
By the asymptotic consistency property of predictive recursion, discussed in Section 2, if is bounded and continuous, then and, hence, are consistent estimates of , so its variance, the second term on the righthand side of (4), should be near 0 when is large. Therefore,
(5) 
in other words, is an approximately unbiased estimator of . The key point, of course, is that the lefthand side of (3), namely, , the variance of the sampling distribution of , is relevant for uncertainty quantification, but it is not readily available. However, the quantity on the righthand side of (3), namely, , can be readily computed by repeatedly sampling , permuting the data according to , and reevaluating the predictive recursion estimator. This provides us with a relatively simple and fast approach to construct a datadependent distribution for or from which valid uncertainty quantification can be achieved. And beyond variance estimation, to get an approximate % confidence interval for , we can easily extract the and quantiles of based on repeated sampling of with data fixed.
4 Numerical results
Here we carry out an empirical investigation into the performance of our proposed permutationbased approach to uncertainty quantification. We begin with pointwise evaluation of the mixing distribution function. The specific scenario we consider here is one where the true mixing density in (1) is a gamma with shape 2 and rate 1, and the kernel is also a gamma with shape and rate 20; this is Example 33 below. We generate samples of size from this mixture model and consider estimating the mixing distribution function, in particular, at the fixed points . For predictive recursion, we take initial guess and weight sequence suggested by Martin and Tokdar (2009).
Before presenting the results, we have to address a relevant and practically important question, namely, how many permutations? In our examples, we are using 200 randomly generated permutations on which to evaluate the predictive recursion estimator. Of course, it does not hurt to do more than 200, and this is still computationally feasible thanks to the algorithm’s efficiency; however, at least in the examples we tried, there were no substantial differences in the results based on more than 200 random permutations.
A plot of distribution function estimates for this gamma mixture example, based on 200 random permutations, is shown in Figure 1. Note that the span of these estimates over permutations hugs the true distribution relatively closely across the entire range of . The vertical bars at correspond to the central 95% interval of the sampling distribution of the predictive recursion estimator, based on repeated sampling from the gamma mixture. The goal of uncertainty quantification is to match this interval as closely as possible, so it is notable that, as predicted by the arguments leading up to (5), our permutationbased intervals are comparable to this “goldstandard” across various . Moreover, the permutation distribution takes only about 10 seconds in R running on an ordinary laptop computer. Fortini and Petrone (2019) present asymptotically approximate credible intervals for the same cumulative distribution function based on the predictive recursion algorithm. But their analysis is based on a dependent and nonstationary model for —one that makes the predictive recursion estimator “quasiBayes”—and, since their model and perspective on uncertainty quantification is different from ours, a direct comparison is not appropriate.
Next we consider pointwise estimation of the mixing density. This is not strictly covered by the theoretical arguments discussed above because the functional for a fixed cannot be expressed as for a bounded and continuous . However, intuition and prior experience suggests that the predictive recursion density estimate ought to satisfy a pointwise consistency property, i.e., for fixed ; see, also, Section 5. Therefore, we can apply the same totalvariance decomposition as in (4) and reason that
and, moreover, that suitable quantiles from the permutation distribution can be used in the obvious way to construct approximate confidence intervals for .
Like in Chae et al (2018), we consider nine examples corresponding to different combinations of the following three kernels and mixing densities.
 Kernel 1.

;
 Kernel 2.

;
 Kernel 3.

;
 Mixing Density 1.

;
 Mixing Density 2.

;
 Mixing Density 3.

.
In what follows, Example  will refer to the case with kernel and mixing density , where and .
As a first visualization, we simulate observations from each of the above mixture models. For each data set, we run the predictive recursion algorithm for 200 randomly sampled permutations. Each panel in Figure 2 shows these 200 density estimates, , the true density, . As we expect, the cluster of permutationbased densities hugs the true mixing density rather closely throughout the range, with more variability in regions where the true density has more curvature. Also displayed in these panels is a central 95% interval from the sampling distribution of , at , based on 500 samples of size from the mixture model. As suggested by (5), the spread of the permutation distribution matches that of the sampling distribution relatively accurately at all three values and across all 9 of the examples.
To assess our claim that the permutationbased uncertainty quantification is approximately valid, we repeat the above experiment 500 times, extract the nominal 95% confidence interval for
based on the permutation distribution and check its coverage probability. Table
1 shows the estimated coverage probabilities for the all 9 examples, at each of the three values, and for two different sample sizes. Note, first, that the coverage probability increases from to . Major departures from the targeted 95% level are at values around which has considerable curvature; in regions where is smoother, the coverage probability estimate tends to be closer to the 95% level. The overall message is that the permutationbased distribution gives approximately valid uncertainty quantification across a variety of mixing densities and kernels.Example  

11  0.914  1.000  0.904  0.964  1.000  0.976 
21  0.882  0.990  0.882  0.956  1.000  0.952 
31  0.880  1.000  0.888  0.950  1.000  0.962 
12  0.994  0.476  0.948  1.000  0.488  0.982 
22  0.986  0.914  0.938  0.998  0.968  0.970 
32  0.972  0.910  0.918  0.998  0.966  0.972 
13  0.998  0.930  0.550  1.000  0.982  0.710 
23  0.994  0.862  0.378  1.000  0.938  0.540 
33  0.996  0.906  0.554  1.000  0.984  0.644 
5 Conclusion
This paper describes a simple permutationbased approach to uncertainty quantification about a mixing distribution by leveraging the builtin dependence of the predictive recursion estimator on the data ordering. The development and numerical results presented here suggest that the uncertainty quantification achieved by this approach, e.g., a confidence interval for the mixing distribution or density function, are valid in the sense that the frequentist coverage probability is approximately equal to the interval’s nominal level.
We will end with two concluding remarks. First, as noted in Section 4, we are currently lacking results on the pointwise consistency of the predictive recursion estimator of the mixing density estimate. At present, we have only results on almost sure convergence of mixing measure estimator to the truth in the weak topology. One idea is to prove the pointwise convergence directly using the structure of the predictive recursion estimator. Another idea is to check the available sufficient conditions, namely, equicontinuity (e.g., Boos 1985), to convert weak convergence of measures into uniform convergence of densities. Unfortunately, we were unable to push through either of these approaches, but this does not shake our confidence in the convergence conjecture.
Second, the idea of leveraging data ordering dependence employed herein is not specific to the predictive recursion estimator. That is, when data are independent and identically distributed or, more generally, exchangeable, and the density estimator in consideration depends on the data ordering, then the totalvariance argument in (5) could be applied and approximately valid uncertainty quantification could be achieved. A natural question is: are there any density estimators that depend on the data ordering? Interestingly, while offtheshelf estimators tend to be permutation invariant, one can consider Cesáro averages of these orderindependent estimators, which retain the estimator’s good asymptotic properties while simultaneously creating orderdependence that can be leveraged using the techniques presented here for valid uncertainty quantification.
Acknowledgments
This work is partially supported by the U.S. National Science Foundation, under grants DMS–1737929 and DMS–1811802.
References
 Bissanz et al (2007) Bissantz N., Dümbgen L., Holzman H., Munk A. (2007) Non‐parametric confidence bands in deconvolution density estimation. Statist. Method., 69(3):483–506.
 Bickel and Rosenblatt (1973) Bickel P. J., Rosenblatt M. (1973) On some global measures of the deviations of density function estimates. Ann. Statist., 1(6):1071–1095.
 Boos (1985) Boos D. (1985) A converse to Scheffe’s theorem. Ann. Statist., 13(1):423–427.
 Chae et al (2018) Chae, M., Martin R., and Walker S. Convergence of an iterative algorithm to the nonparametric MLE of a mixing distribution. Statist. Probab. Lett., 140:142–146.
 Dempster et al (1977) Demster, A., Laird, N., and Rubin, D. Maximumlikelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B, 39(1):1–38.
 Eggermont and LaRiccia (1995) Eggermont, P. and LaRiccia, V. (1995) Maximum smoothed likelihood density estimation for inverse problems. Ann. Statist., 23(1):199–220.
 Fan (1991) Fan, J. (1991) On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist., 19(3):1257–1272.
 Fortini and Petrone (2019) Fortini S. and Petrone S. (2019) QuasiBayes properties of a recursive procedure for mixtures, Unpublished mauscript, arXiv:1902.10708v1
 Ghosh and Tokdar (2006) Ghosh, J. K. and Tokdar, S. T. (2006). Convergence and consistency of Newton’s algorithm for estimating mixing distribution. In Fan, J. and Koul, H., editors, Frontiers in Statistics, pages 429–443. Imp. Coll. Press, London.
 Henna (2005) Henna, J. (2005) Estimation of the number of components of finite mixtures of multivariate distributions. Ann. Instit. Statist. Math., 57(4):655–664.
 Leroux (1992) Leroux, B. G. (1992) Consistent estimation of a mixing distribution. Ann. Statist., 20(3):1350–1360
 Lindsay (1995) Lindsay, B. G. (1995) Mixture Models: Theory, Geometry and Applications IMS, Haywood, CA.
 Link and Sauer (1995) Link W. and Sauer J. (1995) Estimation and confidence intervals for empirical mixing distributions. Biometrics 15(3), 810–821.
 Liu et all (2009) Liu, L., Levine, M. and Zhu, Y. (2009) A functional EM algorithm for mixing density estimation via nonparametric penalized likelihood maximization. J. Comput. Graph. Statist., 18(2):481–504.
 Martin (2018) Martin, R. (2018). On nonparametric estimation of a mixing density via the predictive recursion algorithm. Unpublished manuscript, arXiv:1812.02149.
 Martin and Ghosh (2008) Martin, R. and Ghosh, J. K. (2008). Stochastic approximation and Newton’s estimate of a mixing distribution. Statist. Sci., 23(3):365–382.
 Martin and Tokdar (2009) Martin, R. and Tokdar, S. T. (2009). Asymptotic properties of predictive recursion: robustness and rate of convergence. Electron. J. Stat., 3:1455–1472.
 Martin and Tokdar (2011) Martin, R. and Tokdar, S. T. (2011). Semiparametric inference in mixture models with predictive recursion marginal likelihood. Biometrika, 98(3):567–582.
 Martin and Tokdar (2012) Martin, R. and Tokdar, S. T. (2012). A nonparametric empirical Bayes framework for largescale multiple testing. Biostatistics, 13(3):427–439.
 McLachlan and Peel (2000) McLachlan, G. and Peel, D. (2000) Finite Mixture Models. Wiley Series in Probability and Statistics.
 Newton (2002) Newton, M. A. (2002). On a nonparametric recursive estimator of the mixing distribution. Sankhyā Ser. A, 64(2):306–322.

Newton et al. (1998)
Newton, M. A., Quintana, F. A., and Zhang, Y. (1998).
Nonparametric Bayes methods using predictive updating.
In Dey, D., Müller, P., and Sinha, D., editors,
Practical Nonparametric and Semiparametric Bayesian Statistics
, volume 133 of Lecture Notes in Statist., pages 45–61. Springer, New York.  Newton and Zhang (1999) Newton, M. A. and Zhang, Y. (1999). A recursive algorithm for nonparametric analysis with missing data. Biometrika, 86(1):15–26.
 Redner and Walker (1984) Redner, R. and Walker, H. (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26(2):195–239
 Richard and Green (1997) Richardson, S. and Green, P. (1997) On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. Ser. B. 59(4):731–792

Stefanski and Carroll (1990)
Stefanski, L. and Carroll, R.J. (1990)
Deconvoluting kernel density estimators.
Statistics, 21(2):169–184  Teel et al. (2015) Teel, C., Park, T., and Sampson A. (2015) EM estimation for finite mixture models with known mixture component size. Comm. Statist.Sim. and Comp., 44(6):1545–1556
 Tokdar et al. (2009) Tokdar, S. T., Martin, R., and Ghosh, J. K. (2009). Consistency of a recursive estimate of mixing distributions. Ann. Statist., 37(5A):2502–2522.
 van Dyk and Meng (2001) van Dyk, D. A. and Meng X.L. (2001) The art of data augmentation. J. Comput. Graph. Statist,, 10(1):1–111.
 Zhang (1990) Zhang, C. H (1990) Fourier methods for estimating mixing densities and distributions. Ann, Statist.,, 18(2):806–831
Comments
There are no comments yet.