1 Introduction
Sums of random variables are fundamental to modeling stochastic phenomena. In finance, risk managers need to predict the distribution of a portfolio’s future value which is the sum of multiple assets; similarly, the distribution of the sum of an individual asset’s returns over time is needed for valuation of some exotic (e.g. Asian) options McNeil et al. (2015); Rüschendorf (2013)
. In insurance, the probability of ruin (i.e. bankruptcy) is determined by the distribution of aggregate losses (sums of individual claims of random size)
Klugman et al. (2012); Asmussen and Albrecher (2010). Lastly, wireless system engineers model total interference in a wireless communications network as the sum of all interfering signals (often lognormally distributed) Fischione et al. (2007).In this article, we consider estimating the probability density function (pdf) of sums of random variables (rvs). That is, we wish to estimate the pdf of
, where is simulated according to the joint pdf. A major motivation for obtaining accurate pdf estimates of a rv is to produce confidence intervals for quantiles. For example, the US Nuclear Regulatory Commission specifies regulations in terms of the “95/95” rule, i.e. the upper 95
and then the quantile . In the obvious notation, we then have the convergence in distribution:
as where the limiting variance depends on the unknown density . Thus, any confidence intervals for require estimation of the density , which is a highly nontrivial problem.
In general, the pdf of a sum of rvs is only available via an dimensional convolution. The convolution usually cannot be computed analytically (except in some special cases, e.g., iid gammas or normals) or numerically via quadrature (unless is very small). Approximations have long been applied to this problem in the iid case for large . These include the central limit theorems Petrov (1975), Edgeworth expansions BarndorffNielsen and Cox (1989), and inversion of integral transforms.
An Edgeworth expansion is a generalization of the CLT, which constructs a nonnormal approximation based on the first moments (equivalently, cumulants) of (the first term of the edgeworth expansion is the CLT approximation, which may not be accurate for small ). Moreover, when the summands of are dependent, then the moment sequence of is unknown and needs to be estimated (e.g. by Monte Carlo), and small errors in the approximation of the higher moments can lead to large errors in the approximation. Hence, the method is not fully deterministic (as it may first appear), and requires careful calibration of the value to avoid numerical instabilities.
Another common method is to construct the Laplace transform (or characteristic function) of
and numerically invert it, using a method such as those described in Abate and Whitt (2006). However, when the summands are dependent, the Laplace transform of the sum is unknown, so one has to first estimate it (e.g. by Monte Carlo), and then numerically invert this approximation. Specialized methods have been developed for certain marginals and dependence structures (for example, the sum of lognormals case is considered by Laub et al. (2016)), but an approach for general distributions is still too difficult.Finally, Monte Carlo estimators such as Conditional Monte Carlo Asmussen (2017) and the Asmussen–Kroese estimator Asmussen and Kroese (2006) utilize details of ’s distribution to produce unbiased estimates with a dimension–independent rate of convergence of .
The purpose of this work is to explore an unbiased Monte Carlo estimator for the problem, with a focus on dependent summands.
The estimator is based on treating the pdf estimation problem as a derivative (of the cumulative distribution function) estimation problem. There are several advantages to the proposed estimator. First, we show that in certain settings it enjoys smaller variance than those based on the Conditional Monte Carlo approach. Secondly, the estimator only requires evaluation of the joint pdf up to a (typically unknown) normalizing constant, a situation similar to the application of Markov chain Monte Carlo. As a result of this, the sensitivity–based approach is useful in estimating posterior marginal densities in Bayesian inference (Section
5). The source code used in this paper is available online Laub et al. (2018).In our notation, we use lowercase boldface letters like , ,
for nonrandom vectors and uppercase boldface letters like
for random vectors, and for the vector of 1’s. If is of length , we write: . The innerproduct is denoted . For a differentiable function , we writeand use to denote the ’th component of .
2 Sensitivity Estimator
The estimator is derived from a simple application of Likelihood Ratio method Glynn (1990); Reiman and Weiss (1989), also known as the Score Function method Rubinstein (1986)), that is typically used for derivative estimation of performance measures in Discrete Event Systems. We thus tackle the pdf estimation problem by viewing it as a special type of sensitivity analysis. The basic idea appears in (Asmussen and Glynn, 2007, Chapter VII, Example 5.7), and our contribution is to weaken a technical condition and use a control variate to reduce variance. Furthermore, for the Gaussian copula case we introduce an novel conditional Monte Carlo estimator that is infinitely smooth (which may be useful in quasi Monte Carlo Asmussen and Glynn (2007)).
Assumption 1.
The random vector has a density , each is supported either on the entire real line or a halfreal line, the gradient is a continuous function on the support of , and we have the integrability condition (here ).
This assumption is slightly weaker than the one in (Asmussen and Glynn, 2007, Prop. 3.5 on page 222), which requires that is uniformly bounded by an integrable function of . The proposed estimator is based on the following simple formula, proved in the appendix.
Proposition 1.
It is straightforward to show that (1) still holds if the indicators are replaced by . This suggests the pair of (unbiased) estimators ():
We make use of both of these estimators by using one as a base estimator and the difference of the two as a control variate (the difference has a known expectation, namely, zero) Asmussen and Glynn (2007). In order to ensure the unbiasedness, we may, for example, obtain the control variate coefficient from a pilot (independent) sample, as explained in Section 4.
3 Conditional Monte Carlo Methods
In the following Sections 3.1 and 3.2 we describe the Conditional Monte Carlo approach Asmussen (2017), as well as an extension of the Asmussen–Kroese estimator. We then use these methods as benchmarks to illustrate the performance of the proposed estimator in various settings.
3.1 Conditional Monte Carlo estimator
The Conditional Monte Carlo estimator Asmussen (2017) takes the form
where the notation denotes the vector with the th component removed and . This is particularly simple for the independent case, as .
We now examine the dependent case where ’s dependence structure is given by an Archimedean copula with generator ; i.e., the cdf yields
where is the functional inverse of . The conditional densities of can be calculated from the formula ( denotes th derivative)
(2) 
Some Archimedean copulas, such as the Clayton and Gumbel–Hougaard copulas, have what is called a Marshall–Olkin representation. An Archimedean copula is in the Marshall–Olkin representation class if for some positive rv with cdf . Then an with this dependence structure can be simulated via
(3) 
For this case, Asmussen (Asmussen, 2017, Proposition 8.3) conditions upon the as well as to obtain what we call the extended Conditional Monte Carlo estimator
(4) 
where and is given by (3).
We will use this estimator as a benchmark in our comparisons later on.
3.2 Asmussen–Kroese estimator
The Asmussen–Kroese estimator Asmussen and Kroese (2006) (typically for tail probabilities) is defined as
where: and . Each , whenever . Thus, we can take the derivative of this piecewise estimator to obtain
which can be viewed as alternative conditional estimator. When it is applicable, we use the “extended” form of this estimator where is replaced with as in Section 3.1. Notice that the term in (4) does not appear here. We remark that (to the best of our knowledge) this variant of the AK estimator for estimation of a density has not been previously considered.
4 Numerical Comparisons
In this section,
for various distributions of we compare: i) our proposed method, ii) the conditional MC estimator, and iii) the Asmussen–Kroese (AK) estimator.
We conduct 3 experiments, each one depicted on Figures 1 to 3 below. Each experiment uses iid replicates of which are common to all estimators (our estimator uses the first 5
For each experiment we display a subplot of the estimated density function, as well as the estimated standard deviation and (square root of the)
worknormalized relative variance: . Here, CPU_Time is the (wall) time taken the by method to produce the estimates for the grid of 50 points.These examples show sums with dependent summands. When the copula has a Marshall–Olkin representation (3) we use it to simulate and give results for the extended version (4) of the conditional MC estimator.
Figure 1 considers the sum of dependent identicallydistributed heavytailed variables. The estimates plot shows us that the estimators basically agree with each other, as is to be expected when all methods perform well. In terms of WNRV and standard deviation the sensitivity estimator outperforms the others.
Figure 2 considers a sum of dependent lighttailed variables. The results here are similar to Figure 1. Again, the sensitivity estimator outperforms the others on WNRV and standard deviation.
Figure 3 shows the sum of dependent heavytailed variables. Instead of the standard multivariate lognormal distribution which has a Gaussian copula, we take the Frank copula. The Frank copula is unique among these tests as it is an Archimedean copula which lacks a Marshall–Olkin representation.
5 Extension to Estimation of Marginal Densities
One extension of the sensitivity estimator is in the estimation of marginal densities, which has multiple applications in Bayesian statistics. For an which satisfies Assumption 1, a similar derivation to the one in Proposition 1 gives the following representation of the marginal densities:
(5) 
for , and . We use the estimator with associated control variate that is based on (5). A nice feature of the corresponding estimator is that, due to the presence of the term, the normalizing constant of need not be known. As an example, we use Markov Chain Monte Carlo to obtain samples from the posterior density of a Bayesian model, and use these to estimate the posterior marginal pdfs with our sensitivity estimator.
We consider the wellknown “Pima Indians” dataset (standardized), which records a binary response variable (the incidence of diabetes) for
women, along with seven possible predictors. We specify a Logistic Regression model with predictors:
Number of Pregnancies, Plasma Glucose Concentration, Body Mass Index, Diabetes Pedigree Function, and Age (see Friel and Wyse (2012) for justification). The prior is , as in Friel and Wyse (2012).To obtain samples from the posterior density, we implement an isotropic Random Walk sampler, using a radially symmetric Gaussian density with (trace plots indicate this choice mixes well for the model).
We ran the Random Walk sampler for steps for burnin, then used the next samples (without any thinning) to obtain a KDE, as well as density estimates using our sensitivity estimator (with control variate). As a benchmark, we compare the accuracy with a KDE constructed using every th sample from an MCMC chain of length . The result of this comparison is depicted in Figure 4.
As expected, using the same set of samples, the sensitivity estimator yields a more accurate estimate than KDE. The reason for the lower accuracy of KDE in this context is wellknown — a mean square error convergence of , instead of the canonical Monte Carlo rate of , due to the presence of nonnegligible bias in the KDE estimator (see Botev et al. (2010), for example).
It is important to note that due to the term, the sensitivity estimator can have large variance for very small , even when or is not close to zero. This problem can be resolved with a simple linear shift, as follows. If one element, say , is supported on , then for , where . We can then use the original estimator (with shifted values of and ) to obtain estimates of the density of near or at zero.
6 Extensions for a Gaussian Copula with Positive Random Variables
Recall that we wish to estimate the derivative of the cdf where . Here we consider random variables that are positive and thus consider . We can then write where (that is, has the same distribution as ). When , the nested sequence of events:
suggests the possibility of sequentially simulating the components of the vector :
In other words, for each is drawn from the conditional density:
(6) 
where are normalizing constants that depend on . If we then simulate
we obtain the smooth unbiased estimator: . In addition, an application of the likelihood ratio method Asmussen and Kroese (2006); Glynn (1990); Reiman and Weiss (1989); Rubinstein (1986) gives us an estimator similar to (1), but without the indicator function:
(7) 
Note that the density of depends on the specific at which we estimate . We can use the inverse transform method to remove the dependence on . The inverse transform method allows us to write for some smooth function (that depends on
) and a uniformly distributed vector
(which does not depend on ).In summary, in order to use estimator (7), we must be able to: a) easily simulate from the truncated (or conditional) densities in (6) via the inverse transform method; b) evaluate easily the normalizing constants of the conditional densities in (6).
Our finding is that satisfying requirement a) is too difficult for the family of Archimedean copulas. This is because no simple formulas exist for the inverse cdfs of the densities in (2) and thus numerical rootfinding methods are required. Nevertheless, estimator (7) is viable for the Gaussian copula under which the vector for some correlation matrix . This is because the conditional densities of the multivariate normal density are also normal and their evaluation requires standard linear algebra manipulations of .
As an example, Figure 5 shows the density of the sum of standard lognormal variates (that is, marginally each ), whose dependence is induced by a Gaussian copula with correlation matrix for .
7 Conclusion
In this paper we derived a sensitivitybased estimator of the pdf of the sum of (dependent) random variables and performed a short numerical comparison. To achieve this we used standard techniques from sensitivity analysis — we constructed an estimator of the cdf of the sum of random variables and then differentiated this cdf estimator in order to estimate the density. The cdf estimator was constructed using either a change of measure (giving us the likelihood ratio method), or conditional Monte Carlo (as with the AsmussenKroese estimator), or both (as with the Gaussian copula example).
Overall, the numerical comparison indicates that there isn’t a single best estimator in all settings. Nevertheless, the proposed sensitivity estimator will likely be preferable in settings where can be computed very quickly, and most useful when the conditional Monte Carlo approach is difficult to apply.
As future research we suggest exploring the use of quasirandom numbers in order to further reduce the variance of any smooth density estimators such as (7).
References
References
 Abate and Whitt (2006) J. Abate, W. Whitt, A unified framework for numerically inverting Laplace transforms, INFORMS Journal on Computing 18 (2006) 408–421.
 Asmussen (2017) S. Asmussen, Conditional Monte Carlo for sums, with applications to insurance and finance, Annals of Actuarial Science (2017) 1–24.
 Asmussen and Albrecher (2010) S. Asmussen, H. Albrecher, Ruin probabilities, World Scientific Publishing Co Pte Ltd, 2010.
 Asmussen et al. (2011) S. Asmussen, J. Blanchet, S. Juneja, L. RojasNandayapa, Efficient simulation of tail probabilities of sums of correlated lognormals, Annals of Operations Research 189 (2011) 5–23.
 Asmussen and Glynn (2007) S. Asmussen, P.W. Glynn, Stochastic Simulation: Algorithms and Analysis, volume 57, Springer, 2007.
 Asmussen and Kroese (2006) S. Asmussen, D.P. Kroese, Improved algorithms for rare event simulation with heavy tails, Adv. in Appl. Probab. 38 (2006) 545–558.
 BarndorffNielsen and Cox (1989) O.E. BarndorffNielsen, D.R. Cox, Asymptotic techniques for use in statistics, Monographs on Statistics and Applied Probability, Springer Science & Business Media, 1989.

Botev et al. (2010)
Z.I. Botev, J.F. Grotowski, D.P. Kroese, Kernel density estimation via diffusion, The Annals of Statistics 38 (2010) 2916–2957.
 Commission (2011) U.N.R. Commission, Applying Statistics. U.S. Nuclear Regulatory Commission Report NUREG1475, Rev.1, U.S. Nuclear Regulatory Commission, Washington, DC., 2011.
 Fischione et al. (2007) C. Fischione, F. Graziosi, F. Santucci, Approximation for a sum of onoff lognormal processes with wireless applications, IEEE Transactions on Communications 55 (2007) 1984–1993.
 Friel and Wyse (2012) N. Friel, J. Wyse, Estimating the evidence – a review, Statistica Neerlandica 66 (2012) 288–308.
 Glynn (1990) P.W. Glynn, Likelihood ratio gradient estimation for stochastic systems, Commun. ACM 33 (1990) 75–84.
 Klugman et al. (2012) S.A. Klugman, H.H. Panjer, G.E. Willmot, Loss models: from data to decisions, volume 715, John Wiley & Sons, 2012.
 Laub et al. (2016) P.J. Laub, S. Asmussen, J.L. Jensen, L. RojasNandayapa, Approximating the Laplace transform of the sum of dependent lognormals, Advances in Applied Probability. (2016).
 Laub et al. (2018) P.J. Laub, R. Salomone, Z.I. Botev, Source code and supplementary material for “Monte Carlo Estimation of the Density of the Sum of Dependent Random Variables”, 2018. Available at https://github.com/PatLaub/PushoutDensityEstimation.
 L’Ecuyer (1990) P. L’Ecuyer, A unified view of the IPA, SF, and LR gradient estimation techniques, Management Science 36 (1990) 1364–1383.
 McNeil et al. (2015) A.J. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, 2nd edition, 2015.
 Petrov (1975) V.V. Petrov, Sums of independent random variables, Sums of independent random variables, Springer Berlin Heidelberg, Berlin, Heidelberg, 1975.
 Reiman and Weiss (1989) M.I. Reiman, A. Weiss, Sensitivity analysis for simulations via likelihood ratios, Operations Research 37 (1989) 830–844.

Rosenthal (2006)
J.S. Rosenthal, A first look at rigorous probability theory, World Scientific Publishing Co Inc, 2006.
 Rubinstein (1986) R.Y. Rubinstein, The score function approach for sensitivity analysis of computer simulation models, Mathematics and Computers in Simulation 28 (1986) 351–379.
 Rüschendorf (2013) L. Rüschendorf, Mathematical risk analysis, Springer, 2013.
 Serfling (1980) R.J. Serfling, Approximation theorems of mathematical statistics, Wiley series in probability and mathematical statistics, Wiley, New York, 1980.
Appendix A Proof of Proposition 1
There are many ways to derive this formula. One of the simplest is to use the likelihood ratio method ((Asmussen and Kroese, 2006, Ch.VII, (4.1)), Glynn (1990); Reiman and Weiss (1989); Rubinstein (1986)), which requires the interchange of differentiation and integration. A general sufficient condition for this interchange to be valid is given in (L’Ecuyer, 1990, Theorem 1). The proof in this reference uses the dominated convergence theorem, which requires that is uniformly bounded by an integrable function of . In our derivation below, we instead use the FubiniTonelli theorem, which only requires the integrability of with respect to .
Define the cdf so that the pdf is . The change of variables yields:
where the notation means if , else for .
Let . We will use the fact that almost everywhere (i.e. except possibly on sets of zero Lebesgue measure) on for an arbitrarily small .
In order to justify the identity (almost everywhere) in the case of (similar arguments apply for ), we use the FubiniTonelli theorem for exchanging the order of integration. This exchange holds under the integrability condition
(8) 
and the existence of a continuous , both of which follow from Assumption 1 (verified at the end of this proof). Using the FubiniTonelli theorem Rosenthal (2006) we then write:
Hence, by the fundamental theorem of Calculus, equals the derivative of up to a set of measure zero. In other words, almost everywhere. To proceed, we write
so after a change of variables and using , we obtain
To verify (8), note that after using the change of variable above, it can be upper bounded by
which is bounded by assumption.
Comments
There are no comments yet.