 # Monte Carlo Estimation of the Density of the Sum of Dependent Random Variables

We introduce a novel unbiased estimator for the density of a sum of random variables. Our estimator possesses several advantages over the conditional Monte Carlo approach. Specifically, it applies to the case of dependent random variables, allows for transformations of random variables, is computationally faster to run, and is simpler to implement. We provide several numerical examples that illustrate these advantages.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Sums of random variables are fundamental to modeling stochastic phenomena. In finance, risk managers need to predict the distribution of a portfolio’s future value which is the sum of multiple assets; similarly, the distribution of the sum of an individual asset’s returns over time is needed for valuation of some exotic (e.g. Asian) options McNeil et al. (2015); Rüschendorf (2013)

. In insurance, the probability of ruin (i.e. bankruptcy) is determined by the distribution of aggregate losses (sums of individual claims of random size)

Klugman et al. (2012); Asmussen and Albrecher (2010). Lastly, wireless system engineers model total interference in a wireless communications network as the sum of all interfering signals (often lognormally distributed) Fischione et al. (2007).

In this article, we consider estimating the probability density function (pdf) of sums of random variables (rvs). That is, we wish to estimate the pdf of

, where is simulated according to the joint pdf

. A major motivation for obtaining accurate pdf estimates of a rv is to produce confidence intervals for quantiles. For example, the US Nuclear Regulatory Commission specifies regulations in terms of the “95/95” rule, i.e. the upper 95

 ˆFX(x)=1RR∑r=1I{X[r]≤x}for X,…,X[R]iid∼FX,

and then the quantile . In the obvious notation, we then have the convergence in distribution:

 √R(ˆqα−qα)D⟶N(0,α(1−α)/[fX(qα)]2)

as where the limiting variance depends on the unknown density . Thus, any confidence intervals for require estimation of the density , which is a highly nontrivial problem.

In general, the pdf of a sum of rvs is only available via an -dimensional convolution. The convolution usually cannot be computed analytically (except in some special cases, e.g., iid gammas or normals) or numerically via quadrature (unless is very small). Approximations have long been applied to this problem in the iid case for large . These include the central limit theorems Petrov (1975), Edgeworth expansions Barndorff-Nielsen and Cox (1989), and inversion of integral transforms.

An Edgeworth expansion is a generalization of the CLT, which constructs a non-normal approximation based on the first moments (equivalently, cumulants) of (the first term of the edgeworth expansion is the CLT approximation, which may not be accurate for small ). Moreover, when the summands of are dependent, then the moment sequence of is unknown and needs to be estimated (e.g. by Monte Carlo), and small errors in the approximation of the higher moments can lead to large errors in the approximation. Hence, the method is not fully deterministic (as it may first appear), and requires careful calibration of the value to avoid numerical instabilities.

Another common method is to construct the Laplace transform (or characteristic function) of

and numerically invert it, using a method such as those described in Abate and Whitt (2006). However, when the summands are dependent, the Laplace transform of the sum is unknown, so one has to first estimate it (e.g. by Monte Carlo), and then numerically invert this approximation. Specialized methods have been developed for certain marginals and dependence structures (for example, the sum of lognormals case is considered by Laub et al. (2016)), but an approach for general distributions is still too difficult.

Finally, Monte Carlo estimators such as Conditional Monte Carlo Asmussen (2017) and the Asmussen–Kroese estimator Asmussen and Kroese (2006) utilize details of ’s distribution to produce unbiased estimates with a dimension–independent rate of convergence of .

The purpose of this work is to explore an unbiased Monte Carlo estimator for the problem, with a focus on dependent summands.

The estimator is based on treating the pdf estimation problem as a derivative (of the cumulative distribution function) estimation problem. There are several advantages to the proposed estimator. First, we show that in certain settings it enjoys smaller variance than those based on the Conditional Monte Carlo approach. Secondly, the estimator only requires evaluation of the joint pdf up to a (typically unknown) normalizing constant, a situation similar to the application of Markov chain Monte Carlo. As a result of this, the sensitivity–based approach is useful in estimating posterior marginal densities in Bayesian inference (Section

5). The source code used in this paper is available online Laub et al. (2018).

In our notation, we use lowercase boldface letters like , ,

for non-random vectors and uppercase boldface letters like

for random vectors, and for the vector of 1’s. If is of length , we write: . The inner-product is denoted . For a differentiable function , we write

 ∇f(z)=(∂f(x)/∂x1,…,∂f(x)/∂xn)⊤∣∣x=z,

and use to denote the ’th component of .

## 2 Sensitivity Estimator

The estimator is derived from a simple application of Likelihood Ratio method Glynn (1990); Reiman and Weiss (1989), also known as the Score Function method Rubinstein (1986)), that is typically used for derivative estimation of performance measures in Discrete Event Systems. We thus tackle the pdf estimation problem by viewing it as a special type of sensitivity analysis. The basic idea appears in (Asmussen and Glynn, 2007, Chapter VII, Example 5.7), and our contribution is to weaken a technical condition and use a control variate to reduce variance. Furthermore, for the Gaussian copula case we introduce an novel conditional Monte Carlo estimator that is infinitely smooth (which may be useful in quasi Monte Carlo Asmussen and Glynn (2007)).

###### Assumption 1.

The random vector has a density , each is supported either on the entire real line or a half-real line, the gradient is a continuous function on the support of , and we have the integrability condition (here ).

This assumption is slightly weaker than the one in (Asmussen and Glynn, 2007, Prop. 3.5 on page 222), which requires that is uniformly bounded by an -integrable function of . The proposed estimator is based on the following simple formula, proved in the appendix.

###### Proposition 1.

For the rv where satisfies Assumption 1,

 fS(s) =−1sE{I{1⋅X≤s}[X⋅∇logfX(X)+n]} (1)

for any .

It is straightforward to show that (1) still holds if the indicators are replaced by . This suggests the pair of (unbiased) estimators ():

We make use of both of these estimators by using one as a base estimator and the difference of the two as a control variate (the difference has a known expectation, namely, zero) Asmussen and Glynn (2007). In order to ensure the unbiasedness, we may, for example, obtain the control variate coefficient from a pilot (independent) sample, as explained in Section 4.

## 3 Conditional Monte Carlo Methods

In the following Sections 3.1 and 3.2 we describe the Conditional Monte Carlo approach Asmussen (2017), as well as an extension of the Asmussen–Kroese estimator. We then use these methods as benchmarks to illustrate the performance of the proposed estimator in various settings.

### 3.1 Conditional Monte Carlo estimator

The Conditional Monte Carlo estimator Asmussen (2017) takes the form

 ˆfCond(s)=1nn∑i=1fXi|X−i(s−S−i),X∼FX,

where the notation denotes the vector with the -th component removed and . This is particularly simple for the independent case, as .

We now examine the dependent case where ’s dependence structure is given by an Archimedean copula with generator ; i.e., the cdf yields

 P(X1≤F−1X1(u1),…,Xn≤F−1Xn(un))=ϕ(∑ni=1ψ(ui)),u∈[0,1]n,

where is the functional inverse of . The conditional densities of can be calculated from the formula ( denotes -th derivative)

 fXi|X−i(xi|x−i)=fXi(xi)ψ(1)(FXi(xi))ϕ(n)(∑nj=1ψ(FXj(xj)))ϕ(n−1)(∑j≠iψ(FXj(xj))). (2)

Some Archimedean copulas, such as the Clayton and Gumbel–Hougaard copulas, have what is called a Marshall–Olkin representation. An Archimedean copula is in the Marshall–Olkin representation class if for some positive rv with cdf . Then an with this dependence structure can be simulated via

 X=(F−1X1(ϕ(E1Z)),…,F−1Xn(ϕ(EnZ))),Eiiid∼Exp(1),Z∼FZ. (3)

For this case, Asmussen (Asmussen, 2017, Proposition 8.3) conditions upon the as well as to obtain what we call the extended Conditional Monte Carlo estimator

 ˆfExtCond(s) =1nn∑i=1fXi|X−i,Z(s−S−i), (4)

where and is given by (3).

We will use this estimator as a benchmark in our comparisons later on.

### 3.2 Asmussen–Kroese estimator

The Asmussen–Kroese estimator Asmussen and Kroese (2006) (typically for tail probabilities) is defined as

 ˆFAK(s)=1−n∑i=1¯¯¯¯FXi|X−i(max{M−i,s−S−i})

where: and . Each , whenever . Thus, we can take the derivative of this piecewise estimator to obtain

 ˆfAK(s)=n∑i=1fXi|X−i(s−S−i)I{M−i+S−i≤s},

which can be viewed as alternative conditional estimator. When it is applicable, we use the “extended” form of this estimator where is replaced with as in Section 3.1. Notice that the term in (4) does not appear here. We remark that (to the best of our knowledge) this variant of the AK estimator for estimation of a density has not been previously considered.

## 4 Numerical Comparisons

In this section,

for various distributions of we compare: i) our proposed method, ii) the conditional MC estimator, and iii) the Asmussen–Kroese (AK) estimator.

We conduct 3 experiments, each one depicted on Figures 1 to 3 below. Each experiment uses iid replicates of which are common to all estimators (our estimator uses the first 5

For each experiment we display a subplot of the estimated density function, as well as the estimated standard deviation and (square root of the)

work-normalized relative variance: . Here, CPU_Time is the (wall) time taken the by method to produce the estimates for the grid of 50 points.

These examples show sums with dependent summands. When the copula has a Marshall–Olkin representation (3) we use it to simulate and give results for the extended version (4) of the conditional MC estimator.

Figure 1 considers the sum of dependent identically-distributed heavy-tailed variables. The estimates plot shows us that the estimators basically agree with each other, as is to be expected when all methods perform well. In terms of WNRV and standard deviation the sensitivity estimator outperforms the others. Figure 1: Sum of n=10 Weibull(0.3,1) random variables with a Clayton(1/5) copula.

Figure 2 considers a sum of dependent light-tailed variables. The results here are similar to Figure 1. Again, the sensitivity estimator outperforms the others on WNRV and standard deviation. Figure 2: Sum of n=15 Exp(1) random variables with a GumbelHougaard(5) copula. Figure 3: Sum of n=10 random variables where Xi∼Lognormal(i−10,√i) with a Frank(1/1000) copula. The choice of marginals mimic the challenging (and somewhat pathological) example considered in Asmussen et al. (2011).

Figure 3 shows the sum of dependent heavy-tailed variables. Instead of the standard multivariate lognormal distribution which has a Gaussian copula, we take the Frank copula. The Frank copula is unique among these tests as it is an Archimedean copula which lacks a Marshall–Olkin representation.

## 5 Extension to Estimation of Marginal Densities

One extension of the sensitivity estimator is in the estimation of marginal densities, which has multiple applications in Bayesian statistics. For an which satisfies Assumption 1, a similar derivation to the one in Proposition 1 gives the following representation of the marginal densities:

 (5)

for , and . We use the estimator with associated control variate that is based on (5). A nice feature of the corresponding estimator is that, due to the presence of the term, the normalizing constant of need not be known. As an example, we use Markov Chain Monte Carlo to obtain samples from the posterior density of a Bayesian model, and use these to estimate the posterior marginal pdfs with our sensitivity estimator.

We consider the well-known “Pima Indians” dataset (standardized), which records a binary response variable (the incidence of diabetes) for

women, along with seven possible predictors. We specify a Logistic Regression model with predictors:

Number of Pregnancies, Plasma Glucose Concentration, Body Mass Index, Diabetes Pedigree Function, and Age (see Friel and Wyse (2012) for justification). The prior is , as in Friel and Wyse (2012).

To obtain samples from the posterior density, we implement an isotropic Random Walk sampler, using a radially symmetric Gaussian density with (trace plots indicate this choice mixes well for the model).

We ran the Random Walk sampler for steps for burn-in, then used the next samples (without any thinning) to obtain a KDE, as well as density estimates using our sensitivity estimator (with control variate). As a benchmark, we compare the accuracy with a KDE constructed using every -th sample from an MCMC chain of length . The result of this comparison is depicted in Figure 4. Figure 4: Density estimation of posterior marginal corresponding to the coefficient parameter of the Body Mass Indexpredictor variable ( results from two runs are shown).

As expected, using the same set of samples, the sensitivity estimator yields a more accurate estimate than KDE. The reason for the lower accuracy of KDE in this context is well-known — a mean square error convergence of , instead of the canonical Monte Carlo rate of , due to the presence of non-negligible bias in the KDE estimator (see Botev et al. (2010), for example).

It is important to note that due to the term, the sensitivity estimator can have large variance for very small , even when or is not close to zero. This problem can be resolved with a simple linear shift, as follows. If one element, say , is supported on , then for , where . We can then use the original estimator (with shifted values of and ) to obtain estimates of the density of near or at zero.

## 6 Extensions for a Gaussian Copula with Positive Random Variables

Recall that we wish to estimate the derivative of the cdf where . Here we consider random variables that are positive and thus consider . We can then write where (that is, has the same distribution as ). When , the nested sequence of events:

 {Y1≤1}⊇{Y2≤1−Y1}⊇{Y3≤1−Y1−Y2}⊇⋯⊇{Yn≤1−(Y1+⋯+Yn−1)}

suggests the possibility of sequentially simulating the components of the vector :

 Y1→Y2|Y1→Y3|(Y1,Y2)→⋯→Yn|(Y1,…,Yn−1)

In other words, for each is drawn from the conditional density:

 gk(yk;s|y1,…,yk−1):=fY(yk;s|y1,…,yk−1)I{yk≤1−∑j

where are normalizing constants that depend on . If we then simulate

 Y∼g(y;s):=∏kgk(yk;s|y1,…,yk−1),

we obtain the smooth unbiased estimator: . In addition, an application of the likelihood ratio method Asmussen and Kroese (2006); Glynn (1990); Reiman and Weiss (1989); Rubinstein (1986) gives us an estimator similar to (1), but without the indicator function:

 ^fS(s)=(Y⋅∇logfX(sY)+ns)n∏k=1αk(Y1,…,Yk−1;s). (7)

Note that the density of depends on the specific at which we estimate . We can use the inverse transform method to remove the dependence on . The inverse transform method allows us to write for some smooth function (that depends on

) and a uniformly distributed vector

(which does not depend on ).

In summary, in order to use estimator (7), we must be able to: a) easily simulate from the truncated (or conditional) densities in (6) via the inverse transform method; b) evaluate easily the normalizing constants of the conditional densities in (6).

Our finding is that satisfying requirement a) is too difficult for the family of Archimedean copulas. This is because no simple formulas exist for the inverse cdfs of the densities in (2) and thus numerical root-finding methods are required. Nevertheless, estimator (7) is viable for the Gaussian copula under which the vector for some correlation matrix . This is because the conditional densities of the multivariate normal density are also normal and their evaluation requires standard linear algebra manipulations of .

As an example, Figure 5 shows the density of the sum of standard lognormal variates (that is, marginally each ), whose dependence is induced by a Gaussian copula with correlation matrix for . Figure 5: The density of the sum of 32 dependent lognormal random variables for three correlation values ρ. The estimate is based on the average of R=105 iid replications of (7).

## 7 Conclusion

In this paper we derived a sensitivity-based estimator of the pdf of the sum of (dependent) random variables and performed a short numerical comparison. To achieve this we used standard techniques from sensitivity analysis — we constructed an estimator of the cdf of the sum of random variables and then differentiated this cdf estimator in order to estimate the density. The cdf estimator was constructed using either a change of measure (giving us the likelihood ratio method), or conditional Monte Carlo (as with the Asmussen-Kroese estimator), or both (as with the Gaussian copula example).

Overall, the numerical comparison indicates that there isn’t a single best estimator in all settings. Nevertheless, the proposed sensitivity estimator will likely be preferable in settings where can be computed very quickly, and most useful when the conditional Monte Carlo approach is difficult to apply.

As future research we suggest exploring the use of quasirandom numbers in order to further reduce the variance of any smooth density estimators such as (7).

## References

• Abate and Whitt (2006) J. Abate, W. Whitt, A unified framework for numerically inverting Laplace transforms, INFORMS Journal on Computing 18 (2006) 408–421.
• Asmussen (2017) S. Asmussen, Conditional Monte Carlo for sums, with applications to insurance and finance, Annals of Actuarial Science (2017) 1–24.
• Asmussen and Albrecher (2010) S. Asmussen, H. Albrecher, Ruin probabilities, World Scientific Publishing Co Pte Ltd, 2010.
• Asmussen et al. (2011) S. Asmussen, J. Blanchet, S. Juneja, L. Rojas-Nandayapa, Efficient simulation of tail probabilities of sums of correlated lognormals, Annals of Operations Research 189 (2011) 5–23.
• Asmussen and Glynn (2007) S. Asmussen, P.W. Glynn, Stochastic Simulation: Algorithms and Analysis, volume 57, Springer, 2007.
• Asmussen and Kroese (2006) S. Asmussen, D.P. Kroese, Improved algorithms for rare event simulation with heavy tails, Adv. in Appl. Probab. 38 (2006) 545–558.
• Barndorff-Nielsen and Cox (1989) O.E. Barndorff-Nielsen, D.R. Cox, Asymptotic techniques for use in statistics, Monographs on Statistics and Applied Probability, Springer Science & Business Media, 1989.
• Botev et al. (2010)

Z.I. Botev, J.F. Grotowski, D.P. Kroese, Kernel density estimation via diffusion, The Annals of Statistics 38 (2010) 2916–2957.

• Commission (2011) U.N.R. Commission, Applying Statistics. U.S. Nuclear Regulatory Commission Report NUREG-1475, Rev.1, U.S. Nuclear Regulatory Commission, Washington, DC., 2011.
• Fischione et al. (2007) C. Fischione, F. Graziosi, F. Santucci, Approximation for a sum of on-off lognormal processes with wireless applications, IEEE Transactions on Communications 55 (2007) 1984–1993.
• Friel and Wyse (2012) N. Friel, J. Wyse, Estimating the evidence – a review, Statistica Neerlandica 66 (2012) 288–308.
• Glynn (1990) P.W. Glynn, Likelihood ratio gradient estimation for stochastic systems, Commun. ACM 33 (1990) 75–84.
• Klugman et al. (2012) S.A. Klugman, H.H. Panjer, G.E. Willmot, Loss models: from data to decisions, volume 715, John Wiley & Sons, 2012.
• Laub et al. (2016) P.J. Laub, S. Asmussen, J.L. Jensen, L. Rojas-Nandayapa, Approximating the Laplace transform of the sum of dependent lognormals, Advances in Applied Probability. (2016).
• Laub et al. (2018) P.J. Laub, R. Salomone, Z.I. Botev, Source code and supplementary material for “Monte Carlo Estimation of the Density of the Sum of Dependent Random Variables”, 2018. Available at https://github.com/Pat-Laub/PushoutDensityEstimation.
• L’Ecuyer (1990) P. L’Ecuyer, A unified view of the IPA, SF, and LR gradient estimation techniques, Management Science 36 (1990) 1364–1383.
• McNeil et al. (2015) A.J. McNeil, R. Frey, P. Embrechts, Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, 2nd edition, 2015.
• Petrov (1975) V.V. Petrov, Sums of independent random variables, Sums of independent random variables, Springer Berlin Heidelberg, Berlin, Heidelberg, 1975.
• Reiman and Weiss (1989) M.I. Reiman, A. Weiss, Sensitivity analysis for simulations via likelihood ratios, Operations Research 37 (1989) 830–844.
• Rosenthal (2006)

J.S. Rosenthal, A first look at rigorous probability theory, World Scientific Publishing Co Inc, 2006.

• Rubinstein (1986) R.Y. Rubinstein, The score function approach for sensitivity analysis of computer simulation models, Mathematics and Computers in Simulation 28 (1986) 351–379.
• Rüschendorf (2013) L. Rüschendorf, Mathematical risk analysis, Springer, 2013.
• Serfling (1980) R.J. Serfling, Approximation theorems of mathematical statistics, Wiley series in probability and mathematical statistics, Wiley, New York, 1980.

## Appendix A Proof of Proposition 1

There are many ways to derive this formula. One of the simplest is to use the likelihood ratio method ((Asmussen and Kroese, 2006, Ch.VII, (4.1)), Glynn (1990); Reiman and Weiss (1989); Rubinstein (1986)), which requires the interchange of differentiation and integration. A general sufficient condition for this interchange to be valid is given in (L’Ecuyer, 1990, Theorem 1). The proof in this reference uses the dominated convergence theorem, which requires that is uniformly bounded by an -integrable function of . In our derivation below, we instead use the Fubini-Tonelli theorem, which only requires the integrability of with respect to .

Define the cdf so that the pdf is . The change of variables yields:

 FS(s)=∫RsfX(sy)|s|ndys≠0,

where the notation means if , else for .

Let . We will use the fact that almost everywhere (i.e. except possibly on sets of zero Lebesgue measure) on for an arbitrarily small .

In order to justify the identity (almost everywhere) in the case of (similar arguments apply for ), we use the Fubini-Tonelli theorem for exchanging the order of integration. This exchange holds under the integrability condition

 (8)

and the existence of a continuous , both of which follow from Assumption 1 (verified at the end of this proof). Using the Fubini-Tonelli theorem Rosenthal (2006) we then write:

 ∫sϵφ(t)dt=∫sϵ∫1⋅y≤1ddt(fX(ty)tn)dydt=∫1⋅y≤1∫sϵddt(fX(ty)tn)dtdy=∫1⋅y≤1(fX(sy)sn−fX(ϵy)ϵn)dy=FS(s)−FS(ϵ)

Hence, by the fundamental theorem of Calculus, equals the derivative of up to a set of measure zero. In other words, almost everywhere. To proceed, we write

 fS(s)=φ(s)

so after a change of variables and using , we obtain

 fS(s) =∫1⋅x≤s[xs⋅∇logfX(x)+ns]fX(x)dx=1sE{I{1⋅X≤s}[X⋅∇logfX(X)+n]}.

To verify (8), note that after using the change of variable above, it can be upper bounded by

 ∫sϵ1tE{I{1⋅X≤t}|X⋅∇logfX(X)+n|}dt≤(E∣∣X⋅∇logfX(X)∣∣+n)∫sϵ1tdt<∞,

which is bounded by assumption.