1 Introduction
We address in this article the problem of constructing a numerical approximation of the expectation , where
is a real random variable with known distribution
, and is a real function that is bounded and monotone. Such a problem occurs naturally in applications where one is interested in computing a risk using a model that provides an increasing conditional risk with respect to some random variable . This situation occurs for instance in the field of food safety, with a dose of pathogen andthe corresponding probability of food-borne illness
(see, e.g., Perrin et al., 2014).Assuming that the cumulative distribution function of
is continuous, the problem reduces after change of variable and scaling to the computation of , whereis uniformly distributed on
and belongs to the class of all non-decreasing functions defined on and taking values in . We work in this article in a fixed sample-size setting, where the number of evaluations of to be performed is chosen beforehand.This problem was first studied by Kiefer (1957), who proved that considering regularly-spaced evaluations at , , and then using the trapezoidal integration rule assuming and , is optimal in the worst-case sense among all deterministic—possibly sequential—methods. The corresponding value of the maximal error is . Novak (1992) later studied Monte Carlo (a.k.a. randomized) methods, and established that sequential methods are better in this setting than nonsequential ones, with a minimax rate of over for the error.
This article revisits the nonsequential setting with a focus on unbiased Monte Carlo methods, which are a key building block for the construction of good (rate-optimal) sequential methods—as can be learned from the proof of Theorem 3 in Novak’s article. Section 2 derives a lower bound for the maximal -error of nonsequential methods, for any , which is a generalization of a result by Novak (1992) concerning the error. Sections 3 and 4 then study the maximal
error (variance) of two simple unbiased methods, based respectively on the control variate technique and on stratification. Section
5 concludes the article with a discussion.2 A lower bound for the maximal error
A nonsequential (also called non-adaptive) Monte Carlo method first evaluates the function at random points , …, in , and then approximates the integral
using an estimator
(1) |
where is a measurable function. A nonsequential method is thus defined by two ingredients: the distribution of and the function . The worst-case error of such a method over the class is
(2) |
Remark 1.
The class of nonsequential Monte Carlo methods as usually defined in the literature also allows to be randomized (i.e., allows
to be a random function). We have not considered randomized estimators in our definition, however, since Rao-Blackwell’s theorem implies that they do not help in this setting, for any convex loss function.
Novak (1992) proved that for any nonsequential Monte Carlo method with sample size , the maximal error is greater or equal to . We generalize this result to the case of the error.
Theorem 2.1.
For any nonsequential Monte Carlo methods with sample size ,
Observe that Novak’s lower bound is recovered for . Using Theorem 2.1 with we can deduce of lower bound for the variance of unbiased nonsequential methods.
Corollary 2.2.
For any unbiased nonsequential Monte Carlo method with sample size ,
Proof of Theorem 2.1.
Consider a nonsequential Monte Carlo methods with evaluation points , …, and estimator . Divide the interval into equal subintervals of length : then at least one of the subintervals, call it , will contain no evaluation point with probability at least . Now construct two functions that are both equal to zero on the left of , equal to one on the right, and such that and on . Then , and on the event , since and coincide outside of . It follows that
where denotes the common value of and on . We conclude that
using the fact that, for any and , . ∎
3 Uniform i.i.d. sampling
The simple Monte Carlo method is the most common example of a nonsequential method: the evaluation points , …, are drawn independently, uniformly in , and then the integral is estimated by . The estimator is clearly unbiased, and it follows from Popoviciu’s inequality—i.e., for any random variable taking values in —that
The maximal error is attained when is a unit step function jumping at . It turns out that a smaller error can be achieved, for the same (uniform i.i.d.) sampling scheme, using the control variate technique. More specifically, we consider the control variate and set
Theorem 3.1.
The estimator is unbiased, and satisfies
The maximal error is attained for any unit step function.
Proof.
The estimator is unbiased since , and therefore the mean-squared error is equal to . For a unit step function with a jump at , the random variable is uniformly distributed over , which yields as claimed. It remains to show that for all .
Let denote the class of all non-decreasing staircase functions of the form , with . For any , consider the piecewise-constant approximation defined by averaging over each subinterval of length . Then, and . Thus,
(3) |
Let us now show that is maximized over when is a unit step function. Pick any . Set and . If is not a unit step function, then there exist such that and . Denote by the function obtained by changing the common value of , …, in to . The variance is a convex function of , since it can be expanded as with . Consequently, we have at one of the two endpoints of . Note that the corresponding staircase function has one fewer step than . Iterating as necessary, we conclude that for any there exists a unit step function such that . Therefore , which completes the proof. ∎
4 Stratified sampling
Consider now a stratified sampling estimator with strata:
(4) |
where the -th stratum is , , the weight is the length of the -th stratum, the allocation scheme is such that for all and , and the random variables are independent, with the s uniformly distributed in . Note that the sampling points are no longer identically distributed here. The estimator is unbiased, with variance
(5) |
Theorem 4.1.
For any , any choice of strata and any allocation scheme, the stratified sampling estimator (4) satisfies
(6) |
The maximal error is attained for a unit step function with a jump at the middle of , where . The minimal value of the maximal error (6) is , and is obtained with strata of equal lengths ( and for all ).
The optimal stratified sampling method can be seen as a one-dimensional special case of the Latin Hypercube Sampling (LHS) method (McKay et al., 1979). Novak (1992) relies on this method as a building block for the construction of a rate-optimal sequential method. (On a related note, McKay et al. (1979) prove that, in any dimension, the LHS method is preferable to the simple Monte Carlo method if the function is monotone in each of its arguments.)
Proof.
For all , let . For a given stratified sampling method with strata, for all , define
Then it follows from (5) and Popoviciu’s inequality that
(7) |
where the maximum is attained for a non-decreasing staircase function with jumps of height at the middle of the strata. Note that . Therefore, the right-hand side of (7) is upper-bounded by , which is indeed the value of the variance (5) when is a unit step function with a jump at the middle of the stratum where is the largest.
In order to prove the second part of the claim, observe first that any stratum such that can be further divided into sub-strata of equal lengths without increasing the upper bound. Considering then the case where and for all , the upper bound reduces to , which is minimal when since . ∎
5 Discussion
The stratified sampling (LHS) method provides the best-known variance upper bound over the class for an unbiased nonsequential method as soon as , but is outperformed by the control variate method of Section 3 when
. We do not know at the moment if these results are optimal in the class of unbiased nonsequential methods. (The ratio between the best variance upper bound and the lower bound of Corollary
2.2 is for , for and for .)Relaxing the unbiasedness requirement, it turns out that both methods are outperformed for all by the (deterministic) trapezoidal method discussed in the introduction, which has a worst-case squared error of . The ratio of worst-case mean-squared errors, however, is never very large—at most —and goes to when goes to infinity.
References
- Perrin et al. (2014) F. Perrin, F. Tenenhaus-Aziza, V. Michel, S. Miszczycha, N. Bel, and M. Sanaa. Quantitative risk assessment of haemolytic and uremic syndrome linked to O157:H7 and non-O157:H7 shiga-toxin producing escherichia coli strains in raw milk soft cheeses. Risk Analysis, 35(1):109–128, 2014.
- Kiefer (1957) J. Kiefer. Optimum sequential search and approximation methods under minimum regularity assumptions. Journal of the Society for Industrial and Applied Mathematics, 5(3):105–136, 1957.
- Novak (1992) E. Novak. Quadrature formulas for monotone functions. Proceedings of the American Mathematical Society, 115(1):59–68, 1992.
- McKay et al. (1979) M. D. McKay, R. J. Beckman, and Conover W. J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239–245, 1979.