We investigate the question of efficiently estimating nonlinear expectations on constrained sets, that is, quantities that can be expressed as
. An important special case is the context of estimating the probabilityassigned to the set by a Gaussian copula, obtained by setting in (1). Since belongs to the NORTA family, we assume that random variates from the specific distribution of can be generated rapidly (Cario and Nelson 1997, 1998) on a digital computer. Also, when we say the function and the set are “known," we mean that they are both expressed in analytical form and that their structural properties, such as the curvature at a given point, can be deduced with some effort. Our particular interest is an estimator for that is effective when the set is assigned a small measure, as might happen in the context of studying the occurrence of rare events and calculating associated expectations in physical systems modeled using Monte Carlo simulation.
The question of estimating an expectation on a “small" constrained set is well motivated (Kroese et al. 2011, Asmussen and Glynn 2007), with examples arising in diverse fields such as production systems (Glasserman and Liu 1996), epidemics modeling (Eubank et al. 2004, Dimitrov and Meyers 2010), reliability settings (Barlow and Proschan 1987, Leemis 2009), financial applications (McNeil et al. 2005, Glasserman 2004), and confidence set construction within statistics (DasGupta 2008, 2011). The problem setting we consider in this paper is specific in that in (1
) is assumed to be a NORTA random vector. We believe this special case is worthy of investigation since NORTA random vectors have recently become an important modeling paradigm(McNeil et al. 2005, Chapter 5) and, as we shall see, the knowledge that has a Gaussian copula can lead to highly efficient estimators of . Efficiency, as is usual, is considered here in a certain asymptotic sense, as the measure assigned to the set tends to zero.
1.1 Two Natural Estimators
An obvious consideration for estimating in (1) is the acceptance-rejection estimator, where independent and identically distributed (iid) copies of are generated and an estimator of is constructed using those random variates that fall within the set . To see why this estimator may not be efficient, consider estimating obtained by setting and in (1). (In what follows, we treat the quantity in (1) as a function of the parameter for reasons that will become clear.) The acceptance-rejection estimator is then given by . With some algebra, one can show that and that , giving the relative error
We see from (2) that the relative error as , and particularly that . (For two positive sequences converging to zero, we say to mean that .) Moreover, ifat an exponential rate (as ) suggesting that is a poor estimator of , especially for large values of .
A more sophisticated way of estimating is through exponential tilting or twisting (Glasserman 2004), where an estimator is obtained through importance sampling with a “shifted joint-normal" followed by an acceptance-rejection step. For the example considered above where , the exponential-twisting estimator
where has Gaussian density with mean and variance , and is the standard Gaussian density having mean zero and unit variance.
The estimator , like the estimator , is unbiased with respect to . Theorem 1 formally characterizes the asymptotic variances of and through a relative error calculation. A proof of Theorem 1 can be found in Appendix 6.
Let be the standard Gaussian random variable,
be the standard Gaussian random variable,, and . Then the following hold as .
with the minimum squared relative error
attained for the choice .
It is evident from Theorem 1 that if is chosen carefully, the estimator satisfies . While this suggests that is a much better estimator than , the fact remains that goes to as even in the one-dimensional context. By contrast, the estimator that we propose enjoys as ; in fact, we show that achieves bounded relative error for more general problems in an arbitrary (but finite) number of dimensions under certain conditions.
1.2 Summary and Key Insight
The central question underlying our investigation is whether there exist (Monte Carlo) estimators of whose relative error remains bounded as the set becomes rare in a certain sense. We answer in the affirmative but with some qualifications. We argue that highly efficient estimators of , particularly those with bounded relative error, can be constructed through the use of an appropriately shifted and scaled exponential importance sampling measure. The extent of such shifting and scaling, however, depends crucially on the following three structural properties of the set : (i) location of certain dominating points in defined (loosely) as the set of points that contribute maximally to the calculation of ; (ii) the local curvature of the set
at the dominating points; and (iii) the existence (or lack) of a supporting hyperplane to the setat the dominating points. Considering (i) – (iii), we propose three alternate estimators , , that become applicable depending on the setting. The applicability of the estimators , , is summarized in Table 1 and illustrated through Figure 1.
Which amongst the estimators , , is most appropriate for a given setting will be dictated by how much information we have about (i), (ii), and (iii). For instance, the estimator is the “full information" estimator designed for use in contexts where is Gaussian, and the set is well-behaved with known structural properties, that is, has a unique dominating point with identifiable local curvature that is encoded by certain structural constants, and has a supporting hyperplane at the dominating point. We argue later that the conditions on under which the full-information estimator becomes applicable are not onerous. Particularly, we show in Section 3.2.1 that for a large class of sets , the structural constants of
are identifiable through a linear program that is easily solved. We provide sharp asymptotics ofin Section 3.2, demonstrating that it enjoys bounded relative error leading to the remarkable numerical performance illustrated in Section 5.1.
The other two estimators and that we propose are “partial-information" estimators in that they are applicable in Gaussian settings where we have knowledge of the dominating point of but only limited to no curvature information. As we show in Section 3.3 where we provide the sharp asymptotics for and , such lack of full information hinders the optimal choice of importance-sampling parameters resulting in a loss of the bounded relative error property. The estimators and still achieve a weaker form of efficiency that we call polynomial efficiency.
For contexts where no information about (i) – (iii) is available, we first propose and analyze an (unimplementable) estimator that is obtained as a multinomial mixture of the partial-information estimator . The ecoNORTA algorithm that we propose then constructs a sequential form of by adaptively estimating the dominating points. ecoNORTA starts with an initial crude guess of the dominating points, and as random variates are generated during the estimation process, progressively updates the location of the dominating points within the estimator .
1.3 Paper Organization
The paper is organized into two main parts that appear in Section 3 and Section 4 respectively. Section 3 treats the Gaussian context in its entirety; we present the full-information estimator in Section 3.2 and the partial-information estimators in Section 3.3. Much of the theoretical machinery introduced in Section 3 is then co-opted into Section 4 where we treat the NORTA context. Section 5 provides numerical illustrations in the Gaussian and the NORTA contexts. In the ensuing section, we first introduce some important notions of asymptotic efficiency that will be used throughout the paper.
2 Asymptotic Efficiency: Notation and Definitions
As is usual in rare-event literature, we use the notion of relative error in assessing the efficiency of the estimators of . The relative error of the estimator (with respect to the quantity ) is given by
Since much of our analyses is in a “rare-event regime," we will assess by studying the behavior of as , where the latter limit is usually accomplished by sending a “rarity parameter" . Accordingly, we will be compelled to use the notation and to make explicit the dependence of and on the rarity parameter .
The following notions that quantify the asymptotic behavior of the relative error will be useful for assessing estimators.
(Bounded Relative Error) The estimator is said to exhibit bounded relative error (BRE) if
That an estimator has BRE means that its root mean squared error tends to zero at a rate that is commensurate with the rate at which the quantity it estimates tends to zero. Estimators exhibiting BRE are generally difficult to find in the Monte Carlo context; those with are especially difficult to find.
Considering the difficulty of finding estimators exhibiting BRE, a weaker form of efficiency called logarithmic efficiency has become popular.
(Logarithmic Efficiency) The estimator is said to exhibit logarithmic efficiency if Equivalently, logarithmic efficiency holds if
In this paper, we use a slightly more specific form of efficiency called polynomial efficiency to characterize the behavior of estimators that do not exhibit BRE.
An estimator is said to exhibit Polynomial(), efficiency if
It is clear that if is Polynomial efficient, then it is Polynomial efficient for all . Hence, we will generally seek the smallest such that a given estimator is Polynomial() efficient. Also, it can be shown with some algebra that estimators that exhibit Polynomial() efficiency are also logarithmically efficient as long as converges to zero “faster" than , that is, as for any . See Asmussen and Glynn (2007) for more on measures of efficiency in the rare event simulation context.
3 The Constrained Gaussian Context
In this section, we treat the special context of estimation on low-probability sets driven by the Gaussian measure, that is, the question of estimating when has a Gaussian distribution. As we shall see, the constrained Gaussian context is special in that knowledge of the local structure of the set at the so-called dominating point can be used fruitfully in constructing highly efficient estimators of . Accordingly, in this section, we propose and analyze three different estimators depending on the extent of such available information. We first reformulate the problem statment for ease of exposition.
3.1 Problem Reformulation
For clarity, and since we are in the constrained Gaussian setting, we specialize the notation introduced earlier to write
where the feasible set , and is a polynomial in . The first subscript refers to the “rarity parameter" and will be explained in greater detail in Section 3.1.2. It has been introduced into notation to make explicit the dependence of the feasible set on a parameter that will be sent to infinity in our asymptotic analyses.
3.1.1 Key Assumptions
is distributed as the standard Gaussian density
The “dominating point" exists, is unique, and known.
The “dominating point" is such that .
Assumption1(a) does not threaten generality — settings where has a multivariate Gaussian distribution with mean and positive-definite covariance matrix can be recast in the “standard Gaussian space" as the problem of estimating , where the lower-triangular matrix is such that , and the set .
Assumption1(b) holds often. For example, when the set is expressible through known convex constraints, that is, where the functions are convex, then is the solution to a convex optimization problem and can usually be identified simply. Even if one or more of the functions are not convex, could probably be identified, albeit with some effort, by solving a non-convex optimization problem. If is not expressed through the constraint functions but is instead expressed through a membership oracle, identifying could become a challenging proposition.
Assumption1(c), which stipulates that the second through the th coordinates of are zero, has been imposed for convenience and can be ensured through an appropriate rotation of the set . Specifically, suppose we wish to estimate where is a standard Gaussian random vector and suppose exists and is unique. Then, since the standard Gaussian distribution is spherically symmetric, we can transform the problem using an appropriate “rotation matrix" calculated such that satisfies . Such a rotation matrix always exists. The corresponding constraint set after such rotation becomes yielding the reformulated problem of needing to estimate . Furthermore, since the function is a polynomial, is also a polynomial and no generality is lost.
Considering the above discussion, reformulating the general Gaussian setting to satisfy Assumption1(b) and Assumption1(c) involves two steps in succession: (i) standardize the set to , and (ii) perform the rotation .
We emphasize that Assumption1 is a standing assumption in the Gaussian context, that is, all three estimators , , rely on it. By contrast, the following structural assumption is needed for constructing only and , and not .
The hyperplane is a supporting hyperplane to the set , that is, every point satisfies .
Assumption2 states that the hyperplane passing through the dominating point and normal to the line joining the origin and the point is such that the set is on one side of it. The spirit of Assumption2 is that the region of integration governing the calculation of is a “tail region" that is a subset of an appropriate half-space. To aid reader’s intuition, we note that Assumption1 and Assumption2 together imply that the sets that we consider in Gaussian context have a dominating point and a “vertical" supporting hyperplane to at .
3.1.2 Asymptotic Regimes and a Word on Notation
We will consider two types of asymptotic regimes when analyzing the effectiveness of a proposed estimator of . The first of these regimes, called the translation regime, refers to the sequence (in ) of sets obtained by “translating" a fixed set along the -axis. Formally, for a fixed set , we obtain the translation regime by defining and then considering the sequence of sets as .
The second asymptotic regime, called the scaling regime, refers to the sequence of sets obtained by “scaling" points in some fixed set using the scalar . Formally, for a fixed set , we obtain the scaling regime by defining and then considering the sequence of sets as .
Considering the reformulation due to Assumption1, under each regime the probability assigned to vanishes by simply sending . The translation and scaling regimes are depicted in Figure 2 where a fixed set is either translated or scaled to obtain the needed asymptotic regime.
The following comments are aimed at further clarifying notational issues.
The fixed set in the discussion above was introduced expressly for explaining the translation and scaling asymptotic regimes. We find no reason to refer to the set anywhere in the rest of the paper.
Throughout the paper the scalar will serve as the rarity parameter that will be sent to infinity. In the translation regime, turns out to be the first-coordinate of the unique dominating point. In the scaling regime, has no such physical meaning.
Unless mentioned explicitly, all analysis of the estimators we propose are performed in the translation regime. Particularly, all analysis in Section 3.2 and Section 3.3 is in the translation regime. We are forced to undertake analyses in the scaling regime in Section 4 in order to contend with the possibility of multiple dominating points.
3.2 The Full-Information Estimator
We now propose the full-information estimator for the constrained Gaussian context and in general high-dimensional space. As we shall see, knowledge of the local structure of the set at the dominating point is crucial for constructing efficient estimators of . Accordingly, our proposed estimator is a function of the local structure of the set around the point . Such local structure is encoded through the function in the following assumption, where quantifies the “cost-scaled content" of the set close to the point and “about" the line joining to the origin. When the cost function is identity, for instance, the function simply connotes the volume of the “cross-section" of the set when .
Let the cost function be a polynomial in , conveniently expressed as . Thus, is the term corresponding to the largest power of . Then, denoting and the cross-section set , the (d-1)-dimensional cost-scaled volume
satisfies the expansion as , and the constants are known. The argument in the constants and are often suppressed for convenience.
The assumption about the existence of a local polynomial expansion of is arguably mild; for example, it only precludes sets that are not “too sharp" around the point . The constants and appearing in the expansion of may or may not be easy to deduce depending on how the set is expressed. For example, when is expressed using constraint functions as and the functions are expressed analytically, the constants and can be identified easily. Such contexts seem typical and we provide a systematic way of identifying the structural constants for a large class of sets in Section 3.2.1.
Under Assusmption1, the quantity of interest takes the form
where are constants appearing in Assumption3, is the univariate standard Gaussian density, and the cross-section set . (In (6) and throughout the paper, we have chosen to omit the tedious repitition of the elemental -dimensional volume in the integral.) The above form of inspires our proposed estimator which takes the following simple form:
where is the shifted-exponential density function, is a random variable having the density , and the constants are from Assumption 3. (Theorem 3 will establish that the choice is optimal in the sense of minimizing the relative error of .)
as by Assumption3, and the inner integral in (6) is estimated using a shifted-exponential importance sampling measure along the first dimension. There appears to be no strong physical justification for the use of the shifted exponential family but an algebraic explanation is evident by observing the respective exponents of the Gaussian and the exponential densities. The choice of support for the importance sampling measure is dictated by information contained in Assumption 2.
Towards establishing the bounded relative error property of , we first present a result on the asymptotic expansion of a certain class of integrals that we will repeatedly encounter.
Let as . Then, for and as , the following hold.
Proof. Notice that
where the last line is obtained after the variable substitution . Now, since Lebesgue’s dominated convergence theorem (Billingsley 1995) assures us that
we see from (3.2) that This concludes the proof of part (i) of the theorem. The proof of part (ii) follows similarly.
As is usually done when analyzing estimators of the type , we next present a result that characterizes the rate at which the quantity of interest tends to zero in the asymptotic regime . This will be followed by a result that characterizes the behavior of .