 # Efficient Estimation in the Tails of Gaussian Copulas

We consider the question of efficient estimation in the tails of Gaussian copulas. Our special focus is estimating expectations over multi-dimensional constrained sets that have a small implied measure under the Gaussian copula. We propose three estimators, all of which rely on a simple idea: identify certain dominating point(s) of the feasible set, and appropriately shift and scale an exponential distribution for subsequent use within an importance sampling measure. As we show, the efficiency of such estimators depends crucially on the local structure of the feasible set around the dominating points. The first of our proposed estimators is the "full-information" estimator that actively exploits such local structure to achieve bounded relative error in Gaussian settings. The second and third estimators , are "partial-information" estimators, for use when complete information about the constraint set is not available, they do not exhibit bounded relative error but are shown to achieve polynomial efficiency. We provide sharp asymptotics for all three estimators. For the NORTA setting where no ready information about the dominating points or the feasible set structure is assumed, we construct a multinomial mixture of the partial-information estimator resulting in a fourth estimator with polynomial efficiency, and implementable through the ecoNORTA algorithm. Numerical results on various example problems are remarkable, and consistent with theory.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We investigate the question of efficiently estimating nonlinear expectations on constrained sets, that is, quantities that can be expressed as

 α:=E[h(X)I{X∈Sx}], (1)

where is a known polynomial, is a known constraint set, and has the NORTA (Gaussian copula) distribution (Nelson 2013, McNeil et al. 2005, Nelsen 2007)

. An important special case is the context of estimating the probability

assigned to the set by a Gaussian copula, obtained by setting in (1). Since belongs to the NORTA family, we assume that random variates from the specific distribution of can be generated rapidly (Cario and Nelson 1997, 1998) on a digital computer. Also, when we say the function and the set are “known," we mean that they are both expressed in analytical form and that their structural properties, such as the curvature at a given point, can be deduced with some effort. Our particular interest is an estimator for that is effective when the set is assigned a small measure, as might happen in the context of studying the occurrence of rare events and calculating associated expectations in physical systems modeled using Monte Carlo simulation.

The question of estimating an expectation on a “small" constrained set is well motivated (Kroese et al. 2011, Asmussen and Glynn 2007), with examples arising in diverse fields such as production systems (Glasserman and Liu 1996), epidemics modeling (Eubank et al. 2004, Dimitrov and Meyers 2010), reliability settings (Barlow and Proschan 1987, Leemis 2009), financial applications (McNeil et al. 2005, Glasserman 2004), and confidence set construction within statistics (DasGupta 2008, 2011). The problem setting we consider in this paper is specific in that in (1

) is assumed to be a NORTA random vector. We believe this special case is worthy of investigation since NORTA random vectors have recently become an important modeling paradigm

(McNeil et al. 2005, Chapter 5) and, as we shall see, the knowledge that has a Gaussian copula can lead to highly efficient estimators of . Efficiency, as is usual, is considered here in a certain asymptotic sense, as the measure assigned to the set tends to zero.

### 1.1 Two Natural Estimators

An obvious consideration for estimating in (1) is the acceptance-rejection estimator, where independent and identically distributed (iid) copies of are generated and an estimator of is constructed using those random variates that fall within the set . To see why this estimator may not be efficient, consider estimating obtained by setting and in (1). (In what follows, we treat the quantity in (1) as a function of the parameter for reasons that will become clear.) The acceptance-rejection estimator is then given by . With some algebra, one can show that and that , giving the relative error

 RE(^αAR(z∗1))=√Var(^αAR)E[^αAR]=√1−α(z∗1)α(z∗1). (2)

We see from (2) that the relative error as , and particularly that . (For two positive sequences converging to zero, we say to mean that .) Moreover, if

has the Gaussian distribution with zero mean and unit variance,

at an exponential rate (as ) suggesting that is a poor estimator of , especially for large values of .

A more sophisticated way of estimating is through exponential tilting or twisting (Glasserman 2004), where an estimator is obtained through importance sampling with a “shifted joint-normal" followed by an acceptance-rejection step. For the example considered above where , the exponential-twisting estimator

 ^αN(z∗1)=h(~X)I{~X∈Sx}(ϕ(~X)ϕ(~X;μ,σ2))=I{~X∈(z∗1,∞)}(ϕ(~X)ϕ(~X;μ,σ2)), (3)

where has Gaussian density with mean and variance , and is the standard Gaussian density having mean zero and unit variance.

The estimator , like the estimator , is unbiased with respect to . Theorem 1 formally characterizes the asymptotic variances of and through a relative error calculation. A proof of Theorem 1 can be found in Appendix 6.

###### Theorem 1.

Let

be the standard Gaussian random variable,

, and . Then the following hold as .

• with the minimum squared relative error

 infμα−2(z∗1)E[^α2N(z∗1)]∼z∗12

attained for the choice .

It is evident from Theorem 1 that if is chosen carefully, the estimator satisfies . While this suggests that is a much better estimator than , the fact remains that goes to as even in the one-dimensional context. By contrast, the estimator that we propose enjoys as ; in fact, we show that achieves bounded relative error for more general problems in an arbitrary (but finite) number of dimensions under certain conditions.

### 1.2 Summary and Key Insight

The central question underlying our investigation is whether there exist (Monte Carlo) estimators of whose relative error remains bounded as the set becomes rare in a certain sense. We answer in the affirmative but with some qualifications. We argue that highly efficient estimators of , particularly those with bounded relative error, can be constructed through the use of an appropriately shifted and scaled exponential importance sampling measure. The extent of such shifting and scaling, however, depends crucially on the following three structural properties of the set : (i) location of certain dominating points in defined (loosely) as the set of points that contribute maximally to the calculation of ; (ii) the local curvature of the set

at the dominating points; and (iii) the existence (or lack) of a supporting hyperplane to the set

at the dominating points. Considering (i) – (iii), we propose three alternate estimators , , that become applicable depending on the setting. The applicability of the estimators , , is summarized in Table 1 and illustrated through Figure 1. Figure 1: The figure illustrates three settings where the proposed estimators ^αO,^αE,^αL are applicable, respectively. The leftmost panel depicts a setting where the unique dominating point x∗ of SX, the local structure of SX around x∗, and a supporting hyperplane H at x∗ are known, thereby allowing the use of the full-information estimator ^αO. The center panel depicts a setting where the boundary (and the curvature) of the set SX is unknown but a dominating point x∗ and a hypeplane H at the dominating point x∗ are known. The rightmost panel illustrates a setting with multiple unknown dominating points and where there exists no supporting hyperplane to the set SX.

Which amongst the estimators , , is most appropriate for a given setting will be dictated by how much information we have about (i), (ii), and (iii). For instance, the estimator is the “full information" estimator designed for use in contexts where is Gaussian, and the set is well-behaved with known structural properties, that is, has a unique dominating point with identifiable local curvature that is encoded by certain structural constants, and has a supporting hyperplane at the dominating point. We argue later that the conditions on under which the full-information estimator becomes applicable are not onerous. Particularly, we show in Section 3.2.1 that for a large class of sets , the structural constants of

are identifiable through a linear program that is easily solved. We provide sharp asymptotics of

in Section 3.2, demonstrating that it enjoys bounded relative error leading to the remarkable numerical performance illustrated in Section 5.1. Table 1: We propose the three estimators ^αO, ^αE, and ^αL for use depending on available structural information. The estimator ^αO achieves bounded relative error, but relies on the explicit use of local structure of the feasible set. The estimators ^αE, and ^αL are more general but exhibit polynomial efficiency.

The other two estimators and that we propose are “partial-information" estimators in that they are applicable in Gaussian settings where we have knowledge of the dominating point of but only limited to no curvature information. As we show in Section 3.3 where we provide the sharp asymptotics for and , such lack of full information hinders the optimal choice of importance-sampling parameters resulting in a loss of the bounded relative error property. The estimators and still achieve a weaker form of efficiency that we call polynomial efficiency.

For contexts where no information about (i) – (iii) is available, we first propose and analyze an (unimplementable) estimator that is obtained as a multinomial mixture of the partial-information estimator . The ecoNORTA algorithm that we propose then constructs a sequential form of by adaptively estimating the dominating points. ecoNORTA starts with an initial crude guess of the dominating points, and as random variates are generated during the estimation process, progressively updates the location of the dominating points within the estimator .

### 1.3 Paper Organization

The paper is organized into two main parts that appear in Section 3 and Section 4 respectively. Section 3 treats the Gaussian context in its entirety; we present the full-information estimator in Section 3.2 and the partial-information estimators in Section 3.3. Much of the theoretical machinery introduced in Section 3 is then co-opted into Section 4 where we treat the NORTA context. Section 5 provides numerical illustrations in the Gaussian and the NORTA contexts. In the ensuing section, we first introduce some important notions of asymptotic efficiency that will be used throughout the paper.

## 2 Asymptotic Efficiency: Notation and Definitions

As is usual in rare-event literature, we use the notion of relative error in assessing the efficiency of the estimators of . The relative error of the estimator (with respect to the quantity ) is given by

 RE(^α):=√MSE(^α,α)α2;MSE(^α,α):=E[(^α−α)2]. (4)

Since much of our analyses is in a “rare-event regime," we will assess by studying the behavior of as , where the latter limit is usually accomplished by sending a “rarity parameter" . Accordingly, we will be compelled to use the notation and to make explicit the dependence of and on the rarity parameter .

The following notions that quantify the asymptotic behavior of the relative error will be useful for assessing estimators.

###### Definition 1.

(Bounded Relative Error) The estimator is said to exhibit bounded relative error (BRE) if

That an estimator has BRE means that its root mean squared error tends to zero at a rate that is commensurate with the rate at which the quantity it estimates tends to zero. Estimators exhibiting BRE are generally difficult to find in the Monte Carlo context; those with are especially difficult to find.

Considering the difficulty of finding estimators exhibiting BRE, a weaker form of efficiency called logarithmic efficiency has become popular.

###### Definition 2.

(Logarithmic Efficiency) The estimator is said to exhibit logarithmic efficiency if Equivalently, logarithmic efficiency holds if

 liminfz∗1→∞(logα2(z∗1))−1logMSE(^α(z∗1))≥1.

In this paper, we use a slightly more specific form of efficiency called polynomial efficiency to characterize the behavior of estimators that do not exhibit BRE.

###### Definition 3.

An estimator is said to exhibit Polynomial(), efficiency if

 limsupz∗1→∞z∗1−sRE(^α(z∗1))<∞.

It is clear that if is Polynomial efficient, then it is Polynomial efficient for all . Hence, we will generally seek the smallest such that a given estimator is Polynomial() efficient. Also, it can be shown with some algebra that estimators that exhibit Polynomial() efficiency are also logarithmically efficient as long as converges to zero “faster" than , that is, as for any . See Asmussen and Glynn (2007) for more on measures of efficiency in the rare event simulation context.

## 3 The Constrained Gaussian Context

In this section, we treat the special context of estimation on low-probability sets driven by the Gaussian measure, that is, the question of estimating when has a Gaussian distribution. As we shall see, the constrained Gaussian context is special in that knowledge of the local structure of the set at the so-called dominating point can be used fruitfully in constructing highly efficient estimators of . Accordingly, in this section, we propose and analyze three different estimators depending on the extent of such available information. We first reformulate the problem statment for ease of exposition.

### 3.1 Problem Reformulation

For clarity, and since we are in the constrained Gaussian setting, we specialize the notation introduced earlier to write

 α=E[h(Z)I{Z∈Sz∗1,z}], (5)

where the feasible set , and is a polynomial in . The first subscript refers to the “rarity parameter" and will be explained in greater detail in Section 3.1.2. It has been introduced into notation to make explicit the dependence of the feasible set on a parameter that will be sent to infinity in our asymptotic analyses.

#### 3.1.1 Key Assumptions

For the purposes of Section 3 alone, we assume that the random vector and in (5) are expressed in such a way that the following assumption holds.

###### Assumption 1.
1. is distributed as the standard Gaussian density

 ϕ(z)=(2π)−d/2exp{−12zTz},z∈IRd.
2. The “dominating point" exists, is unique, and known.

3. The “dominating point" is such that .

Assumption1(a) does not threaten generality — settings where has a multivariate Gaussian distribution with mean and positive-definite covariance matrix can be recast in the “standard Gaussian space" as the problem of estimating , where the lower-triangular matrix is such that , and the set .

Assumption1(b) holds often. For example, when the set is expressible through known convex constraints, that is, where the functions are convex, then is the solution to a convex optimization problem and can usually be identified simply. Even if one or more of the functions are not convex, could probably be identified, albeit with some effort, by solving a non-convex optimization problem. If is not expressed through the constraint functions but is instead expressed through a membership oracle, identifying could become a challenging proposition.

Assumption1(c), which stipulates that the second through the th coordinates of are zero, has been imposed for convenience and can be ensured through an appropriate rotation of the set . Specifically, suppose we wish to estimate where is a standard Gaussian random vector and suppose exists and is unique. Then, since the standard Gaussian distribution is spherically symmetric, we can transform the problem using an appropriate “rotation matrix" calculated such that satisfies . Such a rotation matrix always exists. The corresponding constraint set after such rotation becomes yielding the reformulated problem of needing to estimate . Furthermore, since the function is a polynomial, is also a polynomial and no generality is lost.

Considering the above discussion, reformulating the general Gaussian setting to satisfy Assumption1(b) and Assumption1(c) involves two steps in succession: (i) standardize the set to , and (ii) perform the rotation .

We emphasize that Assumption1 is a standing assumption in the Gaussian context, that is, all three estimators , , rely on it. By contrast, the following structural assumption is needed for constructing only and , and not .

###### Assumption 2.

The hyperplane is a supporting hyperplane to the set , that is, every point satisfies .

Assumption2 states that the hyperplane passing through the dominating point and normal to the line joining the origin and the point is such that the set is on one side of it. The spirit of Assumption2 is that the region of integration governing the calculation of is a “tail region" that is a subset of an appropriate half-space. To aid reader’s intuition, we note that Assumption1 and Assumption2 together imply that the sets that we consider in Gaussian context have a dominating point and a “vertical" supporting hyperplane to at .

#### 3.1.2 Asymptotic Regimes and a Word on Notation

We will consider two types of asymptotic regimes when analyzing the effectiveness of a proposed estimator of . The first of these regimes, called the translation regime, refers to the sequence (in ) of sets obtained by “translating" a fixed set along the -axis. Formally, for a fixed set , we obtain the translation regime by defining and then considering the sequence of sets as .

The second asymptotic regime, called the scaling regime, refers to the sequence of sets obtained by “scaling" points in some fixed set using the scalar . Formally, for a fixed set , we obtain the scaling regime by defining and then considering the sequence of sets as .

Considering the reformulation due to Assumption1, under each regime the probability assigned to vanishes by simply sending . The translation and scaling regimes are depicted in Figure 2 where a fixed set is either translated or scaled to obtain the needed asymptotic regime.

The following comments are aimed at further clarifying notational issues.

1. The fixed set in the discussion above was introduced expressly for explaining the translation and scaling asymptotic regimes. We find no reason to refer to the set anywhere in the rest of the paper.

2. Throughout the paper the scalar will serve as the rarity parameter that will be sent to infinity. In the translation regime, turns out to be the first-coordinate of the unique dominating point. In the scaling regime, has no such physical meaning.

3. Unless mentioned explicitly, all analysis of the estimators we propose are performed in the translation regime. Particularly, all analysis in Section 3.2 and Section 3.3 is in the translation regime. We are forced to undertake analyses in the scaling regime in Section 4 in order to contend with the possibility of multiple dominating points.

### 3.2 The Full-Information Estimator ^αO

We now propose the full-information estimator for the constrained Gaussian context and in general high-dimensional space. As we shall see, knowledge of the local structure of the set at the dominating point is crucial for constructing efficient estimators of . Accordingly, our proposed estimator is a function of the local structure of the set around the point . Such local structure is encoded through the function in the following assumption, where quantifies the “cost-scaled content" of the set close to the point and “about" the line joining to the origin. When the cost function is identity, for instance, the function simply connotes the volume of the “cross-section" of the set when .

###### Assumption 3.

Let the cost function be a polynomial in , conveniently expressed as . Thus, is the term corresponding to the largest power of . Then, denoting and the cross-section set , the (d-1)-dimensional cost-scaled volume

 vp(t)=∫Sz∗1,z(t)γpd∏j=2zcj(p)jdz2⋯dzd

satisfies the expansion as , and the constants are known. The argument in the constants and are often suppressed for convenience.

The assumption about the existence of a local polynomial expansion of is arguably mild; for example, it only precludes sets that are not “too sharp" around the point . The constants and appearing in the expansion of may or may not be easy to deduce depending on how the set is expressed. For example, when is expressed using constraint functions as and the functions are expressed analytically, the constants and can be identified easily. Such contexts seem typical and we provide a systematic way of identifying the structural constants for a large class of sets in Section 3.2.1.

Under Assusmption1, the quantity of interest takes the form

 α=∫∞z∗1ϕ(z1)∫Sz∗1,z(z1−z∗1)h(z)d∏i=2ϕ(zi), (6)

where are constants appearing in Assumption3, is the univariate standard Gaussian density, and the cross-section set . (In (6) and throughout the paper, we have chosen to omit the tedious repitition of the elemental -dimensional volume in the integral.) The above form of inspires our proposed estimator which takes the following simple form:

 ^αO=(12π)d−12ϕ(W)fλ(z∗1)(W)ηz∗1p(W−z∗1)s, (7)

where is the shifted-exponential density function, is a random variable having the density , and the constants are from Assumption 3. (Theorem 3 will establish that the choice is optimal in the sense of minimizing the relative error of .)

The basis of should be evident from the structure of in (6); the outer integral in (6) is approximated by the term after noting that

 ∫Sz∗1,z(z1−z∗1)d∏i=2ϕ(zi)=(2π)−(d−1)/2(η(z1−z∗1)s+o((z1−z∗1)s))

as by Assumption3, and the inner integral in (6) is estimated using a shifted-exponential importance sampling measure along the first dimension. There appears to be no strong physical justification for the use of the shifted exponential family but an algebraic explanation is evident by observing the respective exponents of the Gaussian and the exponential densities. The choice of support for the importance sampling measure is dictated by information contained in Assumption 2.

Towards establishing the bounded relative error property of , we first present a result on the asymptotic expansion of a certain class of integrals that we will repeatedly encounter.

###### Lemma 1.

Let as . Then, for and as , the following hold.

###### Proof.

Proof. Notice that

 I1(x∗) :=exp{−12x∗2}∫∞x∗xqexp{−12(x2−x∗2)}g(x−x∗)dx =exp{−12x∗2}∫∞0(x∗+t)qexp{−x∗t}exp{−12t2}g(t)dt =β0x∗q−1−βexp{−12x∗2}∫∞0(1+ux∗2)qexp{−u}exp{−12(ux∗)2}(uβ1+o(uβ1))du, (8)

where the last line is obtained after the variable substitution . Now, since Lebesgue’s dominated convergence theorem (Billingsley 1995) assures us that

 limx∗→∞∫∞0(1+ux∗2)qexp{−u}exp{−12(ux∗)2}(uβ1+o(uβ1))du=∫∞0exp{−u}uβ1du=Γ(β1+1),

we see from (3.2) that This concludes the proof of part (i) of the theorem. The proof of part (ii) follows similarly.

As is usually done when analyzing estimators of the type , we next present a result that characterizes the rate at which the quantity of interest tends to zero in the asymptotic regime . This will be followed by a result that characterizes the behavior of .

###### Theorem 2.

Let Assumptions13 hold. Recalling that , as ,

 α=∫∞z∗1ϕ(z1)∫Sz∗1,z(z1−z∗1)h(z)d∏i=2ϕ(zi)∼(12π)d/2Γ(s+1)ηz∗1p−1−sexp{−12z∗12}.
###### Proof.

Proof. Write

 α =∫∞z∗1ϕ(z1)∫Sz∗1,z(z1−z∗1)h(z)d∏i=2ϕ(zi), =(12π)d/2p∑i=1γi∫∞z∗1zi1exp{−12z21}∫Sz∗1,z(z1−z∗1)d∏j=2zcj(i)jexp{−12d∑j=2z2j} ∼(12π)d/2∫∞z∗1zp1exp{−12z21}∫Sz∗1,z(z1−z∗1)γpd∏j=2zcj(p)jexp{−12d∑j=2z2j} =(12π)d/2∫∞z∗1zp1exp{−12z21}~vp(z1−z∗1), (9)

where . The right-hand side of (3.2) is thus in a form that allows invoking part (i) of Lemma 1, if we can identify the expansion of around .

Towards identifying the expansion of around , we notice that

 ~vp(t