    # Is a Transformed Low Discrepancy Design Also Low Discrepancy?

Experimental designs intended to match arbitrary target distributions are typically constructed via a variable transformation of a uniform experimental design. The inverse distribution function is one such transformation. The discrepancy is a measure of how well the empirical distribution of any design matches its target distribution. This chapter addresses the question of whether a variable transformation of a low discrepancy uniform design yields a low discrepancy design for the desired target distribution. The answer depends on the two kernel functions used to define the respective discrepancies. If these kernels satisfy certain conditions, then the answer is yes. However, these conditions may be undesirable for practical reasons. In such a case, the transformation of a low discrepancy uniform design may yield a design with a large discrepancy. We illustrate how this may occur. We also suggest some remedies. One remedy is to ensure that the original uniform design has optimal one-dimensional projection, but this remedy works best if the design is dense, or in other words, the ratio of sample size divided by the dimension of the random variable is relatively large. Another remedy is to use the transformed design as the input to a coordinate-exchange algorithm that optimizes the desired discrepancy, and this works for both dense or sparse designs. The effectiveness of these two remedies is illustrated via simulation.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Professor Kai-Tai Fang and his collaborators have demonstrated the effectiveness of low discrepancy points as space filling designs FangHic07a; FangEtal19a; FanLiSud06; FanWan94. They have promoted discrepancy as a quality measure for statistical experimental designs to the statistics, science, and engineering communities FanMa01b; FanMaWin02; FanMuk00; FanMa01a.

Low discrepancy uniform designs, , are typically constructed so that their empirical distributions, , approximate

, the uniform distribution on the unit cube,

. The discrepancy measures the magnitude of . The uniform design is a commonly used space filling design for computer experiments FanLiSud06 and can be constructed using JMP® sall2012jmp.

When the target probability distribution for the design,

, defined over the experimental domain , is not the uniform distribution on the unit cube, then the desired design, , is typically constructed by transforming a low discrepancy uniform design, i.e.,

 X={xi}Ni=1={Ψ(ui)}Ni=1=Ψ(U),Ψ:(0,1)d→Ω. (1)

Note that may differ from because and/or is non-uniform. A natural transformation, , when has independent marginals, is the inverse distribution transformation:

 Ψj(uj)=F−1j(uj),j=1,…,d,where F(x)=F1(x1)⋯Fd(xd). (2)

A number of transformation methods for different distributions can be found in DEVROYE200683 and FanWan94*Chapter 1.

This chapter addresses the question of whether the design resulting from transformation (1) of a low discrepancy design, , is itself low discrepancy with respect to the target distribution . In other words,

 does small Funif−FU imply small F−FX? (Q)

We show that the answer may be yes or no, depending on how the question is understood. We discuss both cases. For illustrative purposes, we consider the situation where

is the standard multivariate normal distribution,

.

In the next section, we define the discrepancy and motivate it from three perspectives. In Section 3 we give a simple condition under which the answer to (Q) is yes. But, in Section 4 we show that under more practical assumptions the answer to (Q) is no. An example illustrates what can go wrong. Section 5 provides a coordinate exchange algorithm that improves the discrepancy of a candidate design. Simulation results illustrate the performance of this algorithm. We conclude with a brief discussion.

## 2 The Discrepancy

Experimental design theory based on discrepancy assumes an experimental region, , and a target probability distribution, , which is known a priori. We assume that has a probability density, . It is convenient to also work with measures, , defined on . If is a probability measure, then the associated probability distribution is given by . The Dirac measure, assigns unit measure to the set and zero measure to sets not containing . A design, , is a finite set of points with empirical distribution and empirical measure .

Our notation for discrepancy takes the form of

 D(FX,F,K), D(X,F,K), D(X,ϱ,K), D(X,ν,K), D(νX,ν,K), etc.,

all of which mean the same thing. The first argument always refers to the design, the second argument always refers to the target, and the third argument is a symmetric, positive definite kernel, which is explained below. We abuse the discrepancy notation because sometimes it is convenient to refer to the design as a set, , other times by its empirical distribution, , and other times by its empirical measure, . Likewise, sometimes it is convenient to refer the target as a probability measure, , other times by its distribution function, , and other times by its density function, .

In the remainder of this section we provide three interpretations of the discrepancy, summarized in Table 1. These results are presented in various places, including Hic99a; Hic17a. One interpretation of discrepancy is the norm of . The second and third interpretations consider the problem of evaluating the mean of a random variable , or equivalently a multidimensional integral

 μ=E(Y)=E[f(X)]=∫Ωf(x)ϱ(x)dx, (3)

where

is a random vector with density

. The second interpretation of the discrepancy is worst-case cubature error for integrands, , in the unit ball of a Hilbert space. The third interpretation is the root mean squared cubature error for integrands, , which are realizations of a stochastic processes.

### 2.1 Definition in Terms of a Norm on a Hilbert Space of Measures

Let be a Hilbert space of measures defined on the experimental region, . Assume that includes all Dirac measures. Define the kernel function in terms of inner products of Dirac measures:

 K(t,x):=⟨δt,δx⟩M,∀t,x∈Ω. (4)

The squared distance between two Dirac measures in is then

 ∥δx−δt∥2M=K(t,t)−2K(t,x)+K(x,x),∀t,x∈Ω. (5)

It is straightforward to show that is symmetric in its arguments and positive-definite, namely:

 K(x,t)=K(t,x)∀t,x∈Ω, (6a) N∑i,k=1cickK(xi,xk)>0,∀N∈N, c∈RN∖{0}, X⊂Ω. (6b)

The inner product of arbitrary measures can be expressed in terms of a double integral of the kernel, :

 ⟨λ,ν⟩M=∫Ω×ΩK(t,x)λ(dt)ν(dx). (7)

This can be established directly from (4) for , the vector space spanned by all Dirac measures. Letting be the closure of the pre-Hilbert space then yields (7).

The discrepancy of the design with respect to the target probability measure using the kernel can be defined as the norm of the difference between the target probability measure, , and the empirical probability measure for :

 D2(X,ν,K) :=∥ν−νX∥2M =∫Ω×ΩK(t,x)(ν−νX)(dt)(ν−νX)(dx) =∫Ω×ΩK(t,x)ν(dt)ν(dx)−2NN∑i=1∫ΩK(t,xi)ν(dt) +1N2N∑i,k=1K(xi,xk). (8a) The formula for the discrepancy may be written equivalently in terms of the probability distribution, F, or the probability density, ϱ, corresponding to the target probability measure, ν: D2(X,F,K) =∫Ω×ΩK(t,x)dF(t)dF(x)−2NN∑i=1∫ΩK(t,xi)dF(t) +1N2N∑i,k=1K(xi,xk), (8b) =∫Ω×ΩK(t,x)ϱ(t)ϱ(x)dtdx−2NN∑i=1∫ΩK(t,xi)ϱ(t)dt +1N2N∑i,k=1K(xi,xk). (8c)

Typically the computational cost of evaluating for any is , where is a -vector. Assuming that the integrals above can be evaluated at a cost of , the computational cost of evaluating is .

The formulas for the discrepancy in (8) depend inherently on the choice of the kernel . That choice is key to answering question (Q). An often used kernel is

 K(t,x)=d∏j=1[1+12(|tj|+|xj|−|xj−tj|)]. (9)

This kernel is plotted in Figure 1 for . The distance between two Dirac measures by (5) for this kernel in one dimension is

 ∥δx−δt∥M=√|x−t|.

The discrepancy for the uniform distribution on the unit cube defined in terms of the above kernel is expressed as

 D2(U,Funif,K) =∫(0,1)d×(0,1)dK(t,x)dtdx−2NN∑i=1∫(0,1)dK(t,ui)dt +1N2N∑i,k=1K(ui,uk) =(43)d−2NN∑i=1d∏j=1[1+uij−u2ij2] +1N2N∑i,k=1d∏j=1[1+min(uij,uik)].

### 2.2 Definition in Terms of a Deterministic Cubature Error Bound

Now let be a reproducing kernel Hilbert space (RKHS) of functions Aro50, , which appear as the integrand in (3). By definition, the reproducing kernel, , is the unique function defined on with the properties that for any and . This second property, implies that reproduces function values via the inner product. It can be verified that is symmetric in its arguments and positive definite as in (6).

The integral , which was identified as in (3), can be approximated by a sample mean:

 ^μ=1NN∑i=1f(xi). (10)

The quality of this approximation to the integral, i.e., this cubature, depends in part on how well the empirical distribution of the design, , matches the target distribution associated with the density function .

Define the cubature error as

 err(f,X) =μ−^μ=∫Ωf(x)ϱ(x)dx−1NN∑i=1f(xi) =∫Ωf(x)d[F(x)−FX(x)]. (11)

Under modest assumptions on the reproducing kernel, is a bounded, linear functional. By the Riesz representation theorem, there exists a unique representer, , such that

 err(f,X)=⟨ξ,f⟩H,∀f∈H.

The reproducing kernel allows us to write down an explicit formula for that representer, namely, . By the Cauchy-Schwarz inequality, there is a tight bound on the squared cubature error, namely

 |err(f,X)|2=⟨ξ,f⟩2H≤∥ξ∥2H∥f∥2H. (12)

The first term on the right describes the contribution made by the quality of the cubature rule, while the second term describes the contribution to the cubature error made by the nature of the integrand.

The square norm of the representer of the error functional is

 ∥ξ∥2H =⟨ξ,ξ⟩H=err(ξ,X)since ξ represents the error functional =err(err(K(⋅,⋅⋅),X),X)since ξ(x)=err(K(⋅,x),X) =∫Ω×ΩK(t,x)d[F(t)−FX(t)]d[F(x)−FX(x)].

We can equate this formula for with the formula for in (8). Thus, the tight, worst-case cubature error bound in (12) can be written in terms of the discrepancy as

 |err(f,X)|≤∥f∥HD(X,F,K).

This implies our second interpretation of the discrepancy in Table 1.

We now identify the RKHS for the kernel defined in (9). Let be some dimensional box containing the origin in the interior or on the boundary. For any , define , the mixed first-order partial derivative of with respect to the for , while setting for all . Here, , and denotes the cardinality of . By convention, . The inner product for the reproducing kernel defined in (9) is defined as

 ⟨f,g⟩H :=∑u⊆{1,...,d}∫(a,b)∂uf(xu)∂ug(xu)dxu (13) =f(0)g(0)+∫b1a1∂{1}f(x1)∂{1}g(x1)dx1 +∫b2a2∂{2}f(x2)∂{2}g(x2)dx2+⋯ +∫b2a2∫b1a1∂{1,2}f(x1,x2)∂{1,2}g(x1,x2)dx1dx2+⋯ +∫(a,b)∂{1,…,d}f(x)∂{1,…,d}g(x)dx.

To establish that the inner product defined above corresponds to the reproducing kernel defined in (9), we note that

 ∂uK((xu,0),t) =∏j∈u12[sign(xj)−sign(xj−tj)] =∏j∈usign(tj)1(min(0,tj),max(0,tj))(xj).

Thus, possesses sufficient regularity to have finite -norm. Furthermore, exhibits the reproducing property for the above inner product because

 ⟨K(⋅,t),f⟩H =∑u⊆{1,...,d}∫(a,b)∂uK((xu,0),t)∂uf(xu,0)dxu =∑u⊆{1,...,d}∫(a,b)∏j∈usign(tj)1(min(0,tj),max(0,tj))(xj)∂uf(xu,0)dxu =∑u⊆{1,...,d}∑v⊆u(−1)|u|−|v|f(tv,0)=f(t).

### 2.3 Definition in Terms of the Root Mean Squared Cubature Error

Assume is a measurable subset in and is the target probability distribution defined on as defined earlier. Now, let be a stochastic process with a constant pointwise mean, i.e.,

 Ef∈A[f(x)]=m,∀x∈Ω,

where is the sample space for this stochastic process. Now we interpret as the covariance kernel for the stocastic process:

 K(t,x):=Ef∈A([f(t)−m][f(x)−m])=cov(f(t),f(x)),∀t,x∈Ω.

It is straightforward to show that the kernel function is symmetric and positive definite.

Define the error functional in the same way as in (11). Now, the mean squared error is

 Ef∈A[(err(f,X)]2 =Ef∈A{∫Ωf(x)dF(x)−1NN∑i=1f(xi)}2 =Ef∈A{∫Ω(f(x)−m)dF(x)−1NN∑i=1(f(xi)−m)}2 =∫Ω2Ef∈A[(f(t)−m)(f(x)−m)]dF(t)dF(x) −2NN∑i=1∫ΩEf∈A[(f(x)−m)(f(xi)−m)]dF(x) +1N2N∑i,k=1Ef∈A[(f(xi)−m)(f(xk)−m)] =∫Ω2K(t,x)dF(t)dF(x)−2NN∑i=1∫ΩK(x,xi)dF(x) +1N2N∑i,k=1K(xi,xk).

Therefore, we can equate the discrepancy defined in (8) as the root mean squared error:

 D(X,F,K)=√Ef∈A[(err(f,X)]2= ⎷E∣∣ ∣∣∫Ωf(x)ϱ(x)dx−1NN∑i=1f(xi)∣∣ ∣∣2.

## 3 When a Transformed Low Discrepancy Design Also Has Low Discrepancy

Having motivated the definition of discrepancy in (8) from three perspectives, we now turn our attention to question (Q), namely, does a transformation of low discrepancy points with respect to the uniform distribution yield low discrepancy points with respect to the new target distribution. In this section, we show a positive result, yet recognize some qualifications.

Consider some symmetric, positive definite kernel, , some uniform design , some other domain, , some other target distribution, , and some transformation as defined in (1). Then the squared discrepancy of the uniform design can be expressed according to (8) as follows:

 D2(U,Funif,Kunif) =∫(0,1)d×(0,1)dKunif(u,v)dudv−2NN∑i=1∫ΩKunif(u,ui)du +1N2N∑i,k=1Kunif(ui,uk) =∫Ω×ΩKunif(Ψ−1(t),Ψ−1(x))∣∣ ∣∣∂Ψ−1(t)∂t∣∣ ∣∣∣∣ ∣∣∂Ψ−1(x)∂x∣∣ ∣∣dtdx −2NN∑i=1∫ΩKunif(Ψ−1(t),Ψ−1(xi))∣∣ ∣∣∂Ψ−1(t)∂t∣∣ ∣∣dt +1N2N∑i,k=1Kunif(Ψ−1(xi),Ψ−1(xk)) =D2(X,F,K)

where the kernel is defined as

 K(t,x)=Kunif(Ψ−1(t),Ψ−1(x)), (14a) and provided that the density, ϱ, corresponding to the target distribution, F, satisfies ϱ(x)=∣∣ ∣∣∂Ψ−1(x)∂x∣∣ ∣∣. (14b)

The above argument is summarized in the following theorem.

###### Theorem 3.1

Suppose that the design is constructed by transforming the design according to the transformation (1). Also suppose that conditions (14) are satisfied. Then has the same discrepancy with respect to the target distribution, , defined by the kernel as does the original design with respect to the uniform distribution and defined by the kernel . That is,

 D(X,F,K)=D(U,Funif,Kunif).

As a consequence, under conditions (14), question (Q) has a positive answer.

Condition (14b) may be easily satisfied. For example, it is automatically satisfied by the inverse cumulative distribution transform (2). Condition (14a) is simply a matter of definition of the kernel, , but this definition has consequences. From the perspective of Section 2.1, changing the kernel from to means changing the definition of the distance between two Dirac measures. From the perspective of Section 2.2, changing the kernel from to means changing the definition of the Hilbert space of integrands, , in (3). From the perspective of Section 2.3, changing the kernel from to means changing the definition of the covariance kernel for the integrands, , in (3).

To illustrate this point, consider a cousin of the kernel in (9), which places the reference point at , the center of the unit cube :

 Kunif(u,v) =d∏j=1[1+12(∣∣uj−1/2∣∣+∣∣vj−1/2∣∣−∣∣uj−vj∣∣)] (15) =K(u−0.5,v−0.5)for K defined in% (???).

This kernel defines the centered -discrepancy Hic97a. Consider the standard multivariate normal distribution, , and choose the inverse normal distribution,

 Ψ(u)=(Φ−1(u1),…,Φ−1(ud)), (16)

where denotes the standard normal distribution function. Then condition (14b) is automatically satisfied, and condition (14a) is satisfied by defining

 K(t,x) =Kunif(Ψ−1(t),Ψ−1(x)) =d∏j=1[1+12(∣∣Φ(tj)−1/2∣∣+∣∣Φ(xj)−1/2∣∣ −∣∣Φ(tj)−Φ(xj)∣∣)].

In one dimension, the distance between two Dirac measures defined using the kernel above is , whereas the distance defined using the kernel above is . Under kernel , the distance between two Dirac measures is bounded, even though the domain of the distribution is unbounded. Such an assumption may be unpalatable.

## 4 Do Transformed Low Discrepancy Points Have Low Discrepancy More Generally

The discussion above indicates that condition (14a) can be too restrictive. We would like to compare the discrepancies of designs under kernels that do not satisfy that restriction. In particular, we consider the centered -discrepancy for uniform designs on defined by the kernel in (15):

 D2(U,Funif,Kunif) =(1312)d−2NN∑i=1d∏j=1[1+12(|uij−1/2|−|uij−1/2|2)] +1N2N∑i,k=1d∏j=1[1+12(|uij−1/2|+|ukj−1/2|−|uij−ukj|)],

where again, denotes the uniform distribution on , and denotes a design on

Changing perspectives slightly, if denotes the uniform distribution on the cube of volume one centered at the origin, , and the design is constructed by subtracting from each point in the design :

 U′={u−0.5:u∈U}, (17)

then

 D(U′,F′unif,K)=D(U,Funif,Kunif),

where is the kernel defined in (9).

Recall that the origin is a special point in the definition of the inner product for the Hilbert space with as its reproducing kernel in (13). Therefore, this from (9) is appropriate for defining the discrepancy for target distributions centered at the origin, such as the standard normal distribution, . Such a discrepancy is

 D2(X,Fnormal,K)=(1+√2π)d −2NN∑i=1d∏j=1[1+1√2π+12|xij|−xij(Φ(xij)−12)−ϕ(xij)] +1N2N∑i,k=1d∏j=1[1+12(|xij|+|xkj|−|xij−xkj|)]. (18)

Here,

is the standard normal probability density function. The derivation of (

18) is given in the Appendix.

We numerically compare the discrepancy of a uniform design, given by (17) and the discrepancy of a design constructed by the inverse normal transformation, i.e., for in (16), where the leading to both and is identical. We do not expect the magnitudes of the discrepancies to be the same, but we ask

 Does D(U′1,F′unif,K)≤D(U′2,F′unif,K)imply D(Ψ(U1),Fnormal,K)≤D(Ψ(U2),Fnormal,K)? (19)

Again, is given by (9). So we are actually comparing discrepancies defined by the same kernels, but not kernels that satisfy (14a).

Let and . We generate independent and identically distributed (IID) uniform designs, with points on and then use the inverse distribution transformation to obtain IID random designs, . Figure 2 plots the discrepancies for normal designs, , against the discrepancies for the uniform designs, for each of the designs. Question (19) has a positive answer if and only if the lines passing through any two points on this plot all have non-negative slopes. However, that is not the case. Thus (19) has a negative answer.

We further investigate the relationship between the discrepancy of a uniform design and the discrepancy of the same design after inverse normal transformation. Varying the dimension from to , we calculate the sample correlation between and for IID designs of size . Figure 3 displays the correlation as a function of . Although the correlation is positive, it degrades with increasing . Figure 3: Correlation between the uniform and normal discrepancies for different dimensions.
###### Example 1

A simple cubature example illustrates that an inverse transformed low discrepancy design, , may yield a large and also a large cubature error. Consider the integration problem in (3) with

 X∼N(0,Id),f(x)=x21+⋯+x2d1+10−8(x21+⋯+x2d),Y=f(X), (20a) μ=E(Y)=∫Rdx21+⋯+x2d1+10−8(x21+⋯+x2d)ϕ(x)dx, (20b)

where is the probability density function for the standard multivariate normal distribution. The function is constructed to asymptote to a constant as tends to infinity to ensure that lies inside the Hilbert space corresponding to the kernel defined in (9). Since the integrand in (20) is a function of , can be written as a one dimensional integral. For , to at least significant digits using quadrature.

We can also approximate the integral in (20) using a , cubature (10). We compare cubatures using two designs. The design is the inverse normal transformation of a scrambled Sobol’ sequence, , which has a low discrepancy with respect to the uniform distribution on the -dimensional unit cube. The design takes the point in that is closet to and moves it to , which is very close to . As seen in Table 2, the two uniform designs have quite similar, small discrepancies. However, the transformed designs, for , have much different discrepancies with respect to the normal distribution. This is due to the point in that has large negative coordinates. Furthermore, the cubatures, , based on these two designs have significantly different errors. The first design has both a smaller discrepancy and a smaller cubature error than the second. This could not have been inferred by looking at the discrepancies of the original uniform designs.

## 5 Improvement by the Coordinate-Exchange Method

In this section, we propose an efficient algorithm that improves a design’s quality in terms of the discrepancy for the target distribution. We start with a low discrepancy uniform design, such as a Sobol’ sequence, and transform it into a design that approximates the target distribution. Following the optimal design approach, we then apply a coordinate-exchange algorithm to further improve the discrepancy of the design.

The coordinate-exchange algorithm was introduced in meyer1995coordinate, and then applied widely to construct various kinds of optimal designs sambo2014coordinate; overstall2017bayesian; kang2018stochastic. The coordinate-exchange algorithm is an iterative method. It finds the “worst” coordinate

of the current design and replaces it to decrease loss function, in this case, the discrepancy. The most appealing advantage of the coordinate-exchange algorithm is that at each step one need only solve a univariate optimization problem.

First, we define the point deletion function, , as the change in square discrepancy resulting from removing the a point from the design:

 dp(i)=D2(X)−(N−1N)2D2(X∖{xi}). (21)

Here, the design is the point design with the point removed. We suppress the choice of target distribution and kernel in the above discrepancy notation for simplicity. We then choose

 i∗=argmaxi=1,…,Ndp(i).

The definition of means that removing from the design results in the smallest discrepancy among all possible deletions. Thus, is helping the least, which makes it a prime candidate for modification.

Next, we define a coordinate deletion function, , as the change in the square discrepancy resulting from removing a coordinate in our calculation of the discrepancy:

 dc(j)=D2(X)−D2(X−j). (22)

Here, the design still has points but now only dimensions, the coordinate having been removed. For this calculation to be feasible, the target distribution must have independent marginals. Also, the kernel must be of product form. To simplify the derivation, we assume a somewhat stronger condition, namely that the marginals are identical and that each term in the product defining the kernel is the same for every coordinate:

 Ω=˜Ω×⋯×˜Ω,K(t,x)=d∏j=1[1+˜K(tj,xj)],˜K:˜Ω×˜Ω→R. (23)

We then choose

 j∗=argmaxj=1,…,ddc(j).

For reasons analogous to those given above, the