    # Monte Carlo integration of non-differentiable functions on [0,1]^ι, ι=1,...,d, using a single determinantal point pattern defined on [0,1]^d

This paper concerns the use of a particular class of determinantal point processes (DPP), a class of repulsive spatial point processes, for Monte Carlo integration. Let d> 1, I⊆ d={1,...,d} with ι=|I|. Using a single set of N quadrature points {u_1,...,u_N} defined, once for all, in dimension d from the realization of the DPP model, we investigate "minimal" assumptions on the integrand in order to obtain unbiased Monte Carlo estimates of μ(f_I)=∫_[0,1]^ι f_I(u) d u for any known ι-dimensional integrable function on [0,1]^ι. In particular, we show that the resulting estimator has variance with order N^-1-(2s∧ 1)/d when the integrand belongs to some Sobolev space with regularity s > 0. When s>1/2 (which includes a large class of non-differentiable functions), the variance is asymptotically explicit and the estimator is shown to satisfy a Central Limit Theorem.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Introduction

In the context of computer experiments (see for example [Santneretal13, Chapter 5]), complex phenomena are simulated using a mathematical model to replace the process which generates the data. Usually, the model depends on a large number of parameters (inputs). An objective of the experiments is to quantify the influence of the variability of the inputs on the variable of interest. An experiment consists in running simulations, where each simulation represents a possible combination of the inputs. It is impossible in practice to consider all possible configurations, the number of simulations being limited. Therefore, the design of experiments, i.e. the choice of combinations of inputs, is of great importance. Under a lack of information on how inputs are linked to outputs, a strategy is to spread chosen inputs to cover as much as possible all the input space. This technique is called space-filling design. It can be summarized by generating  points in a given space which regularly cover this space. Latin hypercubes [McKayetal79, Owen92], low discrepancy sequences (see e.g. [Halton64, Sobol67]) are standard methods to generate designs. The goal of computer experiments is not only to examine the influence of all the inputs on an output of interest, but also the influence of a subset of these inputs, or also the influence of particular combination of subsets of these inputs. Since computer experiments may be very expensive in terms of computation load and/or storage capacity, the regularity of the coverage of the designs should be conserved when the initial configuration is projected onto lower dimensional spaces. This would allow to use the initial configuration to study the influence of subsets of inputs for example with the same efficiency.

We investigate in this paper a similar problem in the context of Monte Carlo integration. Let , with , we aim to estimate, under “minimal” assumptions, the integral for any known -dimensional integrable function on using a single set of quadrature points defined once for all in dimension . We insist on the fact that the same set of nodes is used to estimate different integrals. Hence, this set does not exploit the form of , the locations of its possible singularities, etc.

Monte Carlo integration has a long history and it is not the aim of this paper to make a detailed bibliography. We refer the interested reader by an extensive treatment and bibliography to the electronic book [owen:book:13]. Let us however cite a few methods keeping in mind what we mean by “minimal” assumptions in the situation . Ordinary Monte Carlo methods and importance sampling methods (see e.g. [RobertCasella04]) consist in using i.i.d. nodes with a so-called proposal density (the uniform density in the usual situation). Under some type assumption on the integrand, the resulting estimator denoted by has a variance proportional to and satisfies a Central Limit Theorem. When

is large, Monte Carlo Markov Chains (MCMC) methods, where the set of quadrature points is the realization of a particular Markov chain, are usually preferred. When

, the variance of is still of order and satisfies a CLT (see e.g. [douc:moulines:stoffer:14]). To improve the rate of convergence, the price to pay is to require some regularity assumptions on . Many methods exist in the literature: grid-based stratified methods [haber:66], possibly combined with antithetic sampling (see [owen:book:13, Chapter 10]), Quasi Monte Carlo and randomized versions, scrambled nets [dick:etal:13, owen:97, owen:08, owen:book:13, basu:mukherjee:17], etc. For example, a version of scrambled nets with antithetic sampling can lead to an estimator with variance if, to simplify Owen’s assumption [owen:08], is times continuously differentiable on . Additional assumptions on the scrambled net are required to obtain a CLT. Grid-based stratified methods which are maybe the first simple alternative to ordinary Monte Carlo methods require that is continuously differentiable on and yield an estimator satisfying a CLT with variance asymptotically proportional to . Let us also mention that [owen:97] showed that a version of scrambled net has a variance under the sole assumption that but the rate is not explicit until strong regularity assumptions are made on .

In a recent work, another alternative by defining the nodes as the realization of a repulsive point pattern, and in particular a determinantal point pattern, has been proposed in [BardenetHardy20] . The class of determinantal point processes (DPPs for short) has received a growing attention in the last decades (see e.g. [Soshnikov00, ShiraiTakahashi03, Houghetal09, Lavancieretal15_1, Decreusefondetal16]), thanks to its very appealing properties in particular in terms of tractability and exact simulation. [BardenetHardy20] have defined an Orthogonal Polynomial Ensemble, which is a particular inhomogeneous DPP whose kernel is defined through orthonormal polynomials. Under the assumption that is continuously differentiable and compactly supported in the authors obtained an estimator with variance equivalent to an explicit constant times .

In this paper, we investigate a different DPP model. To be more explicit, we consider the most natural kernel, called the Dirichelt kernel in this paper, which is based on the Fourier decomposition of a rectangular subset of indices of . It is a projection DPP which produces almost surely points. It has the advantage to lead to a homogeneous DPP pattern, an interesting characteristic as we want the pattern to be used to estimate any integral. A second advantage is that the marginals are fully characterized and explicit which means that marginals can efficiently be used to estimate for any . Last but not least, our main result Theorem 3.1, shows that the resulting estimator has asymptotic variance proportional to for any where is some Sobolev space with regularity , see (3.4) for more details. We remind that for periodic functions and if is periodic and continuously differentiable then . In particular, our result states that when (thus potentially for non-differentiable functions), the variance is asymptotically equivalent to an explicit constant times . In this case, we also obtain a central limit theorem for the estimator (assuming in addition that the integrand is bounded). As a summary, the estimator proposed has the characteristic to exhibit a variance that decreases faster than the ordinary Monte Carlo as soon as . The decay is slower than methods such as grid-based methods, scrambled nets, etc, but require much less regularity assumptions and can be applied to any -dimensional function, .

The paper is organized as follows. Section 1 contains a background on spatial point processes, generalities on the projection of spatial point processes and DPPs. We also outline the interest of repulsive point processes and in particular DPPs for Monte Carlo integration. Section 2 introduces the Dirichlet DPP and exposes some of its properties. Our main result, Theorem 3.1, is presented in Section 3. It details convergence results for Monte Carlo integration based on the realization of a Dirichlet DPP. A multivariate version of this CLT is also proposed. Section 4 discusses computational aspects for the simulation of Dirichlet DPPs and contains a simulation study which illustrates our results. Finally, proofs of the results are postponed to Appendix B.

## 1 Background and notation

### 1.1 Spatial point processes

A spatial point process  defined on a Borel set  is a locally finite measure on , see for example [MollerWaagepeterson04] and references therein for measure theoretical details, whose realization is of the form where

is the realization of a random variable and the

’s represent the events. We assume that is simple meaning that two events cannot occur at the same location. Thus, is viewed as a locally finite random set.

In most cases, the distribution of a point process can be described by its intensity functions , . By Campbell theorem, see e.g. [MollerWaagepeterson04], is characterized by the following integral representation: for any non-negative measurable function

 E[ ≠∑x(1),…,x(k)∈Xh(x(1),…,x(k))] (1.1) =∫Bkρ(k)X(x(1),…,x(k))h(x(1),…,x(k))dx(1)…dx(k)

where  over the summation means that  are pairwise distinct points. Intuitively, for any pairwise distinct points ,

is the probability that

has a point in each of the infinitesimally small sets around  with volumes , respectively. When , this yields the intensity function simply denoted by . The second order intensity  is used to define the pair correlation function

 gX(x(1),x(2))=ρ(2)X(x(1),x(2))ρX(x(1))ρX(x(2)) (1.2)

for pairwise distinct , and where is set to 0 if  or  is zero. By convention, is set to 0 if  for some . Therefore  is also set to 0 for all  by convention. The pair correlation function (pcf for short) can be used to determine the local interaction between points of  located at and characterizes positive correlation between the points;  means there is no interaction (typically a Poisson point process);  characterizes negative correlations. A point pattern is often referred to as a repulsive point process, if for any (see e.g. [Illianetal08, Section 6.5]). Finally, a point process with constant intensity function on is said to be homogeneous.

### 1.2 Projection of a spatial point process

In this work, we sometimes consider projection of spatial point processes. By projection, we mean that we keep a given number of coordinates from the original spatial point process. Such a framework requires that the original point process must be defined on a compact set : otherwise, the configuration of points of the projected point processes may not form a locally finite configuration, as also noticed in the two-dimensional case in [Baddeley07, p. 17].

This section presents a few notation in this context. Let with cardinality . Let compact sets of and , denote by  its orthogonal projection onto . In particular with . We denote by  the orthogonal projection of  onto . To ease the reading, we let for and even to fix ideas let . Thus, . For any , we let and for a point process defined on , the projected point process  is then defined on . Intensity functions and Laplace functionals for can be derived from the corresponding functions and functionals from , see [Mazoyeretal19] for more details and Section 2.2 for the particular model considered in this paper.

### 1.3 Determinantal point processes

In this section, the class of continuous DPPs is introduced. Again, we restrict our attention to DPPs defined on a compact set . A point process  on  is said to be a DPP on with kernel if for any its th order intensity function is given by

 ρ(k)X(x(1),…,x(k))=det[K(x(i),x(j))]ki,j=1 (1.3)

and we simply denote by . Note that needs to be non-negative definite to ensure . Our results rely on the spectral decomposition of , see (1.5). Therefore, we assume that is a continuous covariance function. The intensity of is given by and its pcf by

 gX(x,y)=1−|K(x,y)|2K(x,x)K(y,y). (1.4)

The popularity of DPPs relies mainly upon (1.3)-(1.4

): all moments of

are explicit and since is Hermitian, for any . The kernel defines an integral operator (see e.g. [DebnathMikusinski05]) defined for any by

 K(f)(x)=∫BK(x,y)f(y)dy,x∈B.

From Mercer’s Theorem [RieszSznagy90, Sec. 98], admits the following spectral decomposition for any

 K(x,y)=∑j∈Nλjϕj(x)¯¯¯¯¯¯¯¯¯¯¯¯ϕj(y) (1.5)

where

are eigenfunctions associated to

and form an orthonormal basis of , and where

are the eigenvalues of

satisfying for any . We abuse notation in the sequel and refer ’s to as the eigenvalues of .

The existence of a DPP on with kernel is ensured if its eigenvalues satisfy for any , see e.g. [Houghetal09, Theorem 4.5.5.]. Eigenvalues and eigenfunctions are indexed here by in (1.5), but other countable sets could be considered. In particular, the -dimensional Fourier basis is indexed by . A DPP is said to be homogeneous if is the restriction on of a kernel defined on which satisfies for any where is the origin in . In that case, we will refer to as a stationary kernel and will use the abusive notation .

A kernel such that for is called a “projection kernel” and the corresponding DPP a “projection DPP”. The number of points in of such a model is almost surely constant and equal to the number of non-zero eigenvalues of (see e.g. [Lavancieretal15_1]).

### 1.4 Why are DPPs interesting for Monte Carlo integration?

The repulsive nature of DPPs can be exploited to generate quadrature points that explore nicely the input space. To see this, let be a bounded set, and a homogeneous point process on with intensity parameter and pair correlation function (a similar result would hold in the inhomogeneous case). Campbell’s Theorem (1.1) ensures that the estimator

 ˆμ(f)=1ρY∑u∈Yf(u) (1.6)

is an unbiased estimator of

with variance

 Var[ˆμ(f)] =1ρY∫Bf(u)2du+∫B2(gY(u,v)−1)f(u)f(v)dudv. (1.7)

If is non-negative (or non-positive), Equation (1.7) suggests that using a point processes satisfying makes the variance smaller than the first term which turns out to be the variance under the Poisson case. It is worth noting that the use of a DPP for this task does not require any sign assumption for . Indeed, given the fact that and from Mercer’s decomposition (1.5), we obtain

 Var[ˆμ(f)]=1ρY∫Bf(u)2du−1ρ2Y∑j,k∈Nλjλk∣∣∣∫Bf(u)ϕj(u)¯¯¯¯¯¯ϕk(u)du∣∣∣2 (1.8)

and so the second term is always negative.

The use of a general DPP does not seem to be of great interest. First, the number of points is random and thinking as a number of points, we claim that the rate of convergence remains the same as in the independent case.

Therefore it seems natural to focus on the subclass of projections DPPs, i.e. a class for which the number of points is almost surely constant. An approach using an ad-hoc Orthogonal Polynomial Ensemble with points has been proposed in [BardenetHardy20]. This class is a particular inhomogeneous projection DPP on . As already outlined in the introduction, under the assumption that and that is compactly supported on some bounded , it is proved that a central limit theorem holds for the integral estimator with variance decreasing as . We propose a similar approach, based on a realization of an -Dirichlet DPP . The -Dirichlet DPP, detailed in the next section, is a projection DPP based on the Fourier basis. Unlike, the model proposed by [BardenetHardy20], this DPP has the advantage to be homogeneous and its projections for are fully characterized (see Section 2.2). Finally, an advantage of our approach is that we do not require that is continuously differentiable. We only assume that belongs to some Sobolev space with low regularity parameter, see Section 3 and in particular Theorem 3.1 for more details.

## 2 The (N,d)-Dirichlet DPP and its projections

### 2.1 The (N,d)-Dirichlet DPP

Let us consider the Fourier basis in defined for any , by

. Given a vector of

positive integers , we construct the following kernel

 K(x,y)=∑j∈ENϕ(d)j(x)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ϕ(d)j(y)=∑j∈ENe2iπj⊤(x−y) (2.1)

where

 EN=E1×⋯×Edand for i=1…d,Ei={0,1,…,ni−1}. (2.2)

Thus, is the rectangular subset of with cardinality which identifies eigenvalues that are all equal to 1. Due to the invariance by translation of the Fourier basis, the kernel (2.1) is a homogeneous kernel. This construction implies that for any , where the ’s are one-dimensional stationary kernels defined for any by . We point out that defining as a block of successive frequences (e.g. frequences centered around 0), would lead to the same DPP [Houghetal09, Remark 4, p.48]. In particular, if

is odd, we could consider

, which leads to the standard Dirichlet kernel, see e.g. [Zygmund02]. This justifies the name Dirichlet DPP for this model.

Such a DPP, which produces almost surely points in , will be referred to as an -Dirichlet DPP. We could wonder why we impose to be a rectangular subset of instead of, for instance, the graded lexicographic order used in [BardenetHardy20]. As seen in the next section, the rectangular nature of allows us to characterize the distribution of for any .

Let us add that, when , the kernel corresponds to the Fourier approximation [Lavancieretal15_1] of the one-dimensional sine-kernel

, which takes its origins in the joint distribution of the eigenvalues (called the Weyl measure) of a unitary matrix. Asymptotic results involving one-dimensional linear functionals from the sine-kernel appear in several papers, see e.g.

[soshnikov:00]. The present paper provides therefore an extension to the -dimensional case and a more thorough treatment of the statistical application to Monte Carlo integration.

### 2.2 Projections of an (N,d)-Dirichlet kernel

An -Dirichlet kernel (2.1) can be written as the product of one-dimensional kernels. More precisely, for any , by denoting and , the -Dirichlet kernel can always be written as

 K(x−y)=KI(xI−yI)KIc(xIc−yIc) (2.3)

where (resp. ) is the -Dirichlet kernel (resp. -Dirichlet kernel). Projected point processes from models with kernels satisfying (2.3) have been studied and characterized in [Mazoyeretal19]. In particular, for an -Dirichlet DPP we have the following result.

###### Proposition 2.1.

Let be an -Dirichlet DPP on , let , then is an - on with kernel , i.e. -. In particular, has (obviously) points and -th order intensity

 ρXI(x(1),…,x(k))=det−1/NIc[NIcKI(x(i),x(j))]ki,j=1

for any pairwise distinct . Its pcf is therefore given, for any pairwise distinct , by

 gXI(x,y)=1−|KI(x,y)|2NNI. (2.4)

In the above result, the notation stands for an determinant, see e.g. [ShiraiTakahashi03] for details on such quantities and for general properties of -DPPs. Proposition 2.1 therefore proposes a full characterization of the distribution of . Its main consequence (and interest for the present paper), as seen from (2.4), is that the pcf of is bounded by 1 and therefore, for any , remains in the class of repulsive point patterns.

## 3 Numerical integration with Dirichlet kernel

### 3.1 Objective

In this section, we study the use of specific DPPs for Monte Carlo integration. To this end, we use notation introduced in Section 1.2. Our objective is to estimate any -dimensional integral, for , using a Monte Carlo approach and using the same quadrature points. More precisely, let , with cardinality and let be a measurable function on , such that . More assumptions on will be given later. We intend to estimate

 μ(fI)=∫BIfI(u)du

using the projection onto of an -Dirichlet kernel on . In particular, we estimate by

 ˆμN(fI)=1N∑u∈XIfI(u)=1NN∑j=1fI((uj)I) (3.1)

where is an -Dirichlet DPP on and where we remind the notation for any . In the following, we study asymptotic properties for .

In this paper, we have chosen to focus on integrals on for simplicity. It can straightforwardly be extended to rectangles. Indeed, let , such that , , and . An estimate of where now is simply given by

 ˆμN(fI)=∏i∈I(bi−ai){1NN∑i=1fI(aI+(bI−aI)(uj)I)}. (3.2)

We introduce two additional fundamental pieces of notation induced by the choice of the Fourier basis and used in our results. Let . The notation for stands for the th Fourier coefficient, i.e.

 ˆfI(j)=∫[0,1]ιfI(u)e−2iπj⊤udu. (3.3)

Finally, we define the space as the isotropic (with respect to the sup norm) Sobolev space with index of squared integrable periodic functions by

 Hs(BI)=⎧⎨⎩fI∈L2per(BI),:∑j∈Zι(1+∥j∥∞)2s|ˆf(j)|2<∞⎫⎬⎭ (3.4)

where is the set of squared integrable periodic functions. By periodic, we mean that all one-dimensional marginals of say , are such that . To remind some known facts about , let us define the following spaces

 C1per(BI) ={fIperiodic,fI∈C1(BI)} ˜H3/2+ε(BI) ={fI∈L2per(BI):ˆfI(j)=O(∥j∥−2−ε∞)},ε>0.

Of course, and are very similar. Now, we remind that for any

 ˜H3/2+ε(BI)⊂C1per(BI)⊂H1(BI). (3.5)

Our main results expressed by Theorem 3.1 and Corollary 3.1, have the interest to focus on functions with small regularity parameter , thus, according to (3.5), potentially to non-differentiable functions. In particular, we show that a CLT can be expected as soon as .

Finally, let us justify why we focus on periodic functions on . For a non periodic function which is at least twice continuously differentiable it is a known fact that . So, the summability condition in (3.4) for such a smooth function would be fulfilled only for , which constitutes a situation of no interest.

### 3.2 Case ι=d

We first consider the -dimensional case. Let with . The first result shows how the variance given in (1.8) relates to the Fourier coefficients of in the case of an -Dirichlet DPP.

###### Proposition 3.1.

Let be an -Dirichlet DPP and . For any , we let denote the th Fourier coefficient of . Then, given by (3.1) is an unbiased estimator of with variance given by

 Var[ˆμN(f)] =1N∑j∈Zd∣∣ˆf(j)∣∣2−1N2∑j,k∈EN∣∣ˆf(j−k)∣∣2 (3.6) =1N∑j∈Zd∣∣ˆf(j)∣∣2−1N2∑j∈FN[d∏i=1(ni−|ji|)]∣∣ˆf(j)∣∣2 (3.7)

where .

This simple form of the variance of invites us to study its asymptotic behavior as . Given the fact that , we require a specific asymptotic. For , we assume that and are integer sequences indexed by some . We assume that each sequence tends to as and that there exist for such that

 limν→∞ni,νN−1/dν=κi. (3.8)

For the sake of conciseness, we skip the dependence in . Similarly, when we write , we implicitly assume (3.8).

We can now obtain the following asymptotic behavior of the variance of and for some values of a central limit theorem. Regarding this asymptotic normality, when , as mentioned in Section 2, corresponds, up to a normalization, to the joint distribution of the eigenvalues of a unitary matrix. Linear statistics for such a DPP have been deeply studied in the literature, see e.g. [soshnikov:00] and the references therein. When , Theorem 3.1 (iii) is therefore original, and relies upon [soshnikov:02, Theorem 1].

###### Theorem 3.1.

Consider the asymptotic framework (3.8) and assume that . Then, we have the following statements.

(i) If , then as

 Var[ˆμN(f)]=O(N−1−2sd).

(ii) If for or for , then

 limN→∞N1+1/dVar[ˆμN(f)]=σ2(f)=∑j∈Zd(d∑i=1|ji|κi)∣∣ˆf(j)∣∣2 (3.9)

where is given by (3.3).

(iii) If in addition, , then as ,

 √N1+1/d(ˆμN(f)−μ(f))→N(0,σ2(f)) (3.10)

in distribution.

Let us rephrase Theorem 3.1 (i): if for some , then necessarily . That is, the variance of the estimator proposed decreases to 0 faster than the standard Monte Carlo estimator, as soon as for some , which is a very weak assumption.

Theorem 3.1 (ii) shows that the variance can be consistently estimated by given by

 ˆσ2N(f)=N1/d∑j∈FN(d∑i=1|ji|ni)∣∣ˆf(j)∣∣2 (3.11)

which has the interest to avoid the constants . Hence, using Slutsky’s lemma, we also have the more practical central limit theorem

 √N1+1/dˆμN(f)−μ(f)ˆσN(f)→N(0,1) (3.12)

in distribution, as .

### 3.3 Case I⊂¯¯¯d

We now consider the situation where we estimate (on ) based on which is an -Dirichlet DPP on . In this section, we naturally assume that . The interest of the contruction of our model is revealed by Corollary 3.1 which, briefly, states that Theorem 3.1 can be applied to functions of the form , .

###### Corollary 3.1.

Let and with cardinality . Consider the asymptotic framework (3.8) assume that , then the following statements hold.

(i) If , then as

 Var[ˆμN(fI)]=O(N−1−2sd).

(ii)If , then

 limN→∞N1+1/dVar[ˆμN(fI)]=σ2(fI) (3.13)

where

 σ2(fI)=∑j∈Zι(∑i∈I|ji|κi)∣∣ˆfI(j)∣∣2 (3.14)

and where is given by (3.3).

(iii) If, in addition is bounded, then as

 (3.15)

in distribution.

The asymptotic constant can still be consistently estimated by (3.11). The proof of this result is a straightforward consequence of Theorem 3.1. Another approach using the fact that is distributed as an -DPP is proposed in Appendix B.

### 3.4 Multivariate central limit theorem

We can combine Theorem 3.1 and Corollary 3.1 to obtain a multivariate version of the central limit theorem.

###### Corollary 3.2.

Let . For any , let with and assume that with if or if . Let and . Then, under the asymptotic framework (3.8), is an unbiased estimator of and as

 √N1+1/d(ˆμN,p−μp)→N(0,Σp)

in distribution, where is the Hermitian matrix with entries

 (Σp)ℓℓ′=∑j∈Zd(d∑i=1|ji|κi)ˆf↑Iℓ(j)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ˆf↑Iℓ′(j)

where , .

## 4 Simulation study

We propose now a simulation study to illustrate the results. We first consider the setting of Theorem 3.1 (ii)-(iii) and Corollary 3.1 (ii)-(iii), i.e. on the case (when ) and second the situtation .

### 4.1 Case s>1/2, illustration of Theorem 3.1 (ii)-(iii)

We consider three different functions with different regularity properties:

• Bump function

 fbump(x)=d∏i=1φbump(xi)∫10φbump(t)dt,φbump(t)=exp(−0.41−(t−1/2)2). (4.1)
• Mixture of cosines

 fmixcos(x)=1dd∑i=1φmc(xi)∫10φmc(t)dt,φmc(t)=0.1|cos(5π(t−1/2))|+(t−1/2)2. (4.2)
• Normalized -norm: let .

 fγ(x)=1dd∑i=1φγ(xi)∫t0φγ(t)dt,φγ(t)=|t−1/2|γ. (4.3)

It is worth mentioning that is infinitely continuously differentiable, so , for any . The function is a non-differentiable (with singularity points) which satisfies , so for any . Finally, is also a non-differentiable squared integrable periodic function, which satisfies . Hence,