Sampling Sup-Normalized Spectral Functions for Brown-Resnick Processes

Sup-normalized spectral functions form building blocks of max-stable and Pareto processes and therefore play an important role in modeling spatial extremes. For one of the most popular examples, the Brown-Resnick process, simulation is not straightforward. In this paper, we generalize two approaches for simulation via Markov Chain Monte Carlo methods and rejection sampling by introducing new classes of proposal densities. In both cases, we provide an optimal choice of the proposal density with respect to sampling efficiency. The performance of the procedures is demonstrated in an example.

Authors

• 8 publications
• 4 publications
• 10 publications
10/20/2019

10/11/2019

Regeneration-enriched Markov processes with application to Monte Carlo

We study a class of Markov processes comprising local dynamics governed ...
10/18/2017

Bayesian inversion of convolved hidden Markov models with applications in reservoir prediction

Efficient assessment of convolved hidden Markov models is discussed. The...
05/29/2020

Informed Proposal Monte Carlo

Any search or sampling algorithm for solution of inverse problems needs ...
11/04/2019

Annotated Hypergraphs: Models and Applications

Hypergraphs offer a natural modeling language for studying polyadic inte...
02/09/2018

Slice Sampling Particle Belief Propagation

Inference in continuous label Markov random fields is a challenging task...
02/28/2021

Exact Simulation of Max-Infinitely Divisible Processes

Max-infinitely divisible (max-id) processes play a central role in extre...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spatial and spatio-temporal extreme value analysis aims at investigating extremes of quantities described by stochastic processes. In the classical setting, the real-valued process of interest is sample-continuous on a compact domain . Analysis of its extremes is often based on results of the limiting behavior of maxima of independent copies , . Provided that there exist continuous normalizing functions and such that the process of normalized maxima converges in distribution to some sample-continuous process with nondegenerate margins as , the limit process is necessarily max-stable and we say that is in the max-domain of attraction of .

From univariate extreme value theory, it follows that the marginal distributions of are necessarily generalized extreme value (GEV) distributions cf. 4 for instance. As max-stability is preserved under marginal transformations between different GEV distributions, without loss of generality, it can be assumed that has standard Fréchet margins, i.e. , , for all . By de Haan 3, any sample-continuous max-stable process with standard Fréchet margins can be represented as

 Z(t)=dmaxi∈N{Ui⋅Vi(t)},t∈K, (1)

where the so-called spectral processes , , are independent copies of a nonnegative sample continuous stochastic process on satisfying for all , and is a Poisson point process on which is independent of the and has intensity measure given by for all .

Due to its complex structure, many characteristics of the max-stable process in (1) cannot be calculated analytically, but need to be assessed via simulations. In order to simulate efficiently, Oesting . 15 suggest to make use of the sup-normalized spectral representation

 Z(t)=dmaxi∈N{Ui⋅c∞⋅Vmaxi(t)∥Vmaxi∥∞},t∈K, (2)

where the are the same as above, the processes are independently and identically distributed, independently of the , with distribution given by

 P(Vmax∈B)=c−1∞⋅∫B∥v∥∞P(V∈dv),B∈C(K), (3)

and for every , where denotes the set of all real-valued continuous functions on equipped with the supremum norm and corresponding -algebra . Here, the normalizing constant is the so-called extremal coefficient of the max-stable process over the domain . In a simulation study, Oesting . 15 demonstrate that simulation based on the sup-normalized spectral representation is competitive to other state-of-the-art algorithms such as simulation based on extremal functions 7 provided that the normalized spectral process can be simulated efficiently.

The law of the processes also occurs when analyzing the extremes of a stochastic process in an alternative way focusing on exceedances over a high threshold: If is in the max-domain of attraction of the max-stable process in (1), we have

 L(x−1X(⋅)∣∥X∥∞>x)⟶wL(PVmax(⋅)∥Vmax∥∞)

as , where

is a standard Pareto random variable and

is an independent process with law given in (3). The limit process is called Pareto process cf. 9, 8.

Arising as sup-normalized spectral process for both max-stable and Pareto processes, the process plays an important role in modelling and analyzing spatial extremes. As a crucial building block of spatial and spatio-temporal models, this process needs to be simulated in an efficient way. Due to the measure transformation in (3), however, sampling of is not straightforward even in cases where the underlying spectral process can be simulated easily.

In the present paper, we focus on the simulation of for the very popular class of log Gaussian spectral processes, i.e.  for some Gaussian process such that for all . The resulting subclass of max-stable processes in (1) comprises the only possible nontrivial limits of normalized maxima of rescaled Gaussian processes, the class of Brown–Resnick processes 13, 12. In order to obtain Brown–Resnick processes that can be extended to stationary processes on , 13 consider , , with being a centered Gaussian process on with stationary increments and variogram

 γ(h)=12E{(G(t+h)−G(t))2},t,h∈Rd.

It is important to note that the law of the resulting max-stable process

does not depend on the variance of

, but only on . Therefore, is called Brown–Resnick process associated to the variogram .

Recently, Ho  Dombry 11 introduced a two-step procedure to simulate the corresponding sup-normalized process

 Vmax(⋅)∥Vmax∥∞=exp(Wmax(⋅)−∥Wmax∥∞)

efficiently if the finite domain is of small or moderate size:

1. Sample the index

of the component where the vector

assumes its maximum, i.e. select one of the events , . Provided that the covariance matrix of the Gaussian vector

is nonsingular, we have that this index is a.s. unique and that the probabilities of the corresponding events can be calculated in terms of the matrix

and the vector given by

 Q=C−1−C−11N1⊤NC−11⊤NC−11Nandm=−(12σ+1−12σ⊤C−11N1⊤NC−11N1⊤N)C−1

where is the variance vector of and . More precisely, by Ho  Dombry 11,

 P(Vmax∈Si)=det(Q−i)−1/2exp{12m⊤−iQ−1−im−i}ΦN−1(0N−1;Q−1−im−i,Q−1−i)∑Nj=1det(Q−j)−1/2exp{12m⊤−jQ−1−jm−j}ΦN−1(0N−1;Q−1−jm−j,Q−1−j),

where denotes the vector after removing the th component, denotes the matrix after removing the th row and th column and is the distribution function of an

-dimensional Gaussian distribution with mean vector

and covariance matrix evaluated at .

2. Conditional on , we have and the distribution of the vector is an -dimensional Gaussian distribution with mean vector and covariance matrix conditional on being nonpositive.

However, the first step includes computationally expensive operations such as the evaluation of -dimensional Gaussian distribution functions and the inversion of matrices of sizes and . Furthermore, an efficient implementation of the second step is not straightforward. Thus, the procedure is feasible for small or moderate only.

In this paper, we will introduce alternative procedures for the simulation of , or, equivalently, , that are supposed to work for larger , as well. To this end, we will modify a Markov Chain Monte Carlo (MCMC) algorithm proposed by Oesting . 15 and a rejection sampling approach based on ideas of de Fondeville  Davison 2. Both procedures have originally been designed to sample sup-normalized spectral functions in general. Here, we will adapt them to the specific case of Brown–Resnick processes.

2 Simulating Wmax via MCMC algorithms

Based on the Brown–Resnick process as our main example, we consider a max-stable process with spectral process for some sample-continuous process . Henceforth, we will always assume that the simulation domain is finite and that the corresponding spectral vector possesses a density w.r.t. some measure on . Then, by (3), the transformed spectral vector , where the logarithm is applied componentwise, has the multivariate density

 fmax(w)= c−1∞Nmaxi=1exp(wi)f(w),w∈RN,

which obviously has the same support as , i.e. .

As direct sampling from the density is rather sophisticated and the normalizing constant is not readily available, it is quite appealing to choose an MCMC approach for simulation. In the present paper, we focus on Metropolis-Hastings algorithms with independence sampler cf. 17 for example. Denoting the strictly positive proposal density on by , the algorithm is of the following form:

Algorithm 1.

MCMC APPROACH (METROPOLIS–HASTINGS)
Input: proposal density Simulate according to the density . for { for Sample from and set for for where the acceptance probability is given by (4). } Output: Markov chain .

Here, the acceptance ratio for a new proposal given a current state is

 α(˜w,w)=min{fmax(w)/fprop(w)fmax(˜w)/fprop(˜w),1}, (4)

using the convention that a ratio is interpreted as if both the enumerator and the denominator are equal to . This choice of ensures reversibility of the resulting Markov chain with respect to the distribution of . Further, it allows for a direct transition from any state to any other state . Consequently, the chain is irreducible and aperiodic and, thus, its distribution converges to the desired stationary distribution, that is, for a.e. initial state , we have that

 ∥Pn(w(0),⋅)−P(Wmax∈⋅)∥TVn→∞⟶0, (5)

where denotes the distribution of the -th state of a Markov chain with initial state and is the total variation norm.

As a general approach for the simulation of sup-normalized spectral processes of arbitrary max-stable processes, Oesting . 15 propose to use Algorithm 1 with the density of the original spectral vector as proposal density (Algorithm 1A) and the Metropolis-Hastings acceptance ratio in (4) simplifies to

 α(˜w,w)=min{maxNi=1ewimaxNi=1e˜wi,1},w,˜w∈RN. (6)

As the proposal density is strictly positive on , convergence of the distribution of the Markov chain to the distribution of as in (5) is ensured. If the support of is unbounded, however, there is no uniform geometric rate of convergence of the chain in (5), as we have

 essinfw∈RNf(w)fmax(w)=essinfw∈RN(c∞⋅minNi=1e−wi)=0

14. In particular, this holds true for the case of a Brown–Resnick process where is a Gaussian vector.

Furthermore, due to the structure of the acceptance ratio in (6), the Markov chain may get stuck, once a state with a large maximum is reached. This might lead to rather poor mixing properties of the chain. Even though independent realizations could still be obtained by starting new independent Markov chains cf. 15, such a behavior is undesirable having chains in high dimension with potentially long burn-in periods in mind.

While the algorithm in Oesting . 15 is designed to be applicable in a general framework, we will use a specific transformation to construct a Markov chain with stronger mixing and faster convergence to the target distribution. For many popular models such as Brown–Resnick processes, this transformation is easily applicable. More precisely, we consider the related densities , , with . These densities are closely related to the distributions that have been studied in Dombry . 7.

Hence, we propose to approach the target distribution with density by Algorithm 1 using a mixture

 fprop=∑Ni=1pifi (7)

as proposal density, where the weights , , are such that . The corresponding acceptance probability in (4) is then

 ˜α(˜w,w)=min{maxNi=1ewi/∑Ni=1piewimaxNi=1e˜wi/∑Ni=1pie˜wi,1}. (8)

With the proposal density being strictly positive on , we can see that the distribution of the Markov chain again converges to its stationary distribution with density . As we further have

 infw∈RNfprop(w)fmax(w)=infw∈RN∑Ni=1piewic−1∞maxNi=1ewi=c∞⋅Nmini=1pi>0, (9)

provided that for , the results found by Mengersen  Tweedie 14 even ensure a uniform geometric rate of convergence for any starting value in contrast to the case where .

In order to obtain a chain with good mixing properties, we choose such that the acceptance rate in Algorithm 1 is high provided that the current state is (approximately) distributed according to the stationary distribution. To this end, we minimize the relative deviation between and under , i.e. we minimize

 D(p1,…,pN)=∫RN(fprop(w)fmax(w)−1)2fmax(w)dw,

under the constraint . Introducing a Lagrange multiplier , minimizing

 D(p1,…,pN)= E{(∑Ni=1pieW(ti)c−1∞emaxNj=1W(tj)−1)2c−1∞emaxNj=1W(tj)}=c∞N∑i=1N∑k=1pipkE{eW(ti)+W(tk)−maxNj=1W(tj)}−1

results in solving the linear system

 (Σ1N1⊤N0)(pλ)=(0N1), (10)

where and with

 σik=E{eW(ti)+W(tk)−maxNj=1W(tj)}. (11)

Provided that the matrix is nonsingular, the solution of (10) is given by

 p=Σ−11N1⊤NΣ−11N (12)

cf. 1 for instance. This solution does not necessarily satisfy the additional restriction for all . In case that is singular or the vector has at least one negative entry, the full optimization problem

 min p⊤Σp s.t.1⊤Np =1 (QP) pi ≥0∀i=1,…,N,

has to be solved. Using the Karush–Kuhn–Tucker optimality conditions, the quadratic program (2

) can be transformed into a linear program with additional (nonlinear) complementary slackness conditions. It can be solved by modified simplex methods. Alternatively, the problem (

2) can be solved by the dual method by Goldfarb  Idnani 10.

In order to ensure a geometric rate of convergence of the distribution of the Markov chain, we might replace the condition for all in (2) by for some given . Then, a geometric rate of convergence follows from (9) as described above.

In the case of Brown–Resnick processes, for simplicity, we consider the case that possesses a full Lebesgue density. In particular, the covariance matrix of is assumed to be nonsingular. Then, the target density is

 fmax(w)=c−1∞Nmaxi=1exp(wi)f(w)=c−1∞maxNi=1exp(wi)(2π)N/2det(C)1/2exp{−12(w+σ2)⊤C−1(w+σ2)},w∈RN,

where is again the variance vector of . Now, the densities which form the proposal density are just shifted Gaussian distributions:

 fi(w)=exp(wi)(2π)N2det(C)12exp{−12(w+σ2)⊤C−1(w+σ2)}=1(2π)N2det(C)12exp{−12(w−C⋅i+σ2)⊤C−1(w−C⋅i+σ2)},

cf. Lemma 1 in the Supplementary Material of Dombry . 7, i.e. we have

 L(W(i))=L(W+C⋅i) (13)

where the Gaussian vectors and possess densities and , respectively. The calculation of the optimal weights is based on the expectation in (11) which typically cannot be calculated analytically, but needs to be assessed numerically via simulations. Such a numerical evaluation, however, is challenging as the random variable is unbounded. To circumvent these computational difficulties, we make use of the identity

 σik=E{eW(ti)+W(tk)−maxNj=1W(tj)}= ∫RNewi+wk−maxNj=1wj(2π)N2det(C)12exp{−12(w+σ2)⊤C−1(w+σ2)}dw = ∫RNewk−maxNj=1wj(2π)N2det(C)12exp{−12(w−C⋅i+σ2)⊤C−1(w−C⋅i+σ2)}dw = ∫RNewk−maxNj=1wjfi(w)dw=E{exp(W(i)(tk)−Nmaxj=1W(i)(tj))} (14)

This expression can be conveniently assessed numerically as the random variable is bounded by .

Note that both (13) and the final result in (14) still hold true if does not possess a full Lebesgue density, but exactly one component is degenerate and the reduced covariance matrix is nonsingular. This situation appears in several examples such as being a fractional Brownian motion where a.s.

In summary, we propose the procedure below to simulate the normalized spectral vector for the Brown–Resnick process (Algorithm 1B):

1. Calculate by solving the quadratic program (2) where the entries of the matrix are given by (14). Provided that all its components are nonnegative, the solution has the form (12).

2. Run Algorithm 1 with proposal density and acceptance probability given by (8). The output of the algorithm is a Markov chain whose stationary distribution is the distribution of .

3 Exact Simulation via Rejection Sampling

In this section, we present an alternative procedure to generate samples from with probability density . In contrast to Section 2 where we generated a Markov chain consisting of dependent samples with the desired distribution as stationary distribution, here, we aim to produce independent realizations from the exact target distribution. To this end, we make use of a rejection sampling approach cf. 5 for instance based on a proposal density satisfying

 fmax(w)≤(c∞⋅C)−1~fprop(w),for all w∈RN, (15)

for some .

Algorithm 2.

REJECTION SAMPLING APPROACH
Input: proposal density and constant satisfying (15) repeat { repeat Simulate according to the density . repeat Generate a uniform random number in . } until Output: exact sample from distribution with density

Thus, on average, simulations from the proposal distribution are needed to obtain an exact sample from the target distribution. Of course, to minimize the computational burden, for a given proposal density , the constant should be chosen maximal subject to (15), i.e.

 C=infw∈RN~fprop(w)c∞fmax(w).

Recently, de Fondeville  Davison 2 followed a similar idea and suggested to base the simulation of a general sup-normalized spectral process on the relation

 P(Vmax∥Vmax∥∞∈dv)=∥~V∥∞E∥~V∥∞P(~V∥~V∥∞∈dv) (16)

where is a spectral process normalized with respect to another homogeneous functional instead of the supremum norm, i.e.  a.s. If is a.s. bounded from above by some constant, from the relation (16), we obtain an inequality of the same type as (15) for the densities of and instead of and , respectively. Thus, samples of can be used as proposals for an exact rejection sampling procedure. For instance, the sum-normalized spectral vector , i.e. the vector which is normalized w.r.t. the functional , can be chosen as it is easy to simulate in many cases cf. 7 and satisfies almost surely.

For a Brown–Resnick process, it is well-known that the sum-normalized process has the same distribution as where has the density from (7) in Section 2 with equal weights see also 6. Thus, in this case, the procedure proposed by de Fondeville  Davison 2 with is equivalent to performing rejection sampling for with as proposal distribution (Algorithm 2A). From Equation (9), it follows that rejection sampling can also be performed with and arbitrary positive weights summing up to , since we have (15) with . Thus, accepting a proposal in the rejection sampling procedure with probability

 Nmini=1pi⋅c∞fmax(w∗)fprop(w∗)=minNi=1pi⋅maxNi=1ew∗i∑Ni=1piew∗i,

we will obtain a sample of independent realizations from the exact target distribution . The rejection rate, however, is pretty high. In order to obtain one realization of , on average simulations from are required. It can be easily seen that the computational costs are indeed minimal for the choice , i.e. the choice in the approach based on the sum-normalized representation. In this case, one realization of on average requires to sample times from . Therefore, this approach becomes rather inefficient if we have a large number of points on a dense grid.

In order to reduce the large computational costs of rejection sampling which are mainly due to the fact that gets small as , we replace each density by the modified multivariate Gaussian density whose variance is increased by the factor for some :

 gi,ε(w)= (1−ε)N/2(2π)N/2det(C)1/2exp(−12(1−ε)⋅(w−C⋅i+σ2)⊤C−1(w−C⋅i+σ2)) =

Analogously to for the MCMC approach in Section 2, we propose a mixture

 ~fprop=∑Ni=1pigi,ε

with and as proposal density for the rejection sampling algorithm. A proposal is then accepted with probability

 C(p,ε)⋅c∞⋅fmax(w∗)∑Ni=1pigi,ε(w∗)= C(p,ε)⋅exp(−ε2(w+σ2)⊤C−1(w+σ2))(1−ε)N2⋅∑Ni=1piexp((1−ε)wi−maxNj=1w∗j), (17)

where

 C(p,ε)= infw∈RN∑Ni=1pigi,ε(w)c∞fmax(w)=infw∈RNNminj=1∑Ni=1pigi,ε(w)fj(w) = infw∈RN(1−ε)N2N∑i=1piexp((1−ε)wi−Nmaxj=1wj+ε2(w+σ2)⊤C−1(w+σ2)). (18)

Thus, to summarize, for appropriately chosen and such that , we propose to run Algorithm 2 with proposal density and according to (3).

To further reduce the computational costs in the simulation, we might even choose a more flexible approach. For instance, instead of using a mixture of a finite number of functions , one could consider arbitrary mixtures

 ~fprop(w)=∫Rdgt,ε(w)ν(dt),w∈RN,

where on the enlarged domain and is a probability measure on . Furthermore, depending on , different values for might be chosen. However, due to the complexity of the optimization problems involved, we restrict ourselves to the situation above where is a probability measure on and is constant in space.

Using the procedure described above, on average, simulations from the proposal distribution are needed to obtain one exact sample from the target distribution, i.e. the computational complexity of the algorithm depends on the choices of and . The remainder of this section will be devoted to this question.

Choice of p and ε

For a given , the computational costs of the algorithm can be minimized by choosing such that the constant given in (3) is maximal, i.e. by choosing as the solution of the nonlinear optimization problem

 maxp∈RN C(p,ε) s.t.∥p∥1 =1 (NP) pi ≥0∀i=1,…,N

Optimizing further w.r.t. , we obtain the optimal choice where .

As the above optimization problem includes optimization steps w.r.t. , and , none of which can be solved analytically, the solution is quite involved. In order to reduce the computational burden, we simplify the problem by maximizing an analytically simpler lower bound. To this end, we decompose the convex combination into sums over disjoint subsets of the form . For a convex combination of with weight vector , we obtain the lower bound

 infw∈RN∑mk=1λkgik,ε(w)fj(w)= ≥ infw∈RN(1−ε)N/2⋅exp((1−ε)∑mk=1λkwik−wj)⋅exp(ε2(w+σ2)⊤C−1(w+σ2))=:c(j)I(ε,λ),

where we made use of the convexity of the exponential function. Setting , this bound can be calculated explicitly:

 c(j)I(ε,λ)= infw∈RN(1−ε)N/2⋅exp{ε2(w+1εκ(j)I(ε,λ)+σ2)⊤C−1(w