1 Introduction
Spatial and spatiotemporal extreme value analysis aims at investigating extremes of quantities described by stochastic processes. In the classical setting, the realvalued process of interest is samplecontinuous on a compact domain . Analysis of its extremes is often based on results of the limiting behavior of maxima of independent copies , . Provided that there exist continuous normalizing functions and such that the process of normalized maxima converges in distribution to some samplecontinuous process with nondegenerate margins as , the limit process is necessarily maxstable and we say that is in the maxdomain of attraction of .
From univariate extreme value theory, it follows that the marginal distributions of are necessarily generalized extreme value (GEV) distributions cf. ^{4} for instance. As maxstability is preserved under marginal transformations between different GEV distributions, without loss of generality, it can be assumed that has standard Fréchet margins, i.e. , , for all . By de Haan ^{3}, any samplecontinuous maxstable process with standard Fréchet margins can be represented as
(1) 
where the socalled spectral processes , , are independent copies of a nonnegative sample continuous stochastic process on satisfying for all , and is a Poisson point process on which is independent of the and has intensity measure given by for all .
Due to its complex structure, many characteristics of the maxstable process in (1) cannot be calculated analytically, but need to be assessed via simulations. In order to simulate efficiently, Oesting . ^{15} suggest to make use of the supnormalized spectral representation
(2) 
where the are the same as above, the processes are independently and identically distributed, independently of the , with distribution given by
(3) 
and for every , where denotes the set of all realvalued continuous functions on equipped with the supremum norm and corresponding algebra . Here, the normalizing constant is the socalled extremal coefficient of the maxstable process over the domain . In a simulation study, Oesting . ^{15} demonstrate that simulation based on the supnormalized spectral representation is competitive to other stateoftheart algorithms such as simulation based on extremal functions ^{7} provided that the normalized spectral process can be simulated efficiently.
The law of the processes also occurs when analyzing the extremes of a stochastic process in an alternative way focusing on exceedances over a high threshold: If is in the maxdomain of attraction of the maxstable process in (1), we have
as , where
is a standard Pareto random variable and
is an independent process with law given in (3). The limit process is called Pareto process cf. ^{9}, ^{8}.Arising as supnormalized spectral process for both maxstable and Pareto processes, the process plays an important role in modelling and analyzing spatial extremes. As a crucial building block of spatial and spatiotemporal models, this process needs to be simulated in an efficient way. Due to the measure transformation in (3), however, sampling of is not straightforward even in cases where the underlying spectral process can be simulated easily.
In the present paper, we focus on the simulation of for the very popular class of log Gaussian spectral processes, i.e. for some Gaussian process such that for all . The resulting subclass of maxstable processes in (1) comprises the only possible nontrivial limits of normalized maxima of rescaled Gaussian processes, the class of Brown–Resnick processes ^{13}, ^{12}. In order to obtain Brown–Resnick processes that can be extended to stationary processes on , ^{13} consider , , with being a centered Gaussian process on with stationary increments and variogram
It is important to note that the law of the resulting maxstable process
does not depend on the variance of
, but only on . Therefore, is called Brown–Resnick process associated to the variogram .Recently, Ho Dombry ^{11} introduced a twostep procedure to simulate the corresponding supnormalized process
efficiently if the finite domain is of small or moderate size:

Sample the index
of the component where the vector
assumes its maximum, i.e. select one of the events , . Provided that the covariance matrix of the Gaussian vectoris nonsingular, we have that this index is a.s. unique and that the probabilities of the corresponding events can be calculated in terms of the matrix
and the vector given bywhere is the variance vector of and . More precisely, by Ho Dombry ^{11},
where denotes the vector after removing the th component, denotes the matrix after removing the th row and th column and is the distribution function of an
dimensional Gaussian distribution with mean vector
and covariance matrix evaluated at . 
Conditional on , we have and the distribution of the vector is an dimensional Gaussian distribution with mean vector and covariance matrix conditional on being nonpositive.
However, the first step includes computationally expensive operations such as the evaluation of dimensional Gaussian distribution functions and the inversion of matrices of sizes and . Furthermore, an efficient implementation of the second step is not straightforward. Thus, the procedure is feasible for small or moderate only.
In this paper, we will introduce alternative procedures for the simulation of , or, equivalently, , that are supposed to work for larger , as well. To this end, we will modify a Markov Chain Monte Carlo (MCMC) algorithm proposed by Oesting . ^{15} and a rejection sampling approach based on ideas of de Fondeville Davison ^{2}. Both procedures have originally been designed to sample supnormalized spectral functions in general. Here, we will adapt them to the specific case of Brown–Resnick processes.
2 Simulating via MCMC algorithms
Based on the Brown–Resnick process as our main example, we consider a maxstable process with spectral process for some samplecontinuous process . Henceforth, we will always assume that the simulation domain is finite and that the corresponding spectral vector possesses a density w.r.t. some measure on . Then, by (3), the transformed spectral vector , where the logarithm is applied componentwise, has the multivariate density
which obviously has the same support as , i.e. .
As direct sampling from the density is rather sophisticated and the normalizing constant is not readily available, it is quite appealing to choose an MCMC approach for simulation. In the present paper, we focus on MetropolisHastings algorithms with independence sampler cf. ^{17} for example. Denoting the strictly positive proposal density on by , the algorithm is of the following form:
Algorithm 1.
MCMC APPROACH (METROPOLIS–HASTINGS)
Input: proposal density
Simulate according to the density .
for {
Sample from and set
where the acceptance probability is given
by (4).
}
Output: Markov chain .
Here, the acceptance ratio for a new proposal given a current state is
(4) 
using the convention that a ratio is interpreted as if both the enumerator and the denominator are equal to . This choice of ensures reversibility of the resulting Markov chain with respect to the distribution of . Further, it allows for a direct transition from any state to any other state . Consequently, the chain is irreducible and aperiodic and, thus, its distribution converges to the desired stationary distribution, that is, for a.e. initial state , we have that
(5) 
where denotes the distribution of the th state of a Markov chain with initial state and is the total variation norm.
As a general approach for the simulation of supnormalized spectral processes of arbitrary maxstable processes, Oesting . ^{15} propose to use Algorithm 1 with the density of the original spectral vector as proposal density (Algorithm 1A) and the MetropolisHastings acceptance ratio in (4) simplifies to
(6) 
As the proposal density is strictly positive on , convergence of the distribution of the Markov chain to the distribution of as in (5) is ensured. If the support of is unbounded, however, there is no uniform geometric rate of convergence of the chain in (5), as we have
^{14}. In particular, this holds true for the case of a Brown–Resnick process where is a Gaussian vector.
Furthermore, due to the structure of the acceptance ratio in (6), the Markov chain may get stuck, once a state with a large maximum is reached. This might lead to rather poor mixing properties of the chain. Even though independent realizations could still be obtained by starting new independent Markov chains cf. ^{15}, such a behavior is undesirable having chains in high dimension with potentially long burnin periods in mind.
While the algorithm in Oesting . ^{15} is designed to be applicable in a general framework, we will use a specific transformation to construct a Markov chain with stronger mixing and faster convergence to the target distribution. For many popular models such as Brown–Resnick processes, this transformation is easily applicable. More precisely, we consider the related densities , , with . These densities are closely related to the distributions that have been studied in Dombry . ^{7}.
Hence, we propose to approach the target distribution with density by Algorithm 1 using a mixture
(7) 
as proposal density, where the weights , , are such that . The corresponding acceptance probability in (4) is then
(8) 
With the proposal density being strictly positive on , we can see that the distribution of the Markov chain again converges to its stationary distribution with density . As we further have
(9) 
provided that for , the results found by Mengersen Tweedie ^{14} even ensure a uniform geometric rate of convergence for any starting value in contrast to the case where .
In order to obtain a chain with good mixing properties, we choose such that the acceptance rate in Algorithm 1 is high provided that the current state is (approximately) distributed according to the stationary distribution. To this end, we minimize the relative deviation between and under , i.e. we minimize
under the constraint . Introducing a Lagrange multiplier , minimizing
results in solving the linear system
(10) 
where and with
(11) 
Provided that the matrix is nonsingular, the solution of (10) is given by
(12) 
cf. ^{1} for instance. This solution does not necessarily satisfy the additional restriction for all . In case that is singular or the vector has at least one negative entry, the full optimization problem
(QP)  
has to be solved. Using the Karush–Kuhn–Tucker optimality conditions, the quadratic program (2
) can be transformed into a linear program with additional (nonlinear) complementary slackness conditions. It can be solved by modified simplex methods. Alternatively, the problem (
2) can be solved by the dual method by Goldfarb Idnani ^{10}.In order to ensure a geometric rate of convergence of the distribution of the Markov chain, we might replace the condition for all in (2) by for some given . Then, a geometric rate of convergence follows from (9) as described above.
In the case of Brown–Resnick processes, for simplicity, we consider the case that possesses a full Lebesgue density. In particular, the covariance matrix of is assumed to be nonsingular. Then, the target density is
where is again the variance vector of . Now, the densities which form the proposal density are just shifted Gaussian distributions:
cf. Lemma 1 in the Supplementary Material of Dombry . ^{7}, i.e. we have
(13) 
where the Gaussian vectors and possess densities and , respectively. The calculation of the optimal weights is based on the expectation in (11) which typically cannot be calculated analytically, but needs to be assessed numerically via simulations. Such a numerical evaluation, however, is challenging as the random variable is unbounded. To circumvent these computational difficulties, we make use of the identity
(14) 
This expression can be conveniently assessed numerically as the random variable is bounded by .
Note that both (13) and the final result in (14) still hold true if does not possess a full Lebesgue density, but exactly one component is degenerate and the reduced covariance matrix is nonsingular. This situation appears in several examples such as being a fractional Brownian motion where a.s.
In summary, we propose the procedure below to simulate the normalized spectral vector for the Brown–Resnick process (Algorithm 1B):
3 Exact Simulation via Rejection Sampling
In this section, we present an alternative procedure to generate samples from with probability density . In contrast to Section 2 where we generated a Markov chain consisting of dependent samples with the desired distribution as stationary distribution, here, we aim to produce independent realizations from the exact target distribution. To this end, we make use of a rejection sampling approach cf. ^{5} for instance based on a proposal density satisfying
(15) 
for some .
Algorithm 2.
REJECTION SAMPLING APPROACH
Input: proposal density and constant
satisfying (15)
repeat {
Simulate according to the density .
Generate a uniform random number in .
} until
Output: exact sample from distribution with density
Thus, on average, simulations from the proposal distribution are needed to obtain an exact sample from the target distribution. Of course, to minimize the computational burden, for a given proposal density , the constant should be chosen maximal subject to (15), i.e.
Recently, de Fondeville Davison ^{2} followed a similar idea and suggested to base the simulation of a general supnormalized spectral process on the relation
(16) 
where is a spectral process normalized with respect to another homogeneous functional instead of the supremum norm, i.e. a.s. If is a.s. bounded from above by some constant, from the relation (16), we obtain an inequality of the same type as (15) for the densities of and instead of and , respectively. Thus, samples of can be used as proposals for an exact rejection sampling procedure. For instance, the sumnormalized spectral vector , i.e. the vector which is normalized w.r.t. the functional , can be chosen as it is easy to simulate in many cases cf. ^{7} and satisfies almost surely.
For a Brown–Resnick process, it is wellknown that the sumnormalized process has the same distribution as where has the density from (7) in Section 2 with equal weights see also ^{6}. Thus, in this case, the procedure proposed by de Fondeville Davison ^{2} with is equivalent to performing rejection sampling for with as proposal distribution (Algorithm 2A). From Equation (9), it follows that rejection sampling can also be performed with and arbitrary positive weights summing up to , since we have (15) with . Thus, accepting a proposal in the rejection sampling procedure with probability
we will obtain a sample of independent realizations from the exact target distribution . The rejection rate, however, is pretty high. In order to obtain one realization of , on average simulations from are required. It can be easily seen that the computational costs are indeed minimal for the choice , i.e. the choice in the approach based on the sumnormalized representation. In this case, one realization of on average requires to sample times from . Therefore, this approach becomes rather inefficient if we have a large number of points on a dense grid.
In order to reduce the large computational costs of rejection sampling which are mainly due to the fact that gets small as , we replace each density by the modified multivariate Gaussian density whose variance is increased by the factor for some :
Analogously to for the MCMC approach in Section 2, we propose a mixture
with and as proposal density for the rejection sampling algorithm. A proposal is then accepted with probability
(17) 
where
(18) 
Thus, to summarize, for appropriately chosen and such that , we propose to run Algorithm 2 with proposal density and according to (3).
To further reduce the computational costs in the simulation, we might even choose a more flexible approach. For instance, instead of using a mixture of a finite number of functions , one could consider arbitrary mixtures
where on the enlarged domain and is a probability measure on . Furthermore, depending on , different values for might be chosen. However, due to the complexity of the optimization problems involved, we restrict ourselves to the situation above where is a probability measure on and is constant in space.
Using the procedure described above, on average, simulations from the proposal distribution are needed to obtain one exact sample from the target distribution, i.e. the computational complexity of the algorithm depends on the choices of and . The remainder of this section will be devoted to this question.
Choice of and
For a given , the computational costs of the algorithm can be minimized by choosing such that the constant given in (3) is maximal, i.e. by choosing as the solution of the nonlinear optimization problem
(NP)  
Optimizing further w.r.t. , we obtain the optimal choice where .
As the above optimization problem includes optimization steps w.r.t. , and , none of which can be solved analytically, the solution is quite involved. In order to reduce the computational burden, we simplify the problem by maximizing an analytically simpler lower bound. To this end, we decompose the convex combination into sums over disjoint subsets of the form . For a convex combination of with weight vector , we obtain the lower bound
where we made use of the convexity of the exponential function. Setting , this bound can be calculated explicitly:
Comments
There are no comments yet.