Bayesian spatial clustering of extremal behaviour for hydrological variables

by   Christian Rohrbeck, et al.

To address the need for efficient inference for a range of hydrological extreme value problems, spatial pooling of information is the standard approach for marginal tail estimation. We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial dependence is incorporated in two ways: to determine the cluster selection and to account for dependence of the data over sites within a cluster when making the marginal inference. We introduce a statistical model for the pairwise extremal dependence which incorporates distance between sites, and accommodates our belief that sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters. We use a Bayesian framework which learns about both the number of clusters and their spatial structure, and that enables the inference of site-specific marginal distributions of extremes to incorporate uncertainty in the clustering allocation. The approach is illustrated using simulations, the analysis of daily precipitation levels in Norway and daily river flow levels in the UK.



page 24

page 26

page 27

page 30


A hierarchical Bayesian non-asymptotic extreme value model for spatial data

Spatial maps of extreme precipitation are crucial in flood protection. W...

Modelling Extremes of Spatial Aggregates of Precipitation using Conditional Methods

Inference on the extremal behaviour of spatial aggregates of precipitati...

Flexible Marginal Models for Dependent Data

Models for dependent data are distinguished by their targets of inferenc...

Modelling the spatial extent and severity of extreme European windstorms

Windstorms are a primary natural hazard affecting Europe that are common...

A Bayesian latent allocation model for clustering compositional data with application to the Great Barrier Reef

Relative abundance is a common metric to estimate the composition of spe...

Multivariate Functional Data Modeling with Time-varying Clustering

We consider the situation where multivariate functional data has been co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Statistical models for estimating the frequency and size of flood and severe rainfall events are required by decision makers to construct effective protection measures and by risk analysts to set insurance premiums. Since such extreme events occur rarely at a site of interest, model-based estimates for the behaviour of extremes at a site are usually derived using a small number of observations, inducing high uncertainty. With a view to obtaining more reliable estimates, pooling of information from other sites can be used; this is the basis of regional methods which are widely used for environmental, meteorological or hydrological hazards (Hosking and Wallis, 1993; Casson and Coles, 1999; Sang and Gelfand, 2009; Asadi et al., 2018). Natural questions that arise in this context are what criteria should be used to group sites, and how should the uncertainty in the clustering allocation, given this selected criterion, be reflected in the uncertainty in the estimated site-specific marginal distributions, and what do the estimated clusters look like?

We propose a novel approach for spatial clustering of extremes and the subsequent inference which coherently addresses these questions. Our methodology is motivated by two applications: modelling weather-related insurance claims and flood risk analysis described in Section 2. In both applications, we are interested in the extremal behaviour of a hydrological variable across multiple spatial locations. Each of these problems requires the marginal analysis of extreme values at different locations, whilst accounting for spatial structure both in the marginal distributions and in the dependence of the data from across sites.

Our approach for modelling both marginal and dependence structures is to use statistical models that have been asymptotically justified by extreme value theory (Coles, 2001; Beirlant et al., 2004)

. When considering the extremes of a univariate random variable

, we adopt the peaks-over threshold approach. Exceedances by of a high threshold are then modelled using the generalized Pareto distribution GPD with


where , and and are scale and shape parameters. The value of , interpreted as the limit of (1) as

, gives the exponential distribution, whilst

corresponds to a short-tailed distribution with finite upper end point , and

gives a power-law tail decay, heavier than that of a normal or gamma distribution. This family of distributions arises as the only possible non-degenerate limit for scaled excesses of a continuous random variable as the threshold tends to the upper end point of the distribution

(Pickands, 1975). In practice, we assume that the limit has been sufficiently achieved above a large enough choice for . Other than in hydrology, these models are applied in a range of areas, e.g., climatology (Blanchet and Davison, 2011; Reich et al., 2014) and finance (Chavez-Demoulin et al., 2014; Hilal et al., 2014). We focus on modelling threshold exceedances, but we could have worked with an extremal mixture model (Behrens et al., 2004; MacDonald et al., 2011) comprising separate distributions for observations below and above if the full distribution of was of interest.

When fitting extreme value models over a range of variables (e.g., the same physical variable measured at different sites, or different variables measured at the same time), it is natural to model the tails of the marginal distributions as GPD, with the parameters

potentially changing over variables. A range of methods have been used to cluster the variables. Two broad approaches have emerged: methods that aim purely to find clusters of similarly distributed variables for purposes of interpretation; and methods that aim to pool information over similarly distributed variables to enhance inference efficiency. The first category of methods tend to evaluate extreme value theory summary statistics (e.g., the GPD shape parameter from each site) and apply widely used generic clustering techniques (e.g., k-means or k-medoids

(Kaufman and Rousseeuw, 2005)) to form clusters. Examples of such approaches include Rubio et al. (2018) for clustering different stocks based on extreme losses, and Bernard et al. (2013) and Bador et al. (2015) who explore different levels of pairwise dependence via clustering.

The latest approaches in the second category use hierarchical modelling. First versions of these methods go back to the pooling methods used by the Flood Studies Report (1975) (Institute of Hydrology (Great Britain), 1975) which selected a hydrologically coherent region and assumed a common GPD shape parameter for all sites within the region. The methods evolved to also account for the dependence structure (Coles and Tawn, 1990, 1996). These methods do not account for the uncertainty in the process of identifying the regions/clusters of similar variables. The first hierarchical method which accounted for the cluster uncertainty in the inference for extremes was by Smith and Goodman (2000), later extended to a Bayesian mixture model by Bottolo et al. (2003)

, who define a Bayesian framework to borrow information across multiple types of independent insurance claims in order to reduce uncertainty in the estimates of the parameters of the type-wise GPD. For spatial extreme value problems hierarchical clustering models for the extremes have been proposed by

Carreau et al. (2017) and Reich and Shaby (2019), with both approaches having limitations. The former does not account for dependence over space. Although this feature is somewhat addressed by Reich and Shaby (2019), they model dependence over variables in the same cluster with a restrictive exchangeable parametric extreme value copula, which is likely to be too simplistic and as a consequence bias the marginal inference (Dupuis and Tawn, 2001). Furthermore, the approach has a high computational cost and it fixes the number of clusters.

The existing methods for spatial clustering of extremes focus solely on criteria based on the similarity of the marginal distributions, whereas others only consider dependence features. No methods look at both aspects yet, from a physical perspective, spatial dependence of a process is the key determinant of the marginal distributions being similar. Furthermore, the vast majority of approaches both ignore the clustering uncertainty (both numbers of clusters and the allocation of variables to clusters) and ignore the spatial dependence between variables within a cluster in their marginal inference.

Our proposed approach is the first to address all of these features in a single Bayesian framework and it is applicable to both areal and geostatistical data. It learns about both the number of clusters and their spatial structure with respect to the site-wise distribution of the peaks-over threshold and spatial dependence in the extremes. Unlike Reich and Shaby (2019) we do not attempt to estimate a full model for the spatial dependence structure, instead, similar to Bernard et al. (2013), we account for dependence through a widely used pairwise measure of extremal dependence (Coles et al., 1999). We introduce a statistical model for the extremal dependence measure which incorporates both distance between sites and our belief that sites within the same cluster tend to exhibit a higher degree of dependence than sites in different clusters. Our approach to impose spatial structure on the parameters of the site-wise GPD is similar to Bottolo et al. (2003), but the additional consideration of spatio-temporal dependence makes it a more general and harder problem. Posterior samples are obtained using a reversible jump MCMC algorithm (Green, 1995) which allows: the number of spatial clusters to vary, the analysis of the site-wise marginal tail behaviour, and the derivation of a point estimate for the cluster structure using Bayesian decision theory.

The paper is organized as follows: Section 2 describes the motivating applications and introduces the data; Sections 3 and 4 detail the statistical modelling framework and the inference procedure; there is a simulation study in Section 5 and Section 6 presents a data analysis for the applications; and we conclude with a discussion in Section 7.

2 Motivating examples and data

2.1 Weather-related property insurance claims in Norway

Insurance companies are interested in the distribution of the number of claims they are likely to receive over different areas as a consequence of rainfall and/or snow-melt. Scheel et al. (2013) showed that the upper tail of this distribution is exceptionally difficult to model for individual administrative areas (termed municipalities) in Norway. Rohrbeck et al. (2018) develop an extremal regression model that shows how the largest numbers of insurance claims per weather event in a municipality can be clearly linked to extreme precipitation in that municipality, and that from such a model, marginalizing over the precipitation gives a good model for the marginal distribution of the number of claims.

While their approach achieves good results for three highly populated Norwegian municipalities, its complexity requires a high number of insurance claims in order to obtain reliable estimates. As this criterion is not satisfied by most Norwegian municipalities, there is a need to exploit spatial structure to borrow information across municipalities and instead model the total claims over clusters of municipalities. The question then is what precipitation value to use and how to model that. Although the probability of a claim given an extreme precipitation is likely to change very slowly over space, the distribution of extreme precipitation varies more rapidly due to geographical reasons. Therefore we need to identify clusters of municipalities which have both the same distribution of extreme precipitation and have similar actual values in each extreme event (i.e., strong extremal dependence), so that the average precipitation over the cluster can be used as a covariate for the aggregated claims across the municipalities within the cluster.

We consider precipitation across the 343 municipalities in South Norway. These areal units, shown in Figure 1 left panel, differ substantially in size, ranging from a few through to several hundred square kilometres. The data were produced by the Norwegian Meteorological Institute (

) and provide the daily amount of precipitation (in mm), including both rain and snowfall, between 1997 and 2006. Data for each municipality are obtained by a two-step process. First, point observations from more than 200 measurement stations across Norway are spatially interpolated to a regular grid of 1km

. Then, for each municipality, the precipitation is obtained by a weighted averaging over the grid within the municipality’s boundary, with weights proportional to the population density (Haug et al., 2011). The data exhibit a small number of missing values; 326 of the 343 municipalities have a full record and only six have more than 20 observations missing.

Figure 1: Map of South Norway showing the boundaries of the municipalities (left) and the average daily amount of precipitation (mm) on log-scale between 1997 and 2006 (right).

Figure 1 right panel shows substantial spatial variation in the average daily precipitation across municipalities. The highest average values are recorded along the west coast and the averages decrease typically with increasing easterly coordinates and distance from the coast. All municipalities exhibit seasonality, with the largest average daily precipitation typically in September and October, but with the west coast having higher average precipitation levels in January to March, while these are the driest months, on average, for the municipalities in the south-east.

To focus on the larger events, we explore the exceedances of a threshold, corresponding to the annual 90% quantile for each municipality. For each municipality, the average number of events exceeding the threshold varies across the year, however, the excess values themselves are found not to exhibit seasonality. Similar results are found for all higher thresholds. Thus site-wise peaks over threshold can be considered as identically distributed over time but they do exhibit spatial variation.

2.2 Flood risk analysis for the UK

Figure 2: Locations of the 45 river flow gauges considered in Section 2.2 (left) and the daily average river flow for the Bywell gauge on the River Tyne (right). The solid line in the right panel shows the two-week moving average over the year for Bywell.

Improving river flood risk analysis in the UK is of high priority given that it has experienced several severe and widespread flood events over the past years. For instance, the floods related to Storm Desmond, Storm Eva and Storm Frank in 2015/2016 caused an estimated economic damage of between £1.3–1.9 billion (Environment Agency, 2018). Practitioners are interested in the potential size of future flood events, as well as their spatial extent. Detecting groups of sites which have similar dynamics in terms of extreme river flow levels helps to address this question as we can combine them in the inference to improve statistical efficiency and hence produce estimates that give more reliable extrapolations to rarer events. However, extreme rainfall events in different seasons have different spatial characteristics due to frontal rainfall in summer and convective rainfall in winter; so clusterings may differ between seasons.

Daily mean river flow levels (in ms) for the years 1980 to 2011 for 45 gauges are obtained from the UK’s National River Flow Archive ( and cover northern England and southern Scotland, with the majority being in North West England; see Figure 2 left panel. Observations for several years are missing for two gauges and other stations also exhibit some missing values. The data include the river flow levels for the floods in Cumbria in November 2009. Hydrological distances which account for catchment closeness are available; these provide a better emulation of the spatial dependence than the geographical distances (Asadi et al., 2015).

Figure 2 right panel indicates strong seasonality for one of the gauges, a feature typical of all gauges in the region; the highest average river flow levels are observed for November through to March, while June through to August record the lowest averages. As such, a river flow level which is considered very high in summer may be rather standard in winter. Moreover, the data exhibit strong autocorrelation. Therefore, an extreme weather event may cause extreme river flow levels over consecutive days. The data values vary substantially across gauges; the site-wise daily average ranges from 0.2ms up to 54.9ms. All these aspects, that is seasonality, spatio-temporal dependence and difference in scale, are considered in our analysis in Section 6.

3 Cluster model

3.1 Introduction

Consider sites with spatial locations . For the Norwegian municipalities in Section 2.1,  () refers to the centroid of the -th areal unit (municipality). In a geostatistical setting, is the point location of the -th site; for instance, the latitude and longitude of the -th gauge in Section 2.2. Spatial proximity of any pair of sites is measured via a suitable metric which provides a distance based on the locations and . For each of the sites, we have data for the variable we wish to draw inference on the distribution of its largest values.

The exploratory analysis in Section 2 shows that hydrological processes usually exhibit seasonality and spatio-temporal dependencies. As such, observations may neither be identically distributed nor independent. Firstly, we account for temporal dependence. We use declustering prior to any model fitting, with each of the time series being split into subperiods such that extremes occurring in different subperiods can be assumed to be independent (Ferro and Segers, 2003). We select the same set of subperiods for each site and only use the highest observation per subperiod and site in our analysis. Our methodology can be adapted to accommodate temporal lags between the same event at different sites but, in what follows, for notational simplicity, we present our method under the assumption that extreme events affecting multiple sites jointly have no time lag.

Let denote the time series for site  () after declustering, i.e., and are assumed to be independent for any and all and including ; the record length is taken to be equal across sites for simplicity. If and are close together, corresponding to being small, it is reasonable to assume that the marginal distributions for and should be the same, or very similar, therefore they should have similar GPD parameter values in (1). Spatial dependence leads to the same extreme event being present at different sites, again with the occurrence rate for such events likely to increase if is small, i.e., if is large then the chances of being large increases if and are close, relative to them being further apart. In Section 3.3 we introduce a pairwise extremal dependence measure, , between sites and that captures this feature.

Our approach is to group sites into clusters such that sites in the same cluster have both similar marginal distributions and the spatial dependence is greater between sites in the same cluster than between sites in different clusters. To represent the cluster structure, we introduce latent random variables . Let denote the number of clusters. Then, for each , with corresponding to the -th site being allocated to the -th cluster. Conditional on , we propose separate models for the marginal distributions of and the dependence measures in Sections 3.2 and 3.3 respectively. We later combine these models in Section 4.

3.2 Model for Marginal Clustering

We model the site-wise tail behaviour using the GPD model (1) for threshold exceedances. Our data in Section 2 indicates that are not identically distributed due to seasonality. There are a range of established approaches to handle this: to split the time series into shorter time periods for which the assumption of stationarity seems reasonable; model the threshold and GPD parameters in terms of a set of temporal predictors (Davison and Smith, 1990; Chavez-Demoulin and Davison, 2005); to preprocess the data to remove non-stationary to the overall series, via the Box-Cox transformation (Eastoe and Tawn, 2009). We will discuss the combination of these methods for our approach in more detail in Sections 6 and 7. However, for simplicity, when presenting the methods we assume that are identically distributed.

For site , given a high enough threshold , we model as following a GPD. The selection of the threshold in (1) is highly important. If is too high, estimates for and are based on a small number of data points. Conversely, the distribution of may not be well approximated by a GPD if is too low. In many applications, is selected using graphical diagnostic tools with the mean residual life and threshold stability plots being the most common ones (Coles, 2001). More recent techniques are described by Wadsworth (2016) and Northrop et al. (2017).

For clustering we group sites that have a very similar marginal tail behaviour. Therefore, the peaks-over threshold for all the sites within a cluster are modelled by a GPD with the same scale and shape parameters. The distribution of the peaks-over threshold for the -th site (), conditional on , is thus given by


where and  () denote the cluster-specific scale and shape parameters, therefore if . We denote the parameters of the model given by , where and .

3.3 Model for Spatial Dependence Clustering

There are a number of ways for modelling dependence in multivariate and spatial extremes. Multivariate approaches have included fitting parametric models for multivariate extreme value copula

(Tawn, 1988) and various threshold methods (Ledford and Tawn, 1997; Ferreira and de Haan, 2014; Rootzén et al., 2018). In a spatial context, max-stable processes are the most widely used (Davison et al., 2012; Reich and Shaby, 2012; Asadi et al., 2015). Such processes arise as the limit of component-wise maxima of suitably normalized replications of a spatial process (de Haan, 1984) and multiple parametric representations have been proposed (Davison et al., 2012). However, they have a number of inference and model limitations. Inference issues are mostly overcome by instead using Pareto processes (Ferreira and de Haan, 2014; Dombry and Ribatet, 2015); however these models give either a strong form of extremal dependence (corresponding to in (3)) across all sites or give independence. Alternative methods that allow for spatial dependence that weakens with extremal level also exist (Wadsworth and Tawn, 2012, 2018).

None of these methods has both the sufficient flexibility and ease of implementation to be applied reliably to problems of the dimension of those in Section 2

. Therefore, instead of attempting to model the full joint distribution over the extreme events at the sites, we model the core summary statistics for pairwise extremal dependence. The most widely used extremal dependence measure is the coefficient of asymptotic dependence

(Coles et al., 1999). Formally, for the random variables and  (), where


with and

denoting the cumulative distribution functions of

and , respectively. Thus gives the limit probability of site observing an extreme event conditional on site recording one. There are strong parallels between and the extremogram (Davis and Mikosch, 2009) at distance , with the key difference being that inference for the extremogram pools together all the data from pairs of sites with the same separation under the assumption of stationarity. When , and are termed asymptotically dependent, with increasing values corresponding to stronger extremal dependence, whilst we say that and are asymptotically independent if . Asymptotic dependence can, for instance, arise when the conditions for bivariate regular variation hold (Resnick, 2013), while random variables with a Gaussian copula are asymptotically independent unless they are perfectly dependent.

Before attempting to specify the model for the distribution of , conditional on , consider what properties the expected value of this conditional distribution should possess. Since sites within the same cluster are expected to have a higher probability of joint extreme events, should be larger for than when . We express this assumption via the constraint


Furthermore, the pairwise extremal dependence between the sites and , irrespective of whether the sites are in the same cluster or not, should decrease with increasing distance . We assume that the expectation of decays exponentially with , but with a varying rate across each cluster and with a common, but faster, decay rate between clusters. These properties are reflected in the formulation


where  () is a cluster-specific parameter and . This approach is consistent with  ( and constraint (4), and allows for non-stationarity of the process as and , , can differ. We ensure that by defining parameters such that


Now consider the distribution of . As may differ between two pairs of sites in the same cluster with the same

, due to factors such as topology, we choose a beta distribution model with


which has expectation given by (5) and where

controls the variance of

; higher values of correspond to being less variable. Consequently, the spatial variation of the set of coefficients of asymptotic dependence, conditional on , is defined via the parameters , and , and we denote the dependence parameters given by .

The distribution (7) could be applied to model for all pairs of sites . However, this may not be optimal. For , may vary strongly, depending on whether sites and belong to adjacent clusters or whether there are multiple clusters along the path between the two sites. Such differences can probably not be captured by the single parameter . We thus consider for adjacent pairs of sites only. In case of the sites being point locations, we first derive the Voronoi partition of the study area and then define sites as being adjacent if their corresponding Voronoi cells are adjacent.

4 Bayesian Inference

4.1 Introduction

We use Bayesian inference for the number of clusters,

, the latent variables , and the marginal and dependence structure parameters and using the declustered data . The posterior, and the algorithm to sample from it, are developed in this section. The derivation of the marginal and dependence structure likelihood contributions and , given and , are provided in Section 4.2. Critically, the data that are used to analyze the marginal and dependence structures are different, in that for marginal distributions we use all marginal exceedances of a threshold at all sites, whereas for the dependence model, only ranks of the variables are used. These two forms of the data are only weakly dependent. Furthermore, Genest et al. (1995) and Genest and Segers (2009) have shown that inference for dependence parameters is largely unaffected by marginal parameter estimation. Therefore we use the following approximation to the joint likelihood, the independent likelihood


Section 4.3 presents our priors, including a spatial prior for given and a prior for . Section 4.4 details our algorithm to sample from the posterior distribution, and Section 4.5 outlines the analysis of the posterior samples and the estimation of the underlying cluster structure.

4.2 Likelihood Components

4.2.1 Marginal component

If the peaks-over threshold data were independent over all sites the likelihood function for , conditional on , the data and thresholds ) would be


However, spatial independence is not a valid assumption for our motivating problems since severe weather events usually affect a number of sites (municipalities/gauges). Therefore, the likelihood function in expression (9) corresponds to a misspecified model. Under suitable regularity conditions, Kent (1982) shows that a general theory for the asymptotic distribution of the maximum likelihood estimate based on the misspecified likelihood (9) is


where are the true model parameters, denotes the Fisher information, and refers to the -th derivative. The limiting variance in (10) is different from the classic Fisher information if the model was misspecified, but that if we have no spatial dependence, and then the classic asymptotic result is obtained. Inference under the misspecified model, when there is spatial dependence would underestimate the variance of the estimator

. With respect to our Bayesian framework, this would correspond to credible intervals of the parameters being too narrow.

We follow an approach of Ribatet et al. (2012), who propose an adjustment of the curvature of the likelihood (9) around its mode using the asymptotic behaviour in expression (10). The adjusted likelihood is


where the matrix depends on and is


with denoting the matrix square root. To compute , the evaluation of and matrices uses the observed information matrix and an estimate for the covariance matrix of the score function respectively, both evaluated with . Note, is a block diagonal matrix consisting of lots of blocks, each block corresponds to one cluster (i.e. for the -th block this corresponds to the terms for and ); this feature enables efficient computation of .

Using this adjusted likelihood has a number of key properties: not changing the maximum likelihood estimate relative to using likelihood , suitably inflating the variances to be consistent with (10) when there is spatial dependence, and leaving the inference unchanged from using likelihood in the absence of spatial dependence. We therefore take our likelihood contribution for the marginal parameters to be .

4.2.2 Dependence component

With defined as in expression (3), we assume that there exists a value such that , i.e., it is equal to its limit form, for all and for all pairs of sites . Techniques for the selection of , for a pair , are available (de Haan and de Ronde, 1998; Wan and Davis, 2019). Our data for estimating are derived from exceedances of the 100% quantiles at sites and , as determined by their respective empirical distribution estimates and . First let , be the set of times when there is both an exceedance of the quantile threshold at site and the data for site are available, and let denote the cardinality of this set. Further let


Then it follows that is distributed as


In the absence of knowledge of our model in Section 3.3 for over sites, the estimate for is given by


which, if there were no missing data, is simply the proportion of the exceedances of the 100% for site that also exceed this quantile threshold for site . So estimator (15) is an extension of the standard estimator for . Note that due to the missing data this estimator has the property that does not necessarily equal even though .

By combining (14) and the information about , from our cluster model (7), and then integrating over , we obtain that

follows a beta-binomial distribution of the form


Following the discussion in Section 3.3 this model is assumed to hold only for pairs of sites which are adjacent. We let denote the distinct pairs of sites which are adjacent. Denote the density function of the beta-binomial distribution in (16) by , then for a pair of adjacent sites the likelihood contribution to is

as critically we have two estimates for which contain almost exactly the same information, each of equal value, and so we weight both observations by .

Under an assumption of independence of distinct pairs, the likelihood function for the spatial dependence model is then


This likelihood is misspecified since are not independently distributed; for instance, when there is no missing data, if and are very large, then cannot be small. We could use the Ribatet et al. (2012) adjustment again but we opted against it due to the following reasons. Firstly, in our data examples, the values for are not so close to as that the effect is very strong. Secondly, we are not directly interested in the marginal posterior distributions of and as for our posterior, given and , and are independent. Finally, the observed information matrix which we would require for the curvature adjustment is dense due to the parameter being present in all terms of the likelihood function and of dimension . Therefore, we would have to invert a potentially high-dimensional matrix, which leads to a substantial computational cost, in particular, since we have to compute this matrix many times.

4.3 Priors

To perform Bayesian inference, we specify priors for the number of clusters , the cluster labels , and the model parameters and . Since , we define and set a weakly informative Gamma prior for , .

We wish to impose that clusters are contiguous; otherwise, two sites which are far apart may be grouped together, despite the probability of them jointly facing an extreme event being close to zero. Our prior is similar to Knorr-Held and Raßer (2000) and only gives positive mass to contiguous clusters. The idea is to represent the spatial structure of the clusters via a set of centres , if ; corresponds to being the centre of the -th cluster. Each site is assigned to the closest cluster centre in terms of  (), i.e., we take


To ensure that is well-defined, the site is allocated to the cluster with lowest index if multiple cluster centres have minimum distance to the site. Relationship (18) implies that we can assign a prior for via one for . We choose a uniform prior with

We conclude by assigning priors for the parameters and . For the GPD parameters in (2), a log-normal prior is defined for the scale parameter, , while a normal prior is set for the shape parameter,  (). The priors for the parameters describing the spatial dependence are defined as exponentially distributed. Specifically, and . The exponential prior for

represents a prior preference to small spatial differences in the extremal dependence. To complete the model setup, we specify independent conjugate priors for the hyperparameters:

, , and .

4.4 Implementation

We wish to sample from the posterior distribution defined by the likelihood function (8) and the prior distributions in Sections 4.3. Since the dimension of the parameter space changes with the number of clusters, we use a reversible jump MCMC algorithm (RJMCMC) (Green, 1995). Given a current sample with clusters, we propose one of the following seven moves:


Introduce a new cluster centre with parameters , and .


Remove one of the existing cluster centres . Sites previously allocated to the removed cluster are assigned to the other clusters according to (18).


Move one of the cluster centres to a nearby site which is not a cluster centre and update the cluster labels according to (18).


Update the scale parameters of the GPD in (2).


Update the shape parameters of the GPD in (2).


Update , and in (16).


Update the hyperparameters , and .

The birth and death move are comparable to the split and merge moves defined in other Bayesian clustering approaches, see Bottolo et al. (2003) for instance, but they potentially affect more than one cluster. For the examples in Sections 5 and 6, birth, death and shift are proposed with probability 0.2 each while the remaining four moves are each proposed with probability 0.1.

We briefly describe some features of our implementation and more details are provided in Appendix A. For a birth move, the new cluster centre is uniformly sampled from one of the sites which are not currently cluster centres. In addition to , the index at which to insert

in the vector

is sampled with equal probability. The proposal distributions for the cluster parameters , and are close to the priors in Section 4.3 but some spatial information is incorporated; in particular, the mean of the proposal distribution is set to a average of the current parameter values of the sites allocated to the new cluster, while the variance of the proposal distribution is set to the prior variance. A death move ensures reversibility and one of the existing cluster centres is removed with equal probability. If , the death move is rejected immediately. A shift move reallocates an existing cluster centre to one of the adjacent sites which are not currently cluster centres; this changes neither the cluster parameters nor the indexing of the cluster centres. Note, the matrix in expression (11) has to be updated in the case of a birth, death or shift move; due to being a block diagonal matrix, we only have to update the blocks of the clusters which are affected by the proposal. The model parameters in and are updated via a Metropolis-within-Gibbs algorithm, while we sample from the corresponding full conditional distributions when updating the hyperparameters.

4.5 Analysis of the posterior samples

Interest lies in the marginal distributions of the peaks-over threshold and the underlying spatial cluster structure. Suppose that we generated posterior samples as described in Section 4.4 and let be the cluster label for site in the -th sample (

). To obtain a point estimate for the cluster structure, we first derive the posterior probability of sites

and  () being in the same cluster. This gives with entry


We estimate the cluster structure using the Bayesian decision theoretical approach by Wade and Ghahramani (2018), provided in the mcclust.ext R package (Wade, 2015). They derive a point estimate for the cluster structure based on the variation of information criterion (Meilǎ, 2007) which was originally introduced for comparing clusterings.

The marginal distributions can be estimated by Monte Carlo integration. For instance, the posterior mean estimate for is


where is the cluster-specific scale parameter for cluster

. Similarly, we obtain credible intervals for the GPD scale and shape parameters. We refer to this approach as site-wise Monte Carlo integration (SWMC); this site-wise estimation is usually done in Bayesian cluster analysis.

For illustration and comparison purposes, we consider a second approach to estimate the marginal distributions based on the posterior samples; we later apply it in Section 6. The estimate first imposes the spatial clusters and then estimates the GPD cluster parameters using the likelihood in (11). We fix the clusters as the point estimate above and assume sites within a cluster to have the same scale and shape parameter (as it is done in many regional hazard models). Let denote the estimated number of clusters and let be such that contains the sites allocated to the -th cluster in the point estimate. We then consider the posterior , where represents . Posterior samples are generated using our implementation in Section 4.4 with birth, death and shift being proposed with zero probability. The cluster-specific parameters for cluster  () are then estimated by Monte Carlo integration with

where is the sampled number of parameter values for . Then, and for each site in cluster . We refer to this approach as cluster-wise Monte Carlo integration (CWMC).

Return levels are an important quantity to characterize extremal behaviour. The -year return level is the value which is exceeded on average once every years. For , the -year return level of is


where is the expected number of times exceeds per year. We obtain posterior median return levels similarly to the GPD parameters in (20).

5 Simulation Examples

Figure 3: Map of areal units in Section 4.

We consider the areal units in Figure 3 which correspond to a set of municipalities in south-east Norway. The distance  () is computed using the coordinates of the municipalities’ centroids and by accounting for the earth’s curvature. Distances are standardized such that . The marginal distributions are set to  (); we set and thus obtain 100 threshold excesses per site for analysis. Data for the extremal dependence are simulated independently from the threshold excesses. We fix  () and sample from the beta-binomial distribution (14). Posteriors for the spatial clusters and model parameters herein are based on running the RJMCMC algorithm in Section 4.4 for iterations with the initial number of clusters set to ; the first iterations are discarded as burn-in and then every 100-th sample is stored for analysis.

Our methodology aims at identifying the spatial cluster structure and estimating the marginal distributions. The simulation study illustrates the performance of our approach with respect to these key aspects. Studies 1 and 2 consider the case of all sites having the same marginal distribution and common pairwise extremal dependence structure, i.e. . The studies differ in that the data are independent (strongly correlated) in Study 1 (Study 2). By investigating the case , we check whether our approach tends to the parsimonious estimate or the extreme case , i.e., each site forms its own cluster. Study 3 then considers the case of clusters with the cluster formulation shown in Figure 3, with spatially independent and dependent data in each cluster.

Study 1: The marginal GPD parameters are and  (), and the spatial dependence parameters are and ; we do not specify since . The posterior has for 98% of the sample and the point estimate for the cluster structure allocates all sites to the same cluster. The 90% credible intervals for the site-wise GPD parameters and are and respectively for all sites. To illustrate the benefits of pooling information spatially, we estimate the GPD parameters for the municipality with index (Fredrikstad) only using the simulated threshold excesses for this site via a random walk Metropolis algorithm; this gives 90% credible intervals of and for and respectively. Consequently, our approach correctly identifies the underlying spatial cluster structure and improves estimates for the marginal distributions in this case.

Study 2: For the data as generated in Study 1, we induce strong spatial correlation by matching ranks across sites, i.e., the -th highest observation () occurs simultaneously at all sites. The posterior probability of is 0.92 and we again recover the underlying cluster structure. The credible intervals for and  () are and , and thus wider than in Study 1, as expected given there is less information in the pooled data than in Study 1 due to the strong correlation. Furthermore, the width of the credible intervals is very similar to the ones when estimating and solely based on the observations for site in Study 1; so we obtain almost no additional information due to the strong spatial dependence, apart from the centre of the credible interval being closer to the true value. The latter feature arises as our method pools information across the cluster while only data from site 4 is used to estimate .

Figure 4: Empirical estimate for the coefficients of asymptotic dependence (left) and the posterior probability of the areal units being in the same cluster (right).

Study 3: We simulate threshold excesses independently across sites with the cluster-specific GPD parameters and . The dependence parameters are , and ; Figure 4 left panel shows the empirical estimates  () for . The matrix , given by expression (19), shown in Figure 4 right panel indicates that clusters are present; has the form of a block diagonal matrix with each block only containing entries of high probability. One may conclude that, for instance, the sites 1-5 form a cluster; this would be correct given Figure 3. Indeed, the point estimate for the spatial clusters is identical to the true cluster formulation in Figure 3. The 90% central credible interval for the number of clusters is (2,7), and the true marginal GPD parameters lie within the estimated 90% central credible interval for all sites. We then generate dependent data using a Gaussian copula and the GPD marginals above. Again, the spatial cluster structure is correctly identified and the spatial dependence leads to a higher uncertainty on the site-wise GPD parameters (results not shown).

6 Data analysis

We now apply our methodology to analyze the data described in Section 2. After an appropriate burn-in period, we perform iterations of the RJMCMC sampling scheme in Section 4.4, with every 1000-th sample being stored. Initially, 10% of the sites are cluster centres. The acceptance probabilities for birth (and death) were 0.02 and 0.05 for the data in Sections 6.1 and 6.2 respectively. Our C++ implementation took about 40 minutes per iterations on a standard laptop for the larger data set in Section 6.1.

6.1 Daily precipitation in South Norway

Since an extreme event may be split across two days, we consider the aggregated amount of precipitation over a 48-hour period. Further, we decluster by using only the highest observation per week because consecutive observations are strongly correlated; this gives about 500 observations per municipality. We then select a threshold  () for which approximately follows a GPD. Analysis of the threshold stability plots suggests to set to the empirical 92.5% quantile; this gives about 40 peaks-over threshold per municipality. The empirical measures and  () are derived based on the dependence threshold .

Figure 5: Trace plot (left) and posterior mass function (right) of the number of clusters for the precipitation data in Section 6.1.
Figure 6: Point estimate of the spatial cluster structure for the Norwegian precipitation data in Section 6.1: when using peaks-over threshold and extremal dependence (left) and when only using peaks-over threshold (right). The municipalities Fredrikstad (F), Lillehammer (L), Stavanger (S) and Trondheim (T) are highlighted in the left panel.

Figure 5 left panel shows appropriate convergence and mixing of the sampled Markov chain for the number of clusters. The posterior mean number of clusters is and the posterior mass function of is shown in Figure 5 right panel; the central 80% credible interval is . The point-estimate for the spatial cluster structure comprises 30 clusters which are illustrated in Figure 6 left panel. Most clusters contain between 7 and 16 municipalities; two clusters located in western and southern Norway contain only two municipalities while the cluster in the south-east includes more than 40, mostly small, areal units. The derived spatial clusters agree with known climatology. Clusters along the west coast regularly observe very high amounts of precipitation which are often related to the Gulf Stream. We also find that the drier municipalities in central Norway are grouped together.

To investigate the effect of exploiting information on both marginal tail behaviour and extremal dependence, we consider a second spatial clustering algorithm with likelihood function , i.e., clusters are solely based on threshold exceedances while spatial dependence is ignored, except via the adjustment (11). The sampled number of clusters is higher than for our approach; the posterior mean for is 39 with 80% credible interval (33,45). Figure 6 right panel shows the point-estimate for the spatial clusters; seven clusters contain five or less municipalities while three clusters include more than 20 municipalities. The highest observations per municipality for the most northern cluster in Figure 6 ranges from 57.3mm through to 174.2mm in comparison to (94.5mm, 174.2mm) for our approach. As such, our proposal to incorporate both marginal distributions and extremal dependence seems to produce more realistic clusters than the considered alternative.

Method Fredrikstad Lillehammer Stavanger Trondheim
25 (i) 72 (60, 99) 74 (64, 105) 96  (81, 133) 94 (78, 135)
(ii) 72 (65, 82) 82 (68, 99) 103 (87, 127) 90 (79, 106)
(iii) 69 (63, 75) 81 (68, 97) 99 (83, 130) 91 (80, 108)
100 (i) 87 (68, 143) 81 (66, 130) 125 (90, 191) 112 (87, 188)
(ii) 82 (73, 101) 98 (78, 126) 131 (101, 172) 112 (94, 143)
(iii) 80 (71, 86) 95 (78, 120) 123 (95, 191) 114 (95, 149)
Table 1: Posterior median (central 90% credible interval) of the -year return level in mm for four municipalities with . The considered methods are (i) Individual estimation for each municipality (ii) SWMC and (iii) CWMC.

We conclude the analysis by estimating return levels for the four municipalities highlighted in Figure 6: Fredrikstad (F), Lillehammer (L), Stavanger (S) and Trondheim (T). The selected municipalities differ in both climate and clustering; Fredikstad and Trondheim lie in large clusters while Stavanger and Lillehammer are members of relatively small clusters. Since is the 92.5% quantile, the average number of exceedances per year in (21) is . We consider three approaches to estimate return levels. For the first estimate, we obtain return level estimates individually using only the observed peaks-over threshold of the municipality, i.e., estimates are based on the 40 observed threshold excesses for the municipality. Our other estimates are based on SWMC and CWMC as described in Section 4.5.

Table 1 shows that SWMC and CWMC provide shorter credible intervals for the considered municipalities than method (i); the central 90% credible interval for both method (ii) and (iii) always lies within the credible interval of method (i). The credible intervals obtained by SWMC and CWMC are very similar for Lillehammer and Trondheim; this indicates that our point estimate for the cluster structure groups municipalities which have indeed very similar marginal distributions. We further see that CWMC produces shorter credible intervals than SWMC for Fredrikstad; this arises as the SWMC approach accounts for cluster uncertainty. In the CWMC method, the cluster containing Fredrikstad contains many municipalities and spatial variation in the GPD parameters is oversmoothed due to the cluster structure being fixed; this can be seen in Table 1 since the credible interval for method (iii) does not contain the posterior median derived by method (i). The SWMC approach, on the other hand, performs better since the municipalities within this cluster are regularly split across smaller clusters; this allows for a good exploration of the parameter space. Finally, the CWMC credible interval is similar to method (i) for Stavanger while SWMC gives a narrower credible interval. The cluster containing Stavanger is relatively small in size and an extreme event usually affects most municipalities within the cluster. Due to this strong spatial correlation, we obtain little additional information by pooling information across the fixed cluster. Conversely, our SWMC approach efficiently pools information from a larger set of municipalities because the cluster structure is allowed to change.

6.2 Daily river flow in the UK

The data described in Section 2.2 exhibit a strong seasonal pattern for all gauges. We consider separately the maximum weekly river flow for November-March and May-September for which in each case the assumption of stationarity seems reasonable. The observations in each season are then standardized site-wise to mean 0 and variance 1; while this affects the scale parameter of the GPD in (1), it is well known that this leaves the shape parameter unchanged.

A common threshold across all gauges, , is selected individually for the two seasons, giving between 25 and 40 peaks-over threshold per season for most sites. The threshold in (15) is set to for November-March and for May-September. This gives (25) for most pairs of sites  () over the summer (winter) seasons. Here, the distance between sites is set to their hydrological distance (Section 2) which we scale such that .

Figure 7: Posterior mass functions of the number of clusters for November-March (left) and May-September (right).

The posterior mass functions for the number of clusters for the two seasons in Figure 7 indicate that the number of clusters is higher for May-September than for November-March; the 80% credible intervals are (5,9) and (3,8). This result agrees with known climatology. Extreme river flows in winter in the UK are often caused by extratropical cyclones which affect larger areas; we thus expect larger clusters for November-March than for May-September.

Figure 8: Point estimates for the underlying spatial cluster structure for November-March (top) and May-September (bottom). The left plots show the cluster allocation on the space of hydrological coordinates. A line between two sites indicates that they are considered adjacent and the line width corresponds to the posterior probability of them being in the same cluster. The right plots illustrate the derived clusters with respect to their latitude and longitude coordinates. The solid black lines are the boundaries of the metropolitan and non-metropolitan counties, and the highlighted gauges are Kirkby Stephen (K) and Marple Bridge (M).

Figure 8 left panels show the imposed adjacency structure and the derived seasonal clusters with respect to the hydrological coordinates of the gauges. The width of the lines in the plots further illustrates that some gauges, which are allocated to different clusters in the point estimate, are also often grouped in the same cluster; see, for instance, the central and southern cluster for November-March. We see an interesting match between the point estimates of the underlying cluster structure for the two seasons. The southern cluster for November-March splits into two equally-sized clusters in May-September. Similarly, the other large cluster (green) for the winter season splits into two clusters in summer.

We then consider the geographical locations in Figure 8 right panels for further analysis. Beyond the seasonal pattern of the clusters, it is also interesting to investigate how these compare to the underlying river networks; the estimated clusters are in fact not identical to the river networks. For instance, we have five gauges for the River Tyne in North-East England but only four of these are allocated to the same cluster (the red cluster for the summer season). Finally, we note that some clusters are split along administrative boundaries, in particular, in winter.

Model Kirkby Stephen Marple Bridge
Winter Summer Winter Summer
100 (i) 107 (85, 164) 45 (36, 70) 70 (55, 111) 34 (26, 52)
(ii) 98 (84, 131) 45 (37, 57) 65 (55, 92) 36 (32, 43)
(iii) 110 (86, 153) 49 (43, 60) 60 (55, 69) 36 (32, 43)
500 (i) 138 (99, 269) 56 (41, 108) 95 (67, 196) 45 (31, 90)
(ii) 128 (103, 200) 57 (43, 80) 84 (67, 148) 47 (39, 64)
(iii) 157 (110, 266) 63 (52, 85) 77 (67, 94) 47 (37, 66)
Table 2: Posterior median (central 90% credible interval) of the -year return level in m/s for two gauges with . The considered methods are (i) Individual estimation for each municipality (ii) SWMC and (iii) CWMC.

To conclude, we estimate return levels for for the two gauges highlighted in Figure 8 top-right panel: Kirkby Stephen (K) and Marple Bridge (M). Kirkby Stephen was selected because multiple of its adjacent sites are allocated to different clusters, while Maple Bridge lies centrally in the southern cluster in winter and at the edge of the most southern cluster in the summer season. We consider the same inference approaches as in Section 6.1. Table 2 shows that SWMC again produces narrower credible intervals than model (i). Furthermore, the weaknesses of the CWMC approach described in Section 6.1 affect the estimates strongly, in particular, for Marple Bridge and winter.

7 Discussion

We introduced a Bayesian clustering approach which groups geographical sites based on their marginal tail behaviour and their extremal dependence. The likelihood for the peaks-over threshold accounts for the spatial dependence usually found in hydrological applications. Our model for the extremal dependence postulates that sites within the same cluster exhibit higher pairwise extremal dependence than sites in different clusters, and that the degree of dependence decreases with an increasing distance between sites. Clusters are represented by their centre, which imposes them to be contiguous and leads to a good computational performance. Samples from the posterior distribution were obtained using a reversible jump MCMC algorithm. A point-estimate for the spatial cluster structure was derived from the pairwise posterior probabilities of sites being in the same cluster using Bayesian decision theory.

We applied our approach to analyze precipitation levels across South Norway and the derived clusters agree with climatology; these clusters will be used to analyze the association between weather events and property insurance claims. The cluster approach was further applied to river flow data in the UK. We found that clusters are not identical to the river networks and that the spatial extent of extreme river flow levels is larger over winter than summer. The results also showed that our approach efficiently pools spatial information to impoove return level estimates.

There are various ways to extend the model presented in this paper. Firstly, we model extremal dependence of a pair of adjacent sites in different clusters via the single parameter . Instead of being constant in (5), may be defined as a function of the cluster-specific parameters. Consider a pair of adjacent sites with and , . High values of and may then imply a high value for . Another possible extension is the consideration of temporal variations in the distribution of the peaks-over threshold and/or the extremal dependence. Such an extension should then also allow for a potential change of the spatial cluster structure across seasons; the results for the UK river flow data indicate the presence of such a temporal variation in the number of clusters.


Christian Rohrbeck is beneficiary of an AXA Research Fund postdoctoral grant. We gratefully acknowledge funding from the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) project "Statistical Estimation and Detection of Extreme Hot Spots with Environmental and Ecological Applications" (Award No. OSR-2017-CRG6-3434.02). We thank Rob Lamb, Raphël Huser and Daniel Cooley for helpful comments and suggestions, and Ida Scheel and Ross Towe for providing access to the Norwegian rainfall data and UK river flow data respectively.

Appendix A Details of the reversible jump MCMC algorithm

Birth and death

Suppose we currently have clusters with centres and parameters . In a birth, we first uniformly sample a new cluster centre together with the index at which to insert it in the set ; the probabilities are and  (). For the new set of cluster centres, , we then derive the cluster labels according to (18).

To complete the proposal, we sample the parameters of the new cluster. The mean of each proposal distribution is set to the current average value of that parameter across the sites allocated to the new cluster, while the variance of the proposal is set to the one of the corresponding prior. Let denote the sites allocated to the new cluster. The proposal

is sampled from a normal distribution with

The parameter

is sampled from a log-normal distribution with

The proposal varies with respect to the current number of clusters. If , is sampled with