## 1 Introduction

Large georeferenced datasets, often based on remote sensing or reanalysis of smaller-sized observation data, have become abundant in the domains of climate, environment, and ecology. However, the spatial and temporal resolution in such datasets may not be fine enough for certain purposes. Moreover, gaps may arise in such datasets when sensors are defective or cannot provide useful data, for instance, due to cloud occlusion or orbital characteristics when using satellite-based instruments.
In this setting of gap filling and prediction of partially observed data in the Earth and atmospheric sciences (*e.g.*, Yuan2011; Henn2013; Mariethoz2012; Wang2012; Henn2013; C2017; Xing2017; Yin2017; Padhee2019), we aim to provide probabilistic predictions of summary statistics of the data process for a space-time domain, and in particular to focus on episodes with simultaneous extreme values of a continuous variable such as temperature. By taking into account the spatial and temporal dependence, we seek inference for summary functionals of space-time hot-spots, such as the minimum value of the process over a spatial region during a certain period of time.

Our work was motivated by the 2019 Extreme Value Analysis (EVA) data challenge to which we participated with our team named *BeatTheHeat*. Its goal was to predict the minimum of anomalies of sea surface temperature (SST) of the Red Sea over regions where data have been artificially masked, and predictions for space-time cylinders with a spatial radius of km and duration of days had to be provided.
The Red Sea is home to over species of coral and species of fish, some of them endemic (spalding2001world; www.fishbase.in).
This rich ecosystem is being threatened by global warming.
The 2014 International Panel on Climate Change report indicates that the average SST of the Indian Ocean has increased by C over the period 1950-2009 (HoeghGuldberg2014climate). Summertime SSTs for the past decade have remained, on average, C above the historical mean (cantin2010ocean).
Coral reefs are particularly sensitive to modest increases in background seawater temperature; a continued increase of C or more above the historical maximum SST can result in substantial coral damage (cantin2010ocean).

There is a need to understand the spatio-temporal dynamics governing SST, and in particular their extremes, in the Red Sea. In this work, we consider temperature anomalies that were constructed by subtracting a spatio-temporal mean from gridded daily SST observations provided by the Operational SST and Sea Ice Analysis (OSTIA; donlon2012operational), a satellite-based data system designed for numerical weather prediction. The data encompasses years (1985–2015) of daily simulations over locations in the Red Sea.

Geostatistics provides a wide range of models to describe and predict spatio-temporal phenomena. Historical and current developments are usually centered around Gaussian processes, which have appealing theoretical and computational properties (cressie2015statistics). Extreme-value analysis provides more specific tools to model extremes in space and time, although they usually pose a greater computational challenge. In this work, we combine both disciplines to predict high-temperature anomalies over space and time in the Red Sea.

Due to the large size of the dataset, care is needed to develop inference procedures that scale well with large datasets. We propose to model the marginal effects and dependence structure separately following a two-step approach. First, we fit a model for marginal distributions and use it to transform data to a common standard Gaussian scale. Then, stationary Gaussian process models are used locally in time to capture spatio-temporal dependence.

To describe non-stationary marginal anomalies at each site in the first step, we start by removing long-term temporal trends at a yearly scale, assumed to be stationary over space, by correcting data using empirical means and variances. Subsequently, we assume temporal stationarity, and we seek to improve the upper tail representation through a model for data above a stationary threshold close to the global

-quantile. We estimate a tail model with space-varying parameters based on a Poisson regression for the rate of threshold exceedances, and on a generalized Pareto (GP) model for the excesses above the threshold. For these marginal models, we compare three approaches that allow generating relatively smooth parameter surfaces. First, we consider the data without marginal transformation as a benchmark to assess the added value of correcting the tail. Second, we rely on the generalized additive modeling (GAM) framework

(Wood2003; Wood2006) where we use the latitude, longitude, and distance to the coastline as covariates to describe the spatial pattern in the tails. This framework allows for flexible modeling of covariate effects and has previously been used in univariate and multivariate extreme-value settings to capture non-stationarity in environmental data (Pauli2001; chavez_davison; Jonathan2014; Mhalla_DC_CD). Third, we implement spatial pooling of data using a nearest-neighbor (NN) approach to infer the tail behavior at each location.SSTs in the Red Sea are known to change smoothly in space and time (chaidez2017decadal). Therefore, in the second step of our approach, we describe the space-time dependence between Gaussian-transformed SST anomalies by a spatio-temporal stationary Gaussian term characterized by a stationary Matérn covariance structure in space and first-order temporal autoregression. For fast likelihood-based inference and probabilistic prediction with large gridded datasets, we use the stochastic partial differential equations (SPDE) approach. It provides a numerically convenient Gauss–Markov approximation to the Matérn covariance (lindgren.al.2011), and we combine it with the computationally-efficient integrated nested Laplace approximation (INLA; rue2009approximate; Rue.al.2017; Opitz.2017b; Opitz.al.2018; castro2019spliced

) for Bayesian inference.

A numerical comparison of the performances of our three models with different marginal pretransformations (none, GAM, NN) is made using the tail-weighted continuous ranked probability score (twCRPS)

(Gneiting2011). Our models perform better than all other contributions to the EVA challenge, and the tail correction provides a significant improvement over models without marginal transformation.The remainder of the paper is organized as follows. Section 2 presents the data through exploratory analyses and graphical illustrations. Section 3 describes our modeling approaches, for which results and prediction performances are discussed in Section 4. Conclusions and an outlook towards further improvements in our model are given in Section 5.

## 2 Red Sea surface temperature data

We describe the main features of the SST anomalies and provide exploratory graphical support for our modeling choices.

### 2.1 Description of data and space-time hotspots

The Sea Ice Analysis (OSTIA) system produces daily SST data in the Red Sea at a spatial resolution of 1/20°. We consider the period between January 1985 and December 2015. We denote by the Red Sea surface temperature at location and time , where and . Our analysis focuses on anomalies where

and is a space-time mean surface; see huser2020eva for details on its estimation based on averages for each pixel-month combination. Anomalies remove recurrent local climatic trends and allow for easier interpretation of short-term variability and large-scale anomalous behavior with respect to what we observe on average at a given site and time of the year. Large-scale extreme temperatures are likely to negatively affect the environmental and ecological systems in the Red Sea, and we are interested in measuring the impact of extreme anomalies occurring jointly in a specific space-time region.

While the original dataset does not contain any gaps with missing data, such gaps have been inserted artificially in the dataset to compare predictive approaches; see huser2020eva. During each month, the data for a randomly selected subdomain of the Red Sea were removed. Using the remaining observations, our goal is to predict a summary functional of space-time subregions contained in these gaps, *i.e.*, where no observations are available.
Specifically, for each pair , we consider the following cylindrical neighborhood

where is a ball centered at with radius km. We are interested in predicting the distribution of summary functionals describing *extreme space-time hot-spots*, by using the minimum summary:

For the EVA competition, the validation dataset of observations to be predicted, , , consists of distinct time values , and points for each of these time points. Our aim is to successfully predict these summaries with a focus on extreme quantiles of , which characterize extreme hot-spots of SST anomalies.

The Red Sea spans a relatively large region on the globe, such that it is not straightforward to choose a sound D coordinate system for models that take into account spatial distances. While 1 degree of latitude corresponds to km, we find that 1 degree of longitude corresponds to km at the northern limit and to km at the southern limit of the Red Sea. We here switch to a metric system with km unit to avoid artificial anisotropy effects as best as possible, by multiplying latitude degree values with and longitude degree values with the average between and .

### 2.2 Exploratory analysis

Despite the preliminary removal of mean trends according to location and month, strong nonstationarity remains in the data , particularly in the upper tail. To illustrate this, Figure 1 shows maps of annual average anomalies in the Red Sea for three different years using the same colour scale, highlighting the fact that average anomalies vary strongly between years. We can also see a mild spatial effect in the 2010 map, but overall, the temporal effect seems to drive the nonstationarity observed in the average anomalies.

To further illustrate the behavior of SST anomalies through years, we compute the annual average anomalies along with their standard deviations for every location in each year. These location and scale statistics can be treated as functional data over the years (one function per location), and displayed using the functional boxplots (sun2011functional) in Figure 2. To highlight the effect of the years at each location, we also plot the overall statistics by year (*i.e.*, average and standard deviation over all locations for every fixed year). We can see that there is an upward annual trend over the averages, and a mild downward annual trend affecting the location-wise standard deviations of the SST anomalies.

To unveil spatial nonstationarity in the tail of the SST anomalies, we characterize the distribution of threshold exceedances defined as those SST anomalies exceeding , corresponding to an overall exceedance probability of approximately , and used later for tail modeling.
Figure 3 shows the location-wise empirical range of exceedances (*i.e.,* the difference between the maximum and minimum values) as well as the mean exceedance value, . We can see a clear North to South pattern in both maps, consistent with a strong spatial effect in the upper tail of the SST anomalies.

We conclude this section by studying bivariate summaries of extremal dependence. Specifically, we analyze two well-know pairwise measures of tail dependence based on conditional exceedance probabilities, namely

where is the distribution function at location and is the time series at location .
Together, and summarise the strength of the spatial tail dependence where characterizes *asymptotic dependence*, while may arise if in the case of *asymptotic independence* (). Figure 4 displays bootstrap estimates and confidence bands for and , where we fix at three high probabilities and consider pairs of locations separated by different spatial lags. For computational reasons and due to the strong spatial dependence in data, the estimation is conducted on a subset of the available locations, obtained by regular sampling of 1558 locations.
The bootstrap procedure relies on bootstrap samples drawn by resampling from the set of pairs of locations. The resulting estimates of point towards asymptotic independence as they strongly decrease with larger spatial lags and higher probabilities. The estimates of remain clearly below for all positive distances, which confirms asymptotic independence in data. They show significantly lower values for higher probabilities, which indicates particularly fast joint tail decay rates. Moreover, this analysis has been done without removing temporal trends in margins, such that even stronger evidence of relatively weak asymptotic independence can be expected after removing such trends.

## 3 Space-time predictive modeling

In this section, we describe our two-step approach to model trends in marginal distributions and local space-time dependence.

### 3.1 Marginal modeling

We consider three approaches to pretransforming data before fitting local Gaussian models.

#### 3.1.1 Model 1: no marginal transformations

As a reference model allowing us to assess the utility of marginal pretransformations, we implement an approach where the local Gaussian dependence model is fitted to raw anomalies without preliminary marginal transformations.

#### 3.1.2 Model 2: generalized additive regression

As discussed in Section 2.2, the SST anomalies exhibit a residual temporal trend in the mean and a spatial non-stationary pattern in the tails. We first remove the long-term time trends observed in Figure 2 by fitting a Gaussian location-scale regression model. Recall that we denote by the SST anomaly at location and time . Then, we assume the following model structure , where and

(1) |

where denotes the year of day and and are the average mean and standard deviation of the year . Owing to the large number of observations in the dataset as well as the spatial pooling of the data, this simple model specification should be able to capture the observed temporal non-stationarity while maintaining the inherent spatio-temporal structure of the large anomalies that we ought to predict.

To reduce the computational cost of fitting the model in (1

), we use a sub-sampling strategy by randomly selecting one site out of 50 and keeping all the observations over time. Our approach is sensible since original SST anomalies were obtained using a spatial multi-scale optimal interpolation scheme

(donlon2012operational).As the interest of this work is in the prediction of jointly large anomalies, we propose to improve the tail modeling of the standardized anomalies. More precisely, the SST anomalies are normalized, in a first step, to a standard Gaussian scale using the fitted model (1), *i.e.*,

(2) |

where are now supposed (approximately) stationary over time. Then, in a second step, we refine the model in the tails by considering a spatial Generalized Pareto (GP) model for the exceedances of a high threshold . Specifically, we assume a generalized additive model (GAM) where the scale and shape GP parameters are described by

(3) |

where are the latitude and longitude of the site , while is its distance to the coast (in meters). Within the GAM framework, the unknown functions and are represented using reduced rank smoothing splines, and more specifically, the reduced rank isotropic thin plate splines (Wood_book2017, Chapter 4). The smooth interaction terms and

are constructed using the tensor product construction where the main effects of

lat and lon are excluded (Wood2006). Using their respective fixed basis expansions, estimation of the unknown smooth functions boils down to inference on the coefficients of their model matrices. In other words, each smooth term can be written as where is the model matrix constructed by evaluating the (known) basis functions at the observed values of the covariates, andthe vector of its corresponding coefficients to be estimated. To avoid overfitting issues, each smooth function

is associated with a smoothing penalty term reflecting its curvature and defined as , where is a known positive definite matrix that depends on the basis functions. In the special setting of tensor products, the wiggliness of the smooth surface is controlled by two smoothing parameters associated with each marginal direction. Then, given a set of smoothing parameters , the estimated model coefficients are obtained by maximum penalized likelihood(4) |

where is the Generalized Pareto (GP) log-likelihood

and . The smoothing parameters are estimated iteratively using the restricted marginal likelihood approach (Wood2010; Wood2016). Precisely, for an intermediate estimate obtained through a Newton step based on (4) and for a fixed , Wood2016 propose to undertake a Bayesian view where is a posterior mode for and the estimate of the smoothing parameters is updated by maximizing a Laplace approximation of the log marginal likelihood of the GP model. Uncertainty measures for the model coefficient estimates include thus the uncertainty associated with the smoothing parameter estimation; see Wood2016 for details. This fitting procedure is implemented in the package mgcv (Wood_book2017) available in the R statistical software, and its application in an extreme-value setting is illustrated in Youngman2019.

Using an *ad-hoc* trial and error procedure, we choose a fixed threshold at the value for all sites and all time points. This value corresponds roughly to the 78.3% empirical quantile of the transformed data in (2). To cope with the computational demands of fitting the GAM model in (3) to all threshold exceedances, we keep all the sites but randomly select one excess out of ten.

Finally, to compensate for the choice of a stationary threshold, we characterize the spatial behavior of the frequency of extreme SST anomalies by modeling the exceedance probability . More specifically, we model the number of exceedances using a Poisson regression model with a spatially varying intensity following the same specification as in (3).

#### 3.1.3 Model 3: Spatial nearest-neighbour modeling

To compare the modeling of the tail behavior with smooth covariate effects described in Section 3.1.2, we propose an alternative local modeling approach. It exploits the large number of observations and captures the spatial pattern in the large anomalies of the normalized observations by modeling tail non-stationarity separately for each pixel. Therefore,we pool together the values of the nearest neighbor pixels and of the pixel itself. Based on this sample, we then calculate the pixel-specific likelihood estimates of the exceedance probability and of the generalized Pareto parameters and .

### 3.2 Local dependence modeling

For predicting nonlinear cluster functionals such as minima or maxima, it is important to provide an appropriate model for all multivariate distributions of the space-time process. In particular, relatively fast and robust geostatistical kriging approaches using only the mean and the variogram (cressie2015statistics) are not suitable. Moreover, the estimation of a dependence model for the full dataset may be impeded by the large size of the dataset, as in the case of the Red Sea data with more than million observations. Instead, we propose to use a local dependence model for subperiods, including the space-time sets over which we aim to predict cluster functionals. We propose using a stationary Gaussian space-time dependence model for such subperiods, applied either directly to the anomalies

, or after using a marginal model to transform margins to the standard Gaussian distribution. We then combine a latent variable approach with Gauss-Markov dependence structures in a Bayesian framework to cope with the high space-time resolution and to provide probabilistic predictions for cluster functionals.

As shown by Figure 4, there is strong evidence in favor of asymptotic independence for the Red Sea SST data. Therefore, classical limit models of dependence from extreme-value theory (max-stable processes or generalized Pareto processes, see Davison.Padoan.Ribatet.2012; Ferreira.deHaan.2014; Thibaud.Opitz.2016) would be inappropriate, in addition to being not easily tractable for prediction with large datasets. Gaussian dependence has already proved to be flexible for tails in some cases (Bortot.Tawn.2000).

A useful choice for modeling spatio-temporal dependence is to link spatial fields through temporal auto-correlation,
and in particular first-order autoregression, which leads to sparse spatio-temporal precision matrices (*i.e.*, inverse covariance matrices) in case of sparse spatial precision matrices.
We consider the following stationary space-time process defined for discrete time with regular time steps:

(5) |

where , are centered Gaussian random fields with Matérn covariance and is the mean parameter. The space-time precision matrix for the Cartesian product of a collection of sites and times corresponds to the Kronecker product of the corresponding purely spatial (Matérn) and purely temporal (AR(1)) precision matrices. In our local models, we use , where we have added one day before and after the periods of seven days length, over which we seek to predict cluster functionals such as minima. Using larger temporal buffers around the prediction period would be possible and could further improve the prediction performance in cases where the conditional independence of and given , as suggested by first-order autoregression, does not hold. For reasons of computational cost, we do not explore this possibility in our data application.

Using the stochastic partial differential equation (SPDE) approach developed by lindgren.al.2011, we can work with numerically convenient sparse precision matrices that approximate the Matérn covariance function. We shortly recall this framework and refer to Krainski.al.2018 for details. The SPDE is given by

(6) |

with the Laplace operator

, and a standard Gaussian white noise process

. Its unique stationary solution is a zero-mean Gaussian process with Matérn covariance function with shape , fixed to in the following. The marginal variance is , and the*empirical range*parameter is , indicating the distance where we observe a correlation of approximately . With a finite study region, appropriate boundary conditions ensure a unique solution, and we can set the boundary relatively far from the actual study region to render boundary effects negligible within the study area.

The approximate Gauss–Markov solution to (6) is given by constructing a discretization of space through a triangulation mesh, and by using the representation with “pyramid-shaped" basis functions , one for each mesh node. By solving the SPDE in the subspace spanned by the linear combination , we obtain with explicitly known precision matrix . The mesh used for the Red Sea in our data application is shown in Figure 5. It contains nodes, and we choose a fine resolution in the study area (within the inner blue boundary), while a coarser mesh is used in the extension area, which is sufficient to avoid boundary effects but without strongly increasing the number of nodes for the sake of keeping the computational cost as low as possible. With and the intercept , each local model counts latent Gaussian variables.

### 3.3 Bayesian inference using the integrated nested Laplace approximation

We represent our local dependence model, detailed in Section 3.2, as a latent Gaussian model with Gaussian likelihood as follows. We denote by the standard Gaussian observations obtained after transforming the raw anomalies using one of the marginal approaches described in Section 3. Precisely, using on the probability integral transform and the fitted tail parameters, we proceed with the following transformation

where is the standard Gaussian distribution function and

and the location-wise empirical distribution, to achieve an improved normalization of the very large anomalies in the dataset. Then, for each , we define the space-time linear predictor

according to (3.2) with intercept

. We assume a centered Gaussian prior for the intercept, such that the joint distribution of

is also Gaussian. By assuming that observations are conditionally independent given the latent Gaussian random vectorand a vector of hyperparameters

, we use the following hierarchical structure:(7) |

where is the Gaussian density with mean and variance , is the mean vector and is the precision matrix of .
The representation in (3.3) corresponds to a latent Gaussian model (LGM; see, *e.g.*, Rue.al.2017) where observations follow a Gaussian distribution with mean process and a measurement error of variance . The measurement error may not be necessary from a modeling stance when data are very smooth, but its presence in the model is required for implementing LGM-based inference. For smooth data, it is typically estimated to be very small, such that it becomes negligible in practice. Approximate Bayesian inference for LGMs is available through the integrated nested Laplace approximation approach (INLA; rue2009approximate) and its implementation, R-INLA (bivand2015spatial) in the R statistical software. With INLA, posterior quantities of interest (that usually involve the evaluation of high-dimensional integrals) are numerically approximated using variants of the classical Laplace approximation (Tierney.Kadane.1986). We use a recent version of INLA calling the PARDISO library for fast numerical matrix computations (pardiso-6.0a; pardiso-6.0b; pardiso-6.0c; van2019new). INLA leverages modern numerical techniques for sparse matrices, providing fast and accurate inference for LGMs with many observations and latent variables such as in our case.

Regarding prior distributions, we choose a Gaussian distribution with precision for the mean parameter in (3.2), which provides a moderately informative prior. The hyperparameters and control the Gaussian likelihood and the latent Gaussian vector , respectively. Although non-informative priors are a common choice when little expert knowledge is available, we here select moderately informative prior distributions using the penalized complexity (PC) prior approach (Simpson.al.2017). This procedure penalizes excessively complex models at a constant rate by putting an exponential prior on a distance to a simpler baseline model (*e.g.*, using the baseline for the measurement error variance), which helps to stabilize the estimation algorithm by facilitating the convergence of Laplace approximations.
Priors then shrink model components toward their base models, thus preventing overfitting.
For the precision of observations, we set an informative prior, such that the probability of observing a standard deviation larger than 0.1 is . For the Matérn covariance, we set a prior distribution where the probability that the range is smaller than 500km and that the variance larger than is in each case. Finally, an informative PC prior is chosen for the temporal auto-correlation coefficient of the AR(1) process in (3.2), which reflects the strong temporal dependence: a priori, the probability of observing greater than is set to .

### 3.4 Generating predictive samples

In contrast to simulation-based estimation of posterior distribution through techniques such as Markov-chain Monte-Carlo, INLA does not automatically provide a representative sample of latent variables and predictors to calculate posterior quantities of interest. Nevertheless, it is possible to obtain samples of the approximate posterior distribution efficiently: first, we sample a set of hyperparameters according to the approximation to their posterior distribution conducted by INLA; second, conditional on the hyperparameter values, we sample from the Gaussian distributions arising from the Laplace approximation

(Krainski.al.2018).For each local Gaussian model, we have implemented this simulation procedure to obtain samples of the latent variables of the fitted model. The simulated predictions of for unobserved locations and times are part of these samples, and we first backtransform them to the original marginal scale (if marginal transformations were applied preliminarily). Finally, the value of the cluster summary is calculated for each backtransformed posterior sample. This leads to a sample of size for each cluster summary to be predicted, and we represent predictive cdfs through the empirical cdfs of these samples.

## 4 Results

We now discuss fitted models and assess the adequacy of our models to predict local space-time hot-spots in the anomalies by comparing goodness-of-fit diagnostics and predictive performance for the cluster summary given by the minimum. Improvements owing to explicit modeling of marginal tails are highlighted.

### 4.1 Marginal modeling

The first marginal modeling step consists in capturing the temporal non-stationarity in the anomalies, *i.e.*, fitting the parametric regression model in (1). The fitted model parameters and are displayed in Figure 1. The second marginal modeling step is to correct the Gaussian tail using a spatial GP distribution with either the GAM structure in (3) or the nearest-neighbors (NN) approach detailed in Section 3.1.3. Figure 6 shows the fitted GP scale, shape, and exceedance probability across the entire region for both approaches. The fitted scale parameters display a North to South effect that is consistent with the spatial pattern in Figure 3. The shape is consistently negative, which implies a finite upper bound for the distribution of SST.

As a graphical goodness-of-fit test for the GP model, we display in Figure 7 the QQ-plots of the normalized data with no tail correction and with a tail correction using both the GAM and the NN approaches. Though the QQ-plots match for non-exceedances, we observe a distinct improvement in the tail region when we resort to extreme-value methods and properly model the spatial pattern in the large normalized anomalies.

### 4.2 Local Gaussian dependence models

We have fitted local Gaussian models, each spanning over nine days and centered at one of the distinct validation days . We do not report exhaustive results, but to summarize some characteristics of the fits, we show the time series of the following estimated hyperparameters in Figure 8: Matérn range, Matérn standard deviation, and temporal auto-correlation coefficient. Interestingly, some obvious trends arise over time. The three rows of the figure correspond to the three approaches to handling marginal distributions, respectively. Estimated values for the Matérn range and temporal auto-correlation are very similar between the three models. Regarding the Matérn standard deviation, we obtain smaller values for Model 1 without any marginal pretransformation of data to unit variance, and one strongly outlying value arises with Model 2. All three models show the same patterns of temporal trends: ranges tend to increase towards the end of the validation period (years 2007-2015) shown in the plot, which implies that spatial dependence has increased. For the standard deviation, a pattern indicating stronger variability at the beginning and the end of the validation period arises. Finally, the temporal dependence captured in the autoregression coefficient is also increasing towards the end of the validation period, by analogy with the spatial dependence.

Figure 9 illustrates the prediction on the Gaussian scale for a validation day in December 2019. The map on the left shows the available data, pretransformed according to the nearest-neighbor approach for marginal distributions, and the validation locations. The map in the middle shows posterior means of predictions obtained through INLA. Finally, the right-hand map shows the corresponding posterior standard deviations. As expected, prediction uncertainty is lower when locations are close to observed locations but can be relatively high in the case of large gaps, such as the one in the southern part of the Red Sea.

### 4.3 Score evaluation

We compare Models 1–3 by applying the tail-weighted Continuous Ranked Probability Score (twCRPS) used for ranking predictions of the EVA challenge (huser2020eva). Given an estimated predictive distribution of the realized cluster summary , here defined as the minimum over space-time cylinders, the twCPRS is defined using a weight function as

We consider the weight function of the EVA challenge with , corresponding to a Gaussian cdf that puts little weight to values below . Moreover, we assess our models by evaluating the twCRPS for weight functions that put stronger weight to more extreme or less high quantiles by choosing , , and , the latter corresponding to the classical non-weighted CRPS. Table 1 summarizes the scores of Models 1–3 for different tail weight functions. In general, the scores of the three models are relatively close, which confirms that, locally, data are relatively smooth in space and time, such that a Gaussian dependence model performs already very well. The twCRPS scores of the GAM and nearest-neighbor (NN) marginal models are systematically improved as compared to the Gaussian model without marginal transformations, except for the classical CRPS. The performances of GAM and NN are relatively similar, but with better twCRPS for the NN model when putting weight on predictions at high quantiles. We conjecture that this is due to some local effects that are more easily captured through local pooling than through covariate modeling, and that the resulting discrepancy seems to be amplified when predicting very high quantiles of the cluster summary. We finally point out that our three models outperform all models developed by other teams for the data of the EVA 2019 challenge (huser2020eva), with their best score for being , while a simple benchmark score for was calculated to be .

Gaussian | GAM | NN | |

0.828 | 0.825 | 0.770 | |

3.27 | 3.19 | 3.07 | |

21.7 | 21.1 | 20.9 | |

702 | 717 | 717 |

## 5 Discussion

We have developed a probabilistic, likelihood-based approach to fill gaps in space-time datasets with particular attention to extreme values and multivariate distributions. By estimating a generative model in a Bayesian framework, we have simulated from the posterior distribution of unobserved data points to obtain representative samples of the cluster functionals such as the minimum over space-time cylinders. Due to the large size of the dataset, joint estimation of margins and dependence using all data seems out of reach. We therefore consider a two-step approach, where we first remove trends to make data more “Gaussian-like", especially at high quantiles. The second step consists of a Gaussian model fitted locally in space and time to the transformed data. Although pixel-wise observations seemed to be approximately Gaussian-distributed, the first step in the modeling procedure improved the predictive performance. We believe that much more substantial benefits can be expected from applying the two-step procedure to other data whose features differ more strongly from a stationary Gaussian distribution.

Our approach shows that we do not need an asymptotic model from extreme-value theory for capturing the dependence during extreme episodes. Based on our preliminary analysis showing asymptotic independence, we conjecture that our local Gaussian models are sufficiently flexible to capture spatial and temporal variations during extreme episodes. Moreover, the asymptotic independence in the data suggests that using non-asymptotic models such as Gaussian random fields might provide a more appropriate framework compared to the asymptotic dependence structure imposed by extreme-value models such as max-stable processes and generalized Pareto processes. For modeling spatio-temporal dependence, Gauss–Markov processes provide good tractability and scalability properties for implementing prediction with large datasets, such that our choice is also of pragmatic nature. The Red Sea SST data of OSTIA are produced using geostatistical interpolation techniques similar to kriging on original data, and they are relatively smooth in space and time due to strong dependence. This is another hint of why the Gaussian dependence models perform very well for prediction.

Due to the artificial generation of gaps in the Red Sea data, we could validate and compare modeling approaches using the true values to be predicted. When this is not the case, external validation could be implemented through cross-validation schemes by predicting cluster functionals for space-time subsets where all observations are available.

The code written for the implementation of the approach presented in this paper is freely available from https://github.com/dcastrocamilo/EVAChallenge2019.

## Acknowledgements

This work started when Daniela Castro-Camilo was a postdoctoral fellow at King Abdullah University of Science and Technology (KAUST). Support from the KAUST Supercomputing Laboratory and access to Shaheen is therefore gratefully acknowledged. Linda Mhalla acknowledges the financial support of the Swiss National Science Foundation.

Comments

There are no comments yet.