1 Introduction
Most of the current approaches to mapping landslide hazard exploit auxiliary information from geomorphological covariates and focus on one of its components—known as the landslide susceptibility—through the modeling of presenceabsence information [4, 5, 11, 17, 37]; see Reichenbach et al. [36]
for a recent review. Such approaches are predominantly based on machine learning techniques using binary classification over fine pixel grids, where subsampling of zero observations is often inevitable to cope with high spatial resolution and highly imbalanced designs. Recently,
Lombardo et al. [29, 30] and Lombardo et al. [26, 28] introduced the “intensity” concept for spatial landslide prediction by focusing on event counts and not only presenceabsence data. Specifically, they proposed a novel probabilistic approach based on a Bayesian hierarchical models, where landslides are viewed as spatial or spatiotemporal point processes of logGaussian Cox type. Using the integrated nested Laplace approximation [21, 40], they developed accurate statistical inference with high grid resolution and with sophisticated latent structures for capturing intensity variations not explained by observed covariates. In this paper, we use one of their models as a highly competitive baseline, and explore various more complex model extensions described below. While observed covariates are available at high pixel resolution, spatial random effects can be resolved at multiple scales. Here, we use slope units [SUs, 2, 10], which allow fast computations by being at lower spatial resolution than pixels, while delimiting physicallymotivated zones that are relevant to the landsliding process, which is known to show a relatively homogeneous response to slope instabilities within each SU. SUs are commonly used in landslide science (and more generally geomorphology), because of the empirical evidence that landslides occur on slopes [18].We study nontrivial model extensions with respect to two important aspects: first, we include spatially unstructured effects at pixel or SU scale to capture residual spatial clustering at small scales; second, we construct complex models with nonlinear or spacevarying covariate effects, in order to improve the baseline model’s predictive performance and allow for new insights and interpretations from an applied perspective. Spacevarying regression has been established as a useful concept when the response to a covariate is not stationary in space [13, 14]. In our context, we consider it as a natural—yet difficult to implement—solution to account for the spatiallyvarying influence of the landslide trigger (such as a heavy precipitation event), which is usually not observed at good spatial resolution in the study area, or not observed at all. In this paper, we conduct a thorough and rigorous structural analysis based on models that explore how the unobserved trigger interacts with observed covariate effects, here highlighted with the example of the slope steepness. Throughout our statistical analysis, our aim is to integrate physical understanding of the landsliding process into the model structure, e.g., by assuming that the slope steepness should become irrelevant as a predictor in places where there is only a weak—or no—trigger influence. Such improvements have never been considered in the landslide modeling literature, despite being of high practical relevance. More generally, we also discuss diagnostics to comprehensively compare the goodnessoffit and spatial predictive performance of a range of models of varying complexity, in order to study improvements with respect to the baseline. We argue that decisions should not be based on a single criterion, but rather on a combined assessment of several criteria to more objectively appreciate the relative strengths and weaknesses of models. Different models may in fact give complementary insights into the statistical or physical behavior of landslide activations.
In the remainder of the paper, Section 2 describes the landslide data and geomorphological covariates that we use in our analysis. Then, Section 3 provides the necessary background theory on logGaussian Cox processes, and presents our new models, while Section 4 discusses implementation and model comparison using the integrated nested Laplace approximation. In Section 5, we compare and interpret the fitted models, which we exploit to map the landslide intensity and sloperelated risk factors. We conclude with some discussion in Section 6.
2 Landslides data and predictor variables
2.1 Precipitationtriggered landslide occurrences
On October 1, 2009, a major rainfall discharge occurred in an area of around km on the island of Sicily (Southern Italy), with approximately mm of rain measured at nearby weather stations. This weather event followed two relatively smaller precipitation events just one and two weeks before, with about mm and mm of rain, respectively [8, 27]. Within just a few hours, this extreme precipitation event triggered several thousands of rapid shallow landslides and led to the death of people, and economic infrastructure damage of around half a billion Euro. Using remote sensing images before and after the event, the identification of separate debrisflow landslide [20] was made possible. Socalled landslide identification points [LIPs, 24] were then extracted from remotely sensed images (at m resolution) for each mass movement. Precisely, the triggering location was set to the point of highest altitude in the area affected by the movement. The left panel of Figure 1 shows a digital elevation model of the study area and the LIP inventory at m pixel resolution. Most LIPs were recorded in distinct pixels, but with a few exceptions: 353 pixels contained 2 landslides, 44 pixels contained 3 landslides, and 2 pixels contained 4 landslides.
2.2 Geomorphological covariate information
We use covariate information that has been aggregated to a m m grid based on a digital elevation model (DEM) with m resolution. This grid resolution has been shown to be sufficiently fine to avoid degrading the predictive performance of models in this study area [3, 9, 25], while allowing for reasonably fast inference. Lombardo et al. [29] provide a more detailed description of the calculation and meaning of covariates. Aggregation from m to m was done by averaging values for continuous covariates, or by selecting the prevailing category for categorical covariates. Covariates are as follows, where we use uppercase notation throughout to refer to the names of these covariates: Elevation (or Digital Elevation Model, abbreviated DEM); Aspect, i.e., the angle in describing the exposition of the area with respect to the North [46]; Slope Steepness [46]; Planar Curvature [19], which is measured perpendicular to the steepest slope angle and characterizes the convergence and divergence of flow across the surface; Profile Curvature [19], which indicates the direction of maximum slope; Topographic Wetness Index (TWI) [7], which quantifies topographic properties related to hydrological processes using slope and upstream contributing area as input; Stream Power Index (SPI) [32], which takes similar input as TWI and measures more specifically the erosive power of flowing water; Landform (with categories; see Wilson and Gallant [45]); the distance of each pixel to the closest tectonic fault line (Dist2Fault); Normalized Difference Vegetation Index (NDVI) [38], which measures the “greenness” of a landscape and serves as a proxy for vegetation; Lithology, i.e., soil type with categories, where rare soil types with less than occurrence pixels have been summarized in a single class “other”; Land Use (with categories). The choice of a m m grid yields a representation of the study area through pixels. When using continuous covariates for modeling purposes, we scale them to have empirical mean
and empirical variance
. Additional information about the covariates can be found in Section 1 of the Supplementary Material.Figure 1 shows the spatial distribution of the Elevation (left panel) and the Slope Steepness (right panel) on their original scale. In this work, we stress the importance of accurately capturing the influence of the Slope Steepness, which has a major effect on landslide activations. Landslides are very unlikely to be triggered in flat areas; they are much more likely on steeper slopes; and they become unlikely again on very steep slopes, since movable material has already gone through erosion and during previous mass movements. Below, we seek to construct models that can capture this nonmonotonic and highly nonlinear influence of the Slope Steepness.
In our Bayesian hierarchical models described in Section 3, we additionally exploit modeling units at an intermediate resolution (between mpixels and the full study area) known as slope units (SUs) [1, 29]. Precisely, the SU partition defines a phyicallymotivated and moderatelysized spatial discretization, here used for capturing latent random effects such as the influence of the spatiallyvarying precipitation event. These SUs can be viewed as relatively homogeneous mapping units with respect to geomorphological and geophysical features that are relevant to landslide activations. In our study area, we have SUs, which are displayed on the left panel of Figure 2. Some landslides triggered within the same SU may be due to a joint triggering mechanism, which may potentially lead to some residual stochastic dependence in the landslide occurrence process (conditional to the geomorphological structure and the precipitation trigger), but such spatial dependence is likely to be very weak when considering events arising in separate SUs. In other words, landslide data in different SUs can be safely assumed to be conditionally independent given fixed and random effects. In the next section, we describe our proposed logGaussian Cox process models.
3 Bayesian hierarchical modeling of landslide point patterns
3.1 LogGaussian Cox processes
Spatial point processes are stochastic models for the occurrence of events in space, when the event positions are random but obey certain density patterns and smallscale clustering or inhibition behavior. We refer to the bounded study area as , shown in Figure 1
. Point processes characterize the joint probability distribution of the number of points
in subareas , i.e., the marginal distribution and dependence structure of the random count variables . An important characteristic of point processes is the intensity function , which determines the expected number of points over any area , i.e., . For Poisson processes,has the Poisson distribution for any set
, and the occurrence of points is independent given the (deterministic) intensity function .LogGaussian Cox processes [LGCPs, 31] are Poisson processes with a stochastic intensity function
given by a logGaussian process. Their doublystochastic structure allows capturing spatial clustering of points due to unobserved or unavailable predictor variables. LGCPs are convenient for Bayesian hierarchical modeling, where the latent Gaussian process may encompass both fixed effects of observed covariates
, and random effects.Throughout, we use the notation and for random effects by adding contextspecific subscripts and superscripts. Precisely, we follow the convention that corresponds to the value of a random effect evaluated at the location , and we write
for the vector of the finite number of latent variables (following a multivariate Gaussian distribution) that are used to represent this random effect. For example, a random effect
described at SU resolution corresponds to a multivariate Gaussian random vector with components, one for each SU, while is the value that corresponds to the SU containing the location .The logintensities of our models are structured as
The random effects may directly depend on location , or only indirectly through a covariate observed at , for instance if
is used to capture the potentially nonlinear influence of a covariate. The probability density function of an observed finite point pattern
, composed of a random but finite number of points in the observation window , corresponds to the expectationusing the convention that if
. Closedform expressions of this expectation are not available in general, but Bayesian inference techniques, such as implementations based on the integrated nested Laplace approximation
[40] as used here, have been developed to approximate it numerically. In our Bayesian framework, the Gaussian processes used to construct the logGaussian intensity function in LGCPs can also be viewed as prior distributions for deterministic components of the intensity function of a Poisson process.3.2 Models
We now present the baseline model and our proposed model extensions. Throughout, we make systematic use of the concept of penalized complexity (PC) priors [42]
to define priors for hyperparameters (e.g., for standard deviations of random effects). For each latent component of the model, a simple baseline model is defined, typically corresponding to its absence (e.g., a standard deviation of
) such that all the latent variables pertaining to the component are set to. PC priors are then defined through a constantrate penalty for the distance to the baseline, expressed in terms of a transformation of the KullbackLeibler divergence. In particular, the PC prior of precision parameters corresponds to an exponential prior on the corresponding standard deviation
[42].3.2.1 M0 (Baseline): Fixed effects and spatial random effect
Our baseline model, called M0, has structure similar to the best model found by [29], although here with relevant improvements through the choice of prior distributions penalizing model complexity. In the model extensions presented subsequently, we ensure easy comparison and consistency with M0 by keeping the same prior distributions for components that are in common. Before giving details on components of the logintensity of M0, we provide its full formula:
(M0) 
The intercept and the continuous covariates, , including the Slope Steepness but with the exception of the circular Aspect variable, appear as fixed effects with coefficients ,
. To guide the estimation algorithm for faster convergence and to stabilize the fitted model, we fix moderately informative Gaussian prior distributions with precision
and mean for fixed effects, except for the intercept where the mean is .PC priors are used for the precision of the independent and identically distributed (i.i.d.) effects of the three categorical, nonordinal covariates (Lithology, Landform, Land Use), , which possess a substantial number of categories (, and , respectively). Priors for their coefficients , , , are centered at , and we impose a sumtozero constraint on the coefficients of each of the three factors to ensure identifiability. The priors for the three precision parameters, denoted by , and , respectively, are relatively informative, and are determined by the apriori specification of . Therefore, we let the data drive the posterior distribution away from if a clear signal is present in the data. This allows us to shrink the model towards a parsimonious formulation and to avoid unstable inference.
The Aspect covariate, , reports an angle within the interval , which we here discretize into equidistant bins, each spanning for a nearcontinuous treatment. The prior model of this random effect is a cyclic firstorder random walk (CRW1) over the bins, with a sumtozero constraint for identifiability. Writing , we characterize its multivariate Gaussian prior through the following conditional specification (where ):
where the constraint is imposed to ensure identifiability, and where is a fixed scaling constant such that corresponds to the marginal variance of the variables , . The CRW1structure makes sure that the estimated piecewise constant curve is “smooth” by borrowing strength between neighboring classes. We set an informative PC prior distribution for by specifying that, a priori, .
For the latent spatial effect structured at the SU level, we implement Besag’s classical conditional autoregressive (CAR) model [6, 39]. Writing , this model links the value in SU to adjacent SUs, described by the index set of size , as:
where the constraint is imposed for identifiability, and where is a fixed scaling constant such that corresponds to the (generalized) marginal variance [43] of the variables ,
, where the generalized marginal variance is the square of the geometric mean of the (nonstationary) standard deviations of the variables. We set a moderately informative PC prior distribution for
by specifying that, a priori, .The number of adjacent SUs, , varies moderately in our dataset, with of values between and and between and , while the minimum is and the maximum is ; see Figure 2. In general, this spatial random effect captures SUresolved effects that cannot be explained by other model components, in particular by observed covariates. With the landslides data, the spatial effect will absorb the local intensity variation of the precipitation trigger, which is at most weakly correlated with some of the other, geomorphological covariates.
3.2.2 Model 1: Spatially unstructured effects
Overdispersion in count data refers to the situation where the variance is larger than the mean, which stands in contrast to the Poisson distribution whose mean and variance are equal. Conceptually, our LGCP models are defined over continuous space and exclude multiple events at the same location , such that (theoretically) the notion of overdispersion does not apply. However, overdispersion in the counts for spatial units such as pixels or SUs can still arise if our intensity model is misspecified and fails to pick up all sources of spatial variation in the data. Our models assume constant intensity within pixels, and the spatial random effect has coarser resolution at the SU level. Pixel resolution is very high in our case and approximates continuous space, with only a very small proportion of pixels counting multiple events, such that we will not explore nonstationary behavior of the point process intensity within pixels. However, we propose to explore models with spatially unstructured effects at the pixel level or the SU level, which are capable of capturing sharp differences in intensity between neighboring pixels or SUs, respectively. In our first model extension, we therefore include i.i.d. Gaussian variables in the latent linear predictor, either by adding one variable to each pixel (Model 1a), or by adding one variable to each SU (Model 1b). We estimate the precision parameters, and , of these pixelwise and SUwise unstructured effects, respectively. Writing and to denote the pixel and slope unit containing location , respectively, these models are given as
(M1a)  
(M1b) 
where sumtozero constraints are imposed on and , denotes the zero vector, is the byidentity matrix, and , . For both effects, the precision parameter ( or ) is endowed with an informative PC prior determined by .
3.2.3 Model 2: Nonlinear effect
In this second model extension, we replace the linear Slope Steepness effect of the form “” (with a known covariate) by a nonlinear random effect , defined through a firstorder random walk prior using equidistant classes to partition Slope Steepness values. Denote the logintensity of the baseline model without the linear Slope Steepness effect by . Here, we consider the modified model
(M2) 
which can capture nonlinear, and in particular nonmonotonic influence of the Slope Steepness covariate. We set the prior distribution of the precision parameter of by analogy with the Aspect effect in M0, i.e., .
3.2.4 Model 3: Spacevarying regression (SVR)
Another extension of our baseline model is possible by keeping a linear coefficient for Slope but allowing it to vary over space. This allows the model to capture local variations of the strength of the Slope Steepness effect due to the precipitation trigger. We keep the global linear Slope Steepness coefficient and add a spatiallyvarying correction, defined at the SU level, in the following model:
(M3) 
By analogy with the latent spatial effect , the prior on corresponds to a Gaussian process with CAR structure, and with its own precision parameter , for which we set an informative prior distribution according to .
3.2.5 Model 4: Nonlinear effect and spacevarying regression
In this model, we combine both the nonlinear Slope Steepness effect of M2 and the SVRcoefficient component of M3 into a single model, leading to the following structure:
(M4) 
where hyperparameter priors are fixed as above.
3.2.6 Model 5: Parsimonious spacevarying regression (PSVR)
Finally, we construct a model similar to M4 but which links the latent spatial effect and the SVR component. If the latent spatial effect acts as a proxy for the precipitation trigger, then its low values indicate a weak or absent trigger effect, and then the Slope Steepness value becomes irrelevant since no landslides occur, whatever the geomorphological conditions. In this case, the SVR may locally neutralize (i.e., counteract) the globally estimated Slope Steepness effect. We here consider the following parsimonious model:
(M5) 
with the interaction coefficient to be estimated. Unlike the more complex model M4, this model features only one single CAR effect, , instead of the two a priori independent effects, and , in model M4, such that we consider it as a parsimonious variant of spacevarying regression. The prior for the parameter is set to be moderately informative; it is Gaussian with mean and precision .
4 Approximate Bayesian inference
4.1 The integrated nested Laplace approximation (INLA)
INLA [40] has found widespread interest in a wide range of applications [22, 29, 33, 34], thanks to its ability to provide fast and accurate posterior inference for the general class of latent Gaussian models including logGaussian Cox processes [44]. The RINLA package (see http://www.rinla.org/
), in which the core statistical methodology is efficiently and conveniently implemented, privileges sparse matrix calculations in large dimensions through systematic use of Gauss–Markov conditional independence structures. With hierarchically structured models including several components with different structures at the latent layer, INLA is typically faster and simpler to tune than simulationbased Markov chain Monte Carlo. (MCMC) methods
[21, 35, 39, 40, 41].In our context of logGaussian Cox processes, we write for the stochastic point process intensity at pixel with , where denotes the vector of the pixelbased latent Gaussian logintensities for the whole study area. We further use the notation for a vector with components, where “” refers to the variables of the additive random effects included in the logintensity model, e.g., , and so forth. The vector of hyperparameters (i.e., precisions of CAR, RW1 and i.i.d. components) is denoted by and has components. The distribution of pixelbased landslide counts , collected into a vector , is assumed to be conditionally independent given the (random) intensity values, i.e.,
where is a scaling factor corresponding to the area of one pixel. The principal inference goal is the calculation of the posterior densities of hyperparameters and of the components of , the latent vector with multivariate Gaussian prior distribution, i.e.,
(1)  
(2) 
However, calculation of (1) and (2) is hampered by the highdimensional numerical integration over the space spanned by the Gaussian vector . Instead, INLA uses the Laplace approximation, which corresponds to replacing integrand functions by suitable Gaussian density approximations. On the other hand, the integration with respect to the components of is done through numerical integration schemes, such that only a small number of hyperparameters can be estimated.
4.2 Model comparison and selection
First, we propose to compare models through the classical information criteria DIC and WAIC. These goodnessoffit criteria take the effective dimension of the latent model into account, thus penalizing model complexity. Their close relationship to the predictive performance measured through leaveoneout crossvalidation has been established, and WAIC is known to better take the stochasticity of the posterior predictive distributions into account
[15]. With INLA, these quantities are calculated through sensible approximation techniques [40].To focus more directly on criteria evaluating the spatial predictive performance, we also devise a fold crossvalidation scheme. Specifically, we randomly partition the SUs into 10 folds, each containing (approximately) the same number of SUs. We calculate predictive scores for two mapping units: pixels and SUs, for the latter by aggregating observed and predicted counts over the pixels of each SU. At the pixel level, we consider predictive scores that use either the predicted counts , the predicted probabilities of withinpixel landslide occurrences , or the full posterior predictive distribution of . Similarly, at the SU level, we consider the predicted counts estimated as , the predicted probabilities of withinSU landslide occurrences , and the posterior predictive distributions of . An alternative approach for predictive diagnostics, studied by Leininger et al. [23], would be to construct holdout sets by removing points at random from the point pattern; this is known as thinning. Here, we prefer the more challenging task of predicting entire spatiallycontiguous areas where all data within SUs have been removed, which is also more suited to assessing slopewise landslide hazard.
The posterior predictive distributions are obtained by generating a large number of posterior samples of counts for each crossvalidation fit. While INLA does not directly provide posterior samples because of its use of analytical and not simulationbased approximations, these can be generated conveniently [41]. Using RINLA’s internal, discrete approximations for posterior distributions of hyperparameters and latent Gaussian fields, the simulation algorithm first generates a realization of the hyperparameter vector; next, conditional on these hyperparameters, a latent Gaussian field is sampled; finally, counts are simulated from the pixelbased Poisson distributions with intensities defined according to the simulated latent Gaussian field. In what follows, crossvalidation results using simulations of the posterior predictive distributions are based on samples of the full posterior model.
We consider four types of crossvalidated predictive scores: the areaunderthecurve (AUC) [12] to measure prediction quality for the presence or absence of landslides within mapping units; the residual sum of squared errors (RSS) and the residual sum of absolute errors (RSA), both using predicted and observed counts; and the continuous ranked probability score [CRPS, 16] using the predictive distribution functions and observed counts. The formulas for pixelbased RSS and RSA are as follows:
where with the posterior density of . The general CRPS formula for a single observation and a corresponding (posterior) predictive distribution from a model may be expressed as . For the pixelbased CRPS in our case, we add up the CRPS values over all pixels and therefore use
Analogous formulas are used for SUbased criteria, where pixelbased observed counts and intensities must be aggregated over the pixels for each SU. This requires resorting to the joint posterior distributions of all corresponding to the pixels in a given SU. Since such CRPS formulas are difficult to calculate analytically, we use posterior sampling as implemented in RINLA and compute a MonteCarlo approximation of CRPS values based on a large number of posterior samples ( as above).
The AUC considers only presenceabsence data, which is a strong simplification for assessing the prediction of landslide counts, especially at the SU level where counts larger than one are frequent. By contrast, the other three measures rely on counts: while RSS and RSA focus on point predictions defined through the posterior mean of intensities at pixel level, CRPS also accounts for the uncertainty of the predictive distributions and yields good scores for models that provide predictions that are both calibrated (i.e., correct on average) and sharp (i.e., having little prediction uncertainty). We calculate these four predictive scores for each of the 10 folds, both at pixel and SU levels, and finally we average the predictive scores of the 10 folds together. Lower final values correspond to better predictive performances.
5 Results and discussion
5.1 Comparison of models using basic diagnostics
Table 1 reports results for our models fitted to the full dataset, including the precision parameter (i.e., inverse variance) of the estimated latent spatial effect (LSE) included in all models, the effective number of parameters (i.e., the effective dimension of the linear predictor when accounting for the dependence between the latent variables), and the two information criteria DIC and WAIC. The similar LSE precisions indicate that the variability of the LSE is relatively stable over different models, even with the most complex models. The effective number of parameters is relatively similar for all models except the model with an i.i.d. effect resolved at the pixel scale; recall that there is a large number of pixels. DIC and WAIC values are relatively similar overall, although both information criteria give a clear preference to models with spacevarying regression components. While DIC ranks first the most complex model M4 with independently specified RW1 and SVRcomponents for the Slope Steepness, WAIC prefers model M3, which includes only a fixed (i.e., global and linear) Slope Steepness effect and the SVR component.
We now also report and discuss estimated precision parameters for the specific components added in the models extending the baseline. In M1a with pixelresolved i.i.d. effect, we estimate a posterior precision of about for the i.i.d component, which indicates the presence of a rather strong independent effect at the pixel scale, not explained by the aggregated view based on SUs, and without dependence spanning over neighboring pixels. By contrast, the precision of the SUresolved i.i.d. effect in M1b is very high (180), indicating a relatively small contribution of this effect to the model. In M2, the precision of the nonlinear RW1effect of Slope Steepness is . In M3 with an SVR component, the precision of the spacevarying coefficient is . By jointly including the RW1 and SVReffects of Slope Steepness in M4, we get RW1precision of and SVR precision of . The former is higher than without the SVR component (M2), indicating that the influence of Slope Steepness is now partially captured by the additional SVR component, whose precision is relatively low. We conjecture that this low precision shows that the SVR component captures the influence of Slope Steepness more easily than the global RW1effect of Slope Steepness; moreover, the noncontinuous specification of the RW1curve may require stronger variation of the SVRcomponent’s contribution at relatively small spatial scales to smooth the RW1effect. Finally, model M5 with a parsimonious SVR component has a RW1precision estimated of , and the posterior mean of the coefficient is given by
with credible interval
. Therefore, a significant transfer of predictive information has taken place from the LSE (whose precision is similar to the other models; recall Table 1) to the spacevarying Slope Steepness influence, while the RW1contribution has been reduced compared to the other models with RW1component. The parsimonious constraint linking the SVR to the latent spatial effect in M5 leads to an improved goodnessoffit compared to the baseline M0 and the models with additional i.i.d. components, but based on its DIC and WAIC values we conclude that it cannot fully attain the high flexibility of models M3 and M4.Model  LSE precision  DIC  WAIC  

M0 (baseline)  0.35 (0.06)  1006  40142  39869 
M1a (pixeliid)  0.3 (0.02)  4166  41925  40834 
M1b (SUiid)  0.32 (0.02)  987  40120  39867 
M2 (RW1)  0.31 (0.02)  1002  40132  39864 
M3 (SVR)  0.31 (0.02)  1144  40005  39690 
M4 (RW1SVR)  0.31 (0.01)  1203  39949  39745 
M5 (PSVR)  0.34 (0.02)  1016  40087  39817 
5.2 Influence of fixed effects
In Figure 3, we compare the estimated coefficients for the 8 predictor variables included as fixed effects in our different models. The contribution of these covariates, and the associated uncertainty of their coefficients, are estimated to be very similar across the 7 models studied here, except for the Slope Steepness due to the major differences in model structure with respect to the contribution of this covariate. Interestingly, the parsimonious SVR structure seems to have fully absorbed the influence of DEM into the SlopeSVR part of the model, . We do not show results for the categorical covariates, but the conclusions remain qualitatively similar.
For the models M2, M4 and M5 with a nonlinear Slope Steepness effect modeled through a RW1 component, Figure 4
shows the resulting estimated curves—for better readability of the plots, piecewise constant curves are replaced by piecewise continuous interpolations. Nonlinear influence is obvious from these plots and displays a similar bell shape in all three models, with intermediately steep slopes between 30 and 60 degrees presenting high relative risk of landslide occurrence. Models M4 and M5 include additional SVRcomponents to capture the slopespecific influence at SU resolution, such that the estimated RW1 curves appear to be flatter, which implies smaller variations in relative risk.
5.3 Spatial predictive performance comparison
Table 2 reports spatial predictive scores based on a 10fold stratified crossvalidation, where folds are composed of a random selection of entire SUs. For all models, pixel and SUbased AUC values are very close to each other and reach around , indicating a very good performance for predicting the presence of landslides, especially at the SU level. When considering sums of absolute and squared errors (RSA and RSS, respectively), stronger differences between models arise, with model M2 with nonlinear Slope influence obtaining the best scores, and M1a with pixelbased i.i.d. effects performing much worse than the other models. Throughout, model M2 has a very good performance and has the best score except for the SUbased AUC value, although differences are rather small. Therefore, the inclusion of a nonlinear effect of the Slope Steepness, here implemented through a random effect with RW1 prior, is important for good prediction. The baseline model shows stable and good performance throughout and does not suffer from some relatively bad countbased scores arising for some of the extended models (except M2). Overall, the ranking of models based on their predictive performance looks quite different from the one based on goodnessoffit measured through information criteria in Table 1. A possible reason is that very high stochasticity and complexity of prior models may lead to more unstable, noisy predictions. We recommend a careful inspection of the fitted models based on several criteria, for goodnessoffit and for outofsample prediction. The “best” model M2 is more complex than the baseline model M0, but we add only a relatively small number of latent components to achieve a nonlinear contribution of Slope Steepness. We also stress that, for our landslides data, the inclusion of i.i.d. effects (pixel or SUbased) could not provide substantial improvements of goodnessoffit or predictive performance. Finally, the models M3, M4 and M5, which possess extra flexibility thanks to a spacevarying regression component, are relatively competitive overall, despite their relatively worse performance on RSS measures. Such models can still be useful by offering insights into the “physical” interaction of Slope Steepness with the unobserved precipitation trigger, as further explained in Section 5.5 below.
Model  AUC  AUC  RSA  RSA  RSS  RSS  CRPS  CRPS 

M0 (baseline)  0.8958  0.9308  420.0  420.5  2614  2608  466.3  240.6 
M1a (pixeliid)  0.8960  0.9308  590.1  590.0  5628  5602  474.7  279.7 
M1b (SUiid)  0.8958  0.9309  420.6  421.2  2615  2635  466.3  240.5 
M2 (RW1)  0.8956  0.9310  411.3  411.5  2481  2483  464.7  238.5 
M3 (SVR)  0.8963  0.9305  441.1  441.7  2922  2936  467.9  242.8 
M4 (RW1SVR)  0.8964  0.9302  436.6  437  2770  2801  465.9  241.1 
M5 (PSVR)  0.8966  0.9311  430.1  429.8  2833  2819  466.4  241.3 
5.4 Landslide intensity mapping
In Figure 5, we show the posterior mean of the estimated logintensity (at pixel scale) of model M0, and the difference in logintensity between the most complex model Model M4 (RW1SVR) and M0. In the logintensity of M0, the influence of geological structures such as river valleys (very low intensity) and mountain ridges comes out clearly. The spatial structure of the predicted values is dominated by the spatial effect capturing the influence of the spatial variation of the precipitation trigger. In this model, the amplitude of logpredicted intensities is close to that of the posterior mean of the spatial effect, which is evidence that the spatial effect is crucially necessary to account for unobserved covariate effects and to locally counteract the influence of observed covariates. The latter may be locally misestimated depending on the force of the precipitation trigger. Differences between models M0 and M4 are usually relatively minor, but in some small subareas, especially river valleys, model M4 has substantially higher logintensity values.
5.5 Interpretation of the Slope Steepness contribution
The maps in Figure 6 show how the observed Slope Steepness variable contributes to the linear predictor according to the different model structures (M0, M2, M3, M5). The SVR models M3 and M5 lead to an overall contribution of Slope Steepness that is strongly conditioned on the precipitation trigger; weakly impacted areas such as the northwestern part of the study area have a very weak contribution close to 0. In contrast, the simpler models M0 and M2 including a fixed effect or a RW1effect show spatial variation that is closer to the one of the Slope Steepness.
From a physical perspective, we expect to observe at most a few landslides in areas with low or absent precipitation trigger, and, intuitively, the effect of covariates may therefore become irrelevant in such areas. While basic models such as M0 and M2 are unable to capture this behavior, more complex models with spacevarying covariate coefficients can adequately reflect such natural physical constraints, and help to better highlight areas where a covariate substantially increases landslide hazard. Our model M5 (parsimonious SVR) is even designed to explicitly integrate this interaction between the precipitation trigger (represented here through the latent spatial effect (LSE), ) and the spacevarying coefficient. The estimated coefficient in model M5 measures the strength of this interaction, which turns out to be highly significant in our model.
At sites where the precipitation trigger is present, a strong response to Slope Steepness is expected for intermediate angles. In our models, the LSE acts as a proxy for the influence of the precipitation trigger, and we therefore expect a strong interaction between the precipitation trigger and the Slope Steepness effect in the logintensity of the point process. In Model M4, the LSE and the spacevarying Slope Steepness effect are components without any prior dependence, but we can investigate the posterior model structure, and in particular posterior correlation between these components. In Model M5, the interaction structure is fixed to a linear rescaling determined by the hyperparameter. The plots on the lefthand side of Figure 6 show a smoothed image of the spacevarying Slope Steepness coefficient, plotted with respect to a twodimensional coordinate system given by the estimated posterior mean of the LSE in each SU, and the Slope Steepness value averaged over the pixels of each SU. The original, nonsmoothed values were obtained at the points shown as small grey dots. A clear interaction pattern between the LSE and the SVR coefficient arises in M4, with lower LSE corresponding to lower SVR coefficient. We underline that the parsimonious model M5 is able to reproduce a very similar structure. The strong similarity of the results for the two models M4 and M5, despite M5 offering much less flexibility due to its rigid link between SVRcoefficient and LSE, persists in the plots on the righthand side of Figure 6. They show the actual spacevarying Slope Steepness effect (i.e., ), which corrects the global RW1Slope Steepness effect. As expected, the plots show a correction towards lower relative landslide hazard for the combinations of high LSE/flat slopes and small LSE/steep slopes. A relatively strong correction towards higher relative landslide hazard is necessary for high LSE/intermediate slopes, and relatively low LSE but Slope Steepness close to the pivotal value of 20 degrees (or slightly above). We conclude that the parsimonious model M5 offers a correction of the triggerindependent RW1Slope Steepness effect that reflects known physical behavior, and the typical landsliding response behavior to Slope Steepness according to the (approximate) intervals – degrees, – degrees and degrees. Field and empirical evidence suggests that the first interval – degrees corresponds to slopes that are too flat for landslides to occur, even in the presence of a strong trigger event, while the last interval of degrees corresponds to slopes where, typically, material that is susceptible to sliding has already gone in the past. Therefore, the trigger primarily acts on slopes in the interval of – degrees where landslides occur most easily.
6 Conclusion
For georeferenced landslide events, we have proposed a Bayesian hierarchical point processbased modeling framework to assess landslide hazard at high spatial resolution. Although substantial geomorphological covariate information is available in our case, it remains crucial for spatial prediction and for the interpretation of covariate effects to capture the latent activation pattern induced by the unobserved precipitation trigger. The framework of spacevarying regression is appealing and useful, but care is needed to avoid overly complex models, for which model components may be difficult to identify from the data and to interpret intuitively. When a spatial random effect captures the trigger intensity, it is natural to include it as a complement of spacevarying regression, such that local covariate effects can be locally removed when the trigger is absent or weak.
The inclusion of additional latent random effects without spatial dependence in the logintensity allows us to model smallscale variability, either at the pixel level (here available at a high m resolution), or at the coarser slope unit (SU) level. This enables capturing overdispersion that would otherwise lead to underestimated local variance of counts arising in the Poisson regression model used to spatially discretize the logGaussian Cox process. With our dataset, the inclusion of such i.i.d. effects did not substantially improve the baseline model. Rather, information criteria and some of the predictive performance measures indicated that model M1a with pixelbased i.i.d. effect was much worse, which may be explained by the large number of additional independent latent variables. This leads to a strong increase in computational complexity of estimation, and may hamper the identification of intermediatescale spatially structured effects.
The Bayesian approach is useful to incorporate expert knowledge and wellknown physical behavior about the shape of the functional response of processes to predictor variables into prior models. A large majority of landslides are usually triggered by a specific physical or weather event, such as extreme precipitation in the case of our dataset, and we need spatial random effects too capture the trigger if it has not been observed at high spatial resolution. Moreover, our formulation of the parsimonious spacevarying regression is a natural mechanism that allows the model to push the regression coefficient of Slope Steepness towards when the trigger is weak. More generally, including such mechanisms into models seems particularly promising for improving interpretability and predictions in cases where a large fraction of the study area experiences only a weak trigger intensity.
Identifiability problems may arise from the use of several latent components, whose spatial resolution and prior specification allows capturing similar types of variability, and which are not identifiable in a frequentist framework without prior distributions (e.g., spatial random effects, spacevarying regression coefficients, and i.i.d. effects, all resolved at the SU level). Typically, the Bayesian paradigm explains spatial variation through the component that is most easily “pushed away” from its prior towards the posterior shape of the point process intensity. Nevertheless, some confounding between such effects is common. For instance, estimated coefficients of fixed effects for covariates whose values are relatively smooth in space often tend be slightly smaller in absolute value in models with spatially resolved random effects. If the ultimate goal is spatial prediction for unobserved areas (e.g., prediction of landslide intensity for unobserved SUs adjacent to the observed ones), we must be careful to avoid a transfer of information from spatially dependent components (such as the latent spatial effect in our model) towards spatially independent effects (such as the i.i.d. effect). Our approach relies on penalized complexity priors where we penalize components rather strongly if they lead to very complex models, which is an appealing solution to cope with sophisticated latent models that include a moderate number of different random effect components. Moreover, we underscore that the INLA method, combined with modern computing power, provides a convenient toolbox that allows for relatively simple implementation and estimation of very highdimensional and sophisticated models in the Bayesian framework using latent Gaussian models.
We conclude that model choice is not easy when observed spatial data are discrete, only available through a relatively moderate sample size, not replicated in time, but depend on predictors that may change rapidly even at small spatial scales—in this setting, there usually is not a single “best” model, and data cannot inform us about the true, complex structure of the intensity function of the point process with high certainty. Therefore, careful construction of a moderate number of candidate models using prior expert knowledge is important. We recommend that several model diagnostics related to both goodnessoffit and predictive performance be compared and carefully studied in practice.
References
 [1] (2016) Automatic delineation of geomorphological slope units with r.slopeunits v1.0 and their optimization for landslide susceptibility modeling. Geoscientific Model Development 9 (11), pp. 3975–3991. Cited by: §2.2.
 [2] (2019) Accounting for covariate distributions in slopeunitbased landslide susceptibility models. a case study in the alpine environment. Engineering Geology 260, pp. In print. External Links: Document Cited by: §1.
 [3] (2016) Effect of raster resolution and polygonconversion algorithm on landslide susceptibility mapping. Environmental Modelling & Software 84, pp. 467–481. Cited by: §2.2.
 [4] (1998) Generalised linear modelling of susceptibility to landsliding in the central apennines, italy. Computers & Geosciences 24 (4), pp. 373–385. Cited by: §1.

[5]
(2005)
The application of gisbased logistic regression for landslide susceptibility mapping in the kakudayahiko mountains, central japan
. Geomorphology 65 (1), pp. 15–31. Cited by: §1.  [6] (1975) Statistical analysis of nonlattice data. The statistician, pp. 179–195. Cited by: §3.2.1.
 [7] (1979) A physically based, variable contributing area model of basin hydrology. Hydrological Sciences Journal 24 (1), pp. 43–69. Cited by: §2.2.
 [8] (2015) Predicting stormtriggered debris flow events: application to the 2009 ionian peloritan disaster (sicily, italy). Natural Hazards and Earth System Sciences 15 (8), pp. 1785–1806. Cited by: §2.1.
 [9] (2016) Exploring relationships between grid cell size and accuracy for debrisflow susceptibility models: a test in the giampilieri catchment (sicily, italy). Environmental Earth Sciences 75 (3), pp. 1–21. Cited by: §2.2.
 [10] (1995) GIS technology in mapping landslide hazard. In Geographical information systems in assessing natural hazards, pp. 135–175. Cited by: §1.
 [11] (2017) Handling high predictor dimensionality in slopeunitbased landslide susceptibility models through LASSOpenalized Generalized Linear Model. Environmental Modelling and Software 97, pp. 145–156. Cited by: §1.
 [12] (2006) An introduction to ROC analysis. Pattern recognition letters 27 (8), pp. 861–874. Cited by: §4.2.
 [13] (2003) Spacevarying regression models: specifications and simulation. Computational Statistics & Data Analysis 42 (3), pp. 513–533. Cited by: §1.
 [14] (2003) Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association 98 (462), pp. 387–396. Cited by: §1.
 [15] (2014) Understanding predictive information criteria for bayesian models. Statistics and Computing 24 (6), pp. 997–1016. Cited by: §4.2.
 [16] (2014) Probabilistic forecasting. Annual Review of Statistics and Its Application 1, pp. 125–151. Cited by: §4.2.
 [17] (2015) Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Computers & geosciences 81, pp. 1–11. Cited by: §1.
 [18] (1994) Towards a definition of topographic divisions for italy. Geomorphology 11 (1), pp. 57–74. Cited by: §1.
 [19] (1982) Quantifying source areas through land surface curvature and shape. Journal of Hydrology 57 (34), pp. 359–373. Cited by: §2.2.
 [20] (2014) The varnes classification of landslide types, an update. Landslides 11 (2), pp. 167–194. Cited by: §2.1.
 [21] (2012) A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA). The Annals of Applied Statistics 6 (4), pp. 1499–1530. Cited by: §1, §4.1.

[22]
(2018)
Advanced spatial modeling with stochastic partial differential equations using r and inla
. Chapman and Hall/CRC, New York. Cited by: §4.1.  [23] (2017) Bayesian inference and model assessment for spatial point patterns using posterior predictive samples. Bayesian Analysis 12 (1), pp. 1–30. Cited by: §4.2.
 [24] (2014) A test of transferability for landslides susceptibility models under extreme climatic events: application to the messina 2009 disaster. Natural hazards 74 (3), pp. 1951–1989. Cited by: §2.1.
 [25] (2016) Exploiting maximum entropy method and aster data for assessing debris flow and debris slide susceptibility for the giampilieri catchment (northeastern sicily, italy). Earth Surface Processes and Landforms 41 (12), pp. 1776–1789. Cited by: §2.2.
 [26] (2019) Geostatistical modeling to capture seismicshaking patterns from earthquakeinduced landslides. Journal of Geophysical Research: Earth Surface 124 (7), pp. 1958–1980. External Links: Document Cited by: §1.
 [27] (2016) Presenceonly approach to assess landslide triggeringthickness susceptibility: a test for the mili catchment (northeastern sicily, italy). Natural Hazards 84 (1), pp. 565–588. Cited by: §2.1.
 [28] (2019) Spacetime landslide predictive modelling. arXiv preprint arXiv:1912.01233. Cited by: §1.
 [29] (2018) Point processbased modeling of multiple debris flow landslides using INLA: an application to the 2009 Messina disaster. Stochastic Environmental Research and Risk Assessment 32 (7), pp. 2179–2198. External Links: Document Cited by: §1, §2.2, §2.2, §3.2.1, §4.1.
 [30] (2019) Numerical Recipes for Landslide Spatial Prediction Using RINLA: A StepbyStep Tutorial. In Spatial Modeling in GIS and R for Earth and Environmental Sciences, H. R. Pourghasemi and C. Gokceoglu (Eds.), pp. 55–83. External Links: Document, ISBN 9780128152263 Cited by: §1.
 [31] (1998) Log Gaussian Cox processes. Scandinavian Journal of Statistics 25 (3), pp. 451–482. Cited by: §3.1.
 [32] (1991) Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrological processes 5 (1), pp. 3–30. Cited by: §2.2.
 [33] (2019) Geospatial health data: modeling and visualization with rinla and shiny. Chapman & Hall/CRC Biostatistics Series, Boca Raton, FL. Cited by: §4.1.

[34]
(2018)
INLA goes extreme: Bayesian tail regression for the estimation of high spatiotemporal quantiles
. Extremes 21 (3), pp. 441–462. Cited by: §4.1.  [35] (2017) Latent Gaussian modeling and INLA: A review with focus on spacetime applications. 158 (3). Note: Journal of the French Statistical Society (Special Issue on SpaceTime Statistics) Cited by: §4.1.
 [36] (2018) A review of statisticallybased landslide susceptibility models. EarthScience Reviews 180, pp. 60–91. Cited by: §1.
 [37] (2010) Optimal landslide susceptibility zonation based on multiple forecasts. Geomorphology 114 (3), pp. 129–142. Cited by: §1.
 [38] (1974) Monitoring vegetation systems in the great plains with erts. Cited by: §2.2.
 [39] (2005) Gaussian Markov random fields: theory and applications. CRC Press. Cited by: §3.2.1, §4.1.
 [40] (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B 71 (2), pp. 319–392. Cited by: §1, §3.1, §4.1, §4.2.
 [41] (2016) Bayesian computing with inla: a review. Annual Review of Statistics and Its Application 1. Cited by: §4.1, §4.2.
 [42] (2017) Penalising model component complexity: a principled, practical approach to constructing priors. Statistical Science 32 (1), pp. 1–28. Cited by: §3.2.
 [43] (2014) Scaling intrinsic Gaussian Markov random field priors in spatial modelling. Spatial Statistics 8, pp. 39–51. Cited by: §3.2.1.

[44]
(1986)
Accurate approximations for posterior moments and marginal densities
. Journal of the American Statistical Association 81 (393), pp. 82–86. Cited by: §4.1.  [45] (2000) Digital terrain analysis. Terrain analysis: Principles and applications 6 (12), pp. 1–27. Cited by: §2.2.
 [46] (1987) Quantitative analysis of land surface topography. Earth Surface Processes and Landforms 12 (1), pp. 47–56. Cited by: §2.2.
Comments
There are no comments yet.