Landscape allocation: stochastic generators and statistical inference

by   Patrizia Zamberletti, et al.

In agricultural landscapes, the composition and spatial configuration of cultivated and semi-natural elements strongly impact species dynamics, their interactions and habitat connectivity. To allow for landscape structural analysis and scenario generation, we here develop statistical tools for real landscapes composed of geometric elements including 2D patches but also 1D linear elements such as hedges. We design generative stochastic models that combine a multiplex network representation and Gibbs energy terms to characterize the distributional behavior of landscape descriptors for land-use categories. We implement Metropolis-Hastings for this new class of models to sample agricultural scenarios featuring parameter-controlled spatial and temporal patterns (e.g., geometry, connectivity, crop-rotation). Pseudolikelihood-based inference allows studying the relevance of model components in real landscapes through statistical and functional validation, the latter achieved by comparing commonly used landscape metrics between observed and simulated landscapes. Models fitted to subregions of the Lower Durance Valley (France) indicate strong deviation from random allocation, and they realistically capture small-scale landscape patterns. In summary, our approach of statistical modeling improves the understanding of structural and functional aspects of agro-ecosystems, and it enables simulation-based theoretical analysis of how landscape patterns shape biological and ecological processes.



There are no comments yet.



Gibbsian T-tessellation model for agricultural landscape characterization

A new class of planar tessellations, named T-tessellations, was introduc...

A simple geometric method for navigating the energy landscape of centroidal Voronoi tessellations

Finding optimal centroidal Voronoi tessellations (CVTs) of a 2D domain p...

Topological descriptors of spatial coherence in a convective boundary layer

The interaction between a turbulent convective boundary layer (CBL) and ...

Critical elements for connectivity analysis of brain networks

In recent years, new and important perspectives were introduced in the f...

An analysis of NK and generalized NK landscapes

Simulated landscapes have been used for decades to evaluate search strat...

Sonifying stochastic walks on biomolecular energy landscapes

Translating the complex, multi-dimensional data from simulations of biom...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Agroecosystems are the basis for food production and other ecosystem services such as biodiversity, pollination and pest control (Power2010; Foresight2011). Landscape heterogeneity plays an important role for many agroecological processes and can be expressed through landscape configuration, referring to the size, shape, and spatial-temporal arrangement of land-use patches, and through landscape composition, referring to the number and proportion of land-use types (Martin2019). Generative models are widely applied in landscape ecology for simulating virtual landscapes (i.e., a mosaic of fields having shapes and properties that vary in space and time including or not biotic and abiotic relationships), differing in configuration and composition, to systematically study the effects and impacts of landscape heterogeneity on ecosystem processes; see the reviews of Poggi2018; Langhammer2019. Such models should allow generating a high number of virtual but structurally realistic maps of land cover (Gardner1999; Saura2000; Gardner2007; Sciaini2018), and often parameters related to landscape features such as the percentage of land-cover, the habitat fragmentation, or spatial autocorrelation (Langhammer2019) can be controlled. In this paper, we focus on agricultural landscapes, and we consider neutral landscape models where the model does not directly interact with biotic or abiotic processes (Gardner1987; With1997).

Models use either a vector-based or a raster-based representation, with the majority of models in the literature being of raster type. The raster approach is particularly useful for modelling gradual landscape dynamics and continuous processes. However, agricultural landscapes are strongly characterized by polygon-shaped patches and piecewise linear corridors following polygon boundaries, such that vector approaches seem preferable

(Gaucherel2006a; Gaucherel2006b; LeBer2009; Papaix2014; Inkoom2017; Langhammer2019). In particular, fringe structures such as hedgerows, roads or ditches aligned along polygon boundaries have an important impact on many agroecological processes. In a vector-based framework, Gaucherel2006a; Gaucherel2006b use models based on Gibbs energy terms to control certain pairwise interactions between landscape elements with the aim of simulating patches and certain fringe structures. Papaix2014 develop a landscape generator without fringe structures that generates the landscape mosaic with two types of fields based on the T-tessellation algorithm of Kieu2013. However, existing modeling frameworks lack tools for parameter inference and model validation. Validation procedures are usually solely based on testing whether simulated landscapes are able to reproduce realistic landscape features by comparing landscape metrics (e.g., from the FRAGSTAT library Mcgarigal1995). Sometimes, such metrics are directly used within simulation algorithms to enforce convergence towards target values (Langhammer2019).

In this paper, we advocate to turn away from the raster paradigm when modeling agricultural landscapes. Instead, vector-based approaches are independent of the grid resolution and give better control over small-surface elements, and they provide a sparser and more functional representation of patchy geometric structures without continuous gradients. The approach that we develop is geared towards flexible and realistic parametric stochastic modeling of fringe structures, such as hedgerow networks. Based on a network representation of interactions among landscape elements, we construct Gibbs energies pertaining to exponential family models, which provides a natural distributional framework for controlling landscape descriptors.

We assume that the polygon structure of patches in a subset of planar space is given, i.e., a tessellation of space serves as fixed support of the model. It can be obtained by preprocessing a real landscape, or we can use simulations of a parametric tessellation model to generate realistic features. We model the stochastic land-use allocation mechanism of patches and linear elements by assigning categories to the polygons and their edges, where dynamic structures such as crop rotation are possible. The composition of landscape is expressed through the proportions of categories (e.g., numbers, relative lengths or surface areas of objects), while configuration relates to the spatial-temporal arrangement of categories, such as clustering or repulsiveness.

An overarching goal is to generate visually realistic landscapes, and we develop the following methodological novelties: i) mathematical representation of landscape composition and configuration through a multilayer network; ii) generative stochastic parametric models coupling land-use allocation of patches and linear elements; iii) simulation of such models using Markov-Chain Monte-Carlo (MCMC); iv) statistical inference using real landscape data; v) validation over real landscape data by comparing metrics for vector and raster representations between real and simulated landscapes. Our approach can handle relatively large landscapes by capitalizing on low computational requirements thanks to vector-based representations and to sparse-matrix structures for interactions.

In the remainder of the paper, Section 2 presents real landscape data and preprocessing steps for an agricultural region in southeastern France, for which previous studies have highlighted a key role of agricultural practices and hedgerow configuration for biodiversity and pest control (Ricci2009; Maalouly2013; Lefebvre2016). In Section 3, we propose the mathematical representation, modeling and simulation of landscapes. Tools for statistical inference and validation are developed in Section 4. In Section 5, we apply the developed framework to the above data, and we discuss how the goodness-of-fit and the generation of realistic landscape metrics is influenced by the choice of the descriptors in the model. A discussion in Section 6

concludes the paper. Supplementary material contains details on the simulation algorithm and additional estimation results.

2 Landscape data

Real data for agricultural landscapes are based on remote sensing images, digital land registers, land cover data bases such as CORINE (Buttner2006), and field data. Often, manual annotation steps are necessary to complete and clean data. We study the Lower Durance Valley in southeastern France depicted in Figure 1a, stretching over and mainly characterized by agricultural activity ( %) and urbanized areas, with main cultures of open area (46%) and apple/pear orchards (24%).

Figure 1: Lower Durance Valley study area. a) Full area with subdomains. b) Subdomains: 1) small region D1; 2) small region D2; and, 3) large region D3.

Data are based on manual digitalization with the ArcView software using an official French database of aerial photographs (BD ORTHO, IGN, 2004, 0.5m resolution, updated with field monitoring in 2009).

The region totals of hedgerows, which we will represent as linear segments, with average length of . We distinguish East-West oriented wind-break hedges to break the strong Mistral winds (83%) from roadside hedges.

For the data application in this paper, we select three subdomains with contrasting properties and dimensions: region 1 in Figure 1b1 is relatively small and we refer to it as D1; region 2 in Figure 1b2 has the same surface area and is called D2; and region 3 in Figure 1b3 delimits a much larger domain including D1 and D2, which we denote by D3. Table 1 summarizes details.

D1 D2 D3
Area () 3.37 2.3 41.13
of Semi-natural 73 50 76
of Crop 27 50 24
Hedgerows () 44.64 33.61 386.36
No. of patches 368 468 4379
No. of linear segments 1105 1405 12517
Table 1: Summary of selected subregions of Lower Durance Valley study area; see Figure 1.

We use a simplified representation of the landscape as a tessellation of space with polygon-shaped cells. Linear segments (e.g., hedgerows) correspond to polygon edges. To achieve polygon shape for patches defining a continuous cover of space, and to align hedgerows with polygon edges, we preprocess the landscape towards a polygon tessellation of 2D space (Okabe1992)

. For this purpose, we minimize a heuristic loss criterion measuring the distance between original and transformed landscape. Figure

2 illustrates that landscape modifications during preprocessing for domain D2 are mostly minor. In our models, the tessellation will be considered as a fixed support for linear element attribution and crop rotation. Tessellation simulation algorithms for agricultural landscapes (Kieu2013; Papaix2014; Poggi2018) also enable the generation of new synthetic but realistic supports for our models.

Figure 2: Side-by-side representation of the original digitalized shapefile (a) and the landscape tessellation after preprocessing (b) for the small region D2.

3 Stochastic modeling and simulation of landscape allocation

3.1 Mathematical landscape representation

We propose to represent a landscape as a collection of geometric objects as follows,


where each element is composed of two sets of data and . The information in is considered as being fixed, while the data contains information on the th geometric element that we aim to model. The vector represents categories that we allocate to the geometric elements in the landscape, such as the land-use types. We suppose that with a finite space of possible categories for the th element. The objects could represent different types, such as polygons (i.e., habitat patches) or linear segments (i.e., linear landscape elements), see Figure 3a. For polygon objects, the dataset could contain this information, and in addition the geographical coordinates of its center point and of its vertices, its surface area, and potentially other exogenous covariate data. For instance, we could allocate each polygon with a category among the following three options: crop (), natural habitat () or other (). For linear segment objects, the data could contain the geographical coordinates of its endpoints, and potentially other exogenous covariate data. A linear segment could be allocated with a category among single-species hedgerow (), mixed-species hedgerow () or no hedgerow (). In the case with only a single category no choice of allocation has to be made. The space of all possible combinations of allocations is . This finite collection contains possible landscape allocations. If the geometric structure in can be supposed to be invariant through time, we describe temporal dynamics (if present) by the sequence , of categories over time.

3.2 Network model of interactions

We use a graphical representation of landscape to capture spatial or functional adjacency of landscape elements. Interaction between objects is modeled through a multilayer or multiplex network, i.e., a set of interacting single-network layers (Boccaletti2014; Kivela2014). Each layer corresponds to an object type, and each single-network layer represents the interaction among objects of same type. Interactions between different network layers represent interactions between objects of different type.

The focus of our models is on patches and linear landscape elements, such that we define a collection of objects with two types, (Figure 3a), where represents an object of patch type (layer ), and represents an object of linear segment type (layer ), see Figure 3b. The multilayer network is given as , where is the set of graphs defined over layers and , respectively (Figure 3b). Layer is represented by the graph and describes the interaction among patches: nodes represent patches, and links represent the intra-layer interaction corresponding to the patch interactions. For the models in this paper, we assume that two patches interact if they are adjacent, that is, if they share part of their boundary. The layer has similar structure and describes the interaction among linear landscape elements: nodes stand for linear landscape elements , and links stand for linear landscape element interaction . In this paper, we assume that two linear elements interact if they intersect or have a vertex in common. The interaction between the single network layer and is described by the set of interconnections where . In our framework, we assume a direct interaction among a node of and a node of when a linear element (i.e. ) is located on the boundary of a patch (i.e. ).

The full interaction structure is encoded in the adjacency matrix :


where and represent the adjacency matrices of intra-layer interactions of and , respectively, and encodes inter-layer interactions among and . For simplicity, we assume symmetry of interaction such that , but the extension to asymmetric and directed interactions is straightforward. Entries of could be binary to represent presence () or absence () of an edge, or may carry a weight value different from unity in case of presence of an edge. Non-binary weights could be based on distance between elements or on volumes/sizes of interacting elements.

Based on this landscape representation, we develop parametric models of probability distributions over the allocations

, conditional on the information in and . We assume conditional independence of the value of object given the information from all adjacent elements: the landscape structure of non adjacent elements does not provide any information on object if we know all the adjacent objects. This framework allows for flexible interaction structures represented by the adjacency matrix

with sparse structure, such that we store only the positions and values of the relatively small number of non-zero entries. We formalize the conditional independence assumption through the following property of equality in distribution of conditional probability distributions:


where . We adopt notations such as to refer to the set of all objects with removed.

Figure 3: Landscape representation. a) Polygon objects (patches, in grey) and linear segment objects (in red). b) Multi-layer network of interactions. Layer : single network of interactions between patches; layer : single network of interactions between linear elements; links between and represent interactions of patches and linear elements.

3.3 Exponential family models for landscape descriptors

We set up a general probabilistic modeling framework based on Gibbs energies to define exponential family models. For simplicity of notation, we will not distinguish between the full object information and the allocated value , and we use henceforth. For defining a model, we use functions , , measuring the value of summary statistics for allocations . We refer to these functions as landscape descriptors. We now define the probability of observing the allocation as follows:


The normalizing constant is defined as


such that probabilities in (4) sum up to . Since the number of possible configurations is finite, the sum in Equation (5) is finite and the model is well-defined. In practice, the number of possible configurations in is usually very large, such that it is impossible to calculate the value of the constant .

Equation (4) defines a global specification of the model where we consider the full allocation . In contrast, we also consider the local allocation probability of a landscape category conditional to the other landscape elements. Therefore, we determine the probability of observing the category given the allocation of all the other elements , where we use the notation . Then, the conditional probability is given as


and the unknown normalizing constant cancels out in expression (6).

3.4 Examples of parametric models

Landscape descriptors are used to capture important landscape characteristics and features. In composition terms, such functions are the sum of contributions of individual objects; in spatial interaction terms, we add up contributions that measure the interaction between two adjacent objects, and in time dependency terms we compare configurations over two consecutive time steps. An example specification is as follows, with spatial landscape descriptors given by


and a descriptor for temporal dynamics:


Then, is a generic composition term, is a generic interaction term for objects of type , is a generic interaction term for objects of different type, and describes time dynamics of objects of type over the time horizon . Table 2 illustrates concrete choices to specify the contributions to landscape descriptors involving two object types and , here taken as patches and linear elements. For each object type, we allow for allocation categories , such as crop () or natural habitat () and hedgerow () or no hedgerow () for patches and linear elements, respectively.

Activity terms are specific composition terms that provide direct control over the number of objects of a category by setting equal to the number of objects in specific category. To ensure identifiability, we fix a reference category (e.g., for objects of type ) and specify the activity term and its coefficient only for categories , such that it is expressed relative to . Implicitly, we have . A positive coefficient would give relative preference to category over category , such that landscapes tend to have more objects of category than of category for type , provided that the energy terms of the other landscape descriptors do not influence the proportion of categories.

Composition Activity term

Patch area

Long segments
Horizontal segments

Interaction (Adjacency)



Time dependency
Patch rotation

Table 2: Examples of landscape descriptor terms. Notations: refers to the (empirical)

percent quantile;

is the (empirical) expected value.

3.5 Iterative landscape simulation algorithm

We implement a Metropolis-Hastings (MH) algorithm to iteratively simulate a discrete-time Markov chain whose stationary distribution corresponds to the target model (e.g., Descombes2013) where the configuration of the allocated land-use categories is the state variable of the system. The four main steps of MH-MCMC are as follows: i) define an initial state ; and then iteratively ii) propose a new state given the current state , based on a proposal distribution , and iii) decide on acceptance () vs. rejection () based on checking where , and the acceptance ratio is given by


finally, iv) Return the final configuration after iterations. A more detailed schematic overview is given in the Supplementary Material. If we need more than one realization of the landscape, we can either run chains in parallel, or we may run a single chain but return a sample given by the states indexed by , , with the burn-in period and intermediate states left out to avoid autocorrelation in the final sample. Since the parameter vector of the model is fixed during each MCMC run, the intractable normalizing constant in (4) cancels out in the acceptance ratio. However, we have to update the calculation of the set of landscape descriptors for each new configuration during the iterations.

With respect to the choice of the initial state of the system, we have to ensure that , and that valid paths to any configuration with are proposed with strictly positive probability. The models presented in Section 3.4 and implemented in the remainder of the paper satisfy for all possible configurations , such that any initial state is valid. In more general cases, hereditary properties when moving between configurations must be checked. We initialize the system state either at random (draw the category of an object among all available categories), or by attributing only one category for each object type, or by using an observed configuration from real data. For the proposal of a new state, we use the following default proposal distribution of random walk type. We first select an object at random. In case of temporal dynamics, the time step is also selected at random. Then, we propose a new category for this object, different from the current one and drawn at random. This proposal distribution leads to symmetry of proposals to move forward and backward, such that the ratio cancels out from (11).

The landscape descriptors are sufficient statistics in an exponential family model – they contain all the information on that we can draw from . Therefore, we can monitor the convergence of Markov chains to their stationary distribution by checking the simulated series through trace plots, which further allows us to determine the burn-in period, and to analyze the mixing behavior of the Markov chains to fix the number of intermediate states to be left out (see, e.g. Kieu2013). In practice, we have found that the number of iterations needed for burn-in strongly depends on the combination of size of the landscape and complexity of the model, and especially on the type of landscape descriptors involved. The running time necessary to simulate one landscape in one Markov chain for the examples discussed in this paper ranges from several seconds to several minutes.

3.6 Simulation examples

We illustrate the simulation algorithm for the subdomain D1 presented in Section 2, and we visually explore how landscape simulation output changes when varying parameters in (4). We focus on object interactions: crop-hedge adjacency (), hedge-hedge adjacency (), and crop-hedge adjacency ()), as defined in Table 2. All other descriptors have coefficient , i.e., are not present in the model.

We simulate three landscape models, each one having one of the three -coefficients different from , with values . Each simulation is run in a separate Markov chain and takes about 10 seconds for the small regions D1 and D2 and it takes about 30 minutes for the large region D3. Figure 3 in the supplementary material shows some traceplots of the landscape descriptors and confirms fast mixing of the chains. We have finally fixed relatively large numbers of burn-in steps of for the small domains D1 and D2, and of for the large domain D3, to ensure that the chains have reached the stationary distribution.

Figure 4 shows one simulation for each configuration of interaction type and coefficient value. A negative coefficient for the interaction among elements of the same type (Figure 4a and 4b) yields a fragmented allocation of crops (Figure 4a) and hedges (Figure 4b), respectively, while a positive coefficient results in clustered configurations. As for the multiplex interaction of crop-hedge adjacency in Figure 4c, a negative coefficient leads to hedges being located away from crop-patch boundaries, while they concentrate on such boundaries for a positive parameter.

Figure 4: Landscape simulations on D1. Panel a): varying crop-crop adjacency; Panel b): varying hedge-hedge adjacency; Panel c): varying crop-hedge adjacency. Columns from left to right: coefficient .

4 Statistical inference and model validation

4.1 Parameter inference

To infer the allocation mechanism of a real landscape, we estimate the parameter vector . The likelihood function is not tractable in practice due to the normalizing constant in the probability mass function (4). Instead, we construct a pseudo-likelihood based on conditional distributions (see Besag1974; Baddeley2000). Given objects with their allocation categories, we define the pseudo-likelihood as the product of the conditional probability of the category given all the other variables ; i.e., it is the composite likelihood of conditional distributions (Besag1974; Varin2011) given as


where the conditional probability is defined in Equation (6). Therefore, the unknown normalizing constant cancels out. We assume that direct interaction of landscape objects occurs only along the edges of the multiplex-network graph, such that we only need the information from adjacent objects in owing to conditional independence.

If has only two possible allocation categories , then we can write for with replaced by the alternative level, and (6

) is equivalent to the logistic regression equation


We exploit the closed-form conditional probabilities (6) and (13) for parameter estimation of using standard software implementations of logistic regression (if ), or using the more general pseudo-likelihood framework (if ).

We implement parametric bootstrapping for statistical inference and model selection. The maximum pseudo-likelihood estimator is known to be asymptotically consistent and normal (Jensen1991; Varin2011)

, but inference is more involved. Specifically, asymptotic variance-covariance parameters are more difficult to obtain from the model fit as compared to full likelihood, and estimation bias is possible with finite-sample data. Instead, we use our simulation algorithm for parametric bootstrapping to validate good estimation performance of the pseudo-likelihood, specifically unbiasedness, and for obtaining parameter confidence intervals.

In practice, we generate landscapes from the fitted model using to obtain a sample

from the pseudo-likelihood estimator, and then check for estimation bias and derive Monte-Carlo confidence intervals. For a test of the null hypothesis

for fixed , i.e. to check if the landscape descriptor is significant, we implement a Monte-Carlo test where we sample from the fitted model, but with the modification . The null is rejected if the value does not lie within the one-sided Monte-Carlo confidence interval of , i.e., if less than (e.g., ) of the -values estimated for the simulations have the same sign and higher absolute value than the value estimated for the data (see, e.g., Davison1997).

4.2 Validation metrics based on landscape structure

For statistical validation of models, it is important to check if the distribution of landscape descriptors – as produced by the fitted model – is in line with the observed value of this descriptor. We develop such checks in the data application in Section 5.

Moreover, various metrics exist to assess if simulated landscape patterns succeed in measuring the landscape functionality and different ecological relevancy in diverse applications (Kupfer2012; Frazier2017). Such metrics are often strongly correlated. It is important to assess if a model such as the one we propose in (4), usually characterized through a small number of landscape descriptors, is capable to generate metric values similar to those in the observed data. Some of commonly used landscape metrics focus on a patch-mosaic model as in our work (i.e., the landscape is simplified into a mosaic of discrete habitat patches). Many metrics have been developed for landscapes conceptualized as an environmental gradient (i.e., for raster representations, see Mcgarigal1995; Cushman2010). Here, we assess how diverse spatial patterns in data are reproduced by our model through metrics based on graph theory (network metrics), often applied to patch-mosaic model (Urban2001; Minor2008; Urban2009; Lu2016), or through metrics based on gradient theory raster metrics (Mcgarigal1995), where we transform the patch-mosaic output in our simulation into a raster. We compare the metrics calculated for the real landscape to the empirical distribution of metrics from the simulated landscapes, the latter obtained through the parametric bootstrap. The metrics are presented in Table 3. We focus on standard network metrics, which are intuitive and useful in different application contexts (Urban2001; Minor2008). Network metrics are evaluated either at the node scale (with one value per node) or at network scale. Node scale helps to identify vital nodes associated with structural or functional objectives (Lu2016), while network scale summarizes the whole topology (Urban2001; Calabrese2004). From gradient theory, we compute metrics reviewed by Mcgarigal1995, and we follow Cushman2008 by choosing the metrics identified as “highly universal and consistent class-level landscape structure components”.

For the raster analysis, we use the R package raster (Hijmans2015) to transform the spatial objects (i.e., polygons and linear segments) into pixels with coordinates x, y and a categorical value and the package landscapemetrics (landscapemetrics) to evaluate the raster metrics. In our example with two polygon types (crops and hedges) and two edge types (hedge or not), we obtain 3 categories (also called habitats): crop, semi-natural habitats, and hedge. Here, absence of hedges is not a class in itself but rather leads to a lack of barriers.

Name Description Support Range Reference
Degree Number of directly connected node neighbors node [0, 1] [1],[2]
Coreness K-shell decomposition for node’s spreading influence node [0,) [1],[2]
Degree grade 2 Number of nodes at most away from given node node [0, 1] [1],[2]
Eccentricity Maximum shortest path to nodes node [0, 1] [1],[2]
Closeness Reciprocal of total length of shortest paths to other nodes node [0,) [1],[2]
Betweenness Potential power to control information flow node [0, 1] [1],[2]
Diameter Longest path network [0,) [1]
Efficiency Efficiency of information exchange network [0,) [2],[3]
Cluster avg Proportion of interconnected adjacent nodes of a vertex network [0,1] [2],[3]
PLAND [%] Percentages of habitats in the landscape raster [0,100] [4],[5]
PD [# / ha 100] Patch density raster [0,) [4],[5]
ENN [m] Mean nearest neighbor distance raster [0,) [4],[5]
PARA [/] Perimeter-area ratio raster [0,) [4],[5]
IJI [%] Interspersion/juxtaposition index measuring spatial intermixing of different habitats raster [0,100] [4],[5]
CLUMPY [/] Clumpiness index measuring deviation from randomness raster [-1,1] [4],[5]
Table 3: Landscape metrics. A star indicates metrics that we normalize when comparing networks with different node numbers. Support is either “node” for node-scale network metrics, “network” for global network metrics, or “raster” for grid-based metrics. References: [1] Urban2009, [2] Lu2016, [3] Latora2001, [4] Mcgarigal1995, [5] Cushman2008

5 Application to the Lower Durance Valley in southern France

5.1 Landscape model structure

We fit parametric stochastic models in the study area, analyze them to select the most appropriate descriptors, and run a simulation experiment to study how well landscape metrics not directly controlled through the Gibbs energy terms are reproduced. We define a full model based on a moderate number of landscape descriptors we deem potentially useful to appropriately characterize the land-use allocation mechanism, and we estimate its parameters using the composite likelihood of Section 4.1. We then run parametric bootstrap to obtain confidence intervals and test the relevance of individual landscape descriptor. For patches and linear elements, we consider categories in each case: crop, semi-natural area for patches; presence or absence of a hedgerow allocated on a patch border for linear elements. We apply the full model described in Equation (4) using the following landscape descriptors. For patch objects: activity parameter (), patch area () using a -quantile, crop-hedge adjacency (), crop-crop adjacency (), with coefficients , , , respectively; for linear elements: activity parameter (), long segments (), hedge-hedge adjacency (), allocation of horizontal segments (), with coefficients , , , respectively, with formulas given in Table 2

. We select a condition on the average length of hedges to test if there is allocation preference towards hedges shorter or longer than the average. For areas, we instead use a condition on the first quartile since the patch area distribution shows high variance and small field sizes may benefit biodiversity through easier access to adjacent fields with complementary resources

(Sirami2019). In our landscape, crops tend to be allocated on moderately sized patches while big patches are dedicated to open area, and so we focus on small patch allocation. However, we can extend the model with another descriptor to control large patch allocation through a condition using the -quantile of patch size ()) to compare the performance of the simple model () and the extended model () for domain D1. We point out some salient results of the comparison of and , while detailed numerical results are reported in the Supplemental Material. Time dynamics such as crop rotation cannot be estimated here due to lack of data.

5.2 Parameter inference

We use the logistic regression (13) proposed in Section 4 to estimate the category allocation mechanism of patches and segments for the three regions D1, D2, D3. Table 4 reports parameter estimates of the coefficients

where standard errors, confidence intervals, significance (with respect to the null

) based on a parametric bootstrap with

simulations. Figure 8 in the Supplemental Material reports the boxplots for the parameter estimates. Throughout, the estimators are unbiased. Estimated parameters are all significant with one exception for D1 concerning hedge attribution to relatively long segments and hedge and crop interaction for D1 and D2. The signs of all the estimates are the same across the three subdomains, implying the same trends, such that the domains are structurally similar.

Patch area allocation to small patches has negative coefficient, i.e. crop is usually not cultivated on very small areas. By contrast, we find a positive coefficient for crop-crop adjacency, meaning that crops are preferably located in spatial clusters. Crop-hedge adjacency is not significant for the small domains D1 and D2 but for D3, and estimated coefficients are negative, meaning that hedges tend not to be directly allocated around crop patches. Hedge-hedge adjacency and the allocation of hedges to horizontal segments have significant positive coefficients throughout, such that allocation of hedges tends to be clustered (i.e. continuous in space) with horizontal orientation for the purpose of breaking winds. In the extended model , the coefficient controlling crop allocation to large patches is significantly negative, and results discussed below show major improvements in the model’s capability to realistically reproduce patch-related landscape metrics leading to a satisfactory performance of the extended model; see the Supplemental Material for detailed results. Overall, estimated parameters have comparable magnitudes between the three domains. In D2, we discern a strong signal indicating many short, strongly horizontally oriented hedges, as compared to D1 and the larger domain D3.

Crop Hedge
Activity Small area C-H C-C Activity Long H H-H Horizontal H
D1 Estimate -1.08 -1.24 -0.05 0.37 -2.38 -0.10 0.77 1.58
Mean -1.09 -1.25 -0.05 0.37 -2.37 -0.09 0.77 1.58
SD 0.37 0.34 0.09 0.10 0.19 0.19 0.07 0.19
D1+ Estimate -1.01 -1.52 0.03 0.40 -2.38 -0.10 0.77 1.58
Mean -1.06 -1.55 0.05 0.39 -2.37 -0.09 0.77 1.58
SD 0.34 0.35 0.07 0.10 0.19 0.19 0.07 0.19
D2 Estimate -1.65 -0.96 -0.08 0.64 -3.38 -0.58 0.77 3.65
Mean -1.67 -0.95 -0.08 0.64 -3.40 -0.58 0.77 3.69
SD 0.33 0.28 0.08 0.08 0.25 0.19 0.09 0.22
D3 Estimate -1.87 -1.37 -0.15 0.66 -3.01 -0.16 0.95 1.97
Mean -1.88 -1.39 -0.15 0.65 -3.02 -0.17 0.95 1.98
SD 0.10 0.14 0.02 0.03 0.06 0.05 0.02 0.06
Table 4: Parameter estimates for the subdomains (first column) of the study area. C stands for “crop” and H for “hedge”. “Mean” and “SD” values were calculated through a parametric bootstrap with simulations from each of the fitted models. Bold face indicates significance of the descriptor at the

level using Monte-Carlo simulation for the test statistics under the null hypothesis. As for the model D1+, we estimate also the parameter related to Big Area:

-1.13, -1.11, 0.29 for Estimated, Mean and SD, respectively.

5.3 Comparison of observed and simulated landscape metrics

We discuss graph- and raster-based metrics for crop allocation in the small region D1 by comparing observed values with the simulated distribution given the fitted model, using MCMC samples. This further allows testing the null hypothesis that the observed metric could have been generated by the model by using two-sided Monte-Carlo confidence intervals. Results for hedges and for other domains (D2, D3) are structurally similar and can be found in the Supplemental Material.

5.3.1 Network-related metrics

For models D1 and D1+, Figure 5 shows real and simulated landscape descriptor values through the red dot and the boxplot, respectively. The models tend to appropriately reproduce the real landscape descriptor values, especially in D1+ where the control over large patch allocation leads improves behavior for all patch-related descriptors.

Figure 6 shows real and simulated network metrics for patches. For node-scale metrics (two top rows), we observe good overlap of boxplots for observed and simulated values, with the exception of Betweenness where we tend to simulate larger values. Betweenness evaluates the number of the shortest paths going through a node when connecting any two other nodes and is very heavy-tailed since it shows high variability among different networks. Differences in the boxplots (real vs. simulated) may in part be due to the

times larger sample size in the case of simulations, while strong right-skewness and heavy tails are present in both cases. This skewness highlights that few crop patches have the bridging role of connecting different crop clusters, which is fundamental for maintaining the connectivity of the landscape

(Estrada2008; Urban2009; Belgrano2015), and this property is preserved in the fitted model.

For network-scale metrics (last row of Figure 6) we show the real landscape value within the boxplot of simulated values. The observed Diameter is within the interquartile range of the simulated ones, while observed values of Efficiency and Cluster Average are located towards the lower extremes of simulations. The numbers in Table 5 report the proportion of the simulated metric values that are “more excentric” than the observed one; e.g., if the observed value is below the median and among the simulated values are even lower, we report . These pseudo--values imply that observed metrics for the crop network still appear realistic under the model. Overall, network-scale results indicate slightly stronger clustering of crop in the model as compared to reality, but still with similar order of magnitude for metric values. We also report pseudo--values for hedge network-scale metrics in Table 5, which show stronger discrepancy between observed and simulated values. Node-scale metrics for hedges, more directly controlled through the network descriptors included in our model, remain satisfying. For the sake of parsimony and simplicity, the model studied here does not directly control descriptors evaluating the number or dimension of clusters, but only local relationships among patches. It is not surprising to obtain better performance of neighborhood-based centralities in comparison to path-based centralities and metrics, the latter based on the whole landscape.

Figure 5: Landscape descriptors for the small region D1 for the basic model (D1) and the extended model (D1+). Panel a) crop network; Panel b) hedgerow network. Boxplots represent simulated landscapes. The red dot represents the real landscape value.
Figure 6: Validation metrics for crop network. Panel a) metrics at node scale. Red dots represent the mean values of the node metrics of the real and simulated networks, respectively. Panel b) metrics at network scale. Boxplots represent simulations, red dots represent mean values of simulations, green dots represent the observed value.

5.3.2 Raster metrics

Figure 7 shows the raster-based landscape metrics of FRAGSTAT; see Table 3 for their description. The model has some difficulty to capture certain observed properties in this grid representation. In particular, the model directly controls only the number of elements according to certain criteria, but not proportions of spatial area. The pseudo--values listed in Table 5 numerically confirm this mismatch between model and real landscape for the proportion of landscape categories (PLAND), and for Patch Density (PD) of semi-natural patches. Other very small pseudo--values also indicate a mismatch, but the boxplots in Figure 7 show that the order of magnitude of metric values is still well captured by the model, even though the variability of simulations is not large enough. Moreover, in the raster representation two adjacent objects of the same type and category are considered as a single habitat, which is different from the vector-based representation in the model.

5.3.3 Correlation analysis of landscape descriptors and metrics

Different landscape descriptors and metrics may comprise similar information, and then non negligible correlation arises among such variables. If we seek a realistic representation of a specific metric through the model, then the landscape descriptors included in the model (or combinations of them) should be correlated with this metric. To assess this relationship, we use linear regression with the landscape descriptors as predictors and landscape metrics as dependent variables, and then consider the part of the standard deviation of the response not explained by the predictors. Including additional descriptors can then improve the performance with respect to network- or grid-based metrics of interest. For illustration, we analyse the difference among the models D1 and D1+ through the correlation analysis in Figure

8a. The descriptor formulated with respect to the -quantile of patch sizes is relatively uncorrelated to other descriptors and metrics for . However, the -quantile descriptor additionally included in is often highly correlated with other metrics for crop patches. In cases with strong negative or positive correlation, this tends to improve the validation performance in network-scale and raster metrics, see the diagram of Figure 8. It shows the evolution in the residual standard deviations not explained by the descriptors of the models. Moreover, Monte-Carlo pseudo--values in Table 5 improve strongly from to . Detailed results for the three study domains provided in the Supplementary Material indicate generally good performance for the large region D3.

Semi-natural Crop Hedge
D1 D1+ D1 D1+ D1 D1+
Diameter - - 0.26 0.57 0 0.19
Efficiency - - 0.06 0.56 0 0.23
Cluster average - - 0.13 0.16 0.01 0
PLAND 0 0.08 0 0.10 0.03 0
PD 0 0 0.26 0.44 0.41 0.37
PARA_MN 0.20 0.20 0.33 0.12 0.04 0.6
ENN_MN 0.11 0.49 0.24 0.30 0.01 0.01
IJI 0.02 0.45 0.19 0.47 0.08 0.47
CLUMPY 0.11 0.19 0.02 0.24 0 0
Table 5: Pseudo--values of network-scale metrics and raster-based metrics for the small region D1 with the basic model (D1) and the extended one (D1+).
Figure 7: Raster-based landscape metrics for model D1. Boxplots: simulated values. Red dots: mean of simulated values. Green dots: observed value.
Figure 8: Correlation analysis. Panel a) Correlations among landscape descriptors and metrics in models D1 and D1+. For network metrics, letters stand for the network layer: = patch, = linear element, = patch-linear element interaction. In raster metrics, SNH stands for Semi-natural habitat, C stands for Crop and H stands for hedges. Panel b) Evolution of patch-related metrics from D1 do D1+ based on standard deviation not explained by the model’s landscape descriptors.

6 Conclusion

Simulation of variable virtual landscapes opens up new ways of exploring various environmental issues (Langhammer2019). We have developed stochastic agricultural landscape models and statistical inference with a focus on the land-use allocation mechanism of patches and linear elements, using network models as an intuitive and flexible tool allowing for direct control and interpretation with respect to local landscape behavior. We have focused on descriptors based on single objects or pairs of such, leading to a certain robustness of modeling, estimation and simulation procedures. The sufficient summary statistics in the exponential family models in the data application to the Lower Durance Valley were satisfactorily reproduced by simulations from fitted models, especially after inserting a new descriptor in the extended model D1. We conclude that the fit is good, such that the model succeeds in capturing the key patterns of configuration and composition in real landscapes.

From a functional point of view, vector-based models such as ours are more parsimonious and meaningful (Gaucherel2012; Bonhomme2017), and they enable handling different spatial and temporal scales. In pixel-based approaches, an appropriate representation of small-surface elements such as hedges would require a very high raster resolution, while a homogeneous large-surface patch would be made up of a very large number of pixels, instead of a single geometric object in our model. Our multiplex network structure assures low computational cost and memory requirements.

We have checked the generality of calibrated models by studying how simulated landscape metrics are correlated to observed ones, using metrics that are not explicitly encoded into the model structure. Certain data patterns calculated through metrics on raster scale are not correctly reproduced by our models, but this is partly explained by instabilities in treating small-scale small-area patterns that are inherent to raster discretizations. Linear element allocation also shows some discrepancy between model and data for large-scale clustering properties. To remedy the issue of appropriately reproducing an important landscape descriptor that is not directly controlled by the model, we can include related descriptors into the model (Kleijnen1995), or add additional constraints during simulation using techniques such as Simulated Annealing.

We point out the potential of Approximate Bayesian Computation (ABC, e.g. Grelaud2009) for parameter estimation and for the inverse problem of finding the model parameters that allow generating specific target values of landscape descriptors. Estimation through ABC is possible by setting target values equal to observed landscape descriptors, and is asymptotically consistent under mild conditions if the set of target descriptors corresponds to the descriptors in the Gibbs energy (4

). ABC can be useful for likelihood-free model selection using Bayes factors

(Grelaud2009). However, unreported preliminary results show that rather long computation times arise with this method.

We do not directly model human action in the temporal dynamics of agricultural environments (Bonhomme2017; Poggi2018). For this, we would have to couple our model with a decision tool. Future developments also comprise the integration of our land-use allocation model with (existing) generative parametric tessellation model for the geometrical support.


We are thankful to Claire Lavigne and Katarzyna Adamczyk for their help and wise suggestions for landscape data processing and for the discussion part.