Tessellations are mathematical representations of space division into cells. This paper focuses on a class of tessellations with internal vertices with three incident edges, two of them subtending a straight angle. Such tessellations are referred to as T-tessellations and may represent various spatial patterns, like those illustrated in Figure (1): cracked soil, agricultural landscapes, burnt wood, patterns on a reptile’s skin. Random T-tessellation models allow for spatial variability of such patterns, generating cells with different sizes and shapes. One of the first random T-tessellation models was proposed by Gilbert () for representing needle-shaped crystals. A variant of the model providing rectangular cells was studied by Miles and Mackisack (). The stable with respect to iteration (STIT) tessellation model was proposed by Nagel and Weiss () for modeling random crack networks. Another example of a random T-tessellation was proposed by Arak, Clifford and Surgailis () in their paper on a random graph model. For the particular choice of the parameters, the model yields T-tessellation patterns.
More recently, a Completely Random T-Tessellation (CRTT) model was introduced by Kiêu et al. ( ).
The model, based on Poisson line process, was considered as a reference measure on a set of T-tessellations.
Gibbs T-tessellations models can be constructed by means of a probability density w.r.t. the CRTT reference measure.
The density involves a set of tessellation statistics the model should account for. The tessellations
with large (resp. small) values of the targeted statistics are favored by an appropriate tuning
of model parameters. The possibility of the model to control various tessellation features makes it
attractive for representing a broad scope of the observed T-tessellation patterns.
A Metropolis-Hasting-Green algorithm for model simulation was derived in (
). The model, based on Poisson line process, was considered as a reference measure on a set of T-tessellations. Gibbs T-tessellations models can be constructed by means of a probability density w.r.t. the CRTT reference measure. The density involves a set of tessellation statistics the model should account for. The tessellations with large (resp. small) values of the targeted statistics are favored by an appropriate tuning of model parameters. The possibility of the model to control various tessellation features makes it attractive for representing a broad scope of the observed T-tessellation patterns. A Metropolis-Hasting-Green algorithm for model simulation was derived in () and implemented in the C++ library ().
This paper focuses on statistical inference for Gibbsian T-tessellation models. The models parameters were estimated by Monte Carlo Maximum Likelihood (MCML). The estimation method was tested on the example of three agricultural landscapes, approximated by T-tessellations. The models fitted to the landscapes aimed at finding the features responsible for the observed contrasts of landscape structures. The goodness-of-fit of the models was assessed by a global envelope test based on the empty-space function .
The paper is structured as follows. Section (2) describes landscape datasets and presents the algorithm of landscape approximation by a T-tessellation. Section (3) defines the Gibbsian T-tessellation model. The principle of MCML is outlined and the properties of the estimates are investigated by simulations. Section (4) presents the results of model fit to the three datasets and points out the main differences between the landscapes. The goodness-of-fit test is carried out for each dataset. Finally, Section (5) contains the concluding remarks and highlights further perspectives.
2 From landscape data to a T-tessellation
Agricultural landscapes exhibit patterns that can be reasonably approximated by T-tessellations. Indeed, as illustrated in Figure (2), most of the vertices of polygons representing agricultural fields are of the T-type. Although the T-vertices are dominant, actual agricultural landscapes are not T-tessellations and must be modified to match the definition of T-tessellation. In the following, an algorithm for transforming a landscape into a T-tessellation is proposed and tested on three datasets.
Three French agricultural landscapes were selected for the study: the landscape of Selommes (Centre-Val de Loire), an area of intensive cereal crop cultivation; the landscape of Kervidy (Brittany), a region characterized by mixed crop-livestock farming systems; and the landscape of the Basse Valée de la Durance (BVD) in Provence-Alpes-Côte d’Azur region, specialized in apple orchards. Different types of agricultural production characterizing the three regions are in part responsible for the variability of landscape patterns observed in Figure (2). A more detailed presentation of the datasets and the related agroecological research problems can be found in () for the Selommes landscape, in () for the Kervidy landscape and in () for the BVD landscape. The landscapes were restricted to the domains encompassing a comparable number of fields: for Selommes, for BVD and for Kervidy.
2.2 Landscape approximation algorithm
A polygonal tessellation of a squared domain is illustrated in Figure (3). The main components of the tessellation are highlighted: the cells, the edges (non-empty intersections of two cells), the vertices (non-empty intersections of three cells) and the segments (maximal unions of the aligned and contiguous edges).
A polygonal tessellation of a bounded domain is called a T-tessellation if the following conditions are simultaneously fulfilled:
- Condition 1
Each internal vertex of a tessellation has exactly three adjacent edges, two of them subtending the angle of .
- Condition 2
There is no pair of distinct and aligned segments.
The purpose of the approximation algorithm is to construct a T-tessellation that closely matches the input set of polygons. The algorithm requires the coordinates of polygons and the coordinates of the tessellated domain, composed from an outer polygon and polygons-holes. The consecutive steps of the algorithm are described below.
Grouping polygons sides (Fig. (4a))
The first step of the algorithm consists of grouping the sides of polygons by hierarchical clustering. Two sides belong to the same group if they are spatially close and if their orientation is similar. The dissimilarity measure that takes these two aspects into account is defined as the sum of two terms: the
area of the convex envelope of two sides, denoted by
The first step of the algorithm consists of grouping the sides of polygons by hierarchical clustering. Two sides belong to the same group if they are spatially close and if their orientation is similar. The dissimilarity measure that takes these two aspects into account is defined as the sum of two terms: the area of the convex envelope of two sides, denoted by, and the smallest Euclidean distance between the sides, denoted by :
The first term becomes large when the sides and are far apart and/or perpendicularly oriented; the second one accounts for the distant sides belonging to the same straight line. To make the two components of the criterion independent of the metric, they are normalized by the product of the lengths of the sides and by their average length, respectively. The classification of the sides is based on the single linkage method. In this method, the distance between two groups of sides is defined as the minimal dissimilarity between the pairs of sides from each group. At each step of the algorithm, the two closest groups are merged, starting from the groups containing single sides.
Choice of representative segments (Fig. (4b))
Each group of sides is then replaced by a representative segment: a regression line is fitted to the endpoints of the grouped sides. The projection of each endpoint on the regression line is calculated. A representative segment is the smallest segment on the regression line, including the projections of sides endpoints.
Transformation into T-tessellation (Fig. (4c))
The representative segments are inserted into a tessellation domain, defining a tessellation that is not yet a T-tessellation. It contains the I-vertices with one incident edge, L-vertices with two incident edges, and X-vertices formed by the crossing segments. To remove an I-vertex, the edge can be removed or lengthened until it meets another edge. In order to choose between these two modifications, a variation of total internal length is calculated: the modification that induces the smallest length variation is selected. The same rule also governs the order of removing the I-vertices: at each iteration, the I-vertex whose suppression induces the smallest variation in length is selected. The approach for removing the L-vertices is quite similar: one of two incident edges is lengthened until it meets another edge. The edge that induces the smallest increase in total internal length is selected. The approach for fixing the order of removal of L-vertices is the same as for I-vertices.
In order to remove an X-vertex from a tessellation, one of two segments passing through the vertex is broken into two halves at the vertex location. The end of a segment half lying at the vertex location is moved along the other incident segment by a small distance. The procedure is repeated until there is no X-vertex left in the tessellation.
Figure (2) presents the three landscapes approximated by T-tessellations.
The distributions of three basic polygon characteristics in the Selommes landscape and its approximation are compared in Figure (5). The distributions of perimeter and area are similar before and after transformation. As for the number of vertices, its median value is comparable but the quantile and quantile are clearly higher in the original landscape. It is the effect of replacing closely aligned polygons sides in the original landscape by the representative segments in its approximation.
3 Gibbsian T-tessellation model
Let denote a set of all T-tessellations of a bounded polygonal domain . Each tessellation can be defined by the union of the boundaries of its cells , a closed set with an empty interior. Thus, we can define on a standard hitting -algebra for the closed sets, generated by the events :
for , a set of all compact subsets of (see ). The corresponding measurable space will be denoted by . It should be noted that we can assign to each tessellation a set of the lines supporting the segments of . Conversely, we can assign to each line pattern intersecting the set of all T-tessellations supported by the lines of . This observation leads us to a definition of a counting measure on , corresponding to a line pattern .
Let be a line pattern intersecting a domain and let be a set of all T-tessellations supported by the lines of . A measure on is defined as:
where stands for an indicator function of a set . The measure counts the tessellations in supported by the lines of . When is a realization of a Poisson line process, the following extension of the previous definition holds:
Let be a Poisson line process with unit intensity, restricted to . A measure on is defined as follows:
where is a normalizing constant.
The model defined by the probability distribution (
The model defined by the probability distribution (3) was proposed in  and referred to as the Completely Random T-Tessellation Model (CRTT). The finiteness of measures (2) and (3) was proven in  and .
The CRTT model can be considered as a reference measure on .
Let be a -dimensional vector of tessellation statistics and let
-dimensional vector of tessellation statistics and letbe a corresponding parameter vector belonging to a parametric space . Let us assume that each component of can be written as the sum of local contributions of tessellation vertices, edges, cells or segments:
where denotes the set of tessellation elements and varies from to . The model that favors the tessellations with high (resp. low) values of statistics is given by the following density function w.r.t. the measure :
where is an unknown normalizing constant. The scalar product defining the probability density 5 is denoted by :
Model (5) is referred as a Gibbs model for T-tessellations and the function is the energy function of the model. Depending on the chosen statistics, the model makes it possible to control various T-tessellation features: cell area variability, angles between edges, etc.
Let us consider a Gibbs model with the following statistic :
The first sum in (7) runs through a tessellation cells, denoted by . For each cell , the set of its vertices is considered in a second sum. The vertex contributes to the energy function only if the angle between the sides incident to is acute (see Figure 6). The contribution of a vertex is close to if the sides are almost perpendicular. Thus the statistic measures a deviation of tessellation cells from rectangles.
A Metropolis-Hastings-Green algorithm for the simulation of the Gibbs model (5) was proposed and its convergence properties were discussed in (). It is based on three local updates of a T-tessellation: a split of tessellation cell, a merge of two cells and a removal of an edge at the end of a blocking segment, followed by an insertion of an edge incident to a new endpoint of the shortened segment. The algorithm was implemented in the C++ library ().
3.1 MCML inference
Model parameters were estimated by Monte Carlo Maximum Likelihood method, proposed in  for the families of distributions specified by the unnormalized densities. The method relies on the possibility of sampling from the targeted distribution. For the Gibbs model, the log-likelihood ratio against a fixed point is defined as:
This ratio depends on the ratio of unknown normalizing constants, which can be expressed as:
The estimate of the unknown log-likelihood ratio (8) is obtained by substituting the ratio of normalizing constants with its approximation (10). It is called a Monte Carlo Likelihood (MCL) function and denoted by . The maximizer of the MCL function approximates the unknown value of the Maximum Likelihood estimate . The asymptotic properties of and were derived in  and .
The likelihood of exponential family models is convex. Hence the optimization of its Monte Carlo approximation is justified. The method holds for every value of , but in practice the choice of this parameter has an impact on the quality of the estimation. If is far from the exact estimate, MCL approximation is poor and will be far from as well. One possible solution to this problem is to iterate the optimization algorithm. The trust region method applied here for calculating (see ) starts from an arbitrary parameter and maximizes quadratic approximation of over a region around , yielding a new parameter value. The size of the region is adjusted in order to obtain a correct approximation of by a quadratic model.
Figure 8 illustrates the algorithm for calculating the estimates of a two-parameter model with an energy function:
where is the number of internal segments of and is the angle statistic already defined in example (1). The model was fitted to the tessellation simulated with parameter in the unit square (see Fig. 8a). The algorithm started from and reached the maximum after iterations. The calculation of a current value of the estimate was based on a Monte Carlo sample of size . The distribution of two statistics calculated over runs of the fitted model appears to be correctly centered on the observed values (see Fig. 8c).
The standard error of the Maximum Likelihood estimate
The standard error of the Maximum Likelihood estimatefor model (11), calculated from the Monte Carlo approximation of the Fisher Information Matrix with , was equal to . The asymptotic results given by Geyer in () make it possible to estimate the standard error of the approximation , referred to as the Monte Carlo Standard Error (MCSE). Figure (9) shows the graphs of MCSE() and MCSE() as the sample size grows from to . The highest value of MCSE() was equal to for the sample size . The MCSE() decreased rapidly as the sample size varied from to . Beyond this threshold no significant improvement of the accuracy of was observed.
4 Gibbs model for landscapes comparison and simulation
The epidemic models of plant diseases are studied at the landscape scale. Landscape features are likely to play a role in the epidemic spread, interacting with other variables determining the dynamics of the host-pathogen system. The impact of variables is evaluated through numerical experiments (see ). Landscape effect is measured by comparing the model’s outputs calculated for the landscapes contrasted with respect to the targeted features, like in factorial experiments. The design of such experiments requires a set of landscapes that differ with respect to the variables of interest. If the access to the landscape databases is somehow restricted, simulations of simplified landscape representations can be an alternative approach.
The Gibbs model can be a useful tool for yielding contrasted tessellation patterns that inherit the observed landscapes features. Let us illustrate this approach on the example of three landscapes in (2). The first step is to find out what the characteristics are that differ between the landscapes, fitting a guess model to landscape approximations and comparing the estimates of model parameters.
The guideline for choosing the model statistics is therefore their ability to capture the contrasts between the observed patterns. Based on these considerations, the candidate model with the following statistics was proposed first:
the number of tessellation cells , accounting for the differences in tessellation scales;
the sum of squared cell areas , measuring the departure from the patterns with cells with equal areas, minimizing the value of the statistic;
the angle statistics , measuring the departure from orthogonal patterns. This statistic, already introduced in example (7), was renamed here so that all of the statistics would have a common notation;
the number of long cells . A cell is said to be long if the length-to-width ratio of its smallest enclosing rectangle (see Fig. 10) is greater than a threshold . In this case the value of was fixed at , the value close to the median length-to-width ratio in the BVD landscape.
Let , and be the observed tessellations, approximating the landscapes represented in Figure (2). Let a tessellation follow a Gibbsian model given by the energy:
for . Table (1) gives Monte Carlo Maximum Likelihood estimates of model parameters together with their confidence sets, based on the approximation of the Fisher Information Matrix inverse. The size of the Monte Carlo sample was fixed at .
The model fitted to the BVD landscape tends to generate the tessellations close to the patterns with equally sized cells (high positive value of ) and orthogonal segments (positive value of ).The model favors the patterns with a number of long cells greater than in other landscapes and greater than in the reference model (negative value of ). The model fitted to the Kervidy landscape is closest to completely random T-tessellation. The estimates of parameters and are non-significantly different from zero. The penalty for the acute angles between the segments () is positive, but its value is smaller than the estimates calculated for the Selommes and BVD landscapes. The model fitted to Selommes data generates the most rectangular patterns with the smallest number of long cells.
Figure (11) gives the examples of model simulations for the three landscapes. According to the model properties, the mean values of the statistics calculated over a sample of tessellations simulated from (12) are centered on the values observed in the landscapes. In this sense, the model is able to generate simplified landscape representations that inherit selected landscape features.
4.1 Empty-space function for testing goodness-of-fit
The empty-space function is a summary statistic commonly applied in spatial point pattern analysis to characterize
the observed patterns and to check point process model validity (see ). The function definition can be extended
to the other random sets, e.g. line segments and linear networks (see , ). For a random stationary tessellation , the empty-space
function is defined as the cumulative distribution function of the shortest distance
, the empty-space function is defined as the cumulative distribution function of the shortest distancefrom an arbitrary point in the tessellation domain to :
Since is stationary, the definition (13) does not depend on a particular choice of . The function can be interpreted as the average fraction of the area of the region obtained by the dilation of tessellation by a disc with a radius . Consequently, for tessellations with cells that have regular shapes and sizes, such as grid patterns, will tend to grow more rapidly than, for example, in the CRTT model. The empty-space function is thus a useful tool for comparing the observed patterns between themselves or with the patterns generated by a candidate model. The border-corrected estimate of is calculated over the points sampled in the tessellation domain and separated from the border by a distance greater than or equal to :
where denotes the border of the tessellation domain (see ).
The global envelope test for model checking compares a functional statistic of the observed pattern with its counterparts
calculated for the simulations of the null model (see ). The test is applied for checking point process models, but its extension to
random tessellation models is straightforward. The principle of the test procedure is to reduce a tessellation pattern to a random variable . A null hypothesis states that:
to a random variableand to compare the distribution of for the observed pattern with its distribution simulated from a null model. Let denote the value of for the observed tessellation and let be the values of calculated for a sample of tessellations generated by a null model. The tessellations are assumed to be independent of each other and independent of the observed pattern. Let denote the distribution of the i.i.d. sample
. A null hypothesis states that:
Let be the rank of in a pulled sample . Under the null hypothesis this is an i.i.d sample and the following equality holds:
Thus, if is greater than a maximal value of than the null hypothesis (15) is rejected at a significance level .
The choice of the variable is subject to the constraint that its large positive values are unlikely under a null hypothesis. Different variants of envelope test were proposed depending on the definition of (see ). In the Maximum Absolute Deviation test based on the empty-space function, the variable is defined as the maximal absolute difference between the estimate of and its expected value under the null hypothesis:
The maximum in (16) is taken over a range of distances from to units. If in formula (16) cannot be calculated explicitly, it can be estimated from the simulations of a null model. Testing the null hypothesis (15) with the MAD test is equivalent to checking if calculated for the observed tessellation falls inside the interval:
for every in . If this is the case, there is no evidence for rejecting .
The test was applied to check the goodness-of-fit of model (12). Figure 12 shows the envelopes (17) based on model simulations. The estimated -function for the observed patterns falls entirely inside the envelopes for each landscape. At the significance level there is no evidence against the null hypothesis that landscape patterns follow the Gibbs model (12) with the parameters values given by Table (1). The corresponding p-values of the test are: for the Selommes, for the BVD and for the Kervidy landscape.
Random T-tessellations following the Gibbs model proposed in () may represent a broad scope of spatial patterns. This paper completes the model with a statistical inference tested on the example of agricultural landscapes, approximated by T-tessellations. The proposed landscape model highlights the differences between the observed patterns. Model parameters are estimated by the Monte Carlo Maximum Likelihood method. Confidence sets calculated from MC approximation of the Fisher Information Matrix enable parameters comparison. The goodness-of-fit of the model is assessed by the global envelope test for the empty-space function.
The framework of Monte Carlo Likelihood inference may give rise to the development of variable selection methods for the Gibbs model. Starting from a set of candidate variables, a stepwise procedure can be considered for variables inclusion and elimination, based on Monte Carlo approximation of Likelihood tests. From a practical view point it should be accompanied by the development of more efficient computational methods for finding . Indeed, the calculation of the estimate becomes cumbersome as the number of variables grows.
Strong correlations between the model statistics may be at the origin of the difficulties in model fit. The example of the model fitted to the BVD landscape illustrates this problem. The model accounts for a high number of long cells and, at the same time, favors the patterns with equally distributed cell areas. In order to satisfy both criteria the model tends to generate the cells with extremely elongated shapes, as illustrated in Figure (11). Consequently, the model alters the mean elongation of tessellation cells, making it difficult to fit the model while controlling the values of this statistic at the same time. The question of how to select a set of tessellation statistics to avoid correlation issues is of major interest for statistical inference.
Finally, the generalization of the model to polygonal tessellations with no restrictions on the vertices type would be of great practical interest as well. The source of bias due to the approximation of the original pattern by a T-tessellation could be avoided in this way.
We are grateful to Claire Lavigne and Sylvain Poggi for access to the landscape datasets and for their precious advice.
This work was supported partly by the French PIA project “Lorraine Université d’Excellence” (ANR-15-IDEX-04-LUE) and by the Metaprogram SMACH (Sustainable Management of Crop Health, http://www.smach.inra.fr/) of the French National Research Institute for Agriculture, Food and Environment (INRAE).
-  K. Adamczyk-Chauvat and K. Kiêu. LiTe. http://kien-kieu.github.io/lite, 2015.
-  T. Arak, P. Clifford, and D. Surgailis. Point-based polygonal models for random graphs. Advances in Applied Probability, 25:348–372, 1993.
-  A. Baddeley, E. Rubak, and R. Turner. Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC, 2016.
-  R. Fletcher. Practical Methods of Optimization, Second Edition. Wiley, 2000.
-  R. Foxall and A. Baddeley. Nonparametric measures of association between a spatial point process and a random set, with geological applications. Journal of the Royal Statistical Society, Series C, 51:165–182, 2002.
-  C.J. Geyer. On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society, Series B, 56:261–274, 1994.
-  C.J. Geyer. Likelihood inference for spatial point processes. In O.E. Barndorff-Nielsen, W.S. Kendall, and M.N.M. van Lieshout, editors, Stochastic Geometry - Likelihood and Computation, volume 80 of Monographs on Statistics and Applied Probability, pages 79–140. Chapman and Hall/CRC, 1999.
-  E.N. Gilbert. Applications of Undergraduate Mathematics in Engineering, chapter Random plane networks and needle-shaped crystals. Noble, B., 1967.
-  J. Kahn. How many T-tessellations on k lines? existence of associated Gibbs measures on bounded convex domains. Random Structures & Algorithms, 47:561–587, 2015.
-  K. Kiêu, K. Adamczyk-Chauvat, H. Monod, and R.S. Stoica. A completely random T-tessellation model and Gibbsian extensions. Spatial Statistics, 6:118–138, 2013.
-  F.K. Klienschroth, J.R. Healey, F. Mortier, and R.S. Stoica. Effects of logging on roadless space in intact forest landscapes of the Congo Basin. Conservation Biology, 31:469–480, 2016.
-  M. Mackisack and R. Miles. Homogeneous rectangular tessellations. Advances in Applied Probability, 28:993–1013, 1996.
-  B. Miasojedow, W. Niemiro, J. Palczewski, and W. Rejchel. Asymptotics of Monte Carlo likelihood estimators. Probability and Mathematical Statistics, 36:295–310, 2016.
-  M. Myllymaki, T. Mrkvicka, P. Grabarnik, H. Seijo, and U. Hahn. Global envelope tests for spatial processes. Journal of the Royal Statistical Society, Series B, 79:381–404, 2017.
-  W. Nagel and V. Weiß. Crack STIT tessellations: Characterization of stationary random tessellations stable with respect to iteration. Advances in Applied Probability, 37:859–883, 2005.
-  J. Papaïx, K. Adamczyk-Chauvat, A. Bouvier, K. Kiêu, S. Touzeau, and C Lannou. Pathogen population dynamics in agricultural landscapes: the Ddal modelling framework. Infection, Genetics and Evolution, 27:509–520, 2014.
-  S. Pivard, K. Adamczyk, J. Lecomte, C. Lavigne, A. Bouvier, A. Deville, P.H. Gouyon, and S. Huet. Where do the ferail oilseed populations come from? A large-scale study of their possible origin in a farmland area. Journal of Applied Ecology, 45:476–485, 2008.
-  S. Poggi, A. Delaplace, P. Pichelin, and R. Le Cointe. History of land uses over the period 2013-2019 in the Kervidy-Naizin watershed (Brittany, France). https://doi.org/10.15454/ATYOBO, 2020.
-  B. Ricci, P. Franck, J.-F. Toubon, J.-C. Bouvier, B. Sauphanor, and C. Lavigne. The influence of landscape on insect pest dynamics: a case study in southeastern France. Landscape Ecology, 24:337–349, 2009.
-  D. Stoyan, W.S. Kendall, and J. Mecke. Stochastic geometry and its applications, Second edition. Wiley, 1996.