I Introduction
Environments that contain obstacles are of interest in a wide variety of fields Dietrich and Köster (2014); Gabella (1990); Grima (2007); Helbing et al. (2005); Höfling and Franosch (2013); Lerner et al. (2007); Moussaïd et al. (2011); Rehder and Kloeden (2015); RoosenRunge et al. (2011); Rühl (2005); Simpson and Plank (2017); Varas et al. (2007). In biology, it is well known that the motion of macromolecules and proteins in the cytosol is restricted by the densely crowded nature of the interior of cells Höfling and Franosch (2013); RoosenRunge et al. (2011). The meshlike structure of the enteric nervous system, highlighted in Figure 1(a), contains clusters of glial cells connected by nerve strands, as well as regions that are inaccessible to the enteric glial cells Gabella (1990); Rühl (2005). Hence the location and movement of glial cells is constricted by these inaccessible regions Gabella (1990). In the context of pedestrian dynamics, successful navigation around an obstacle without jamming is a key criteria for the design of safe egress routes Alizadeh (2011); Dietrich and Köster (2014); Helbing et al. (2005); Moussaïd et al. (2011); Varas et al. (2007)
. Similarly, predicting how pedestrians will react to pathblocking obstacles is a key question in computer vision, as developing algorithms for robots to reliably avoid collisions with pedestrians is crucial
Rehder and Kloeden (2015); Ziebart et al. (2009).Within such environments individuals can undergo selforganization and form highly spatially structured populations, such as the aforementioned clusters of glial cells Gabella (1990); Rühl (2005) or pedestrian lanes Chraibi et al. (2010); Moussaïd et al. (2011)
. Quantifying the amount of spatial structure present within an environment provides insight into the mechanisms by which the individuals are governed. Therefore, measures that can be applied to experimental data to obtain estimates of the spatial structure within the data are critical
Perry et al. (2002). Various methods for quantifying spatial structure have been proposed previously (for example, see the review by Perry et al. Perry et al. (2002), and references therein). Here we focus on the use of pair correlation functions (PCFs), which are a powerful and versatile tool for analysing spatial structure and spatial correlation Agnew et al. (2014); Bahcall and Soneira (1983); Binder and Simpson (2013, 2015); Binny et al. (2016); Dini et al. (2018); Holzmann and Castin (1999); Gavagnin et al. (2018); Johnston et al. (2014, 2016); Law et al. (2009). Pair correlation functions have been successfully employed in astrophysics Bahcall and Soneira (1983), particle physics Holzmann and Castin (1999), ecology Flügge et al. (2012); Law et al. (2009); Rajala et al. (2018) and cell biology Binder and Simpson (2013); Johnston et al. (2014), amongst others. Briefly, the pair correlation for a given distance , , can be defined aswhere is the number of pairs of individuals separated by a distance observed in the data, and the normalization term, , is the expected number of individuals separated by distance if the individuals are located randomly throughout the experimental domain. If there are more individuals separated by a particular distance than expected for randomly located individuals then , and hence there is spatial structure corresponding to aggregation at that distance. Similarly, if fewer individuals are separated by a particular distance than expected then , which suggests that there is spatial structure corresponding to segregation present at distance Binder and Simpson (2013).
As it is unlikely that any two pairs of individuals are separated by exactly the same distance, is typically divided into bins Binder and Simpson (2015). This can take the form of considering the environment as continuous space and binning the measured distance between pairs of individuals, or by mapping individuals onto a discrete domain such that there is a finite number of possible distances between pairs of individuals Binder and Simpson (2013, 2015); Johnston et al. (2014); Gavagnin et al. (2018). There has been significant recent focus on PCFs for discrete, or latticebased, domains Agnew et al. (2014); Binder and Simpson (2013, 2015); Gavagnin et al. (2018); Johnston et al. (2014, 2016), and deriving analytic expressions for the normalization term under various distance metrics Binder and Simpson (2013); Gavagnin et al. (2018). In particular, Binder and Simpson Binder and Simpson (2013) present a normalization term for rectilinear distance in and , illustrated in Figures 2(a)(b), which corresponds to the distance separating two lattice sites in and , respectively. More recently, Gavagnin et al. Gavagnin et al. (2018) derive a normalization term under the taxicab and square uniform distance metrics, illustrated schematically in Figures 2(c)(d), respectively. Under the taxicab distance metric and the square uniform distance metric, the distance between two lattice sites can be thought of as the number of “jumps” between the two sites under movement occurring in a von Neumann neighborhood (four nearestneighbors) and a Moore neighborhood (eight nearestneighbors), respectively Gavagnin et al. (2018).
However, while these PCFs have proven useful in a range of applications, they are unsuitable for analyzing environments that contain inaccessible regions, due to either “holes” in the domain or the presence of obstacles. In these environments, Cartesian distance measures do not adequately describe the distance between two individuals. For example, for the glial cells presented in Figure 1(a), certain cells are separated by a path distance that is significantly longer than the Cartesian distance between the cells. Therefore, naïvely calibrating a standard PCF to this data may result in a lack of identification of spatial correlation between cells, or may result in spurious correlations being reported. Here we propose an analytic method for calculating a corrected PCF (cPCF) for lattice domains containing obstacles or inaccessible regions. In Section II, we construct the cPCF in a systematic manner, first considering a single inaccessible site, and subsequently increasing the number of sites within an inaccessible region, as well as increasing the number of inaccessible regions. Through comparison with pathfinding algorithms, we show that the derived normalization term is exact. In Section III, we demonstrate that this cPCF is required to isolate the correlation associated with the mechanisms governing the behavior of individuals from correlation associated with the structure of the environment. Further, we show that analysis with the exact normalization term is significantly less computationally intensive to perform, compared to using a pathfinding algorithm, and we discuss environments where an approximation to the normalisation term can be used effectively. Finally, in Section IV, we discuss our results and suggest potential avenues for future research.
Ii Derivation
First we illustrate domains where a different distance metric is required. In Figure 3(a)(d) we introduce a single inaccessible lattice site and examine how the distance metrics change, compared to the domain in Figure 2. We note that neither of the rectilinear distances change, and hence all sites remain the same distance from the center site, excluding the inaccessible sites. As such, the counts of pair distances are only reduced by the reduction in the number of lattice sites. The taxicab distance is more significantly impacted by the introduction of the inaccessible site, as to travel from the center of the domain to the rightmost side now requires that the inaccessible site is avoided. Hence the taxicab distance between the center site and sites on the opposite side of the inaccessible site, with respect to the center site, increases by two (Figure 3(c)). The square uniform distance is also unaffected by the inaccessible site, as diagonal “jumps” count the same as either horizontal or vertical “jumps”. Therefore the inaccessible site can be avoided by two diagonal “jumps”, rather than the two horizontal “jumps”, and the distance does not change (Figure 3(d)). If we introduce a larger inaccessible region, as presented in Figures 3(e)(h), we again see that both rectilinear differences are not influenced. As before, the taxicab distance is influenced as the inaccessible region must be avoided to travel between the center site and sites on the right boundary (Figure 3(g)). In contrast to the small inaccessible region, the square uniform distance is now affected by the presence of the larger inaccessible region, as the larger region cannot be avoided through diagonal movement (Figure 3(h)). We note that any inaccessible region aside from a single site will influence the square uniform distance. As the size of a lattice site typically corresponds to the size of an individual, a distance metric that is not impacted by the presence of obstacles of that size is not appropriate. Furthermore, the majority of models that are implemented on a lattice with obstacles typically only allow movement to one of four nearestneighbors Alizadeh (2011); Burstedde et al. (2001); Ellery et al. (2015, 2016a, 2016b); Nicolau Jr et al. (2007); Wedemeier et al. (2009), and hence the taxicab distance metric is implicitly applied. As such, in the remainder of this work, we consider only the taxicab distance metric.
ii.1 Standard pair correlation functions
To obtain the cPCF for an environment with obstacles under the taxicab distance metric, we first introduce the counts of pair distances for an environment without obstacles. For a domain containing sites in the direction and sites in the direction with noflux boundary conditions, the maximum pair distance is . As experimental images are typically captured such that the influence of boundary effects are minimized, noflux boundary conditions are perhaps the most relevant boundary conditions Johnston et al. (2014). The alternative choice is periodic boundary conditions; however, we do not expect an individual passing through one boundary to arrive at the opposing boundary. Recently, Gavagnin et al. Gavagnin et al. (2018) derived the counts of pair distances for a domain without obstacles, , for
Introducing inaccessible sites into the domain increases the distances between pairs of sites (Figure 3), and hence we require an expression for the counts of pair distances for . For an arbitrary by domain with no obstacles, the counts of pair distances are (see Appendix A for the derivation)
(1) 
The PCF is calculated by evaluating the counts of pair distances between occupied sites, , and normalizing by the expected number of pair distances obtained from and the average occupancy of the domain. If there are occupied sites and accessible sites, where is the number of inaccessible sites, then the expected counts of pair distances is Binder and Simpson (2013)
(2) 
Picking two accessible sites at random,
is the probability that the first selected site is occupied, and
is the probability that the second site is occupied, given that an occupied site has been selected previously.There are two counts that must be obtained from the data: the counts of pair distances between occupied sites, , and the counts of pair distances between accessible sites, . While may have to be obtained via a pathfinding algorithm, the number of occupied sites is typically small compared to the total number of sites. As such, calculating will require significantly fewer iterations of the pathfinding algorithm, which scales with the square of occupied sites for or the square of accessible sites for Gavagnin et al. (2018). Hence, even if calculating via a pathfinding algorithm is prohibitively computationally intensive, should be able to be calculated rapidly.
ii.2 Corrected pair correlation functions
Next, we focus on obtaining an expression for for domains containing inaccessible sites, by adjusting the counts of pair distances for a domain with no obstacles, , to account for inaccessible sites. This takes the form of several additional terms:

Accessibleinaccessible pairs, denoted , which are pairs of sites in the domain that consist of an accessible site and an inaccessible site. As these pairs are counted in , we require that is accounted for via the removal of these pairs.

Inaccessibleinaccessible pairs, denoted , which are pairs of inaccessible sites. Again, these sites are counted in and must be removed to obtain .

Shifted pairs, denoted , which are pairs of accessible sites where the path distance between the sites is different to the Cartesian distance due to the presence of inaccessible sites. Shifted pairs consist of lost pairs, which are pairs of sites in that are no longer present due to inaccessible sites altering the distance (Figures 5(b),(d)), and gained pairs, which are pairs of sites that are not in but are now present due to the introduction of inaccessible sites (Figures 5(c),(e)).
ii.3 Single inaccessible site
We will derive an expression for each of the adjustment terms by systematically considering different configurations of inaccessible sites. We first consider a single inaccessible site, such as presented in Figure 4, and use the subscript to denote the special case of a single inaccessible site. The coloring on each site in Figure 4 highlights the distance between that site and the inaccessible site. For a single inaccessible site, the number of sites with a specific color therefore corresponds to the accessibleinaccessible pairs, . In the absence of boundaries, there are 4 pairs for a distance .
Intuitively, the boundaries reduce the number of pairs of sites separated by larger values of . To calculate which of the 4 pairs lie outside the boundary, we introduce eight values, which represent the distance between the inaccessible site and the boundaries and corners of the domain. The distance to the boundary, , where for the boundary in the left, right, down and up directions, respectively, is defined as
where and correspond to the and location of the inaccessible site. Similarly, the distance to the corner, where and for the corner in the downleft, downright, upleft, upright directions, respectively, is defined as
These values allow us to define a function for the number of pairs containing a site that is located outside of the boundaries, , referred to as outofdomain pairs, and hence
(3) 
The accessible site belonging to an accessibleinaccessible pair can either be located in the same row or column as the inaccessible site (Figures 4(a)(c)), or not located in the same row and not in the same column as the inaccessible site (Figures 4(d)(f)). For sites in either the same row or column as the inaccessible site, there is at most one site at a distance in each direction (Figures 4(a)(c)). Further, the sites will only be in the domain for distances less than the distance to the boundary. We therefore introduce the function
which represents the number of outofdomain pairs of distance with respect to the boundary . For the number of outofdomain pairs of distance in diagonal directions, intuitively there will be no pairs if the distance is less than the minimum distance to a boundary of interest. For all distances greater than the minimum distance to a boundary, but less than the distance to the other boundary of interest, we observe that there is the same number of sites in the domain. For example, in Figure 4(e), in the downleft direction, we observe only two sites highlighted in cyan for . In comparison, in the downright direction, where is less than the minimum distance to a boundary, we observe two, three, four and five sites highlighted in cyan. Hence, the number of outofdomain pairs increases exactly with distance for distances greater than the minimum distance to a boundary, but less than the distance to the other boundary of interest. For distances greater than the maximum distance to a boundary of interest, we observe that the number of sites highlighted in cyan decreases exactly with distance. Hence the number of outofdomain pairs increases by two for an increase in distance of one. Finally, for distances greater than the distance to the corner site, there will be no pairs inside the domain, and the number of outofdomain pairs must be . We note that in all cases the sum of number of the pairs outside and inside the domain is for a distance in a particular diagonal direction. Combining these observations, we introduce the function
which represents the number of outofdomain pairs in each diagonal direction. Note that both and do not need to be evaluated for pair distances greater than the largest distance between the inaccessible site and the boundary or corner, respectively. The number of outofdomain pairs is therefore
For a single inaccessible site there are no inaccessibleinaccessible pairs. Therefore, we next consider the shifted pairs for a single inaccessible site. In this case, pairs of sites that are located on either side of the inaccessible site are shifted pairs, as highlighted in Figure 5. The relevant row and column are presented in Figures 5(b)(e), with an example site highlighted to demonstrate the difference between the pair distances in the domain if the inaccessible site is included (Figures 5(c) and 5(e)) or not (Figures 5(b) and 5(d)). Intuitively, we observe that if both sites in the pair are on the same side of the inaccessible site then the presence of the inaccessible site is irrelevant. If the sites in the pair are on opposite sides of the inaccessible site, then the inaccessible site increases the pair distance by two, which accounts for a path that avoids the inaccessible site. As we initially consider the counts of pair distances for the domain without inaccessible sites, we can incorporate the shifted pairs by removing the pairs present in Figures 5(b) and 5(d) and including the pairs present in Figures 5(c) and 5(e). We note that these are the lost pairs and gained pairs, respectively.
We therefore require an expression for the counts of pair distances within the subdomains in Figures 5(b)(e) for pairs with an inaccessible site on both sides of the inaccessible site. Two observations are useful here; the total number of pairs is conserved, and avoiding the inaccessible site is equivalent to extending the subdomain by two sites, adding two more inaccessible sites, and calculating the pair distances as if the inaccessible site can be passed through. The minimum pair distance for pairs that are located on separate sides of an inaccessible site is one greater than the number of inaccessible sites between them. Hence is defined on and , respectively for the subdomains in Figures 5(b) and 5(d). The maximum number of pairs separated by a given distance is restricted by the requirement that sites are located on both sides of the inaccessible site, and hence has an upper bound of in the horizontal direction and in the vertical direction. These values represent the minimum of the number of sites either side of the inaccessible site. Further, there is only one possible pair of sites separated by a distance of 2 and (or ), two possible pairs of sites for a distance of 3 and (or ), and so forth. Combining this with the aforementioned upper bound we obtain an expression for for a single inaccessible site,
(4) 
As noted previously, the number of shifted pairs is conserved. As the pair distance increases by two in the presence of a single inaccessible site, here
(5) 
This corresponds to an increase of two in both the number of inaccessible sites and the subdomain length. We have now considered all the adjustment terms for a single inaccessible site and therefore by combining (1), (3) and (4)(5), we obtain the expression for the count of pair distances for a domain with a single inaccessible site, noting for a single inaccessible site,
(6) 
ii.4 Clusters of inaccessible sites
We now consider a domain with several inaccessible sites that form a by 1 cluster of inaccessible sites, as presented in Figure 6. Introducing the two additional sites in this example results in an increase in the number of accessibleinaccessible pairs. The total number of accessibleinaccessible pairs, , is given by
(7) 
where is the set of inaccessible sites in the domain and is defined in (3). Including the additional inaccessible sites introduces inaccessibleinaccessible pairs. The calculation of the number of inaccessibleinaccessible pairs, , is straightforward, as the distance between such pairs is simply the taxicab distance between the two relevant sites:
(8) 
where is the indicator function, one when and zero otherwise, is the number of inaccessible sites, is the taxicab distance between two sites located at sites and . We observe that for , as discussed previously.
To calculate lost and gained pairs for a cluster of inaccessible sites, we consider the example domain in Figure 6. The lost pairs and gained pairs associated with the horizontal direction are relatively straightforward, and can be calculated from the subdomains presented in Figures 6(b)(c). We introduce the general counts of pair distances for a onedimensional subdomain , where is the total number of sites in the subdomain, is the number of inaccessible sites in the subdomain and is the relevant or value, as defined previously. The function is
(9) 
We note that this generalizes Equations (4)(5) by allowing for an arbitrary number of inaccessible sites. By using this generalized definition, we can consider transformations of the cluster of inaccessible sites to multiple onedimensional subdomains with varying and , and calculating the value for each. For example, for the horizontal subdomains in Figure 6(b), the lost pair subdomain uses the , and values obtained from the original domain, whereas the gained pair subdomain in Figure 6(c) uses , and . This is consistent with the previous observation of a longer pair distance corresponding to the path around the inaccessible sites. We reiterate that this implies that the sites around the inaccessible sites are accessible. The vertical subdomains in Figure 6(a) are more complicated due to the increase in the number of columns where pairs of sites can be located such that the distance between them is influenced by the inaccessible sites. These columns are highlighted in pink in Figure 6(a). As the accessible sites must be located on either side of the inaccessible sites, there are a total of nine combinations of columns. Hence, the transformation of the vertical subdomain around the cluster of inaccessible sites results in nine onedimensional subdomains. For the lost pairs, there can be one, two or three inaccessible sites between the pair of accessible sites. The number of each of these possibilities is highlighted in Figure 6(d). Note that an increase in the number of inaccessible sites corresponds to an increase in both and . For the gained pairs, there can again be one, two or three inaccessible sites between the pair of accessible sites. However, the magnitude of the increase in pair distance due to the inaccessible sites depends on which columns the pair belongs to. If either one of the accessible sites is in an outermost column, then the increase in pair distance is two. However, if both accessible sites are in the middle column, then the increase in pair distance is four, as the path from one site to the other must avoid all of the inaccessible sites. In general, the increase in distance is two times the minimum distance between the columns (rows) that the accessible sites belong to, and the columns (rows) on either side of the inaccessible sites. We illustrate the transformation of the vertical subdomain into onedimensional subdomains for gained pairs in Figure 6(e), as well as the relevant number of each subdomain. In general, an by 1 cluster of inaccessible sites results in
(10) 
and
(11) 
The first term in both equations corresponds to the horizontal subdomain, and the second and third terms correspond to the vertical subdomains. A similar function for the lost and gained pairs would describe a 1 by cluster of inaccessible sites, as this is simply a rotation of the by 1 cluster of inaccessible sites.
It is now relatively straightforward to extend the lost pairs and gained pairs functions, (10)(11), to apply to an by cluster of inaccessible sites, as presented in Figure 7. We note that the vertical subdomain transformation is similar to the previous by 1 cluster, albeit with an increase in . Now that the cluster has , the horizontal subdomain transformation also results in multiple onedimensional subdomains. We note that the transformation is the same as for the vertical subdomain except for rotation and hence it is straightforward to obtain the lost pairs function for an arbitrary by cluster of inaccessible sites,
(12) 
where
is the horizontal contribution to the lost pair distances and
is the vertical contribution to the lost pair distances. Similarly, the gained pairs function for an arbitrary by cluster of inaccessible sites can be obtained, and is given by
(13) 
where
is the horizontal contribution to the gained pair distances and
is the vertical contribution to the gained pair distances.
Multiple clusters of inaccessible sites
Thus far we have considered only a single cluster of inaccessible sites. We now consider the generalization to multiple clusters of inaccessible sites, as illustrated in Figure 8(a). In this example, we consider a domain with 1 by 1 clusters of inaccessible sites. However, we note that the approach generalizes to rectangular clusters of any size provided that the clusters are arranged such that entirely accessible rows and columns exist on either side of all clusters. We also note that the previous definitions of the accessibleinaccessible pairs and inaccessibleinaccessible pairs, (7)(8), are valid here. Similar to the approach to obtain the lost and gained pairs for a by cluster, a subdomain is isolated and transformed into onedimensional subdomains, where is the number of clusters within a subdomain. Consider the column subdomain presented in Figure 8(b). The transformation isolates all possible combinations of inaccessible sites, and the lost pairs and gained pairs functions are calculated for each onedimensional subdomain. The transformation of the subdomain in Figure 8(b) results in three subdomains. The first two subdomains contain a single inaccessible site and total sites, where corresponds to the maximum number of sites from the original subdomain that are contiguous and still contain only that single inaccessible site. The third subdomain contains both inaccessible sites, and renders all of the intervening sites inaccessible. Here corresponds to the length of the original subdomain. As we are able to calculate the lost and gained pair functions on a subdomain with a single cluster of inaccessible sites, transforming the subdomain with multiple separate inaccessible sites provides a straightforward method for this calculation. We note that the single inaccessible site subdomains provide pairs that contain sites in the uppermost (or bottommost) region of the original subdomain as well as in the middle of the original subdomain. The subdomain with additional inaccessible sites provides pairs that are located in both the uppermost and bottommost region of the original subdomain. In Figure 8(c), we consider the row with three single inaccessible sites, and the associated transformation. As before, we consider all possible combinations of inaccessible sites: three subdomains with a single inaccessible site, two subdomains with inaccessible regions bounded by two inaccessible sites, and finally a subdomain containing all three inaccessible sites, and the sites between them treated as inaccessible.
We now generalize this approach to distinct clusters within a row (column) of clusters of inaccessible sites, where () is the set of coordinates of the leftmost (uppermost) site in inaccessible clusters and () is the set of coordinates of the rightmost (bottommost) site in inaccessible clusters.
(15) 
The first summation represents the number of clusters of inaccessible sites in the domain, and the second summation represents the number of combinations of neighboring clusters. The value corresponds to the number of sites in the cluster in the horizontal direction, and the value corresponds to the length of the subdomain. Repeating this processes over all distinct rows and columns of clusters of inaccessible sites, we obtain
(16) 
for the lost pair distances and, following similar arguments,
(17) 
for the gained pair distances, where () is the number of rows (columns) that contain distinct clusters of inaccessible sites. A cluster that has would only contribute once to rather than times, and we note that the domain in Figure 8(a) has and .
Therefore, the expression for the counts of pair distances for a domain with obstacles is
(18) 
where , , , and are defined in (1), (7)(8) and (16)(17), respectively. Again, we note that this expression is exact provided that the obstacles are arranged such that each obstacle has an entirely vacant row or column on all sides of the obstacle.
Iii Results
We first verify that the analytic expression (18) exactly calculates the counts of pair distances for a domain with obstacles. In Figures 9(a),(c),(e),(g) we present four different domains with inaccessible sites highlighted in black. For each domain we calculate the counts of pair distances numerically using Matlab’s , which calculates the shortest distance between any two points on a graph, given the adjacency matrix of the graph. This is the approach suggested by Gavagnin et al. Gavagnin et al. (2018) for calculating general PCFs, who note that computational cost is for a square domain with sites in each direction, which can become computationally infeasible even for modest values . We then evaluate the analytic expression (18) and present the count of pair distances in Figures 9(b),(d),(f),(h). We observe that the analytic and numerical counts are the same in each case.
We next examine the differences between the PCF and the cPCF for a range of domains that contain inaccessible sites. In Figure 10 we present four domains, containing 0, 1, 25 and 576 clusters of inaccessible sites. The remaining sites are populated with agents (red) at random such that the average occupancy of accessible sites is 20%. We calculate both the PCF and the cPCF for one hundred identicallyprepared realizations and present the functions in Figures 10(b),(e),(h),(k) and 10(c),(f),(i),(l) for the PCF and the cPCF, respectively. As the accessible sites are populated randomly, there should be no pair correlation present and hence for all . For the domain with 0 inaccessible sites (Figure 10(a)), we expect that the PCF and cPCF will be identical, as the cPCF reduces to the PCF in this case. As expected, we observe that the correlation functions in Figures 10(b) and 10(c) are indistinguishable. For all three domains with inaccessible sites, the PCF incorrectly suggests that pair correlation is present within the images for a range of pair distances. For the domain in Figure 10(d), the PCF implies that there is a mechanism that results in both short and longrange clustering, as the correlation value is greater than one for and . Further, there appears to be a mechanism which inhibits clustering at intermediary distances. A similar trend is observed for the third domain (Figure 10(g)), as well as regular oscillations in correlation for . For the fourth domain (Figure 10(j)), these oscillations dominate the PCF. In contrast, the cPCF is approximately one for all pair distances and correctly indicates that there is no mechanism influencing clustering present in the locations of the agents. Hence, the correlation observed in Figures 10(e),(h),(k) is an artifact associated with the inaccessible sites.
We next compare the PCF and the cPCF for the same domains as analyzed previously, but for agents that follow a birthmovement exclusionbased random walk process. Such processes are discussed in detail elsewhere and have been used to mimic the behavior of a cell population Johnston et al. (2014); Simpson et al. (2010). Briefly, we populate accessible sites with agents at random such that the average occupancy is 1%. Agents undergo birth and movement events with probabilities and per time step, respectively. During a time step, agents are selected randomly with replacement and undergo birth events, where a daughter agent is placed at one of four randomly selected nearestneighbor sites Simpson et al. (2010). This birth event is successful if the selected nearestneighbor site does not contain an agent, and the selected site is not inaccessible. After the birth events have been attempted, agents are selected randomly with replacement to undergo movement events. During a movement event, one of four nearestneighbor sites is selected at random, and an agent attempts to move to that lattice site Simpson et al. (2010). Similar to the birth events, this event is successful if the selected site does not contain an agent and the selected site is not inaccessible. This random walk process is used as it is known to result in shortrange correlation between agents due to the birth mechanism Johnston et al. (2014), as birth events inherently cause agents to be located at neighboring sites. This clustering tendency is countered by the movement mechanism, which acts in a diffusive manner. Hence, for higher ratios of to , we expect to see more shortrange correlation, and for , we expect to see no correlation.
Four representative snapshots of domain occupancy for agents following the birthmovement random walk process are shown in Figures 11(a),(d),(g),(j). In each simulation we use a final time that is weighted by the chance of successfully undergoing movement or birth, as the location and number of the inaccessible sites can influence this chance, and hence comparisons between simulations may have an effectively different final time. For the domain in Figure 11(j), for example, there are many sites that have only two accessible neighbor sites. This can be compensated for by scaling the final time by the ratio of the number of accessible neighbor sites if all neighbor sites are accessible to the actual number of accessible neighbor sites in the domain. Hence, for Figure 11(j), we scale the final time by approximately 1.48, as there are 5196 out of a possible 7696 accessible neighbor sites. For all simulations and , before scaling. Compared to the randomly occupied domains presented in Figures 10(a),(d),(g),(j) we immediately observe that the agents are located in clusters. When we calculate the PCFs for these domains, we therefore expect to observe pair correlation values greater than one for short distances. We present the PCF for all four domains in Figures 11(b),(e),(h),(k). While we observe that the pair correlation is greater than one at short distances in all four cases, the correlation at longer pair distances varies. For the domain with no inaccessible sites (Figure 11(a)), the pair correlation is below one for and above one for , as expected. In the second domain, the correlation is below one for intermediate pair distances and above one for . The correlation for the third domain is approximately one for intermediary and large
values. For the fourth domain, the pair correlation is only above one for short distances, and below one otherwise. However, there is a distinct oscillatory pattern between odd and even pair distances. As the mechanisms in the random walk process only result in correlation at a pair distance of one, it is unlikely that these oscillations of this magnitude arise from the random walk process. To examine whether these correlations are indeed present due to the random walk mechanisms, we present the cPCF in Figures
11(c),(f),(i),(l) for the four domains. Again, for the domain with no inaccessible sites, the PCF and the cPCF are the same. In all cases, we observe the expected high correlation at short distance associated with birth events. For the domain in Figure 11(d) the correlation is approximately one for the remainder of the pair distances. For the domains in Figures 11,(g),(j), however, it appears that the correlation decreases with pair distance. This suggests that the restricted geometry of the domain may influence the spreading of the agents, even while scaling the final time. Interestingly, this decrease is also present in the PCF, for the domain in Figure 11(j), albeit in the presence of oscillations. For all four domains the cPCF is relatively consistent, compared to the PCF, which is strongly domain dependent. As such, the cPCF provides a meaningful measure of the correlation present in the domain, as it is able to isolate the agentagent correlation from spurious correlations arising from domain geometry.Domain  50x50  50x50  100x100  100x100  150x150  150x150 

Number of clusters  25  100  25  100  25  100 
Analytic time (s)  2.18  2.36  55.61  30.35  211.64  194.83 
Numerical time (s)  6.48  6.41  310.85  75.58  738.05  573.98 
Domain  50x50  50x50  100x100  100x100  150x150  150x150 

Number of clusters  25  100  25  100  25  100 
Analytic time (s)  3.81  3.16  62.24  56.23  345.20  352.57 
Numerical time (s)  19.02  14.41  348.06  304.75  2053.61  2013.19 
Finally, we compare the time taken to evaluate the cPCF both using the pathfinding algorithm described previously and the analytic expression (18). We consider domains of different sizes randomly occupied by agents such that 20% of the accessible sites contain an agent for different numbers of clusters of inaccessible sites. In Table 1 we present the time required to evaluate the cPCF if the domain is randomly generated with no restriction on the number of inaccessible sites, and in Table 2 we present the time required if the maximum number of inaccessible sites is restricted to 25% of the total number of sites. In both cases, we observe that the numerical approach is always slower, and requires between three and six times the computation time of the analytic approach. Interestingly, the numerical approach requires additional computation time for fewer inaccessible sites. Furthermore, as the numerical algorithm requires individual calculation of the distance from one site to all other sites for each site (approximately algorithm realizations), memory issues become a considerable problem as the domain increases in size.
Approximation
The cPCF relies on a normalization term, (18), that is only exact for certain configurations of inaccessible sites. As it is computationally intensive to determine the normalization term through many realizations of a pathfinding algorithm Gavagnin et al. (2018), which is required for the exact counts of pair distances for such configurations, it is of interest to examine whether an approximation of the analytic normalization term provides a sufficiently accurate alternative. As discussed previously, the corrected normalization term is composed of the standard pairs, accessibleinaccessible pairs, inaccessibleinaccessible pairs and shifted pairs terms. The restriction to certain configurations of inaccessible sites is solely due to the shifted pairs, as the number of other pairs are calculated using standard distance metrics rather than path distance. Hence, if we do not consider the shifted pair distances, the restriction on inaccessible site configurations can be relaxed. We first examine the contributions of shifted pairs to the overall pair distances to determine the size of this contribution. As the shifted pairs consist of both negative and positive terms, corresponding to lost and gained pairs, respectively, the combined terms may only provide a small contribution to the total pairs. We note that for each lost pair there is a corresponding gained pair and hence the total number of pairs is constant independent of whether the lost and gained pairs are considered. In Figure 12 we present the contribution of the shifted pairs term to the overall count of pair distances for the three domains in Figure 11(d),(g),(j). The blue bars correspond to the standard pair distance counts, , the green bars correspond to standard pair distance counts corrected by accessibleinaccessible and inaccessibleinaccessible pairs, and the orange bars correspond to the correction associated with the shifted pairs. As such, the red line corresponds to the approximation of the pair distance count and the black dashed line corresponds to the exact corrected pair distance count. We observe that the shifted pairs provide a small contribution, except in the case of a single large cluster of inaccessible sites (Figure 12(a)). As such, an approximation of the corrected normalization term may result in a valid approximation of the cPCF, provided that the domain is not dominated by a single large cluster of inaccessible sites. We note that for these domains we are able to compare the analytic pair distances with the approximation as the configuration of inaccessible sites means that the analytic function is exact. The approximation of the counts of the pair distances is
(19) 
For the domains considered previously we present both the cPCF and the corresponding approximation in Figures 12(d)(f). As expected, we see that for the domain with a large cluster of inaccessible sites, the approximation is poor for pair distances similar in size to the cluster. For the other two domains, the approximate cPCF performs well. Finally, we populate domains with inaccessible sites at random and calculate the approximate cPCF. All accessible sites on the domain are populated by agents such that the expected pair correlation is one for all pair distances. In Figure 13, we present the average approximate cPCF for 50 random identicallyprepared domains for a range of numbers of inaccessible sites, as well as the mean error associated with each approximate cPCF. Intuitively, we observe that an increase in inaccessible sites corresponds to an increase in the distance between the expected pair correlation and the approximation. Increasing the number of inaccessible sites while populating the domain with inaccessible sites at random introduces accessible sites that are not connected to the remainder of the domain, and hence we do not consider the cPCF approximation for higher numbers of inaccessible sites. As such, the approximation may prove useful for calculating pair correlations for domains where the inaccessible sites do not satisfy the conditions required for the exact pair distance calculation, but have a modest proportion of inaccessible sites compared to accessible sites.
Iv Discussion and conclusions
Analysis of the spatial structure present in experimental images provides valuable insight into the mechanisms governing behavior within the experiment Johnston et al. (2014); Law et al. (2009); Perry et al. (2002). Pair correlation functions have been employed in a variety of fields, including ecology Law et al. (2009), biology Dini et al. (2018) and physics Bahcall and Soneira (1983), and have proven useful for elucidating the presence and impact of spatial structure. Many experimental environments contain immobile obstacles that influence the transport and location of individuals within that environment Rehder and Kloeden (2015); RoosenRunge et al. (2011); Moussaïd et al. (2011). Isolating the spatial structure associated with the mechanisms that govern transport, rather than the heterogeneous nature of the environment is therefore of interest. Naively applying standard PCFs does not account for distances between pairs of individuals that must avoid obstacles, and may result in the incorrect suggestion of spatial correlations.
Here we have presented an exact analytic expression for the normalization term of a corrected PCF that incorporates a physical path distance between individuals, and hence can be applied to environments with obstacles. We demonstrate that this cPCF is necessary for isolating the spatial correlation associated with the locations of individuals from the spatial correlation associated with the environment itself. Further, we highlight that the analytic expression allows for the cPCF to be calculated significantly faster than relying on a pathfinding algorithm. We apply the cPCF to images arising from a latticebased movementbirth random walk, which mimics cell motility and cell proliferation, where shortrange correlation is known to exist, and demonstrate the cPCF recovers this correlation. Standard PCFs can introduce spurious correlations as well as oscillations in the correlation. Finally, we present an approximation to the cPCF that relaxes assumptions on the locations of the inaccessible sites within the domain and show that for modest numbers of inaccessible sites, the approximation is accurate.
The work and analysis presented here could be extended in a number of directions. One obvious application is to calculate the PCF for experimental data obtained from an environment that contains obstacles. Here we have focused on data arriving from simulations that mimic processes such as cell migration and proliferation Johnston et al. (2014) rather than explicitly using experimental data. As previous investigations involving the application of PCFs to experimental data have proved fruitful Agnew et al. (2014); Johnston et al. (2014); Dini et al. (2018), the application of the cPCF to appropriate data may prove to be insightful. Another promising approach would be to examine how the pair correlation changes between two experiments on different domains. As the cPCF is able to isolate the correlation associated with the behavior of individuals, it would be instructive to consider whether the behavior is dependent on environment and, if so, quantify which mechanisms are are responsible for this change in behavior.
Acknowledgements.
This research was in part conducted and funded by the Australian Research Council Centre of Excellence in Convergent BioNano Science and Technology (project no. CE140100036). We thank Pradeep Rajasekhar and Daniel Poole for providing the image of the nervous system used in Figure 1(a).Appendix A Standard counts of pair distances derivation
Consider an arbitrary domain with no obstacles and by lattice sites. For pair distances that satisfy , the number of pairs of sites separated by a distance was presented by Gavagnin et al. Gavagnin et al. (2018). However, the maximum pair distance on such a domain with noflux boundary conditions is , and we therefore must derive counts of pair distances for .
We first consider pair distances where , that is, pairs of sites that cannot be connected via a path containing only “jumps” in the shorter of the horizontal or vertical directions. For example, if , the path connecting pairs of sites separated by distance must contain either entirely horizontal “jumps”, or a combination of horizontal and vertical “jumps”. For a particular , the number of pairs of sites separated by horizontal “jumps” is . This term is the product of the number of pairs of sites separated by horizontal “jumps” within a single row, , and the number of rows, . The number of sites separated by horizontal “jumps” and 1 vertical “jump” is . Compared to the horizontal “jumps” case, there is an additional number of pairs of sites separated by horizontal “jumps”, providing the component. However, as each distance contains a vertical “jump”, the rows containing the sites must be offset by one and hence there is one less row where this can occur, which results in the term. Finally, as the vertical “jump” can be in either the positive or negative vertical direction, this introduces the factor of two. Introducing additional vertical “jumps” increases the number of pairs of sites separated by the resulting smaller number of horizontal “jumps”, while decreasing the effective number of rows. Noting that and are interchangeable, we therefore obtain
for .
We next consider pair distances , where distances between sites must include both horizontal and vertical “jumps”. Again, without loss of generality, we assume that . The minimum number of vertical “jumps” for a distance is , and the maximum number is . The corresponding number of pairs of rows that are separated by this vertical distance is , and the number of pairs of sites separated by this vertical distance and the requisite horizontal distance is . Following the same process as above, taking a summation over the possible rows and columns, and noting that the vertical separation can be either in the positive or negative direction, we obtain
for .
References
 Lerner et al. (2007) A. Lerner, Y. Chrysanthou, and D. Lischinski, in Computer Graphics Forum (Wiley Online Library, 2007), vol. 26, pp. 655–664.
 Dietrich and Köster (2014) F. Dietrich and G. Köster, Phys. Rev. E 89, 062801 (2014).
 Gabella (1990) G. Gabella, J. Auton. Nerv. Syst. 30, S59 (1990).
 Grima (2007) R. Grima, Theor. Biol. Med. Model. 4, 2 (2007).
 Helbing et al. (2005) D. Helbing, L. Buzna, A. Johansson, and T. Werner, Transp. Sci. 39, 1 (2005).
 Höfling and Franosch (2013) F. Höfling and T. Franosch, Rep. Prog. Phys. 76, 046602 (2013).
 Moussaïd et al. (2011) M. Moussaïd, D. Helbing, and G. Theraulaz, Proc. Natl. Acad. Sci. 108, 6884 (2011).
 Rehder and Kloeden (2015) E. Rehder and H. Kloeden, in Proceedings of the IEEE International Conference on Computer Vision Workshops (2015), pp. 50–58.
 RoosenRunge et al. (2011) F. RoosenRunge, M. Hennig, F. Zhang, R. M. Jacobs, M. Sztucki, H. Schober, T. Seydel, and F. Schreiber, Proc. Natl. Acad. Sci. 108, 11815 (2011).
 Rühl (2005) A. Rühl, Neurogastroenterol. Motil. 17, 777 (2005).
 Simpson and Plank (2017) M. J. Simpson and M. J. Plank, Results Phys. 7, 3346 (2017).
 Varas et al. (2007) A. Varas, M. D. Cornejo, D. Mainemer, B. Toledo, J. Rogan, V. Munoz, and J. A. Valdivia, Physica A 382, 631 (2007).
 Alizadeh (2011) R. Alizadeh, Saf. Sci. 49, 315 (2011).
 Ziebart et al. (2009) B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A. Bagnell, M. Hebert, A. K. Dey, and S. Srinivasa, in IEEE/RSJ International Conference on Intelligent Robots and Systems (IEEE, 2009), pp. 3931–3936.
 Chraibi et al. (2010) M. Chraibi, A. Seyfried, and A. Schadschneider, Phys. Rev. E 82, 046111 (2010).
 Perry et al. (2002) J. N. Perry, A. M. Liebhold, M. S. Rosenberg, J. Dungan, M. Miriti, A. Jakomulska, and S. CitronPousty, Ecography 25, 578 (2002).
 Agnew et al. (2014) D. J. G. Agnew, J. E. F. Green, T. M. Brown, M. J. Simpson, and B. J. Binder, J. Theor. Biol. 352, 16 (2014).
 Bahcall and Soneira (1983) N. A. Bahcall and R. M. Soneira, Astrophys. J. 270, 20 (1983).
 Binder and Simpson (2013) B. J. Binder and M. J. Simpson, Phys. Rev. E 88, 022705 (2013).
 Binder and Simpson (2015) B. J. Binder and M. J. Simpson, R. Soc. Open Sci. 2, 140494 (2015).
 Binny et al. (2016) R. N. Binny, P. Haridas, A. James, R. Law, M. J. Simpson, and M. J. Plank, PeerJ 4, e1689 (2016).
 Dini et al. (2018) S. Dini, B. J. Binder, and J. E. F. Green, J. Theor. Biol. 439, 50 (2018).
 Holzmann and Castin (1999) M. Holzmann and Y. Castin, Eur. Phys. J. D 7, 425 (1999).
 Gavagnin et al. (2018) E. Gavagnin, J. P. Owen, and C. A. Yates, Phys. Rev. E 97, 062104 (2018).
 Johnston et al. (2014) S. T. Johnston, M. J. Simpson, D. S. McElwain, B. J. Binder, and J. V. Ross, Open Biol. 4, 140097 (2014).
 Johnston et al. (2016) S. T. Johnston, J. V. Ross, B. J. Binder, D. S. McElwain, P. Haridas, and M. J. Simpson, J. Theor. Biol. 400, 19 (2016).
 Law et al. (2009) R. Law, J. Illian, D. F. Burslem, G. Gratzer, C. Gunatilleke, and I. Gunatilleke, J. Ecol. 97, 616 (2009).
 Flügge et al. (2012) A. J. Flügge, S. C. Olhede, and D. J. Murrell, Ecology 93, 1540 (2012).
 Rajala et al. (2018) T. Rajala, S. Olhede, and D. J. Murrell, J. Ecol. (2018).
 Burstedde et al. (2001) C. Burstedde, K. Klauck, A. Schadschneider, and J. Zittartz, Physica A 295, 507 (2001).
 Ellery et al. (2015) A. J. Ellery, R. E. Baker, and M. J. Simpson, Phys. Biol. 12, 066010 (2015).
 Ellery et al. (2016a) A. J. Ellery, R. E. Baker, and M. J. Simpson, J. Chem. Phys. 144, 171104 (2016a).
 Ellery et al. (2016b) A. J. Ellery, R. E. Baker, and M. J. Simpson, Phys. Biol. 13, 05LT02 (2016b).
 Nicolau Jr et al. (2007) D. V. Nicolau Jr, J. F. Hancock, and K. Burrage, Biophys. J. 92, 1975 (2007).
 Wedemeier et al. (2009) A. Wedemeier, H. Merlitz, C.X. Wu, and J. Langowski, J. Chem. Phys. 131, 064905 (2009).
 Simpson et al. (2010) M. J. Simpson, K. A. Landman, and B. D. Hughes, Physica A 389, 3779 (2010).
Comments
There are no comments yet.