1 Introduction
Many systems are spatial in nature. When working with spatial data sets, it is important to study the role of underlying spatial relationships [11]. To illustrate this importance, consider the spatiotemporal dynamics of Coronavirus disease 2019 (COVID19) case rates, which is one of the key motivations for our work. The spatial adjacencies between the neighborhoods of a city affect these dynamics, and it is important to account for them. Researchers have studied a wide variety of spatial data sets, such as gross domestic product (GDP) and life expectancy by country [40, 3] and voting in elections across different regions of a state [17]. Such data sets often also include temporal information (e.g., daily COVID19 case rates), and it is also important to take this into account.
We develop new methods for using topological data analysis (TDA) to analyze geospatial (i.e., geographical) and geospatiotemporal data sets in a way that directly incorporates spatial information. TDA is a way of studying the “shape” of a data set [6]. Using persistent homology (PH), a tool from algebraic topology, allows one to find geometric voids of different dimensions in a data set and to quantify the “persistence” of these voids [30]. Zerodimensional (0D) voids are connected components, and onedimensional (1D) voids are holes; such voids are particularly important in twodimensional (2D) spatial data. To quantify the persistence of holes and other voids, one constructs a simplicial complex (a combinatorial description of a topological space) and a filtration function (see Section 2.1). In our work, we treat the geographical data as 2D data and construct a 2D filtered simplicial complex to represent it. PH has yielded insights into a wide variety of areas, such as dynamical systems [42, 45, 24, 38, 1], neuroscience [33, 18], materials science [5], and chemistry [25]. Spatial applications that have been examined as 2D data sets include sensor networks [14], percolation, [35], and citystreet networks and other complex systems [16].
In our consideration of timedependent data, we use vineyards, which were introduced in [13] as a way of representing timevarying PH, to incorporate temporal information. One can visualize a vineyard as a continuous stack of persistence diagrams (PDs), with one PD for each time point. The PH features trace out curves, which are called vines, in . See Section 2.2 for the definition of a vineyard.
In our approach, the voids that PH identifies correspond to local extrema of realvalued geospatial data. Our approach captures both local information (specifically, the locations and values of the local extrema) and global information about the relationships between the extrema, such as the extent to which extrema are spatially separated. The vineyards allow us to measure the persistence of the extrema over time and to track how the locations of the extrema change over time. We are thus able to track how the spatial structure of the data changes over time.
One of the contributions of our paper is a method for constructing an efficient simplicial complex that is homeomorphic to a geographical space (the set of regions, as we will explain shortly). It is important to attempt to minimize the number of simplices in a simplicial complex because PH and vineyard computation times are very sensitive to the number of simplices. More precisely, we construct a 2D simplicial complex that comes with a natural mapping from the set of 2D simplices to the geographical regions. In our construction, the union of any subset of geographical regions is homeomorphic to the subsimplicial complex (see Section 3 for the definition of a subsimplicial complex) that is induced by the union of the corresponding simplices. When every geographical region is simply connected, our construction uses the minimal number of simplices that a simplicial complex with the property above can have. We believe that our construction is also minimal in more general cases, but we do not prove this. Other methods of producing simplicial complexes from geospatial data, such as rasterization of a shapefile or by treating the regions as a point cloud, require a tradeoff between the number of simplices and the accuracy of the representation of the geographical regions. For example, the levelsetbased PH method of [17] uses ordersofmagnitude more simplices to achieve sufficient resolution of the smallest geographical regions (e.g., densely populated urban centers that are important to analyze). See Section 7 for further discussion.
Our method addresses several limitations of previous efforts to combine TDA with geospatial analysis. In [37], Stolz et. al. studied the percentage of United Kingdom voters by district that voted to leave the European Union in the socalled “Brexit” referendum. The holes that they identified using PH corresponded to districts that voted differently than the surrounding districts. However, the approach in [37] did not distinguish between PH classes that were merely noise and PH classes that corresponded to small geographical districts. In [17], Feng and Porter developed an approach for studying PH in which they constructed filtered simplicial complexes using the levelset method of front propagation [29]. Using their levelset complexes, they examined the percentage of voters in each precinct of California counties that voted for a given candidate (e.g., Hillary Clinton) in the 2016 presidential election. The PH features represented precincts that voted more heavily for Clinton than the surrounding precincts (e.g., “islands of blue in a sea of red”). The levelset complexes in [17] have two key limitations. The first is that they cannot handle timedependent data, as they are built to study data at a single point in time or data that has been aggregated over some time window to yield timeindependent data. The second limitation is that they reduce realvalued data (e.g., the percentage of voters who voted for Clinton) to binary data (e.g., whether or not the majority voted for Clinton). Consequently, in this example, the levelsetbased PH does not capture the extent to which a blue “political island” voted more heavily for Clinton. By contrast, our approach is designed specifically to capture such information. As a tradeoff, we no longer capture the geographical sizes of the political islands. See Feng, Hickok, and Porter [15], who applied the levelset filtration to study one of the COVID19 data sets that we also study in the present paper, for further discussion.
Our new approach for computing PH is also able to resolve some other technical issues in [17]. In particular, some of the PH features in the levelset approach from [17] are geographical artifacts that are indistinguishable from true features of a data set. In our method, by contrast, the finite 1D PH features have either a onetoone correspondence with the local maxima of the realvalued geospatial function or a onetoone correspondence with the local minima, depending on the choices that one makes. Additionally, unlike the levelset approach in [17], we are able to detect extrema that are adjacent to the boundary of the geographical space.
As case studies, we apply our method to two data sets. The first data set is a geospatial data set of per capita vaccination rates in New York City (NYC) by zip code. The PH features identify zip codes in which the vaccination rate is either lower or higher (depending on choices that one can make in our approach) than in the surrounding zip codes. The second data set consists of 14day mean per capita COVID19 case rates in neighborhoods in the city of Los Angeles (LA) in the time period 25 April 2020–25 April 2021. Modeling the spatiotemporal spread of COVID19 is a complex task [2, 44]. In this geospatiotemporal data set, the PH features of our approach identify COVID19 anomalies, which are regions whose case rates are higher than in the surrounding regions.^{1}^{1}1We examine local maxima in the caserate data. This contrasts to COVID19 “hotspots,” which the CDC has defined using an absolute threshold for the number of cases and criteria that are related to the temporal increase in the number of cases [10]. It is important to examine such anomalies, as COVID19 spreads with significant spatial heterogeneity and thus has heterogeneous effects on different areas.^{2}^{2}2 Other scholars have studied contagions using TDA in ways that do not yield topological features with geographical meaning. For example, recent work has used TDA to study the spatiotemporal spread of COVID19 [32] and Zika [34]. These papers studied topological features in atmospheric data, which were then used to forecast case rates. TDA was also used in [39] to analyze the Watts threshold model of a social contagion on noisy geometric networks. Many factors (such as mobility, population density, socioeconomic differences, and racial demographics) play a role in how COVID19 affects regions differently [19, 20, 9]. In our case study of COVID19 case rates in LA, we construct a vineyard that (1) identifies which anomalies are most persistent in time and (2) reveals how the anomalies move geographically over time.
Our paper proceeds as follows. In Section 2, we review relevant topological background. In Section 3, we formulate how we construct simplicial complexes. In Section 4, we define several filtration functions and discuss how to interpret the resulting PDs and vineyards. In Section 5, we apply our method to the LA and NYC data sets. In Section 6, we discuss our choices in our methodology. In Section 7, we summarize our work and discuss some of its implications. In Appendix A, we discuss technical details of the simplicial complexconstruction. In Appendix B, we discuss alternative topological approaches for studying PH in geospatiotemporal data. Our code is available at https://bitbucket.org/ahickok/vineyard/src/main/.
2 Background
2.1 Persistent Homology (PH)
We briefly review persistent homology (PH). See [30] for a more thorough introduction. To start, let be a simplicial complex. A filtration function (or simply a filtration) is a function such that if the simplex is a face of , then . The pair is a filtered simplicial complex (FSC). Let be the sublevel simplicial complex, and let be the image of . The sequence is a nested sequence of simplicial complexes. See Figure 7 for an example of an FSC.
We compute the homology of each over a field , which we set to in the present paper. Homology classes represent connected components, holes, and higherdimensional voids in a simplicial complex. The inclusion induces a map from the homology of over to the homology of over . The persistent homology (PH) of the filtered simplicial complex is the module , where the action of is given by the maps ; that is, if is a homology class in , then . We say that a PH class is born at filtration level if is the earliest filtration level at which exists. More precisely, is born at if is not equivalent to for all and . We say that the PH class dies at filtration level if is the minimum index such that . Not every class dies; we refer to classes that do die as finite and classes that do not die as infinite.
The Fundamental Theorem of Persistent Homology yields a set of generators for a given persistence module. According to it, the persistence module is isomorphic to
(1) 
for some , , . An summand corresponds to a PH class that is born at filtration and never dies. An summand corresponds to a PH class that is born at filtration and dies at filtration . Each generator has a birth simplex that creates the homological class and (if finite) a death simplex that destroys the homological class. For example, in Figure 7, there is one 1D PH generator. Its birth simplex is the edge because this is the edge that completes the loop that encircles the hole, and its death simplex is the triangle because this is the triangle that fills in the hole. The birth filtration level of the PH class is , and the death filtration level (if finite) is .
A persistence diagram (PD) is a way of representing PH as a multiset of points in . Each point represents a PH class; the point’s coordinates are the class’s birth and death filtration levels. Given a decomposition of the persistence module of the form Eq. 1, the PD includes the points for all , the points for all , and all points on the diagonal. One includes the points on the diagonal for technical reasons; one can think of them as PH classes that die instantaneously upon birth. See Figure 7 for an example of a PD.
2.2 Vineyards
Vineyards are a tool for computing timevarying PH [22]. A timedependent filtration function on a simplicial complex is a function such that is a filtration for all and is a filtered simplicial complex for all . We compute the PH of for all times . We visualize the vineyard in as a continuous stack of PDs (see Figure 8). The points in the PDs trace out curves over time; these curves are the vines. Each vine corresponds to a PH class; a vine is the graph of the birth and death filtration levels of a particular PH class over time. The PH class that is represented by a vine has a timedependent birth simplex and (if finite) a timedependent death simplex . At time , the homology class is created by the simplex at filtration level and destroyed by the simplex at filtration level (if finite). The functions and are piecewise constant. We measure the overall persistence of a vine by calculating .
CohenSteiner et al. [13] developed an algorithm for computing vineyards when they first introduced the concept. One computes the initial PH at time , and one then updates the pairings of birth and death simplices as the order of the simplices (as induced by ) changes over time. Each change in the order of the simplices occurs one transposition at a time. One can make these updates in time (where is the number of simplices) per transposition of simplices.
3 Constructing a Simplicial Complex
We now show how we construct a simplicial complex from geographical data (e.g., a shapefile that specifies geographical boundaries). We partition the geographical space into regions. In Section 5.1, the regions are zip codes in NYC; in Section 5.2, the regions are neighborhoods in the city of LA. Let be the set of regions. We refer to the complement of as the exterior region. Given the geographical boundaries of such a set , we construct a 2D simplicial complex with the following property:

There is an assignment of 2D simplices to regions such that the union of any subset of regions is homeomorphic to the subsimplicial complex that is induced by the union of the corresponding 2D simplices. The subsimplicial complex that is induced by a set is the smallest simplicial complex that contains the set of simplices. That is, if is a simplicial complex that contains , then . When is 1D, a subsimplicial complex is equivalent to an induced subgraph.
In Figure 11, we present an example of our construction.
When every geographical region is simply connected, our simplicial complex has the minimal number of simplices that is possible for a simplicial complex with property 3. We believe that our construction is minimal in more general cases (specifically, under the assumptions 3–3 that we define shortly). Constructing an efficient simplicial complex is important because the run time of TDA computations is very sensitive to the number of simplices.
We make the following (mild) assumptions about geographical regions:

There are a finite number of regions, and each region has a finite number of connected components.

Each region is a compact subset of .

The boundary of each component of a region is a finite collection of curves that are homeomorphic to . (This ensures that region boundaries are not selfintersecting. Each component of a region has an outer boundary component and some number (which can be ) of inner boundary components.)

The intersection between any two regions has a finite number of components. Each component of the intersection is homeomorphic to a point, a closed interval in , or .

The intersection between three or more regions is either a point or .
These conditions are very reasonable for humanmade geographical boundaries. We do not even require the regions to be simply connected or for the region intersections to be connected. In Figure (a)a, we illustrate the most typical situation that we encounter. In this example, LA neighborhood Granada Hills is homeomorphic to a disc and its intersection with each of its neighbors is homeomorphic to a closed interval in . In Figures 14 and (a)a, we illustrate a few of the other possible configurations that arise in geospatial applications. In our case studies, the geographical data take the form of shapefiles. In a shapefile, each region is represented by a polygon (or by multiple polygons, if the region is disconnected). If the interiors of the polygons do not intersect, then conditions 3–3 are satisfied. In practice, the polygon boundaries are not always aligned perfectly and thus may overlap slightly, but we can still approximate a given set of regions by a set of regions that do satisfy 3–3. In our data, the only assumption that does not always hold is 3; for example, see Figure (a)a for a violation of 3 in the NYC data set. However, by making a few modifications, we are still able to construct a simplicial complex with property 3 for this data data. One can make analogous modifications for similar data sets. We discuss this in more detail in Section 5.1.
To build a simplicial complex from our geographical data, we proceed as follows. First, we construct a 2D simplicial complex for each region . We then glue their boundaries together in a way that respects the geographical region boundaries. In Figure 11, we show an example of this procedure. For each of the five regions in the example, we construct a simplicial complex that consists of a few triangles. We then glue five simplicial complexes together along their boundaries to obtain a simplicial complex with property 3. We assign a 2D simplex to the region whose simplicial complex originally contained . In the remainder of this section, we discuss the details of this process.
Under the geographical assumptions 3–3, the intersections of a region with its neighbors are such that for each component of the region’s boundary, one can order the neighbors in clockwise (or counterclockwise) fashion, possibly with repetition^{3}^{3}3Theoretically, several 0D intersections can be adjacent to each other, although this scenario does not occur in our data sets. That is, in principle, there can be a sequence of neighbors such that is the same point for all . The order of this sequence is not determined uniquely by the intersections of the neighbors with . Instead, we order them in the order in which they appear clockwise (or counterclockwise) around the point . This sequence must be finite because there are a finite number of regions and 3 implies that if .. We list intersections with the exterior region in the same manner as for any other neighboring region. We also record whether each intersection is 1D or 0D. For example, in Figure (a)a, the clockwise sequence of neighbors around the boundary of Valley Glen is {Van Nuys, North Hollywood, Valley Village, Sherman Oaks}. The intersection with Valley Village is 0D and the other intersections are 1D. For regions such as West Vernon in Figure (b)b, we obtain a sequence for each boundary component. Each sequence is unique up to the choice of starting neighbor.
Given a sequence of neighbors for each boundary component of each region (which, if necessary, we adjust as in Appendix A.1), we construct a 2D simplicial complex for each region using Algorithm A.2. In Figure 21, we illustrate examples of the resulting simplicial complexes. Without loss of generality, we assume that each region is connected; if not, we treat each component of a region as if it were its own unique region. To region , we assign a simplicial complex such that the th boundary component of is a cycle that has one edge for each neighbor such that is 1D. For example, Granada Hills (see Figure 11) is assigned the simplicial complex in Figure (a)a. We annotate each edge of the boundary with the neighbor that corresponds to it. We also annotate each vertex with the sequence of its adjacent regions, which we list in clockwise order starting with .
We then glue the simplicial complexes along their edges according to their edge and vertex annotations. More precisely, if has disjoint edges with the annotation (which is the typical situation when has components that are 1D), then has exactly disjoint edges with the annotation . Let , with and in clockwise order, be the vertices of an edge in with annotation . Because the edges are disjoint, and must have at least neighbors (including ). We seek an edge (with and in clockwise order) in with the annotation such that (1) and are annotated with the same sequences and (2) and are annotated with the same sequences. We know that there must be at least one such edge because represents a component of and there is some edge in that represents the same component (and thus its vertices have the same sequences of adjacent regions as and ). In Appendix A.3, we prove that there is a unique such edge. In Figure 17, we show an example of this case. If there are consecutive edges on the boundary of with annotation , then there are consecutive edges on the boundary of with annotation . This situation arises precisely because of the adjustments we discuss in Appendix A.1. We glue to for all . If is homeomorphic to , then the choice of as the first edge in is not unique, but all choices result in topologically equivalent spaces. The result of this gluing process is a topological space with property 3.
Code for our simplicialcomplex algorithm is available at https://bitbucket.org/ahickok/vineyard/src/main/. This code has one limitation that the algorithm in the present paper does not: It requires that no interior region (i.e., a region that is contained within the outer boundary of another region) intersects any other interior region. This does not occur in our data, and we believe that it does not occur in most geographical spaces.
4 Our Filtration Functions
We define various filtrations that one can use with the simplicial complex that we constructed in Section 3, and we discuss how to interpret the resulting PDs and vineyards. Let be the set of geographical regions that the simplicial complex represents, and let be a realvalued function on . For example, in Section 5.1, is the per capita full vaccination rate (i.e., having received all required doses of some vaccine) for COVID19 in NYC zip code . In Sections 4.1 and 4.2, we define two filtration functions that are induced by . Given a timedependent and realvalued function , we define timedependent filtration functions in Section 4.3. For example, in Section 5.2, is the 14day mean per capita COVID19 case rate in neighborhood on day . From a timedependent filtration function, we compute a vineyard.
4.1 The Sublevel Filtration
In this subsection, we define a sublevel filtration. In our applications, we use the 1D PH of the sublevel filtration to analyze local maxima in our data sets. We illustrate the idea of a sublevel filtration in Figure 32.
Definition 1 (Sublevel Filtration)
Let be a simplicial complex from the construction in Section 3 for a set of regions, and let be the assignment of 2D simplices to regions. Let . We define the sublevel filtration function by considering the sublevel sets of . On the 2D simplices, we define the filtration function by
We extend the filtration function to the remaining simplices by setting
if is a vertex or edge on the boundary of and by setting
(2) 
otherwise, where denotes the boundary of .
At filtration level , the simplicial complex is the subsimplicial complex of that is induced by the union of the set of 2D simplices such that and the set of vertices and edges that are on the boundary of (that is, the set of vertices and edges that represent intersections of the regions in with the exterior region). (Henceforth, we say that such simplices are “exterioradjacent”.) By construction, is homeomorphic to the union of regions such that along with the exterior boundary. We set for exterioradjacent vertices and edges for technical reasons that we will explain in a few paragraphs. In Appendix B.2, we explore an alternative definition in which we set the filtration values of exterioradjacent vertices and edges to , where is the connected component that contains .
The 1D PH of the sublevel filtration encodes information about the structure of the local maxima of . A region is a local maximum if the value of is larger than the value of for all neighboring regions of for which is 1D. If is a local maximum, there is a 1D PH class whose death simplex is one the simplices in the preimage . The class dies at filtration level . For example, if is the COVID19 case rate in region , then 1D PH classes correspond to COVID19 anomalies and the death simplex of a 1D PH class is the epicenter of that anomaly. The larger the value of in comparison to the surrounding regions (including regions that are nearby but not necessarily immediate neighbors), the more persistent the PH class is. If the union of all regions (excluding the exterior region) is not simply connected, then there is at least one 1D PH class with an infinite death time. See Figure (b)b for an example. The infinite 1D PH classes correspond to the holes in the geographical space, rather than to local maxima. The local maxima of are in onetoone correspondence with the set of 1D PH classes with finite death times. There is a canonical mapping from finite 1D PH classes to regions. A class that is represented by simplex pair is mapped to the region that contains . The region is the location of the local maximum of that corresponds to the PH class, and the death simplex’s filtration value is the value of the local maximum. The death simplices of the finite 1D PH classes and their filtration values give the localmaximum locations and their function values .
The 1D PH does more than simply identify local maxima and their locations; it also reveals information about their relationships to each other. If the local maxima are wellseparated from one another, then the corresponding PH classes all have early birth times. In the NYC data set, for example, there are several connected components and one can think of the global maxima of each connected component as “totally separated” from each other because they are on different connected components. The corresponding 1D PH classes are all born at the earliest possible filtration time, which is (see Figure (a)a). We show an example of wellseparated local maxima in Figure (e)e. By contrast, the two local maxima in Figure (j)j are not wellseparated, so the PH class that corresponds to the lower peak in Figure (j)j is born at a higher filtration value than the PH class in Figure (e)e. The birth times of the 1D PH classes reflect structural information about the local maxima.
We set the filtration value of exterioradjacent vertices and edges to so that 1D PH can detect local maxima on the boundary of a geographical space. This is important for the LA data set of COVID19 case rates. In Figure 51, we observe that many of the most persistent COVID19 anomalies are on the boundary of the geographical space, and it is crucial that we are able to detect them. If we had not made this adjustment, the filtration value of exterioradjacent vertices and edges would be the value of , where is the unique region that is adjacent to . If is a local maximum, its corresponding 1D PH class would be born and die at filtration level . In the PD, it would then appear as a point on the diagonal. Therefore, for 1D PH to detect local maxima on the boundary of a geographical space, we must adjust the filtration values of exterioradjacent vertices and edges.
The 0D PH classes correspond to local minima of . However, unlike for the 1D PH classes, there is not a natural mapping from 1D PH classes to the locations of the minima. In Appendix B.1, we discuss the interpretation and computation of 0D PH classes in more detail.
4.2 The Superlevel Filtration
An alternative to using the sublevel filtration from Section 4.1 is to instead consider superlevel sets of and use them to construct a superlevel filtration. In our case studies, we use the superlevel filtration to analyze local minima in our data sets. We illustrate the idea of the superlevel filtration in Figure 33.
Definition 2 (Superlevel Filtration)
Let for a set of regions. The superlevel filtration function is the sublevel filtration function that is induced by .
At filtration level , the simplicial complex is the subsimplicial complex of that is induced by the union of the set of 2D simplices for which . By construction, is homeomorphic to the union of regions for which . Local maxima of now correspond to 0D PH classes, and local minima of now correspond to 1D PH classes; this is the opposite situation from the sublevel filtration. Our discussion of local maxima for the sublevel filtration in Section 4.1 applies to local minima for the superlevel filtration, and our discussion of local minima for the sublevel filtration in Section 4.1 applies to local maxima for the superlevel filtration. The only difference is that the filtration values in the superlevel filtration are the additive inverses of the function values of . This implies, for example, that the death filtration value of a 1D PH class that corresponds to a local minimum at region is , rather than .
4.3 A TimeDependent Filtration
Suppose that we have a timedependent, realvalued function whose domain is , where is the initial time and is the final time. For example, in Section 5.2, the value of is the 14day mean per capita COVID19 case rate in Los Angeles on day . We seek to analyze the structure of local extrema as they change over time.
Definition 3 (TimeDependent Sublevel Filtration)
Let be a timedependent function on a set of regions. At each time , we define the timedependent filtration function to be the sublevel filtration that is induced by . To extend this filtration function to the entire interval
, we linearly interpolate.
In the present paper, we only use the timedependent sublevel filtration, but one can analogously define a timedependent superlevel filtration. We have implemented both of these filtrations in our code.
We use a timedependent sublevel filtration to construct a vineyard. This allows us to track how the extrema move in both space and time. As in Section 4.1, each finite vine corresponds to a local maximum whose location at time is given by the region that contains the vine’s timedependent death simplex . The length of a vine corresponds to its persistence in time.
5 Case Studies
We now apply our methods to two data sets, which we visualize in Figure 36.
5.1 COVID19 Vaccination Rates in New York City
We examine vaccination rates in (modified) zip codes of NYC^{4}^{4}4Modified zipcode tabulation areas (MODZTCA) are used by the NYC Department of Health & Mental Hygiene for COVID19 data [27]. In these modified zip codes, some zip codes with small populations are combined [28]. We henceforth refer to modified zip codes as simply “zip codes”.. We demonstrate the effects of the two filtrations that we defined in Section 4. The geographical boundaries of the zip codes are given by a shapefile [27]. New York City zip codes do not satisfy assumption 3 because some of the zipcode boundaries have a component that is homeomorphic to two circles that are glued at a point (i.e., a figure8). For an example, see Figure (a)a. We construct a simplicial complex in a way that is similar to the construction in Section 3; everything is the same except for some minor modifications to the way that we construct the simplicial complexes for zip codes with figure8 boundaries. In Figure (b)b, we show how one constructs a simplicial complex for such a region. Our construction still has property 3.
The data set, which we obtained from the NYC Department of Health & Mental Hygiene website [12], consists of the number of fully vaccinated people in each zip code on 23 February 2021^{5}^{5}5The NYC Department of Health & Mental Hygiene defines “fully vaccinated” people to be individuals who have either received both doses of the Pfizer or Moderna vaccine or one dose of the Johnson & Johnson vaccine. (This differs from common parlance, in which people are sometime labeled as “fully vaccinated” only after two weeks have passed after their final dose of a vaccine.)
. For each zip code, we divide this number by its population estimate in
[12] to obtain a per capita vaccination rate. For zip code , we define to be the per capita full vaccination rate in on 23 February 2021.We do not possess the daily vaccinationrate data that is necessary to compute a vineyard, so instead we calculate the PH of with the sublevel and superlevel filtrations from Sections 4.1 and 4.2. We show the resulting PDs for the 1D PH in Figure 42. As we described in Section 4.1, the points in the sublevelfiltration PD correspond to zip codes in which vaccination rates are higher than in the surrounding zip codes. The death filtration level of a PH class is the vaccination rate in that zip code, and the birth filtration level of a PH class reflects the extent of spatial isolation of that zip code from other local maxima; an earlier birth filtration implies more spatial isolation. Similarly, the points in the superlevel filtration PD correspond to zip codes in which the vaccination rate is lower than in surrounding areas. As we discussed in Section 4.1, we obtain the zip code of a PH class from its death simplex . We color all points in the PDs by the borough of the corresponding zip code.
An issue arises from the fact that several of the NYC zip codes are isolated islands. These islands are trivial extrema because they are not adjacent to any other zip codes. One may want to exclude these trivial extrema from the PD. In Appendix B.2, we propose alternative methods for handling disconnected geographical spaces such as NYC.
One can use the PDs in Figure 42 to study inequities in vaccine access. For example, it seems potentially desirable to discern patterns in demographic data that correspond to the most persistent points in the PDs.
5.2 COVID19 Infections in the City of Los Angeles
We now examine COVID19 case rates in neighborhoods of the city of Los Angeles (LA)^{6}^{6}6We exclude Angeles National Forest because it only has 20 inhabitants.. The geographical boundaries of the neighborhoods are given by a shapefile [23]. From this, we construct a simplicial complex in the manner that we described in Section 3. We also know the number of cases in each neighborhood from 25 April 2020 through 25 April 2021. For each neighborhood, we divide the case count by the neighborhood population to obtain per capita case rates, and we calculate a running 14day mean^{7}^{7}7On day , we take the mean of the case rates on days , , …, . Some outlets (e.g., [36]) report running 14day means of COVID19 case counts, and other outlets (e.g., [41]) report 14day trends. on each day to smooth the data. For neighborhood and time , we define to be the 14day mean per capita case rate in on day after April 2020. We compute the vineyard for a simplicial complex using the timedependent sublevel filtration that is induced by . We show our vineyard in Figure 45, and we show subsets of this vineyard in Figures 49 and 54.
The vines in the vineyard correspond to COVID19 anomalies, which we define to be neighborhoods that have a higher running 14day mean COVID19 case rate than the surrounding neighborhoods for at least one day. Anomalies that are more spatially isolated yield vines with early birthfiltration levels, and anomalies with high case rates yield vines with late deathfiltration levels. See Section 4.1 for a more detailed discussion. We color each vine according to the geographical location(s) of its anomaly. As we discussed in Section 4.3, we obtain the anomaly location(s) from the timedependent death simplex of a vine. The function is a piecewiseconstant function; as it changes, so does the location of the associated anomaly. Therefore, the color of a vine can change over time. For example, consider Figure 49, where we show the five mostpersistent vines. The global maximum of the data set is initially in Little Armenia, but it moves to Vermont Square at about . In the vineyard, we see this from the vine that is initially blue (for Little Armenia) from time until about and then orange (for Vermont Square) starting from about time through time . There are also other vines whose locations change over time. Such geographical location changes do not need to be adjacent, but they often are near each other. In Figure 51, we highlight these anomalies on a map.
A vineyard encodes the temporal persistence of anomalies. The length of time that a vine is not on the diagonal plane of a vineyard, which we henceforth call the “length” of a vine, is the amount of time that an anomaly exists. At the beginning of the COVID19 pandemic, all neighborhoods had low per capita case rates. We expect emerging anomalies to have a low case rate for a long time and then for the case rate to grow rapidly starting at some later time. An emerging anomaly in the “low case rate” phase yields a vine that is close to the diagonal for a long time. By examining the lengths of vines, we hypothesize that one can distinguish between very concerning emerging anomalies (i.e., those that may become major COVID19 anomalies in the future) and anomalies of lesser concern, even when the anomalies have similar case rates.
In Figure 54, we show case rates early in the time period that we track by computing the vineyard for the period 25 April 2020–25 May 2020. The COVID19 pandemic was declared a national emergency on 13 March 2020 [43], and the city of LA closed its public schools and ordered the closure of restaurants, bars, and gyms on on 16 March 2020 [21]. In our vineyard, we exclude the twenty mostpersistent vines to more easily visualize the vines that are close to the diagonal plane. Many of these latter vines are short, so their associated anomalies are shortlived. The longer vines are anomalies that are longerlived and thus of greater concern in the long run, even though they are close to the diagonal during the period 25 April 2020–25 May 2020. For example, there is an anomaly at Wilmington that we show with the lightblue vine. This vine is close to the diagonal plane, but it has high temporal persistence during the period 25 April 2020–25 May 2020. In Figure 49, we see that Wilmington eventually becomes one of the worst hotspots of COVID19 case rates in LA.
6 Discussion
In our approach, we needed to make a variety of choices. There are other ways to construct a simplicial complex to represent a geographical space. There are also other choices in topological tools for analyzing timevarying data. We briefly discuss some of these possibilities in the next several paragraphs.
Rasterization is an alternative method for constructing a simplicial complex from shapefile data. When one rasterizes a shapefile, one can transform the resulting image into a simplicial complex by imposing the pixels of the image onto a triangulation of the plane. However, our approach has several key advantages over rasterization. First, the number of simplices in the simplicial complex that one obtains by rasterizing a shapefile is ordersofmagnitude larger than the number of simplices in our construction. Computing the PH of a simplicial complex with fewer simplices allows significantly faster computations. Second, the simplicial complex that one obtains by rasterization has no guarantee of “topological correctness”, as property 3 may not hold. The extent to which the resulting simplicial complex is topologically correct depends on the resolution of the rasterization, and using a higher resolution requires more simplices. Our construction of simplicial complexes also yields a natural way to map a 2D simplex to the geographical region that contains it. We use this preservation of geographical information to find the locations of the local extrema. Lastly, our construction allows us to detect anomalies on the boundary of a geographical space.
Our construction uses direct geographical adjacencies, but one may instead wish to employ “effective” distances between regions. One can calculate effective distances using mobility and transportation data. Two regions that are closely connected via transportation are effectively closer than they are based on direct geographical considerations; this affects phenomena such as the dynamics of infectious diseases [4, 31].
We used only 1D PH to study extrema, but one can alternatively use 0D PH if one is not interested in the geographic locations of the extrema; we discuss this in Appendix B.1. In Appendix B.2, we discuss alternative filtrations that one can apply to geographical spaces (such as NYC) that are disconnected. We used a timedependent function on a geographical space to compute vineyards, but an alternative is to use an approach that is based on multiparameter zigzag PH. We discuss this in Appendix B.4. When the timedependent function is monotonic for all regions , one can also use an approach that is based on multiparameter PH (i.e., without needing to invoke zigzag PH); we discuss this in Appendix B.3. However, both multiparameter PH and multiparameter zigzag PH are difficult to visualize, and they both suffer from a lack of easily interpretable invariants. Consequently, we only computed vineyards in our applications.
7 Conclusions
We developed methods to directly incorporate spatial structure into applications of topological data analysis (and, specifically, of persistent homology) to geospatiotemporal and geospatial data. We defined a way to construct a simplicial complex that efficiently and accurately represents a geographical space. Given a function on a geographical space, we defined filtration functions on a simplicial complex such that the PH classes are in onetoone correspondence with either local minima or local maxima. By constructing a vineyard, one can track how the local extrema move and change over time.
We conducted case studies using COVID19 vaccination and caserate data. In one case study, we examined the geospatial vaccination structure in New York City on one day. In our other case study, in which we examined geospatiotemporal data, we constructed a vineyard to analyze COVID19 caserate anomalies in the city of Los Angeles over the course of one year. From the vineyard, we identified the locations of these anomalies and measured the severity of the disease outbreaks. The vineyard also captured information about the relationships between anomalies, such as the extent to which they are isolated from each other. We calculated the temporal persistence of an anomaly based on the length of its corresponding vine.
There are several ways to build on our research. It is desirable to discover how to use a vineyard to produce systematic forecasts of how a disease (or something else) will spread in space and time. We hypothesized in Section 5.2 that one can identify “emerging anomalies” in the COVID19 data set as vines that are long but close to the diagonal plane. In other applications, one may want to predict the locations of the local extrema with the largest data values and/or highest temporal persistences. One may also want to forecast how the extrema will move in space. It will be valuable to investigate how to use the output of our approach as an input to forecasting models.
Our approach is useful for a wide variety of applications, and it seems possible to generalize it for many others. For example, given spatiotemporal voting data, one can identify regions that vote differently from the surrounding regions. This would allow one to generalize the work of [17] to track the intensity of voting differences and study spatial relationships between different political islands. Our methodology is not restricted to geographical data. Our methodology is applicable whenever one has a surface that is partitioned into a finite number of regions and a realvalued function (or a sequence of realvalued functions) on those regions. (That is, it is not restricted to geographical data.) For example, it may be possible to apply our approach to grayscale image data by partitioning an image into regions in which pixel values are close to each other. It also seems possible to extend our approach to higher dimensions; this would require constructing a higherdimensional simplicial complex given boundary intersection data on the higherdimensional regions. For example, in three dimensions, one could use such an extension of our approach to study atmospheric, oceanic, and video dynamics.
Appendix A Details of the SimplicialComplex Construction
a.1 BoundarySequence Adjustment
Before constructing the simplicial complexes for each region , we adjust the boundary sequences as follows. Let denote the sequence of neighbors around the outer boundary of region , and let , …, (where is the number of inner boundary components of ) denote the sequences of neighbors around the inner boundary components of . First, we adjust the sequences so that for each region and each boundary component , the first element of has a 1D intersection with . To do this, let be the elements of , where is the number of neighbors. If is not 1D, let be the smallest index such that is 1D. We then set to be equal to . Following this, we adjust the sequences such that for all and . If , there are two cases:

(Case 1) If , let be the unique element of . This situation occurs if is an island, and it can also occur if lies inside or if lies inside . We adjust to be the sequence . If is not the exterior region, let be the index of the boundary component of that intersects . Adjust to be the sequence to compensate for the adjustment that we made to .

(Case 2) If , let and be the two elements of , where (without loss of generality) is the exterior region if is adjacent to the exterior along its th boundary component. For example, in Figure (a)a, . We adjust to be the sequence . If is not the exterior region, which occurs if is not adjacent to the exterior, then we also adjust to compensate, where is the index of the boundary component of that intersects . In that case, we adjust by repeating an additional time.
Finally, we adjust the outer boundary sequences so that for all . If for some region (which does not occur in either of our geographical data sets), let be an element of , where (without loss of generality) is the exterior region if if is adjacent to the exterior along its outer boundary component. We adjust so that repeats an additional times. If is not the exterior, let be the index of the boundary component of that intersects . We adjust to compensate by repeating neighbor an additional times.
a.2 Construction of for Region
We assume that we have already adjusted the boundary sequences as in Appendix A.1 whenever necessary.
1 Construct the annotated simplicial complex for region
Input:

The sequence of neighbors in clockwise order around the outer boundary component

The sequence of neighbors in clockwise order around the th inner boundary component, where and is the number of inner boundary components

, where is the dimension of
Output: An annotated simplicial complex .
a.3 Construction of from the Collection
We present two lemmas that were used in Section 3 to construct by gluing together the collection of simplicial complexes .
Lemma 1
Let be the boundary components of a region . If a region is connected and has a nonempty intersection with the boundary component , then does not intersect any other boundary component .
Proof
By the Jordan Curve Theorem, every boundary component of divides the plane into an “inside” and an “outside”. Therefore, because is connected, it either lies outside the outer boundary component of or inside one of the inner boundary components of . If lies inside an inner boundary component , then can intersect but cannot intersect any other for . If lies outside the outer boundary component, then can intersect the outer boundary component of , but it cannot intersect any other boundary components.
For example, suppose that is West Vernon in Figure (b)b. Vermont Square intersects its inner boundary component but not its outer boundary component.
Lemma 2
Let be the annotated simplicial complex for region , let be one of its boundary components, and let be a vertex in . Let be the sequence of region adjacencies of . If , then the boundary component has at most one other vertex with the same set of region adjacencies. Additionally, if exists, its sequence of region adjacencies must be , which is the mirror of the orientation of neighbors around .
Proof
Suppose that is a vertex in with the same set of region adjacencies as . Let be the boundary component of that corresponds to the boundary component of . Let and be the points on that correspond, respectively, to and in . Either the interior of is contained in the region that is bounded by , or it is contained in the complement of the region that is bounded by . Without loss of generality, we suppose the former case. Let be the permutation of such that the sequence of region adjacencies around is . Let , with , be a pair of indices. By Lemma 1, is adjacent only to one boundary component of and to one boundary component of . Let be the boundary component of to which is adjacent, and let be the boundary component of to which is adjacent. We have .
Because is homeomorphic to , there exist paths , from to such that . Because the interior of does not intersect , it follows that and are both in the complement of the region that is bounded by . There are two paths from to on . Let be the unique choice of path such that is not contained in the region that is bounded by the closed curve . Either is in the region that is bounded by the closed curve or is in the region that is bounded by the closed curve . Without loss of generality, we suppose that the latter is true.
Analogously to our argument above, there exist paths , from to such that and and are in the complement of the region that is bounded by . Because is homeomorphic to , and are either both contained in the region that is bounded by or both contained in the complement of the region that is bounded by . Because , it must be the former case. Therefore, . It follows that is orderreversing. If there were another vertex in that is adjacent to the same set of regions, then the orientation of those regions around would be the mirror of both the orientation of regions around and the orientation of regions around , which gives a contradiction when .
For example, let be the region Koreatown in Figure (a)a. The two vertices that are shared by Koreatown and Little Bangladesh have the same region adjacencies, but they have mirrored orientations.
Appendix B Alternative Topological Approaches
b.1 0D Persistent Homology
Let be a realvalued function on a set of geographical regions. In Sections 4.1 and 4.2, we described how one can analyze the local maxima (respectively, minima) of by computing the 1D PH of the sublevel (respectively, superlevel) filtration. In this section, we discuss how the 0D PH of the sublevel (respectively, superlevel) filtration yields information about local minima (respectively, maxima) of .
The 0D PH of the sublevel filtration encodes information about the structure of local minima of in a way that is similar to how the 1D PH encodes information about the structure of local maxima. One can imagine taking sublevel sets of the function in Figure 33 (where we display superlevel sets) to see why this is true. A region is a local minimum if the value of is less than the value of for all neighboring regions of for which is 1D. If is a local minimum, there is a 0D PH class whose birth simplex is one of the vertices in one of the triangles in the preimage . The class is born at filtration level . For the LA data set of COVID19 case rates, 0D PH classes correspond to regions that have a lower case rate than surrounding regions. The smaller the value of in comparison to the surrounding regions, the more persistent the PH class is. There is also one infinite 0D PH class for each connected component. One can think of these classes as corresponding to the “local minimum” at the exterior region. However, unlike for 1D PH classes, there is no canonical map from 0D PH classes to regions because the birth simplex of a 0D class is a vertex that belongs to several regions. The 0D PH of the superlevel filtration analogously encodes information about the structure of local maxima of . However, as with the sublevel filtration, there is no canonical map from 0D PH classes to regions. Therefore, one cannot easily use the 0D PH of the sublevel (respectively, superlevel) filtration to identify the geographical locations of the local minima (respectively, maxima), so we did not study 0D PH in our case studies.
Although we did not compute 0D PH in the present paper, using 0D PH to study the structure of local extrema is appropriate when one is not interested in their locations. One can compute the 0D PH of the sublevel (respectively, superlevel) filtration more efficiently than the 1D PH of the superlevel (respectively, sublevel) filtration using the following approach. Given any filtration function , the 0D PH of is isomorphic to the 0D PH of (, where is the 1skeleton of (i.e., the vertices and edges of ) and is the simplicial complex that we constructed in Section 3. In particular, if is the sublevel or superlevel filtration that is induced by , then one can construct an alternative 1D simplicial complex with even fewer simplices than such that the 0D PH of is isomorphic to the 0D PH of . Because has fewer simplices, one can do TDA computations more efficiently. To build , we start with the collection . For each , we remove all 2D simplices. We then remove all nonboundary edges, except that for each inner boundary component, we leave one edge that connects that inner boundary component to the outer boundary component. (The latter step is so that is still connected after we remove the 2D simplices.) We then glue together the collection of simplicial complexes according to their edge and vertex annotations. This yields the desired 1D simplicial complex . However, we are not interested in 0D PH in our case studies, so we do not use this construction.
b.2 Alternative Filtrations for Disconnected Geographical Spaces
In Section 4.1 (respectively, Section 4.2), we defined a sublevel (respectively, superlevel) filtration in which we set the filtration values of all exterioradjacent vertices and edges to the global minimum (respectively, negative global maximum) of . In applications in which the union of all regions is not connected, such as for the NYC zip codes in Section 5.1, an alternative definition is to consider extrema on each connected component separately, rather than on the entire geographical space at once. This solves the problem that an isolated region (a geographical island^{8}^{8}8These are literal islands, rather than “islands” from a PH computation.) is trivially both a local maximum and a local minimum because it is not adjacent to any other regions. In Definitions 1 and 2, they appear as 1D PH classes that are born at the earliest filtration time, which may falsely emphasize the persistence of these trivial extrema.
Definition 4 (Alternative Sublevel Filtration)
Let be the simplicial complex from Section 3 for a set of regions, and let be the assignment of 2D simplices to regions. Let . If is a vertex or edge on the boundary of , let be the 2D simplex for which is on the boundary of . On , we define the alternative sublevel filtration function to be
where is the connected component that contains the region . On all other simplices, the filtration function is equal to the sublevel filtration function.
Definition 5 (Alternative Superlevel Filtration)
Let for a set of regions. The alternative superlevel filtration function is the the alternative sublevel filtration function that is induced by .
Definitions 4 and 5 are appropriate options if one seeks to treat each connected component independently. In these alternative definitions, each connected component uses only information about other regions in the same component. One then compares region values to global extremum values on their connected components. One consequence of using these definitions is that one ignores isolated regions, which are trivial extrema. In Definitions 4 and 5, these isolated extrema appear as points on the diagonal of a PD. This is often an appropriate way to handle isolated regions. However, when an isolated region is a global extremum of a data set, this may be undesirable. This situation never occurs in our data.
For example, NYC has 14 connected components; several of these are zip codes that correspond to isolated islands. The alternative sublevel and superlevel filtrations effectively treat each connected component of NYC separately. In Figures (a)a and (b)b, we show the PDs that we compute using the alternative sublevel and superlevel filtrations that are induced by the vaccinationrate function that we defined in Section 5.1. In these PDs, we compare a zip code’s per capita vaccination rate to the global minimum or maximum rate on its connected component, rather than the global extremum of the entire data set. More precisely, the birth time of a connected component’s global extremum is either the lowest per capita vaccination rate of that component (for the alternative sublevel filtration) or the additive inverse of the highest per capita vaccination rate of that component (for the alternative superlevel filtration). Consequently, the trivial island extrema are represented by PH classes on the diagonal.
The alternative sublevel filtration and the alternative superlevel filtration, along with their timedependent versions, are implemented in our code that is available at https://bitbucket.org/ahickok/vineyard/src/main/.
b.3 Multiparameter Persistent Homology
One can use multiparameter persistent homology to study how the topology of a data set changes as one varies multiple parameters [8]. In applying multiparameter PH to our COVID19 caserate data, two feasible parameters are (1) time and (2) the cumulative COVID19 case rate. To compute multiparameter PH, one starts with a multiparameter filtration , where for all and is the number of parameters. When , this is a filtered simplicial complex (i.e., an ordinary filtration); when , it is a bifiltration. The multiparameter PH in dimension over field is the graded module . For , the action of is the map that is induced by the inclusion . When , this definition reduces to PH. One can use multiparameter PH to study local extrema of functions that are nondecreasing over time.
Definition 6
Let be the simplicial complex from the construction in Section 3 for a set of regions. Let be a function for which for all . Define the function to be the sublevel filtration that is induced by . Let be the image of , where is the number of elements in the image. We define the bifiltration
One can use Definition 6 to study cumulative COVID19 case rates over time.
b.4 Multiparameter Zigzag Persistent Homology
One can use zigzag persistent homology to study how the topology of a data set changes as one varies a parameter nonmonotonically [7]. In singleparameter zigzag PH, one starts with a sequence of simplicial complexes such that, for all , either or . (By contrast, in ordinary PH, for all .) An inclusion induces a map , and an inclusion induces a map . Analogously to PH, one can decompose the resulting zigzag module into “interval modules” .
One can use multiparameter zigzag PH when there are multiple parameters that vary nonmonotonically. See Section 2.1 of [7] for a short discussion. In applying multiparameter zizag PH to our COVID19 caserate data, two feasible parameters are (1) time and (2) the current COVID19 case rate. Given a diagram of simplicial complexes, such as in Equation 3, one can construct a diagram of homology groups that is induced by the maps between the simplicial complexes. This is a representation of a quiver. However, there are not wellbehaved statistical summaries (in contrast to singleparameter zigzag PH).
Definition 7
Let be the simplicial complex from the construction in Section 3 for a set of regions, and suppose that . Define half steps for , and let . Define the function as follows:
We define the function to be the sublevel filtration that is induced by . Let be the image of . We define
This yields the following diagram:
(3) 
The inclusion maps induce a corresponding diagram of homology groups.
One can use Definition 7 to study noncumulative COVID19 case rates over time.
Acknowledgements
We thank Henry Adams, Heather Zinn Brooks, Michelle Feng, Lara Kassab, and Nina Otter for helpful discussions. Additionally, we are grateful to Michelle Feng for teaching us how to work with geospatial data. We thank the Los Angeles County Department of Public Health for providing the LA city data on COVID19 and the LA neighborhood population estimates.
References

[1]
H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman,
S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier,
Persistence images: A stable vector representation of persistent homology
, Journal of Machine Learning Research, 18 (2017), pp. 1–35.
 [2] J. Arino, Describing, modelling and forecasting the spatial and temporal spread of COVID19 — A short review, arXiv preprint arXiv: 2102.02457, (2021).
 [3] A. Banman and L. Ziegelmeier, Mind the Gap: A Study in Global Development Through Persistent Homology, Springer International Publishing, Cham, Switzerland, 2018, pp. 125–144.
 [4] D. Brockmann and D. Helbing, The hidden geometry of complex, networkdriven contagion phenomena, Science, 342 (2013), pp. 1337–1342.
 [5] M. Buchet, Y. Hiraoka, and I. Obayashi, Persistent Homology and Materials Informatics, SpringerVerlag, Heidelberg, Germany, 2018, pp. 75–95.
 [6] G. Carlson, Topological methods for data modelling, Nature Reviews Physics, 2 (2020), pp. 697–707.
 [7] G. Carlsson and V. de Silva, Zigzag persistence, Foundations of Computational Mathematics, 10 (2010), pp. 367–405.
 [8] G. Carlsson and A. Zomorodian, The theory of multidimensional persistence, Discrete and Computational Geometry, 42 (2007), pp. 71–93.
 [9] Centers for Disease Control and Prevention, Risk for COVID19 infection, hospitalization, and death by race/ethnicity. https://www.cdc.gov/coronavirus/2019ncov/coviddata/investigationsdiscovery/hospitalizationdeathbyraceethnicity.html.
 [10] Centers for Disease Control and Prevention, Trends in number and distribution of COVID19 hotspot counties — United States, March 8–July 15, 2020. https://www.cdc.gov/mmwr/volumes/69/wr/mm6933e2.htm (21 August 2020).
 [11] Y. Chun and D. A. Griffith, Spatial Statistics and Geostatistics: Theory and Applications for Geographic Information Science and Technology, Sage Publishing, 2013.
 [12] City of New York, COVID19: Data on Vaccines — NYC Health. https://www1.nyc.gov/site/doh/covid/covid19datavaccines.page (23 February 2021).
 [13] D. CohenSteiner, H. Edelsbrunner, and D. Morozov, Vines and vineyards by updating persistence in linear time, in Proceedings of the Annual ACM Symposium on Computational Geometry, Association for Computing Machinery, 2006, pp. 119–126.
 [14] V. de Silva and R. Ghrist, Coverage in sensor networks via persistent homology, Algebraic & Geometric Topology, 7 (2007), pp. 339–358.
 [15] M. Feng, A. Hickok, and M. A. Porter, Topological data analysis of spatial systems, arXiv:2104.00720, (2021).
 [16] M. Feng and M. A. Porter, Spatial applications of topological data analysis: Cities, snowflakes, random structures, and spiders spinning under the influence, Physical Review Research, 2 (2020), p. 033426.
 [17] M. Feng and M. A. Porter, Persistent homology of geospatial data: A case study with voting, SIAM Review, 63 (2021), pp. 67–99.
 [18] C. Giusti, R. Ghrist, and D. S. Bassett, Two’s company, three (or more) is a simplex, Journal of Computational Neuroscience, 41 (2016), pp. 1–14.
 [19] S. Hazarie, D. SorianoPaños, A. Arenas, J. GómezGardeñes, and G. Ghoshal, Interplay between intraurban population density and mobility in determining the spread of epidemics, arXiv:2102.00671, (2021).
 [20] X. Hou, S. Gao, Q. Li, Y. Kang, N. Chen, K. Chen, J. Rao, J. S. Ellenberg, and J. A. Patz, Intracounty modeling of COVID19 infection with human mobility: Assessing spatial heterogeneity with business traffic, age, and race, Proceedings of the National Academy of Sciences of the United States of America, 118 (2021).
 [21] J. Kandel, Timeline: A look at key coronavirus pandemic events and milestones in California. https://www.nbclosangeles.com/news/coronavirus/20202021californiacoronaviruspandemictimelinekeyevents/2334100/ (15 June 2021).

[22]
Y. Li, D. Wang, G. A. Ascoli, P. Mitra, and Y. Wang,
Metrics for comparing neuronal tree shapes based on persistent homology
, PLoS ONE, 12 (2017), p. e0182184.  [23] Los Angeles GeoHub, COVID19 by neighborhood. https://geohub.lacity.org/datasets/covid19byneighborhood/about (3 June 2020).
 [24] S. Maletić, Y. Zhao, and M. Rajković, Persistent topological features of dynamical systems, Chaos, 26 (2016), p. 053105.
 [25] S. Martin, A. Thompson, E. A. Coutsias, and J.P. Watson, Topology of cyclooctane energy landscape, Journal of Chemical Physics, 132 (2010), p. 234115.
 [26] NYC By Natives, New York City Zip Codes. https://www.nycbynatives.com/nyc_info/new_york_city_zip_codes.php (30 March 2021).
 [27] NYC Open Data, Modified Zip Code Tabulation Areas (MODZCTA). https://data.cityofnewyork.us/Health/ModifiedZipCodeTabulationAreasMODZCTA/pri4ifjk/data (23 February 2021).
 [28] N. D. of Health and M. Hygiene, ZCTA vs MODZCTA. https://github.com/nychealth/coronavirusdata/issues/64 (28 May 2021).
 [29] S. J. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, vol. 153, SpringerVerlag, Heidelberg, Germany, 2003.

[30]
N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, and H. A. Harrington,
A roadmap for the computation of persistent homology
, European Physical Journal — Data Science, 6 (2017), p. 17.
 [31]
Comments
There are no comments yet.