Effects of Aggregation Methodology on Uncertain Spatiotemporal Data

by   Zachary T. Hornberger, et al.

Large spatiotemporal demand datasets can prove intractable for location optimization problems, motivating the need to aggregate such data. However, demand aggregation introduces error which impacts the results of the location study. We introduce and apply a framework for comparing both deterministic and stochastic aggregation methods using distance-based and volume-based aggregation error metrics. In addition we introduce and apply weighted versions of these metrics to account for the reality that demand events are non-homogeneous. These metrics are applied to a large, highly variable, spatiotemporal demand dataset of search and rescue events in the Pacific ocean. Comparisons with these metrics between six quadrat aggregations of varying scales and two zonal distribution models using hierarchical clustering is conducted. We show that as quadrat fidelity increases the distance-based aggregation error decreases, while the two deliberate zonal approaches further reduce this error while utilizing fewer zones. However, the higher fidelity aggregations have a detrimental effect on volume error. In addition, by splitting the search and rescue dataset into a training and test set we show that stochastic aggregation of this highly variable spatiotemporal demand appears to be effective at simulating actual future demands.


Optimal Heterogeneous Asset Location Modeling for Expected Spatiotemporal Search and Rescue Demands using Historic Event Data

The United States Coast Guard is charged with the coordination of all se...

To aggregate or not to aggregate: Forecasting of finite autocorrelated demand

Temporal aggregation is an intuitively appealing approach to deal with d...

Exploring the impact of spatiotemporal granularity on the demand prediction of dynamic ride-hailing

Dynamic demand prediction is a key issue in ride-hailing dispatching. Ma...

Spatial Aggregation and Temporal Convolution Networks for Real-time Kriging

Spatiotemporal kriging is an important application in spatiotemporal dat...

Towards event aggregation for reducing the volume of logged events during IKC stages of APT attacks

Nowadays, targeted attacks like Advanced Persistent Threats (APTs) has b...

Passenger Mobility Prediction via Representation Learning for Dynamic Directed and Weighted Graph

In recent years, ride-hailing services have been increasingly prevalent ...

Controlling the False Split Rate in Tree-Based Aggregation

In many domains, data measurements can naturally be associated with the ...

1 Introduction

Location modeling is a branch of operations research with vast real-world applicability and thus has been studied for a number of decades. Location modeling typically considers the location and time of demand signals over a network and optimizes the corresponding location of a servicing asset, such as a factory or vehicle. The underlying spatiotemporal demand signal data points are thus instrumental to the quality of the resulting model.

When considering large spatiotemporal datasets, there is frequently a need to aggregate demand points to make the problem more tractable for the solver, clearer for the analyst, and comprehensible for the end-user. Aggregation, while of practical use, is not a lossless compression, and introduces aggregation error into the model. When location data is aggregated, the resulting grouping’s location is traditionally represented by an aggregated data point. The distances between the actual demand points and the aggregated data points depend on the size of the aggregated region and the manner of aggregation. Similarly, the magnitude of uncertainty in the aggregated demand volumes is influenced by the nature of the aggregation. Therefore, great consideration must be given to the aggregation technique used when solving location problems.

The impact of aggregation becomes more pronounced when the geographic region expands in size and there is high variability in demand density across the region; this struggle is actualized in studying the United State Coast Guard (USCG) District 14’s search and rescue (SAR) mission. The international community recognizes the need for global cooperation in responding to emerging crises around the world. Nations have entered into SAR agreements, dividing the globe into respective search and rescue regions (SSRs). Per the United States National Search and Rescue Supplement to the International Aeronautical and Maritime Search and Rescue Manual (U.S. Coast Guard, 2013), the USCG is the federal SAR coordinator for SAR missions within the United States’ maritime SSRs and the aeronautical SSRs that do not overlay the continental United States or Alaska.

USCG District 14 is headquartered in Honolulu, Hawaii and is responsible for USCG statutory missions across the Pacific region. In particular, the district’s SSR spans more than 12 million square nautical miles, though the preponderance of SAR emergencies occur in the vicinity of Guam and the Hawaiian Islands. Additionally, District 14 has among the fewest assets in the USCG fleet, increasing the necessity to optimally posture those assets across the Pacific. Given the time-sensitive nature of rescue operations, it is imperative the USCG be optimally postured to ensure rapid response. Over the past decade, researchers have partnered with Coast Guard units - USCG and international - to solve these variations of the traditional facility location problem. These studies typically use historic SAR event data as the foundation of either a deterministic or simulation-based location model.

This study quantifies the effects of the aggregation trade-off for spatiotemporal data over a large region, using District 14 SAR emergency data as a practical basis for consideration. Section 2 of this paper reviews previous works related to the aggregation of data for location models in general and coast guard SAR missions in particular. In section 3, we outline the methodology for implementing various aggregation techniques, both deterministic and stochastic, using a training data set. In section 4, we evaluate the effectiveness of these techniques by quantifying the aggregation errors between the modelled demand and actual demand over a two-year period. In section 5, we review our findings and provide recommendations for future research.

2 Related Works

Researchers have long been cognizant of a relationship between the methods used to aggregate location data and the resulting solutions generated by location models using this data. Gehlke and Biehl (1934) were among the first to note this problem, observing that the smoothing of census data inherent in aggregation resulted in a loss of valuable information and impacted the corresponding correlation coefficients of their models. Hillsman and Rhoda (1978)

laid a foundation for aggregation theory when they classified three sources of error (type A, B, and C) associated with representing individual demand points using aggregated demand points for solving factory location problems. Source A refers to the difference in distance from the aggregated demand points to the placed factory and the sum of distances from individual demand points to the factory. Source B is similar to Source A, if the factory were required to be collocated with an aggregated demand point. Source C refers to the phenomena where individual demand points are erroneously assigned to inefficient factories due to the zone in which it is aggregated.

Several research teams have subsequently sought to quantify and minimize these aggregation errors. Papadimitriou (1981)

presents two heuristics for aggregating data points in a manner that reduces the worst-case aggregation error and

Zemel (1984) produced a theorem for the worst-case bounds on Papadimitriou’s honeycomb approach. Qi and Shen (2010)

note the underlying assumption to Zemel’s work of uniformly distributed demand points, and propose a multi-pattern tiling approach for considering arbitrarily distributed demand. Works by

Current and Schilling (1987, 1990), outline methods for eliminating Source A and B error when solving P-Median, set covering, and maximal covering location problems. Lowe (2014) present a metric for measuring the error bounds for a P-Median problem, and Tamir (2004) discuss formulations for minimizing the aggregation error using a penalty function approach.

In the fields of geography and ecology, aggregation error of spatial data points is dubbed the modifiable areal unit problem (MAUP) (Openshaw, 1984; Dark and Bram, 2007) or the zone definition problem (Curtis, 1995; Curtis and MacPherson, 1996). Research into MAUP typically decomposes the problem into two main effects: the scale effect and the zone effect. The scale effect refers to the impact on the spatial analysis results that are caused by the fidelty of the aggregation; for example, the impact of aggregating demand in a city using 200m x 200m grids versus 1km x 1km grids. Conversely, the zone effect refers to the impact caused by the way in which aggregation zones are bounded; for example, the impact of aggregating demand in a state using county lines versus city limits versus a grid overlay (Openshaw, 1984; Dark and Bram, 2007). Wu (1996) created a seminal contrived demonstration of these effects, which we replicate for completeness in Figure 1.

Figure 1:

(a-c) show effect of scale effect. As scale of aggregation increases mean does not change but variance declines. (d-e) show effect of aggregation effect. Keeping scale equal but changing method of aggregation changes variance. (c,e,f) show that even when number of zones is constant (4) mean and variance can change.

Previous research on MAUP has cautioned against arbitrary aggregation of spatial data and stressed its threat on the reliability of the resulting location analysis. Openshaw (1984) was foundational in the study of MAUP and called for developing better methods for aggregating spatial data due to MAUP’s impact on the reliability of geographic studies. Curtis and MacPherson (1996) studied data for New York and concluded that researchers can bias the results of their analysis based on the means of aggregation, even if there appears to be a logical basis for the employed method of aggregation. Curtis (1995) go so far as to question the accuracy of any location-based analysis conducted using aggregated data because of the effects of MAUP.

In studies of MAUP, and aggregation theory in general, trends have emerged. Increases in the number of aggregated zones are typically proportional to decreases in distance-based aggregation error; distance-based aggregation error disappears when each distinct demand point is assigned to a unique zone (i.e., the number of aggregation zones equals the number of demand points). As any grouping introduces an associated level of distance-based error, it follows that reducing the amount of aggregation would subsequently reduce this error. Emir-Farinas (2004) notes the law of diminishing returns applies in this context, however, suggesting that iterative reductions in the number of aggregate groups shows diminishing improvements to error reduction. Lowe (1992) discuss the paradox of aggregation, noting that solving formulations to minimize error can be more cumbersome than the original location problem being solved, which is counter-intuitive as aggregation is employed to simplify the resolution of these original location problems. Dark and Bram (2007) consider the trends corresponding to both the scale effect and the zone effect. A known benefit of aggregation is tied to the scale effect; predictions of aggregated demand levels tend to be more accurate with fewer, larger aggregate zones. This is because when there are more demand points consolidated in each zone, the demand variance between zones decreases. The impact of zone effect is less understood and tends to differ from problem-to-problem.

The importance of careful aggregation has been thoroughly studied and is synthesized by Tamir (2008). In their survey of previous literature regarding aggregation error associated with location problems, Francis et al. note that there is an inherent tradeoff when aggregating data points; although aggregation has a tendency to decrease computational requirements and statistical uncertainty within the grouped data, it increases the error within the model by introducing aggregation error. Thus there does not exist a singular “best” level of aggregation and the tradeoffs inherent in aggregation must be considered.

In addition to the theoretical work on this problem, there has been applied work specifically relating to Coast Guard SAR missions. Although some research into this area was conducted in the late 1970s (Cook, 1979), the preponderance of studies relating to Coast Guard posturing has emerged in the past decade. Studies researching the allocation of SAR assets, or facilities, typically adopt a quadrat modeling technique for aggregating location data (Pelot, 2018; Eiselt, 2018; Gunal, 2017; Mehrotra, 2009). This technique consists of decomposing the region in question into square cells using a grid overlay. Notably, the quadrat method is frequently adopted in crime data analyses, which typically seek to quantify spatial trends in criminal activity across a city or state (Tita, 2000; Dando, 2005).

Cook (1979) constructed a goal-programming model for assigning SAR aircraft, incorporating probabilistic consideration for the time required by the aircraft to locate distress events in different areas of the corresponding region, using a grid overlay to create a collection of square zones. These zones were then assigned deterministic values, representing the average number of distress events per month. Similarly, Gunal (2017) utilized a quadrat model for simulating the location and volume of distress calls for the Turkish Coast Guard in the Aegean Sea. They first determined the optimal resource allocation strategy using individual events as separate demand nodes, and then evaluated the effectiveness of this strategy using simulated demand.

The incorporation of kernel density estimation with the quadrat model, popular in crime data analysis

(Tita, 2000), has been previously implemented in SAR location problems. The kernel density estimation method composes the region into grid cells and assigns a density function to each data point (). Points that are within proximity to each other relative to a specified bandwidth (), are grouped into a kernel () and their density functions are combined. The resulting image is a smooth heat map with greater densities illustrated over areas that have the most activity clustered closely together (Tita, 2000; Dando, 2005). Flanigan (2008) utilized kernel density estimation when considering the problem of locating aeromedical bases across the state of New Mexico. Similarly, Eiselt (2018) implemented a kernel density estimation approach to approximate the intensity of distress calls received by the Canadian Coast Guard. They varied the size of the grid overlay based upon the proximity to shoreline. This decision was based upon the assumption that since most distress events occurred closer to shore, the analysis would benefit from greater fidelity in aggregation along the coastline.

Though not specifically kernel density estimation, Mehrotra (2009) implemented an intensity function-based approach for solving the Coast Guard SAR location problem. They first constructed a non-parametric statistical simulation of distress calls within USCG District 7 (headquartered in Miami, Florida) and then utilized their simulation to model demand for a facility location problem. This simulation was constructed by overlaying the region with a relatively fine grid and estimating the intensity of distress calls for each cell.

While most work regarding SAR posturing has incorporated quadrat techniques, Achutegui (2007) introduced an intuitive method that has been applied to maritime research. Instead of defaulting to grids, Azofra’s zonal distribution model allows for flexibility in the definition of emergency zones, such as zones based upon subject matter expertise. Once the zones are determined, the centroids of distress calls, dubbed superaccidents, are computed for each zone. The zonal distribution model is a gravitational model, with the determination in optimal SAR operational response based upon the distance to the superaccidents and their associated weight. They demonstrate the implementation of this model using a notional example involving three superaccidents and three ports.

Since the introduction of the zonal distribution model, some researchers have opted to expand upon it by applying it to real-world problems. Zhang (2015) utilize this model for locating supply bases and positioning vessels for maritime emergencies for a portion of the coastline of China along the Yellow Sea. While not adhering to the strict grid cells of previous studies, their zones remained rectangular in shape and varied in size across the region. Razi and Karatas (2016) improved upon the zonal distribution model by utilizing a k-means clustering algorithm for defining the zones and implementing a weighted approach for locating the superaccidents. By adopting this approach, Razi et al. define the aggregated zones and corresponding representative demand nodes based upon historical trends in distress calls in the Aegean Sea rather than arbitrary cells. Lunday (2019) propose an extension to the work of Razi and Karatas, which they dub the stochastic zonal distribution model. Their model implements hierarchical k

-means clustering algorithm to define the aggregation zones, fits probability distributions to model the SAR demand for each zone, and then uses empirically constructed discrete distributions to model the corresponding rescue response for each emergency.

A review of the existing literature regarding SAR asset posturing models finds a lack of explicit consideration regarding the impact of aggregation. Additionally, as SAR research expands to larger regions of consideration (e.g., oceans vs. seas or shorelines), it is necessary to more thoroughly consider the effects of various aggregation methods. Outside of SAR, and more generally emergency response asset modeling (e.g., Araz et al. (2007)), other transportation resource posturing problems which utilize massive demand data-sets assume or require demand aggregation (e.g., taxi service areas (Li and Szeto (2019), Rajendran and Zack (2019)), and should also be concerned with how such aggregation effects the associated location modeling. To provide such consideration, our study utilizes historic SAR data from across the Pacific Ocean to compare the effectiveness of a zonal aggregation technique compared to quadrats of varying fidelity. Additionally, we evaluate these tradeoffs in the aggregation as applied to deterministic and stochastic implementations.

3 Methodology

In this section, we consider two key characteristics that define a zonal aggregation of demand signals: dividing the region into zones, and modeling the demand level. Using these two characteristics as the framework, we model and compare the following methodologies: deterministic quadrat approaches of various fidelities, the Razi and Karatas (2016) zonal distribution model, and the Lunday (2019) stochastic zonal distribution model.

These methodologies are compared using the District 14 SAR region, an interesting test case due to its large area and highly variable demand levels; Figure 2 depicts the Honolulu Maritime Search and Rescue Region (U.S. Coast Guard, 2014). Historic search and rescue demand data was obtained from the Marine Information for Safety and Law Enforcement (MISLE) database to form both a training set and a test set. The training set is comprised of SAR events from a 5 year span (January 2011 - December 2015) and is utilized to construct the models of spatiotemporal SAR demand. The accuracy of the aggregated demand methodologies is then evaluated using historic SAR data for the same region from January 2016 - December 2017.

Figure 2: Honolulu Maritime SAR Region

The training and test data is scoped to only consider events that occurred within the District 14 area of responsibility (AOR). Additionally, demand points missing GPS coordinates were removed as were data points classified as medical consultations since these consultations only require a discussion with a medical professional over the phone and resources are not dispatched. The final training set contains 2629 demand points and the test set contains 1080 demand points.

3.1 Modeling Spatiotemporal Demand

The quadrat aggregation approach was implemented with 6 different quadrat scales to test the impact of the scale effect. These six grid-based decompositions of the region are labelled Aggregations A - F. Aggregation A considered the region of study as a singular zone, consolidating all demand points; see Figure 3. Aggregation B divided the region into two zones along the antimeridiean; see Figure 4. Aggregations C, D, and E are iterative increases in fidelity, decomposing the region into eight, fifteen, and forty-three zones, respectively; see Figures 5, 6, and 7. Aggregation F adopts the approach employed by Eiselt (2018) and allows for smaller grid cells in sections of higher demand. Specifically, the two zones from Aggregation E with the greatest proportion of Guam and Hawaiian Island workloads are further decomposed into x cells; Aggregation F results in 212 zones. Aggregation F is depicted in Figure 8.

Figure 3: Aggregation A (1 Zone)
Figure 4: Aggregation B (2 Zones)
Figure 5: Aggregation C (8 Zones)
Figure 6: Aggregation D (15 Zones)
Figure 7: Aggregation E (43 Zones)
Figure 8: Aggregation F (212 Zones)

Aggregation ZDM was constructed utilizing Razi and Karatas (2016) general implementation of the zonal distribution model and divided the AOR using a weighted k-means clustering algorithm; see Figure 9. Razi and Karatas defined the weight of each SAR event using an analytical hierarchy process based upon the level of fatality, material damage, response arduousness, and environmental impact. Their weighting scheme was not viable for this study based on the available information in MISLE, so this implementation of Razi and Karatas’s procedure utilizes total activities as a weighting. The metric of total activities represents the number of resources assigned to a rescue operations, in addition to the instances when a significant change occurred in the course of the rescue operation; this metric of total activities serves as a proxy for the complexity of a SAR event. Razi and Karatas determine the number of zones to cluster demand points into based upon a rule of thumb method proposed by Makwana (2013). This method suggests that the number of zones Z is based upon the total number of events K, such that .

Figure 9: Aggregation ZDM (36 Zones)

Aggregation SZDM was developed by implementing the stochastic zonal distribution model approach proposed by Lunday (2019); see Figure 10. Hornberger et al. utilized a hierarchical k-means clustering algorithm to aggregate demand points into zones. All demand points are sorted into mutually exclusive groups based upon the unit that coordinated the response and the types of assets utilized in the response. District 14 is divided into Sector Guam and Sector Honolulu, which split the coverage of the AOR around longitude E. Current policy dictates that the mission range for USCG boats is 50 nautical miles from the shoreline of an island on which there exists a USCG boat station; District 14 has boat stations located on the islands of Guam, O’ahu, Kaua’i, and Maui. Hornberger et al. note that a reasonable approximation of asset utilization would be a combination of boats and helicopter aircraft responding to SAR events within the 50 nautical mile boundary of these islands while a combination of cutters and aeroplane aircraft respond to SAR events beyond these boundaries. Therefore, all demand points where sorted into the following mutually exclusive groups: Guam Boat/Helicopter Events, Guam Cutter/Airplane Events, Hawaii Boat/Helicopter Events, and Hawaii Cutter/Airplane Events. These groups are further decomposed into clusters based upon the geographic proximity of the data points by employing a k-means clustering algorithm. The number of zones was determined by considering the relationship between the number of zones and the corresponding within-cluster variance. A plot of this relationship forms an elbow curve, whose name is tied to the phenomena that initial groupings account for a greater reduction in variance compared to subsequent groupings; the ‘elbow’ of the curve occurs at the suggested number of zones for the data set.

Figure 10: Aggregation SZDM (15 Zones)

3.2 Methods of Comparative Analysis

This study evaluates the effectiveness of various methods of aggregation when conducting spatiotemporal forecasting. Specifically, we seek to assess the merit of the Razi and Karatas (2016) deterministic zonal distribution mode, and the Lunday (2019) stochastic zonal distribution model, comparing their effectiveness against traditional quadrat methods of varying fidelity’s. To conduct these comparisons, two metrics are considered: distance-based aggregation error and volume-based aggregation error.

The distance-based aggregation error () represents the total distance between where events were modelled as occurring () and the actual location of their occurrence (), for each event () in the zone (). The anticipated event locations for all zones are weighted centroids for the each zone. In the quadrat models, the centroids are computed as an average of the latitudes/longitudes, multiplied by the events’ corresponding total activities, for all events in the zone. In the zonal and stochastic zonal distribution models, the clustering algorithm yields a weighted centroid. The distance-based aggregation error metric is:


where the Haversine formula,


which, given latitudes , and longitudes , calculates the great-circle distance between two points, is used to calculate each individual distance.

The weighted distance-based aggregation error () is the sum of the differences in distance between where individual assets are modelled as being deployed to () and the actual location assets are dispatched to. The weighting () is the number of assets assigned to the rescue operation. The difference between and is that the former treats individual SAR events as being equal in magnitude, whereas the latter incorporates the number of deployed assets. As with the individual distances in are calculated using the Haversine formula.


The distance-based aggregation error (), and the weighted distance-based aggregation error () are both computed for all aggregations A-F, as well as for the ZDM and the SZDM.

The volume-based aggregation error () represents the total difference between the predicted level of monthly demand () and the actual level of monthly demand (), for each month in the considered time frame (). The metric is computed as:


Given that a primary difference between ZDM and SZDM is the integration of stochastic elements in the modeling of the demand, both deterministic and stochastic demand comparisons for volume-based aggregation error are conducted. For purposes of consistency, all frequency considerations are made on a per month basis.

Aggregations A-F are compared to the ZDM using a deterministic demand signal. This requires a singular, static value which represents the typical demand volume for each zone. Two methods are frequently used to identify these deterministic values: averages and medians. The average value is a common metric and is familiar to an end-user decision maker, but can be easily skewed by the presence of outliers. Median values tend to be more stable in the presence of outliers and thus more representative of the typical demand volume. As such, median values are implemented as the metric for deterministic demand volume in this study.

The stochastic modelling approach utilized in SZDM considers the inherent uncertainty present in SAR events by fitting probability distributions to demand volumes in each zone. As noted by Mehrotra (2009) and Eiselt (2018), SAR events can often be viewed as Poisson processes. In particular, Lunday (2019)

found the emergence of SAR events in District 14’s AOR could be modelled using poisson and gamma-poisson distributions. This study implements stochastic demand modeling in SZDM, and compares this to aggregations C and D to compare the impact of aggregation method on the simulation of future SAR demand. (Aggregations A and B were deemed too trivial to be of real interest, and stochastic models of Aggregations E and F proved intractable on the authors’ hardware.)

A modification of the volume-based aggregation error, , is also considered providing a distinction between over- and under-forecasting events. Stochastic models are compared graphically, plotting the simulated output for each month of the 24-month test period against the actual demand volume observed.

4 Analysis

4.1 Distance-Based Aggregation Error

The distances, in nautical miles, between the aggregated demand point and the subsequent demand nodes during 2016 - 2017 are shown in Table 1. The resulting distance-based aggregation error for the quadrat models reflect the law of diminishing returns, as described by Emir-Farinas (2004). The first division of the region of study, from Aggregation A to Aggregation B, results in an 82.3% reduction to the locational aggregation error. This error was continuously diminished with additional divisions. These results support the trend of location error generally reducing with additional zones.

Aggregation Number of Zones
A 1 1,471,479 2,195,276
B 2 251,042.3 312,118.6
C 8 171,531.3 225,615
D 15 158,119.1 208,812.7
E 43 86,745.88 119,741.5
F 212 51,553.33 66,668.67
ZDM 36 80,165.06 92,669.37
SZDM 15 92,067.72 97,425.77
Table 1: Distance-Based Aggregation Error

Aggregations ZDM and SZDM perform very well compared to the quadrat models. The zonal distribution model has a lower associated location error than Aggregation E, despite only having 36 zones compared to Aggregation E’s 43 zones. This runs counter to the general claim that more zones always improves the accuracy of the location model, suggesting instead that deliberate steps can be implemented to aggregate spatial demand points in fewer clusters while still achieve competitively low levels of location error. The stochastic zonal distribution model’s results support this observation, achieving a 41.7% reduction in distance-based aggregation error compared to Aggregation D despite using the same number of zones.

Similar trends are observed when the attention is shifted from the error in SAR event distances to the error in resource dispatch distances. There is a steady improvement in accuracy as the number of zones is increased, with the exception of Aggregations ZDM and SZDM. Additionally, the differences between and are notably larger for the quadrat models compared to Aggregations ZDM and SZDM; the stochastic zonal distribution model had the smallest increase in location error when weighting by the number of resources dispatched. These observations suggest that deliberate zoning of demand point can enhance the robustness of aggregate zones to weighted events, particularly when the zones are developed with consideration to both geographic proximity and the operational characteristics that are tied to the event weights.

4.2 Deterministic Volume-Based Aggregation Error

The total error in volume based upon the median monthly demand for each zone compared to the actual demand volumes as depicted in Table 2. The phenomena described by Tamir (2008) and Dark and Bram (2007) is observed; there is a general increase in total volume-based aggregation error as the number of zones increases.

Aggregation Number of Zones
A 1 139
B 2 189
C 8 288
D 15 306
E 44 372
F 212 584
ZDM 36 458
Table 2: Volume-Based Aggregation Error for Deterministic Demand Modeling

Interestingly, implementing the zonal distribution model corresponds to a large volume-based aggregation area, second only to Aggregation F; see Figure 11. This suggests deliberate clustering based on geographic proximity does not correspond to improvements in deterministic demand volume modeling.

Figure 11: Comparison of the Total Volume-Based Aggregation Error for Deterministic Demand Modeling

Additional analysis compared the tendency for different aggregation models to overpredict versus underpredict demand volume. A plot of this analysis is shown in Figure 12, colorcoding the region of overprediction as red and underprediction as blue. For each month, Aggregation A and B perform equally well; the lines overlap in the plot. With the exception of Aggregation F, all methods adhere to similar trends in spikes and drops throughout the test timeframe. The general trend is for models to underpredict more consistently as they incorporate more aggregated zones. The exception to this trend is the zonal distribution model, which continues to have greater volume-based aggregation error compared to Aggregation E.

Figure 12: Comparison of Over- and Under-predictions fo Deterministic Demand Modeling

4.3 Stochastic Volume-Based Aggregation Error

A comparison of stochastic demand models was used probability distributions fit to each zone in Aggregations C, D, and SZDM. The results from these simulations are compared to the actual observed demand levels for the two-year test period; see Figure 13. Note that since the demand distributions were observed to be relatively stationary at large, each month’s simulated volume from each model is determined by random draws from static probability distributions assigned to each zone (i.e., poisson and gamma-poisson distributions).

Figure 13: Comparison of Stochastic Demand Models and Observed Demand Levels

Since the results from Figure 13 are randomly generated, the emphasis is less on the specific results from month-to-month and more on whether overall trend appears similar to the observed trend. This analysis shows similar trends for the three stochastic demand models, suggesting that they all could be used to effectively simulate the stochastic demand of the AOR. Aggregation C does make a notable spike in simulated SAR activity at the end of the test period, caused by the coincidence of multiple zones within the model simulating larger-than-normal demand volume. This phenomena was investigated further.

While the observed demand volume fluctuates from month-to-month, it stays within the bounds of 30 and 60 events per month. Using these levels as thresholds, a monte carlo simulation of 10,000 2-year models was constructed. For each of the 240,000 simulated months, Table 3 shows the number that were beyond the thresholds of 30 and 60 events per month. All models appear relatively stable compared to these bounds; Aggregation C, with the greatest number of ‘extreme months’, only had approximately 4.6% of the 240,000 months classified as ‘extreme’. The stochastic zonal distribution model appeared to be the most stable of the three considered models, having the fewest months classified as ‘extreme’ on either side of the bound. These findings suggests that while extreme months are not likely to be a significant occurrence in a simulation of SAR demand, the stochastic zonal distribution model minimizes the likelihood this will occur.

Aggregation Below 30 Events Above 60 Events
C 6175 5056
D 5402 4660
SZDM 4727 3854
Table 3: Comparison of Extreme Months over 10,000 2-Year Simulations

5 Conclusion

The method used to aggregate spatiotemporal demands affects the outcome of location models built using the aggregated data, thus an understanding of the impacts of aggregation methods is fundamental. We have presented a framework for comparison of both static and stochastic spatiotemporal aggregation models, utilizing both a distance based aggregation error metric, an event magnitude weighted distance based aggregation error metric, and a volume based aggregation error metric. We further applied this framework to test six quadrat aggregation models of varying fidelity’s, and two zonal based models, using historical search and rescue data from a massive scale region possessing highly variable demands. As expected aggregations with greater fidelity tend to reduce the distance-based aggregation error. In addition implementation of a deliberate zoning approach (e.g., ZDM and SDZM) further reduce this error while utilizing fewer zones. However, higher fidelity aggregations with increased number of zones has a detrimental effect on the modelling of demand volumes. Finally, stochastic representations of SAR demand appears to be effective at simulating actual SAR demand.

Based on the results of our aggregation analysis we propose the following as potential exploratory efforts. Zonal techniques based on hierarchies and clustering techniques seem very promising, additional research on the impacts of clustering techniques could be fruitful. Additionally combining these zonal techniques, with their associated reduced location errors, with a lower fidelity aggregation model to project region level demands may be useful. Finally, a study examining possible nonlinear dynamic effects on the resulting output of location models as a result of changes in aggregation method may be informative.


  • J.J. Achutegui (2007) Optimum placement of sea rescue resources. Safety Science 45, pp. 941–951. External Links: Document Cited by: §2.
  • C. Araz, H. Selim, and I. Ozkarahan (2007) A fuzzy multi-objective covering-based vehicle location model for emergency services. Computers & Operations Research 34 (3), pp. 705–726. Cited by: §2.
  • W.D. Cook (1979) Goal programming models for assigning search and rescue aircraft to bases. The Journal of the Operational Research Society 30, pp. 555–561. External Links: Document Cited by: §2, §2.
  • J. Current and D. Schilling (1987) Elimination of source a and b errors in p-median location problems. Geographical Analysis 19, pp. . External Links: Document Cited by: §2.
  • J. Current and D. Schilling (1990) Analysis of errors due to demand data aggregation in the set covering and maximal covering location problems. Geographical Analysis 22, pp. 116–126. External Links: Document Cited by: §2.
  • A. Curtis and A. MacPherson (1996) The zone definition problem in survey research: an empirical example from new york state. Professional Geographer 48, pp. 310–323. External Links: Document Cited by: §2, §2.
  • A. Curtis (1995) The zone definition problem in location-allocation modeling. Geographical Analysis 27, pp. 60–77. External Links: Document Cited by: §2, §2.
  • J. Dando (2005) Chapter 2. methods and techniques for understanding crime hot spots. Mapping Crime: Understanding Hot Spots, pp. 15–34. External Links: Document Cited by: §2, §2.
  • S. Dark and D. Bram (2007) The modifiable areal unit problem (maup) in physical geography. Progress in Physical Geography 31, pp. 471–479. External Links: Document Cited by: §2, §2, §4.2.
  • H.A. Eiselt (2018) A modular capacitated multi-objective model for locating maritime search and rescue vessels. Annals of Operations Research 267, pp. 3–28. External Links: Document Cited by: §2, §2, §3.1, §3.2.
  • H. Emir-Farinas (2004) Aggregation decomposition and aggregation guidelines for a class of minimax and covering location models. Geographical Analysis 36(4), pp. 332–349. External Links: Document Cited by: §2, §4.1.
  • M. Flanigan (2008) Optimization of aeromedical base locations in new mexico using a model that considers crash nodes and paths. Accident Analysis and Prevention 40, pp. 1105–1114. External Links: Document Cited by: §2.
  • C. Gehlke and K. Biehl (1934) Certain effects of grouping upon the size and correlation coefficient in census tract material. Journal of the American Statistical Association 29, pp. . External Links: Document Cited by: §2.
  • M.M. Gunal (2017) An ilp and simulation model to optimize search and rescue helicopter operations. Journal of the Operational Research Society 68, pp. 1335–1351. External Links: Document Cited by: §2, §2.
  • E. Hillsman and R. Rhoda (1978) Errors in measuring distances from populations to service centers. Annals of Regional Science 1, pp. 74–88. External Links: Document Cited by: §2.
  • B. Li and W. Szeto (2019) Taxi service area design: formulation and analysis. Transportation Research Part E: Logistics and Transportation Review 125, pp. 308–333. Cited by: §2.
  • T. Lowe (2014) Comparative error bound theory for three location models: continuous demand versus discrete demand. TOP 1, pp. 144–169. External Links: Document Cited by: §2.
  • T.J. Lowe (1992) On worst-case aggregation analysis for network location problems. Annals of Operations Research 40, pp. 229–246. External Links: Document Cited by: §2.
  • B. Lunday (2019) Optimal heterogeneous asset location modeling for expected spatiotemporal search and rescue demands using historic event data. Manuscript submitted for publication , pp. . External Links: Link, Document Cited by: §2, §3.1, §3.2, §3.2, §3.
  • P. Makwana (2013)

    Review on determining number of cluster in k-means clustering

    International Journal of Advance Research in Computer Science and Management Studies 1(6), pp. 90–95. External Links: Document Cited by: §3.1.
  • A. Mehrotra (2009) US coast guard air station location with respect to distress calls: a spatial statistics and optimization based methodology. European Journal of Operational Research 196, pp. 1086–1096. External Links: Document Cited by: §2, §2, §3.2.
  • S. Openshaw (1984) The modifiable areal unit problem. Geo Books, Norwich. Cited by: §2, §2.
  • C. Papadimitriou (1981) Worst case and probabilistic analysis of a geometric location problem. SIAM Journal on Computing 1, pp. . External Links: Document Cited by: §2.
  • R. Pelot (2018) A maritime search and rescue location analysis considering multiple criteria, with simulated demand. INFOR: Information Systems and Operations Research 56:1, pp. 92–114. External Links: Document Cited by: §2.
  • L. Qi and Z.J.M. Shen (2010) Worst-case analysis of demand point aggregation for the euclidean p-median problem. European Journal of Operational Research 202, pp. 434–443. External Links: Document Cited by: §2.
  • S. Rajendran and J. Zack (2019) Insights on strategic air taxi network infrastructure locations using an iterative constrained clustering approach. Transportation Research Part E: Logistics and Transportation Review 128, pp. 470–505. Cited by: §2.
  • N. Razi and M. Karatas (2016) A multi-objective model for locating search and rescue boats. European Journal of Operational Research 254, pp. 279–293. External Links: Document Cited by: §2, §3.1, §3.2, §3.
  • A. Tamir (2004) Demand point aggregation analysis for a class of constrained location models: a penalty function approach. IIE Transactions 36, pp. 601–609. External Links: Document Cited by: §2.
  • A. Tamir (2008) Aggregation error for location models: survey and analysis. Annals of Operations Research 167, pp. 171–208. External Links: Document Cited by: §2, §4.2.
  • G. Tita (2000) Spatial analysis of crime. Measurement and Analysis of Crime and Justice vol 4, pp. 213–262. External Links: Document Cited by: §2, §2.
  • U.S. Coast Guard (2013) The u.s. coast guard addendum to the united states national search and rescue supplement to the international aeronautical and maritime search and rescue manual. COMDTINST M16130.2F, . Cited by: §1.
  • U.S. Coast Guard (2014) Fourteenth coast guard district search and rescue plan. CGD14INST M16130.1A. Cited by: §3.
  • J. Wu (1996) The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology 11, pp. 129–140. External Links: Document Cited by: §2.
  • E. Zemel (1984) Probabilistic analysis of geometric location problems. Annals of Operations Research 1, pp. 215–238. External Links: Document Cited by: §2.
  • L. Zhang (2015) The optimization model for the location of maritime emergency supplies reserve bases and the configuration of salvage vessels. Transportation Research Part E: Logistics and Transportation Review 83, pp. 170–188. External Links: Document Cited by: §2.