1 Introduction
Many cities around the world are now collecting large amounts of spatial data from a wide range of sources. Governments and other organizations are releasing data on items such as poverty rate, air pollution, traffic flow, energy consumption and crime [Shadbolt et al.2012, Goldstein and Dyson2013, Barlacchi et al.2015]. Analyzing such spatial data is of critical importance in improving the life quality of citizens in many fields such as socioeconomics [Rupasinghaa and Goetz2007, Smith, Mashhadi, and Capra2014], public health [Jerrett et al.2013], public security [Bogomolov et al.2014, Wang et al.2016] and urban planning [Yuan, Zheng, and Xie2012]. For example, knowing the spatial distribution of poverty enables us to optimize allocation of resources for remedial action. Likewise, the spatial distribution of air pollution is useful in creating policies that can control air quality and thus protect human health.
Naturally, information at fine spatial granularity is preferred because it allows us to identify key regions that require intervention to improve city environments efficiently. As an example, Figures 1(a) and 1(b) visualize the distributions of poverty rates in New York City by community district and by borough, respectively; darker hues represent poorer regions. Clearly, to better understand socioeconomic problems, Figure 1(a) is better than Figure 1(b). In practice, however, such information is often aggregated into coarse granularities as in Figure 1(b). It is usually thought to be too timeconsuming and costly to conduct a census over the whole population of a city, and a sample survey
is conducted instead. Accordingly, the number of samples associated with each finegrained region may not be large enough to provide a statistically significant estimate of the value associated to this region; the typical response is to aggregate samples over larger regions
[Smith, Mashhadi, and Capra2014].With the recent increase in data availability, utilizing auxiliary spatial data sets on the same region is an effective way of refining coarsegrained target data [Bogomolov et al.2014, Park2013, Smith and Capra2016, Smith, Mashhadi, and Capra2014, Wotling et al.2000]. In these works, the regression models are used for estimating the relationships between target data (e.g., poverty rate) and auxiliary data sets (e.g., unemployment rate). These existing methods, however, require that the spatial granularities of all the auxiliary data sets are the same as the desired granularity of target data. This requirement prevents us from making full use of the auxiliary data sets with various granularities. The auxiliary data sets are actually associated with various geographical partitions. For example, New York City has released various spatial data sets portioned into boroughs, community districts, zip code, police precincts and so on.
We propose a probabilistic model for refining coarsegrained target data through the effective utilization of auxiliary data sets with various granularities. An important characteristic is discerning the usefulness of each auxiliary data set which depends on not only the strength of relationship with the target data but also the level of spatial granularity. For example, consider the case of two auxiliary data sets that have the same strength of relationship with the target data, but different spatial granularities. In that case, the finergrained one is seen as more helpful for refining the coarsegrained target data.
With the proposed model, the finegrained target data are assumed to follow a Gaussian process (GP) [Rasmussen and Williams2006]
whose mean function is modeled by a linear regression of the auxiliary data sets. This GPbased modeling allows us to consider the spatial correlation in the target data and the auxiliary data sets simultaneously. Since the target data are observed not at fine granularity but at coarse granularity, we model a spatial aggregation process to transform the finegrained target data into the coarsegrained target data. Furthermore, to handle auxiliary data sets with various granularities, we apply GP regression to each auxiliary data set to derive a predictive distribution defined on the continuous space; this conceptually corresponds to spatial interpolation. A key idea is that it hierarchically incorporates the predictive distributions into the model; that is, it does not use point estimates. This enables us to consider uncertainty in the prediction of auxiliary data sets. The uncertainty is governed by several factors, one of which is sample density, i.e., spatial granularity of the auxiliary data; the finer the granularity is, the lower the uncertainty is. Incorporating the uncertainty leads to effectively learning the usefulness of the auxiliary data with consideration of the levels of spatial granularity; this allows our model to accurately refine the coarsegrained target data. We predict the finegrained target data via a Bayesian inference procedure. The proposed model is designed such that the estimation of model parameters based on the exact marginal likelihood is possible: By analytically integrating out the variables of finegrained target and auxiliary data, we can estimate the parameters without explicitly obtaining these variables. We construct the predictive distribution of the finegrained target data by using the estimated parameters.
2 Related Work
The problem of refining coarsegrained spatial data has been studied in various fields such as socioeconomics [Smith and Capra2016, Smith, Mashhadi, and Capra2014], agricultural economics [Howitt and Reynaud2003, Xavier et al.2016], epidemiology [Sturrock et al.2014] , meteorology [Wilby et al.2004, Zorita and von Storch1999] and geographical information system (GIS) [Boucher and Kyriakidis2006, Goovaerts2010]. This problem is also called statistical downscaling, spatial disaggregation, and areal interpolation. The previous works can be categorized into two cases in terms of target data availability.
In the first case, in which a large amount of coarse and finegrained target data are available, we can predict the finegrained target data by using a mapping function from coarse to finegrained data. The mapping function can be learnt by using various machine learning methods including linear regression models
[Hessami et al.2008][Cannon2011, Misra, Sarkar, and Mitra2017][Ghosh2010]. Recently, superresolution techniques based on deep neural networks have been applied for refining coarsegrained spatial data
[Vandal et al.2017, Vandal et al.2018]. The superresolution techniques aim to learn a mapping function from low to highresolution images [Dong et al.2014]. The method by [Vandal et al.2017] is based on the analogy between gridded spatial data and images; values at grid cells are regarded as values at pixels. The large amount of finegrained data needed for training is, however, not available in many cases (e.g., poverty survey), and often only coarsegrained data are available. These methods are not applicable in such situations.In the second case, in which only coarsegrained target data are available, many regressionbased methods have been proposed that use auxiliary spatial data sets to refine coarsegrained target data [Flaxman, Wang, and Smola2015, Smith, Mashhadi, and Capra2014, Wang et al.2016, Zheng, Liu, and Hsieh2013, Zheng et al.2015]. Regression models (linear and nonlinear) are used for estimating the relationships between target data and auxiliary data sets. A few methods can construct the regression models under the spatial aggregation constraints [Murakami and Tsutsumi2011, Park2013]. The constraints state that a value associated with a coarsegrained region is a linear average of their constituent values in a finegrained partition. In order to satisfy the spatial aggregation constraints, the regression residuals at the coarsegrained regions are allocated to the finegrained regions by using the spatial interpolation method, i.e., kriging [Stein1999]. These methods, however, assume that the auxiliary data sets have spatial granularities equivalent to that of finegrained target data to be estimated. This assumption makes it difficult to utilize multiple auxiliary data sets with various granularities.
Several regression methods have been developed for estimating relationships between multiscale spatial data sets [Miller et al.2015, Diodato et al.2010, Xu2017, Xu et al.2018]. These methods predict the target data with the same granularity as that of the training data by utilizing multiscale auxiliary data sets. They do not, however, consider the spatial aggregation constraint, which is a critical factor in predicting the finescale target data from the coarsescale target data.
There have been several hierarchical Bayesian models to predict finegrained target data using finegrained auxiliary data sets [Taylor, AndradePacheco, and Sturrock2018, Wilson and Wakefield2017, Keil et al.2013]. Although they introduce a fully Bayesian treatment for model parameters, the uncertainty in the prediction of auxiliary data sets is ignored: They cannot discern the usefulness of each auxiliary data set considering their levels of spatial granularity.
Different from prior works, the proposed model can effectively make use of auxiliary data sets with various granularities by hierarchically incorporating Gaussian processes. This hierarchical modeling allows us to effectively learn the usefulness of each auxiliary data set considering the levels of spatial granularity. Our model also considers the spatial aggregation constraints by integrating the Gaussian processes with a spatial aggregation process to transform the finegrained target data into the coarsegrained target data.
3 Problem Formulation
Symbol  Description 

set of indices of auxiliary spatial data sets  
index of auxiliary spatial data set,  
total region of a city  
location point represented by  
latitude and longitude coordinates,  
coarsegrained partition of of target data  
region in the coarsegrained partition  
of target data,  
finegrained partition of of target data  
region in the finegrained partition  
of target data,  
partition of of th auxiliary data set  
region in the partition of th auxiliary data set,  
value associated with region in coarsegrained  
target data,  
value associated with region in finegrained  
target data,  
value associated with region in th auxiliary  
data set, 
In this section, we describe the spatial data this study focuses on, and define our problem of refining coarsegrained spatial data by using, for the same region, auxiliary spatial data sets with various granularities. Assume that we have a target spatial data set with coarse granularity, and we would like to obtain a finegrained version. Let be the collection of indices of auxiliary data sets. The notations used in this paper are listed in Table 1.
Partition: Let be a total region of a city, and be a location point represented by its coordinates (e.g., latitude and longitude). Partition of is a collection of disjoint subsets, called regions, of , whose union is equal to . Let denote the number of regions in . We can consider several partitions of as follows. Let be the coarsegrained partition, i.e., that of the coarsegrained target data. Let be the finegrained partition, of the desired finegrained target data. For , let be the partition of the th auxiliary data set.
Spatial data: Let be a dimensional vector consisting of the coarsegrained target values, where is the value associated with region . For , let be a dimensional vector consisting of the th auxiliary data values, where is the value associated with region of the th auxiliary data set.
Problem: Suppose that we have coarsegrained target data whose partition is , auxiliary data sets with the respective partitions , and the desired finegrained partition , we wish to estimate a dimensional vector consisting of the finegrained target values, where is the value associated with region . Here, the values , and are assumed to be intensive quantities such as ratios; that is, they are independent of the area scale of the respective regions. When the values are extensive quantities such as population, they can be transformed into intensive quantities by dividing them with the areas of regions.
4 Proposed Model
We propose a probabilistic model that allows auxiliary spatial data sets with various granularities to be used in refining coarsegrained spatial data. Our model is based on Gaussian process (GP) [Rasmussen and Williams2006]
, which is a flexible nonparametric model for nonlinear functions in a continuous domain. We model the generative process for coarsegrained target data
, given the auxiliary data sets with known partitions , coarsegrained partition , and finegrained partition. In other words, we model the conditional probability
instead of the joint probability of and . It enables us to adopt twostep inference approach described in Section 5, which is advantageous in the computational cost for learning model parameters.The generative process (given three auxiliary data sets) is illustrated schematically in Figure 2, where darker hues represent regions with higher values. This process contains the following three steps: (a) Deriving the predictive distribution over continuous space for each auxiliary data set via GP regression, which corresponds to spatial interpolation; (b) generating the finegrained target data via a GP whose mean function is modeled as the linear regression of the continuous predictive distributions of the auxiliary data sets; (c) generating the coarsegrained target data by spatially aggregating the constituent values in a finegrained partition.
In our problem, each value is associated with a region in a partition rather than a single location point in ; this prevents us from directly applying GP. We thus associate each region in a partition with its centroid, and regard each value as being associated with the centroid of that region. This assumption, while significantly simplifying computations involved, might worsen the fit of the GP to the data set, which however is appropriately taken into account in the following steps as increased uncertainty of the GPs for both the respective auxiliary data sets (described in (5)) and the target data (described in (6)). For , let be the set of the centroids in partition , where is the centroid of region . Similarly, for finegrained partition , let be the set of centroids in . Thus, our problem is now reformulated as estimating , where is a target value at the centroid of region , as indicated by the auxiliary spatial data sets .
(a) Deriving predictive distributions of auxiliary spatial data sets: In order to handle auxiliary spatial data sets with various granularities, we use GP regression to derive a posterior Gaussian process for a latent continuous random function on ; this conceptually corresponds to spatial interpolation of each auxiliary spatial data set. We then evaluate the predictive distribution on the basis of the posterior Gaussian process. Let be a noisefree latent function for the th auxiliary data set at location . We assume that follows a Gaussian process, , with mean zero and a covariance function . Though our model does not depend on any particular choice of the covariance function, for simplicity we consider the wellknown covariance function, i.e., squaredexponential kernel, which is widely used for measuring the similarity between function values in spatial coordinates [Rasmussen and Williams2006]. The squaredexponential kernel is defined as
(1) 
where is the scale parameter,
is a signal variance that controls the magnitude of the covariance, and
is the Euclidean norm. We assume that the th auxiliary data is generated with an additive Gaussian noise with noise variance . If represents the prediction of the th auxiliary data set for the centroids of the finegrained partition, the predictive distribution of is as follows:(2) 
where is the predictive means, and is the covariance matrix, whose diagonal elements represent the uncertainties in the prediction at the test points . Incorporation of the predictive distributions (2) is expected to allow the usefulness of auxiliary data to be effectively learnt as it allows consideration of the uncertainty in the prediction. Details are given in (7) in Section 5. Here, is a covariance matrix whose entries are covariances between training points . is a covariance matrix whose entries are covariances between training points and test points . is a covariance matrix whose entries are covariances between test points .
(b) Generative process of finegrained target data: We model a generative process for the finegrained target data . Let be a noisefree latent function for the finegrained target data at location . We assume that follows a Gaussian process, , with mean function , where and are the regression coefficient of the th auxiliary data set and the bias parameter, respectively. The covariance function is a squaredexponential kernel with the scale parameter and signal variance . Given the predictive values for the auxiliary data sets from (2), the conditional distribution of at the centroids is given by
(3) 
where and is a covariance matrix defined by . Here, we let be the number of auxiliary data sets. We define the augmented matrix as the matrix , in which is a column vector of 1’s. This GPbased modeling enables us to consider the spatial correlation in the target data and the auxiliary data sets simultaneously.
(c) Generative process of coarsegrained target data: We design a spatial aggregation process to transform the finegrained target data into the coarsegrained target data , in order to encourage consistency between , which is to be estimated, and the available coarsegrained target data . In the spatial aggregation process, a value associated with one region in the coarsegrained partition is obtained by aggregating the values in the finegrained regions contained in the coarsegrained region (see the upper part of Figure 2). Then, is generated from the following conditional distribution given ,
(4) 
where is the noise variance for the coarsegrained target data, and is a aggregation matrix, whose entries are nonnegative weighting coefficients; the row sum of should equal 1. We set the coefficients in accordance with the property of the target data. For example, in cases where target data are incidences of disease, then the entry of would be proportional to the population in the intersection of the coarsegrained region and the finegrained region . In the following, for simplicity, we consider a simple aggregation matrix, in which entry is if the finegrained region is contained in the coarsegrained region , and zero otherwise. Here, is a subset of , all the elements of which are contained in the coarsegrained region .
5 Inference
Given the coarsegrained target data , the auxiliary spatial data sets with centroids , the centroids of finegrained partition and the aggregation matrix , we aim to predict the finegrained target data via a Bayesian inference procedure. In order to calculate the predictive distribution of
, we need to estimate the model parameters. The problem of estimating the model parameters can be divided into two steps: 1) estimate hyperparameters
for each auxiliary data set and 2) estimate regression coefficient and hyperparameters for the target data. Although one could also opt for estimating all the model parameters simultaneously (i.e., onestep inference), it will increase the computational cost of inference drastically; we adopt the efficient twostep inference as described in the following paragraphs. We finally construct the predictive distribution of by using the estimated parameters. Details of the inference procedure are shown in Algorithm 1.The first inference step: Given the th auxiliary spatial data set with centroids , the marginal likelihood of is given by
(5) 
The hyperparameters are estimated by maximizing the logarithm of (5). We solve the optimization problem through the use of the BFGS method [Liu and Nocedal1989]. By solving the optimization problem for each auxiliary data set independently, we obtain the set of the estimated hyperparameters for all auxiliary data sets. The predictive distribution of corresponding to (2) is obtained using the estimated hyperparameters.
The second inference step: Given the coarsegrained target data and the centroids of finegrained partition , the marginal likelihood of is given by
(6) 
where is a matrix, and we analytically integrate out the latent variables and with the help of the conjugacy of the distributions (2), (3), and (4). is a covariance matrix represented by , where . The entry of is shown in (7).
(7) 
Here, in (7) represents Kronecker delta; if , and otherwise. The residual variance term in (7) represents the residual variance in the regression of . This term contains the uncertainty in the prediction of , i.e., , which is weighted by . The spatial correlation term in (7) represents the strength of spatial correlation between and . This term contains the covariance between and , i.e., , which is weighted by . On the basis of the marginal likelihood (6) with this covariance matrix , our model can effectively learn the regression coefficient while taking into consideration the prediction uncertainties and the spatial correlations from the auxiliary data sets with various granularities, simultaneously. The parameter and the hyperparameters , , are estimated by maximizing the logarithm of (6). We solve the optimization problem by using the BFGS method [Liu and Nocedal1989]. The derivatives of the logarithm of (6) with respect to , , , are described in Appendix A.
Predictive distribution of finegrained target data: Using the estimated model parameters, the predictive distribution of the finegrained target data is given by
(8) 
where is the predictive means, and where is the covariance matrix. We can obtain the refinement results, i.e., the estimated finegrained target data, by using the predictive means . By analyzing the covariance matrix , we can also evaluate the confidence of the refinement results.
6 Experiments
Data description: We evaluated the proposed model using realworld spatial data sets from NYC Open Data ^{1}^{1}1https://opendata.cityofnewyork.us. There are 44 data sets that contain a variety of categories such as social indicators, land use, air quality and taxi traffic. Each data set is associated with one of six geographical partitions, i.e., school district (32), UHF42 (42), community district (59), police precinct (77), zip code (186) and taxi zone (249), where each number in parenthesis denotes the number of regions in the corresponding partition. In our experiments, we try to refine the poverty rate data set and the five air pollution data sets (i.e., PM2.5, ozone, formaldehyde, benzene, elemental carbon). The experimental setting is as follows: 1) Given the poverty rate data set with the borough partition (), we would like to refine the data into the community district partition (), and 2) given each air pollution data set with the borough partition (), we aim to refine the data into the UHF42 partition (). Appendix B details the data sets and the settings.
Baselines: The existing methods can be applied to auxiliary data sets with various granularities if preprocessing is applied, i.e., spatial interpolation, so that the granularities of the auxiliary data sets match with that of the finegrained target data. Accordingly, we first performed spatial interpolation of each auxiliary data set by using GP regression; we then obtained the predictive values at the centroids of the target finegrained partition so that the spatial granularities of all auxiliary data sets equaled that of the finegrained target data. We compared the proposed model with three baselines: GP regression (GPR) [Rasmussen and Williams2006], Linear regressionbased method (LRbased method) [Smith, Mashhadi, and Capra2014] and Twostage statistical downscaling method (2stage SD) [Park2013]. Here, GPR is a simple spatial interpolation, namely, it predicts the finegrained target data by using only the coarsegrained target data . Details of these baselines are given in Appendix C.
PM2.5  Ozone  Formaldehyde  Benzene  Elemental carbon  Poverty rate  

Proposed model  
2stage SD  
LRbased method  
GPR 
and standard errors for the predictions of the finegrained target data.
Finegrained target data prediction: We evaluated our model in terms of its performance in predicting finegrained target data
. The evaluation metric is the mean absolute percentage error (MAPE) in finegrained target values:
, where is the true value associated with region in the target finegrained partition; is its predicted value. Table 2shows the MAPE and the standard error of absolute percentage error for the proposed model, 2stage SD, LRbased method and GPR. For all data sets, our model performed better than the baselines, and the differences between our model and the baselines are statistically significant (Student’s ttest). In Table
2, the single star () and the double star () indicate significant difference at the levels of and , respectively. We found similar results using other evaluation metrics (e.g., MAE, RMSE, RMSPE). These results show that our model well utilized the auxiliary data sets with various granularities to accurately predict the finegrained target data.Figures 3 and 4 visualize the predicted finegrained target data for the PM2.5 data set and for the poverty rate data set, respectively. We illustrate the true finegrained data on the left in Figures 3 and 4, and the predictions made by the proposed model, 2stage SD and LRbased method on the right. Here, the predictive values of each method were normalized to the range , and darker hues represent regions with higher values. As shown in these figures, our model refined the coarsegrained data more precisely than the other methods. In particular, in both data sets, our model achieved significant improvement in the north part of the map (i.e., Manhattan). Such visualization results are useful for finding key regions, e.g., the poorest regions of a city.
Proposed model  2stage SD  

Auxiliary data  Auxiliary data  
1.  Fire incident (Zip code)  0.173  12 fam. bldg (Comm.)  0.088 
2.  Taxi dropoff (Taxi zone)  0.139  Hospital (Comm.)  0.069 
3.  311 call (Zip code)  0.135  Public school (Comm.)  0.069 
4.  Public telephone (Zip code)  0.114  Lots of vacant (Comm.)  0.067 
5.  Natural gas (Zip code)  0.109  Crime (Police precinct)  0.064 
Evaluation of auxiliary spatial data sets: Table 3 shows the top five relevant auxiliary data sets as determined by our model and 2stage SD for the PM2.5 data set. These auxiliary data sets are arranged in descending order of the absolute values of the estimated regression coefficient , each of which is listed in the “” columns of Table 3. By comparing the sorted list of the auxiliary data sets created by the proposed model with that yielded by 2stage SD, we can confirm that the proposed model assigned relatively large regression coefficients to the auxiliary data sets with finergrained partitions (i.e., Zip code and Taxi zone).
Figures 5 and 6 visualize the top two relevant auxiliary data sets as estimated by our model and 2stage SD for the PM2.5 data set, respectively. Comparing these visualizations with that of the true target data in Figure 3(a) shows that our model emphasized the most useful auxiliary data sets, i.e., those that are both strongly related with the target data and have fine granularities; 2stage SD evaluated the usefulness of auxiliary data sets only in terms of the strength of relationships with the target data.
Figure 7 shows the relation between the regression coefficient and the uncertainty in the prediction of auxiliary data sets estimated by the proposed model for the PM2.5 data set. In this figure, each auxiliary data set is depicted by a dot whose color indicates its partition. The horizontal axis shows the averages of the variances in the predicted values of each auxiliary data set; for the th auxiliary data set, the average of variances was calculated by , which is the degree of uncertainty in predicting the th auxiliary data set; the vertical axis shows the absolute values of the estimated coefficients. As shown, the absolute coefficient values estimated by our model were likely to be higher for the auxiliary data sets that had lower degrees of uncertainty. These results indicate that our model can effectively learn the usefulness of each auxiliary data set by considering the uncertainty in the prediction of auxiliary data sets. Consequently, the proposed model can precisely refine the coarsegrained target data by effectively utilizing auxiliary data sets with various granularities.
7 Conclusion
This paper has proposed a probabilistic model for refining coarsegrained spatial data by utilizing auxiliary spatial data sets with various granularities on the same region. Our model can effectively make use of auxiliary data sets with various granularities by hierarchically incorporating Gaussian processes. Our model also has the advantage of allowing the inference of model parameters based on the exact marginal likelihood, in which the variables of finegrained target and auxiliary data are analytically integrated out. Using multiple realworld spatial data sets in New York City, we confirmed that our model can predict the finegrained target data more precisely compared with the baselines.
Our future work is to consider shapes of regions as in the previous study [Rathbun1998]: The assumption of using the centroid of each region allows for GPbased formulations and significantly simplifying computations involved; meanwhile, it might worsen the fit of the GP to the exotic shaped regions (e.g., extremely elongated). Another future work is to incorporate fully Bayesian treatment for model parameters. It can be expected to provide the better results.
References
 [Barlacchi et al.2015] Barlacchi, G.; Nadai, M. D.; Larcher, R.; Casella, A.; Chitic, C.; and G. Torrisi et al., j. 2015. A multisource dataset of urban life in the city of Milan and the province of Trentino. Scientific Data 2.
 [Bogomolov et al.2014] Bogomolov, A.; Lepri, B.; Staiano, J.; Oliver, N.; Pianesi, F.; and Pentland, A. 2014. Once upon a crime: Towards crime prediction from demographics and mobile data. In ICMI, 427–434. ACM.
 [Boucher and Kyriakidis2006] Boucher, A., and Kyriakidis, P. C. 2006. Superresolution land cover mapping with indicator geostatistics. Remote Sensing of Environment 104:264–282.
 [Cannon2011] Cannon, A. J. 2011. Quantile regression neural networks: Implementation in R and application to precipitation downscaling. Computers & Geosciences 37(9):1277–1284.
 [Diodato et al.2010] Diodato, N.; Bellocchi, G.; Bertolin, C.; and Camuffo, D. 2010. Multiscale regression model to infer historical temperatures in a central mediterranean subregional area. Climate of the Past Discussions 6:2625–2649.
 [Dong et al.2014] Dong, C.; Loy, C. C.; He, K.; and Tang, X. 2014. Learning a deep convolutional network for image superresolution. In ECCV, 184–199. Springer.
 [Flaxman, Wang, and Smola2015] Flaxman, S. R.; Wang, Y. X.; and Smola, A. J. 2015. Who supported Obama in 2012?: Ecological inference through distribution regression. In KDD, 289–298. ACM.
 [Ghosh2010] Ghosh, S. 2010. SVMPGSL coupled approach for statistical downscaling to predict rainfall from GCM output. Journal of Geophysical Research: Atmospheres 115(D22).
 [Goldstein and Dyson2013] Goldstein, B., and Dyson, L. 2013. Beyond transparency: Open data and the future of civic innovation.
 [Goovaerts2010] Goovaerts, P. 2010. Combining areal and point data in geostatistical interpolation: Applications to soil science and medical geography. Mathematical Geosciences 42(5):535–554.
 [Hessami et al.2008] Hessami, M.; Gachon, P.; Ouarda, T. B.; and StHilair, A. 2008. Automated regressionbased statistical downscaling tool. Environmental Modeling & Software 23(6):813–834.
 [Howitt and Reynaud2003] Howitt, R., and Reynaud, A. 2003. Spatial disaggregation of agricultural production data using maximum entropy. European Review of Agricultural Economics 30(2):359–387.
 [Jerrett et al.2013] Jerrett, M.; Burnett, R. T.; Beckerman, B. S.; Turner, M. C.; Krewski, D.; and et al., G. T. 2013. Spatial analysis of air pollution and mortality in California. American Journal of Respiratory and Critical Care Medicine 188(5):593–599.
 [Keil et al.2013] Keil, P.; Belmaker, J.; Wilson, A. M.; Unitt, P.; and Jetz, W. 2013. Downscaling of species distribution models: a hierarchical approach. Methods in Ecology and Evolution 4(1):82–94.
 [Kyriakidis2004] Kyriakidis, P. C. 2004. A geostatistical framework for areatopoint spatial interpolation. Geographical Analysis 36(3):259–289.
 [Liu and Nocedal1989] Liu, D. C., and Nocedal, J. 1989. On the limited memory BFGS method for large scale optimization. Mathematical programming 45(1–3):503–528.
 [Miller et al.2015] Miller, B. A.; Koszinski, S.; Wehrhan, M.; and Sommer, M. 2015. Impact of multiscale predictor selection for modeling soil properties. Geoderma 239–240:97–106.

[Misra, Sarkar, and
Mitra2017]
Misra, S.; Sarkar, S.; and Mitra, P.
2017.
Statistical downscaling of precipitation using long shortterm memory recurrent neural networks.
Theor. Appl. Climatol.  [Murakami and Tsutsumi2011] Murakami, D., and Tsutsumi, M. 2011. A new areal interpolation technique based on spatial econometrics. ProcediaSocial and Behavioral Sciences 21:230–239.
 [Park2013] Park, N. W. 2013. Spatial downscaling of TRMM precipitation using geostatistics and fine scale environmental variables. Advances in Meteorology 2013.
 [Rasmussen and Williams2006] Rasmussen, C. E., and Williams, C. K. I. 2006. Gaussian processes for machine learning.
 [Rathbun1998] Rathbun, S. L. 1998. Spatial modelling in irregularly shaped regions: Kriging estuaries. Environmetrics 9:109–129.
 [Rupasinghaa and Goetz2007] Rupasinghaa, A., and Goetz, S. J. 2007. Social and political forces as determinants of poverty: A spatial analysis. The Journal of SocioEconomics 36(4):650–671.
 [Shadbolt et al.2012] Shadbolt, N.; O’Hara, K.; BernersLee, T.; Gibbins, N.; Glaser, H.; Wendy, H.; and Schraefel, M. C. 2012. Linked open government data: Lessons from data.gov.uk. IEEE Intelligent Systems 27(3):16–24.
 [Smith and Capra2016] Smith, C. C., and Capra, L. 2016. Beyond the baseline: Establishing the value in mobile phone based poverty estimates. In WWW, 425–434. ACM.
 [Smith, Mashhadi, and Capra2014] Smith, C. C.; Mashhadi, A.; and Capra, L. 2014. Poverty on the cheap: Estimating poverty maps using aggregated mobile communication networks. In CHI, 511–520. ACM.
 [Stein1999] Stein, M. L. 1999. Interpolation of spatial data: Some theory for kriging.
 [Sturrock et al.2014] Sturrock, H. J. W.; Cohen, J. M.; Keil, P.; Tatem, A. J.; Menach, A. L.; Ntshalintshali, N. E.; Hsiang, M. S.; and Gosling, R. D. 2014. Finescale malaria risk mapping from routine aggregated case data. Malaria Journal 13:421.
 [Taylor, AndradePacheco, and Sturrock2018] Taylor, B. M.; AndradePacheco, R.; and Sturrock, H. J. W. 2018. Continuous inference for aggregated point process data. Journal of the Royal Statistical Society: Series A (Statistics in Society) 12347.
 [Vandal et al.2017] Vandal, T.; Kodra, E.; Ganguly, S.; Michaelis, A.; Nemani, R.; and Ganguly, A. R. 2017. DeepSD: Generating high resolution climate change projections through single image superresolution. In KDD, 1663–1672. ACM.
 [Vandal et al.2018] Vandal, T.; Kodra, E.; Ganguly, S.; Michaelis, A.; Nemani, R.; and Ganguly, A. R. 2018. Generating high resolution climate change projections through single image superresolution: An abridged version. In IJCAI, 5389–5393.
 [Wang et al.2016] Wang, H.; Kifer, D.; Graif, C.; and Li, Z. 2016. Crime rate inference with big data. In KDD, 635–644. ACM.
 [Wilby et al.2004] Wilby, R. L.; Zorita, S. P.; Timbal, E.; Whetton, B.; and Mearns, L. O. 2004. Guidelines for Use of Climate Scenarios Developed from Statistical Downscaling Methods.
 [Wilson and Wakefield2017] Wilson, K., and Wakefield, J. 2017. Pointless continuous spatial surface reconstruction. [online]. Available: https://arxiv.org/abs/1709.09659.
 [Wotling et al.2000] Wotling, G.; Bouvier, C.; Danloux, J.; and Fritsch, J. M. 2000. Regionalization of extreme precipitation distribution using the principal components of the topographical environment. Journal of Hydrology 233(14):86–101.
 [Xavier et al.2016] Xavier, A.; Freitas, M. B. C.; Rosrio, M. D. S.; and Fragoso, R. 2016. Disaggregating statistical data at the field level: An entropy approach. Spatial Statistics 23:91–103.
 [Xu et al.2018] Xu, J.; Liu, X.; Wilson, T.; Tan, P. N.; Hatami, P.; and Luo, L. 2018. Muscat: Multiscale spatiotemporal learning with application to climate modeling. In IJCAI, 2912–2918.
 [Xu2017] Xu, J. 2017. Multitask learning and its application to geospatiotemporal data. ProQuest Dissertations Publishing.
 [Yuan, Zheng, and Xie2012] Yuan, J.; Zheng, Y.; and Xie, X. 2012. Discovering regions of different functions in a city using human mobility and pois. In KDD, 186–194. ACM.
 [Zheng et al.2015] Zheng, Y.; Yi, X.; Li, M.; Li, R.; Shan, Z.; Chang, E.; and Li, T. 2015. Forecasting finegrained air quality based on big data. In KDD, 2267–2276. ACM.
 [Zheng, Liu, and Hsieh2013] Zheng, Y.; Liu, F.; and Hsieh, H. P. 2013. Uair: When urban air quality inference meets big data. In KDD, 1436–1444. ACM.
 [Zorita and von Storch1999] Zorita, E., and von Storch, H. 1999. The analog method as a simple statistical downscaling technique: Comparison with more complicated methods. Journal of Climate 12:2474–2489.
Appendix A Derivatives of model parameters
The logmarginal likelihood of is given by
(9) 
We describe the first derivatives of (9) with respect to , , , , which is required for estimating the parameter based on the BFGS method. The derivative of (9) with respect to is given by
(10) 
where and is a matrix of elementwise derivatives. The derivative of the element (7) is obtained by
(11) 
Denoting , the derivative of (9) with respect to is given by
(12) 
The matrix of elementwise derivatives is trivial. The derivative of the element (7) with respect to each hyperparameter is as follows:
(13) 
(14) 
(15) 
Appendix B Description of realworld spatial data sets
We used the realworld spatial data sets from NYC Open Data ^{2}^{2}2https://opendata.cityofnewyork.us. for evaluating the proposed model. The data sets were collected and released for improving the urban environment in New York City, and contain a variety of categories such as social indicators, land use, air quality and taxi traffic. Details of the data sets are listed in Table 4. There are multiple data sets in each category, with the total number of data sets being 44. Each data set is associated with one of six geographical partitions, i.e., school district, UHF42, community district, police precinct, zip code and taxi zone. These partitions have various spatial granularities; the number of regions in each partition is shown in Table 4. These data sets are gathered once a year using the time ranges shown in Table 4; the values of data are divided by the number of observation times. When the values of data are extensive quantities (i.e., proportional to the scale of areas, e.g., population), the values are divided by the areas of respective regions; the resulting values are intensive quantities (i.e., independent of area scale, e.g., population density).
In our experiments, we try to refine the poverty rate data set in the social indicator category and the five air pollution data sets in the air quality category. The poverty rate data set contains the values of poverty rates associated with each region in the community district partition as visualized in Figure 1(a). The air pollution data sets contain the average concentrations of pollutants (i.e., PM2.5, ozone, formaldehyde, benzene, elemental carbon) associated with each region in the UHF42 partition. In order to evaluate the performance in refining coarsegrained data, we used the data that were aggregated into a coarsergrained partition, i.e., borough partition, via spatial averaging, where the borough partition has five regions as illustrated in Figure 1(b). The experimental setting is as follows: 1) Given the poverty rate data set with borough partition (), we would like to refine the data into the community district partition (), and 2) given each air pollution data set with the borough partition (), we aim to refine the data into the UHF42 partition (). In the setting for the poverty rate data set, we used all data sets other than the target data as auxiliary data sets, so the number of auxiliary data sets was 43. In the setting for the air pollution data sets, we used all data sets not contained in the air quality category, so was 36.
Appendix C Baselines description
For GPR, we predict the finegrained target data based only on the coarsegrained target data . For LRbased method and 2stage SD, given the coarsegrained target data and the predictive values of all auxiliary data sets , we predict the finegrained target data . Details of these baselines are given below.
Gaussian process regression (GPR): We compared our proposed model with a simple spatial interpolation (i.e., GPR) of the coarsegrained spatial data . This baseline assumes that the target data are explained by only the spatial correlation. Given and the set of centroids of the coarsegrained partition , we predicted the finegrained target data by using the predictive distribution. Note that this baseline does not use the auxiliary spatial data sets.
Linear regressionbased method (LRbased method): We used a linear regressionbased method that has been applied in various studies (e.g., [Bogomolov et al.2014, Smith, Mashhadi, and Capra2014]). The linear regression model is used for estimating the relationships between the coarsegrained target data and the auxiliary data sets. The procedure in the training phase is as follows: 1) aggregate all auxiliary data sets into the coarsegrained partition of target data via spatial averaging; 2) estimate the regression coefficients of the respective auxiliary data sets by using the coarsegrained target data and the auxiliary data sets aggregated via spatial averaging. In the prediction phase, generate unknown values for the target finegrained partition by applying the estimated relationships to the predictive values of auxiliary data sets as follows: , where is the estimated regression coefficient.
Twostage statistical downscaling method (2stage SD): We used the statistical downscaling method proposed in [Park2013]. This method assumes that coarsegrained target data can be decomposed into linear regression terms and residual terms. The downscaling procedure is divided into two stages. In the first stage, we obtain the regression coefficients in a manner similar to the training phase of the LRbased method. In the second stage, given the estimated coefficient , the finegrained target data are estimated to be those that satisfy the following relation:
(16) 
This relation expresses the spatial aggregation constraint, i.e., the assumption that value associated with coarsegrained region is the linear average of the constituent values in the finegrained partition. Here, and are the residuals in the coarsegrained and finegrained partitions, respectively. To obtain the finegrained target data , the residual value in the finegrained partition must be determined. Since the linear regression terms have already been fixed in the first stage, is obtained from (16); the residuals in the finegrained partition are predicted by applying the spatial interpolation method, i.e., areatopoint simple kriging [Kyriakidis2004], to the residuals in the coarsegrained partition.
Category/Name  #data sets  Partition  #regions  Time range  Description 

Education  3  School district  32  2010  Class size, ratio of #pupils to #teachers, SAT score 
Air quality  8  UHF42  42  2009–2010  Average concentration of pollutants 
Social indicator  13  Community district  59  2009–2013  Poverty rate, population, mean commute time, etc. 
Land use  11  Community district  59  2009–2013  Area percentage for commercial office, parking, etc. 
Crime  1  Police precinct  77  2010–2016  Number of crimes 
Incident  2  Zip code  186  2010–2016  #311 calls, #fire incidents 
Telecommunication  2  Zip code  186  2016  #public telephones, #free WiFi hotspots 
Consumption  2  Zip code  186  2010–2014  Greenhouse gas (GHG) emission, natural gas consumption 
Taxi traffic  2  Taxi zone  249  2014–2016  #taxi pickup and dropoff events 
Comments
There are no comments yet.