Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs

07/19/2019 ∙ by Yusuke Tanaka, et al. ∙ Kyoto University 0

We propose a probabilistic model for inferring the multivariate function from multiple areal data sets with various granularities. Here, the areal data are observed not at location points but at regions. Existing regression-based models require the fine-grained auxiliary data sets on the same domain. With the proposed model, the functions for respective areal data sets are assumed to be a multivariate dependent Gaussian process (GP) that is modeled as a linear mixing of independent latent GPs. Sharing of latent GPs across multiple areal data sets allows us to effectively estimate spatial correlation for each areal data set; moreover it can easily be extended to transfer learning across multiple domains. To handle the multivariate areal data, we design its observation model with a spatial aggregation process for each areal data set, which is an integral of the mixed GP over the corresponding region. By deriving the posterior GP, we can predict the data value at any location point by considering the spatial correlations and the dependences between areal data sets simultaneously. Our experiments on real-world data sets demonstrate that our model can 1) accurately refine the coarse-grained areal data, and 2) offer performance improvements by using the areal data sets from multiple domains.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Governments and other organizations are now collecting data from cities on items such as poverty rate, air pollution, crime, energy consumption, and traffic flow. These data play a crucial role in improving the life quality of citizens in many aspects including socio-economics rupasinghaa:social ; Smith:poverty , public security bogomolov:once ; wang:crime , public health jerrett:spatial , and urban planning yuan:discovering . For instance, the spatial distribution of poverty is helpful in identifying key regions that require intervention in a city; it makes it easier to optimize resource allocation for remedial action.

Figure 1: Areal data

In practice, these data collected from cities are often spatially aggregated, e.g., averaged over a region; thus only areal data are available; observations are not associated with location points but with regions. Figure 1 shows an example of areal data, which is the distribution of poverty in New York City, where darker hues represent regions with higher rates. The poverty rate data are actually obtained via household surveys taken over the whole city. The survey results are aggregated over the predefined regions Smith:poverty . The problem addressed herein is to infer the function from the areal data. Solving this problem allows us to obtain spatially-specific information about cities; it is useful for finding key pin-point regions efficiently.

One promising approach to address this problem is to utilize a wide variety of data sets from the same city. Existing regression-based models learn relationships between target data and auxiliary data sets Law:variational ; murakami:new ; park:spatial ; Smith:poverty ; tanaka:refining . These models assume that we have the auxiliary data at fine spatial granularity (e.g., 1 km 1 km grid cells); however many areal data sets are actually associated with larger regions (e.g., zip code and police precinct). These models cannot then make full use of the coarse-grained auxiliary data sets. Another important drawback of all the prior works is that their performance in refining the areal data is suspect if we have only a few data sets available for the city.

In this paper, we propose a probabilistic model, called Spatially Aggregated Gaussian Processes (SAGP) herein after, that can infer the multivariate function from multiple areal data sets with various granularities. In SAGP, the functions for respective areal data sets are assumed to be a multivariate dependent Gaussian process (GP) that is modeled as a linear mixing of independent latent GPs. The latent GPs are shared among all areal data sets in the target city, which is expected to effectively learn the spatial correlation for each data set even if the number of observations in a data set is small; that is, its data set is associated with the coarse-grained regions. Since the areal data are identified by regions, not location points, we introduce an observation model with the spatial aggregation process, in which areal observations are assumed to be calculated by integrating the mixed GP over the corresponding region; then the covariance between regions is given by the double integral over the corresponding regions. It allows us to accurately evaluate the covariance between regions considering their shapes. Thus it is more helpful in such a situation that there be non-regular shaped regions (e.g., extremely elongated) in the input data.

The mechanism adopted in SAGP for sharing latent processes is also advantageous in that it makes it straightforward to utilize data sets from multiple cities. This allows our model to learn the spatial correlation for each data set by sharing the latent GPs among all areal data sets from multiple cities; SAGP remains applicable even if we have only a few data sets available for a single city.

The inference of SAGP is based on a Bayesian inference procedure. The model parameters can be estimated by maximizing the marginal likelihood, in which all the GPs are analytically integrated out. By deriving the posterior GP, we can predict the data value at any location point considering the spatial correlations and the dependences between areal data sets simultaneously.

2 Related Work

Related works can be roughly categorized into two approaches: 1) regression-based model and 2) multivariate model. The major difference between them is as follows: Denoting and as target data and auxiliary data, respectively, the aim of the first approach is to design a conditional distribution

; the second approach designs a joint distribution

instead.

Regression-based models. A related problem has been addressed in the spatial statistics community under the name of statistical downscaling, spatial disaggregation,

areal interpolation

, or fine-scale modeling gotway:combining , and this has attracted great interest in many disciplines such as socio-economics bogomolov:once ; Smith:poverty , agricultural economics howitt:spatial ; xavier:disaggregating , epidemiology sturrock:fine , meteorology wilby:guidelines ; wotling:regionalization , and geographical information system (GIS) goovaerts:combining . Regression-based models have been developed for refining coarse-grained target data via the use of multiple auxiliary data sets that have fine granularity (e.g., 1 km 1 km grid cells) murakami:new ; park:spatial . These models learn the regression coefficients for the auxiliary data sets under the spatial aggregation constraints that encourage consistency between fine- and coarse-grained target data. There have been a number of advanced models that allow a fully Bayesian inference keil:downscaling ; taylor:continuous ; wilson:pointless or a variational inference Law:variational for model parameters. The task addressed in these works is to refine the coarse-grained target data on the assumption that the fine-grained auxiliary data are available; however the areal data available on a city are actually associated with various geographical partitions (e.g., police precinct), then one might not be able to obtain the fine-grained auxiliary data. In that case, these models cannot make full use of the auxiliary data sets with various granularities, which contain the coarse-grained auxiliary data.

A Gaussian process-based model was recently proposed for refining coarse-grained areal data by utilizing auxiliary data sets with various granularities tanaka:refining . In this model, a Gaussian process regression is first applied to each auxiliary data set for deriving a predictive distribution defined on the continuous space; this conceptually corresponds to spatial interpolation. By hierarchically incorporating the predictive distributions into the model, the regression coefficients can be learned on the basis of not only the strength of relationship with the target data but also the level of spatial granurality. A major disadvantage of this model is that the spatial interpolation is separately conducted for each auxiliary data set, which makes it difficult to accurately interpolate the coarse-grained auxiliary data due to the data sparsity issue; this model cannot fully use coarse-grained data.

In addition, these regression-based models (e.g., Law:variational ; park:spatial ; tanaka:refining ) do not consider the spatial aggregation constraints for the auxiliary data sets; meanwhile it is a critical factor in estimating the multivariate function from multiple areal data sets, which is the problem focused in this paper. Different from the regression-based models, we design a joint distribution that incorporates the spatial aggregation process for all areal data sets (i.e., for both target and auxiliary data sets). The proposed model infers the multivariate function while considering the spatial aggregation constraints for respective areal data sets. This allows us to effectively utilize all areal data sets with various granularities for the data refinement even if some auxiliary data sets have coarse granularity.

Multivariate models. The proposed model builds closely upon recent studies in multivariate spatial modeling, which model the joint distribution of multiple outputs. Geostatistics studies widely use the classical method of co-kriging for predicting multivariate spatial data myers:co-kriging ; this method is, however, problematic in that it is unclear how to define cross-covariance functions that determine the dependences between data sets gibbs:efficient

. In the machine learning community, there has been growing interest in multivariate Gaussian processes 

carl:gaussian , in which dependences between data sets are introduced via methodologies such as process convolution boyle:dependent ; higdon:space , latent factor modeling luttinen:variational ; teh:semiparametric , and multi-task learning bonilla:multi-task ; micchelli:kernels . Since these works assume that the data samples are observed at location points, they cannot be straightforwardly used for modeling the areal data we focus on. The proposed model is based on the semiparametric latent factor model (SLFM) teh:semiparametric . In SLFM, the dependent Gaussian process with multivariate outputs is obtained by a linear mixing of independent latent Gaussian processes, each of which is shared among all areal data sets. To handle the multivariate areal data, we newly introduce to SLFM the observation model with the spatial aggregation process for all areal data sets; this is represented by the integral of the mixed Gaussian process over the corresponding region. This enables us to infer the multivariate function from the observed areal data sets. Furthermore, the sharing of key information (i.e., covariance function) can be used for transfer learning across a wide variety of areal data sets; this allows our model to robustly estimate the spatial correlations for respective data sets and to support areal data sets from multiple cities.

3 Proposed Model

We propose SAGP (Spatially Aggregated Gaussian Processes), a probabilistic model for inferring the multivariate function from areal data sets with various granularities. We first consider a formulation in the case of a single domain, then we mention an extension to the case of multiple domains.

Areal data. We start by describing the areal data this study focuses on. For simplicity, let us consider the case of a single domain. Assume that we have a wide variety of areal data sets from the same domain and each data set is associated with one of the geographical partitions that have various granularities. Let be the number of kinds of areal data sets. Let denote a domain (e.g., a city), and denote an input variable, represented by its coordinates (e.g., latitude and longitude). For , partition of is a collection of disjoint subsets, called regions, of . Let be the number of regions in . For , let denote the -th region in . Each areal observation is represented by the pair , where is a value associated with the -th region . Suppose that we have the set of areal data sets .

Formulation for the case of a single domain. In the proposed model, the functions for respective areal data sets on the continuous space are assumed to be the dependent Gaussian process (GP) with multivariate outputs. We first construct the multivariate dependent GP by linearly mixing some independent latent GPs. Consider independent GPs,

(1)

where and are a mean function and a covariance function, respectively, for the -th latent GP , both of which are assumed integrable. Defining as the -th GP, the -dimensional dependent GP is assumed to be modeled as a linear mixing of the independent latent GPs, then is given by

(2)

where , is an weight matrix whose -entry is the weight of the -th latent GP in the -th data set, and is an -dimensional zero-mean Gaussian noise process. Here,

is a column vector of 0’s and

with being a covariance function for the -th Gaussian noise process. By integrating out , the multivariate GP is given by

(3)

where the mean function is given by . The covariance function is given by . Here, and . The derivation of (3) is described in Appendix A. The -entry of is given by

(4)

where represents Kronecker’s delta; if and otherwise. The covariance function (4) for the multivariate GP is represented by the linear combination of the covariance functions for the latent GPs. The covariance functions for latent GPs are shared among all areal data sets, which allows us to effectively learn the spatial correlation for each data set by considering the dependences between data sets; this is advantageous in the case where the number of observations is small, that is, the spatial granularity of the areal data is coarse. In this paper we focus on the case , with the aim to reduce the number of free parameters as it helps to avoid overfitting teh:semiparametric .

(a) The case of a single domain.
(b) The case of two domains.
Figure 2: Graphical model representation of SAGP.

The areal data are not associated with location points but with regions, and their observations are obtained by spatially aggregating the original data. To handle the multivariate areal data, we design the observation model with a spatial aggregation process for each of the areal data sets. Let be a -dimensional vector consisting of the areal observations for the -th areal data set. Let denote an -dimensional vector consisting of the observations for all areal data sets, where is the total number of areal observations. Each areal observation is assumed to be obtained by integrating the mixed GP over the corresponding region; then

is generated from the following Gaussian distribution

111We here assume that the integral appearing in (5) is well-defined and well-behaved.,

(5)

where is represented by

(6)

in which , whose entry is a nonnegative weight function for spatial aggregation over region . This formulation does not depend on the particular choice of , provided that they are integrable. If one takes, for region ,

(7)

where is the indicator function; if is true and otherwise, then is the average of over . We may also consider other aggregation processes to suit the property of the areal observations, including a simple summation and a population-weighted averaging over . in (5) is an block diagonal matrix, where

is a noise variance for the

-th GP, and

is the identity matrix. Figure 

2(a) shows a graphical model representation of SAGP, where shaded and unshaded nodes indicate observed and latent variables, respectively.

Extension to the case of multiple domains. It is possible to apply SAGP to areal data sets from multiple domains. The graphical model representation of SAGP shown in Figure 2(b) is for the case of two domains. The superscript in Figure 2(b) is the domain index, and is the union of the input spaces for both domains. The key point is that the latent GPs are shared among the domains, which enables us to robustly estimate the spatial correlations of respective data sets by utilizing the areal data sets from both domains. SAGP can be extended to the case of more domains in a similar fashion.

4 Inference

Given the areal data sets, we aim to derive the posterior GP on the basis of a Bayesian inference procedure. The posterior GP can be used for predicting data values at any location point in the continuous space. The model parameters, , , , , , are estimated by maximizing the marginal likelihood, in which multivariate GP is analytically integrated out; we then construct the posterior GP by using the estimated parameters.

Marginal likelihood. Consider the case of a single domain. Given the areal data , the marginal likelihood is given by

(8)

where we analytically integrate out the GP prior . Here, is an -dimensional mean vector represented by

(9)

which is the integral of the mean function over the respective regions for all areal data sets. is an covariance matrix represented by

(10)

It is an block matrix whose -th block is a matrix represented by

(11)

Equation (11) is the double integral of the covariance function over the respective pairs of regions in ; this conceptually corresponds to aggregation of the covariance function values that are calculated at the infinite pairs of location points in the corresponding regions. Since the integrals over regions cannot be calculated analytically, in practice we use a numerical approximation of these integrals. Details are provided at the end of this section. This formulation allows for accurately evaluating the covariance between regions considering their shapes; it is more helpful if input data is obtained for irregular shaped regions (e.g., extremely elongated)222The related concept has been used in the methods (e.g., block kriging burgess:optimal ) of geostatistics; these methods, however, suffer from the difficulty in determining the dependences between data sets in multivariate settings.. By maximizing the logarithm of the marginal likelihood (8), we can estimate the parameters of SAGP.

Transfer learning across multiple domains. Consider the case of domains. Let denote the collection of data sets for the domains. In SAGP, the observations for different domains are assumed to be conditionally independent given the shared latent GPs ; the marginal likelihood for domains is thus given by the product of those for the domains:

(12)

where and are the mean vector and the covariance matrix for the -th domain, respectively. Estimation of model parameters based on (12) allows for transfer learning across the areal data sets from multiple domains via the shared covariance functions.

Posterior GP. We have only to consider the case of a single domain, because the derivation of the posterior GP can be conducted independently for each domain. Given the areal data and the estimated parameters, the posterior GP is given by

(13)

where and are the mean function and the covariance function for , respectively. Defining as

(14)

which consists of the covariances between any location point and the respective regions in all areal data sets, the mean function and the covariance function are given by

(15)
(16)

respectively. We can predict the data value at any location point by using the mean function (15). The second term in (15) shows that the predictions are calculated by considering the spatial correlations and the dependences between areal data sets simultaneously. By using the covariance function (16), we can also evaluate the prediction uncertainty. Derivation of the posterior GP is detailed in Appendix B.

Approximation of the integral over regions. The integrals over regions in (9), (11), and (14) cannot be performed analytically; thus we approximate these integrals by using sufficiently fine-grained square grid cells. We divide a total region into the square grid cells, and let be a set of grid points that are contained in region . Let us consider the approximation of the integral in the covariance matrix (11). The -entry of is approximated as follows:

(17)
(18)

where we use the formulation of the region-average-observation model (7). The integrals in (9) and (14) can be approximated in a similar way. Letting denote the number of all the grid points, the computational complexity of  (11) is ; meanwhile, assuming the constant weight (e.g., region average), its computational complexity can be reduced to , where is the cardinality of the set of distinct distance values between grid points. Here, we use the property that in (18) depends only on the distance between and . This is useful for saving the computation time and the memory requirement.

5 Experiments

Data. We evaluated SAGP using 10 and 3 real-world areal data sets from two cities: New York City and Chicago, respectively. They were obtained from NYC Open Data NYCopen and Chicago Data Portal CHIdata . We used a variety of areal data sets including poverty rate, air pollution rate, and crime rate. Each data set is associated with one of the predefined geographical partitions with various granuralities: UHF42 (42), community district (59), police precinct (77), and zip code (186) in New York City; police precinct (25) and community district (77) in Chicago, where each number in parentheses denotes the number of regions in the corresponding partition. Details about the real-world data sets are provided in Appendix C.

Refinement task. We considered the task of refining coarse-grained areal data by using multiple areal data sets with various granularities. To evaluate the performance in predicting the fine-grained areal data, we first picked up one target data set and used its coarser version for learning model parameters; then we predicted the original fine-grained target data by using the learned model. Note that the fine-grained target data was used only for evaluating the refinement performance; we did not use them in the inference process. The target data sets were poverty rate (5, 59), PM2.5 (5, 42), crime (5, 77) in New York City and poverty rate (9, 77) in Chicago, where each pair of numbers in parentheses denotes the numbers of regions in the coarse- and the fine-grained partitions, respectively. Defining

as the index of the target data set, the evaluation metric is the mean absolute percentage error (MAPE) of the fine-grained target values,

, where is the true value associated with the -th region in the target fine-grained partition; is its predicted value, obtained by integrating the -th function of the posterior GP  (13) over the corresponding target fine-grained region.

New York City Chicago
Poverty rate PM2.5 Crime Poverty rate
GPR 0.344 0.046 (–) 0.072 0.010 (–) 0.860 0.102 (–) 0.599 0.099 (–)
2-stage GP 0.210 0.022 (–) 0.042 0.005 (–) 0.454 0.075 (–) 0.380 0.060 (–)
SLFM 0.207 0.025 (4) 0.036 0.005 (6) 0.401 0.053 (2) 0.335 0.052 (2)
SAGP 0.177 0.019 (3) 0.030 0.005 (5) 0.379 0.055 (3) 0.278 0.032 (2)
Table 1:

MAPE and standard errors for the prediction of fine-grained areal data (a single city).

(a) SAGP (b) SLFM
(c) Visualization of the estimated parameters and .
Figure 3: (a,b) Refined poverty rate data in NYC, and (c) Visualization of the estimated parameters when predicting the poverty rate data in NYC. The radii of green and blue circles equal to the values of estimated by SAGP and SLFM, respectively. The edge widths are proportional to the absolute weights estimated by the respective models. Here, we omitted those edges whose absolute weights were lower than a threshold.

Setup of the proposed model. In our experiments, we used as the latent GPs zero-mean Gaussian processes, i.e., for . We used the following squared-exponential kernel as the covariance function for the latent GPs, , where is a signal variance that controls the magnitude of the covariance, is a scale parameter that determines the degrees of spatial correlation, and is the Euclidean norm. Here, we set because the variance can already be represented by scaling the columns of . For simplicity, the covariance function for the Gaussian noise process is set to , where is Dirac’s delta function. The model parameters, , , , , were learned by maximizing the logarithm of the marginal likelihood (8) or (12) using the L-BFGS method liu:on implemented in SciPy (https://www.scipy.org/). For approximating the integral over regions (see (18)), we divided a total region of each city into sufficiently fine-grained square grid cells, the size of which was 300 m 300 m for both cities; the resulting sets of grid points for New York City and Chicago consisted of 9,352 and 7,400 grid points, respectively. The number of the latent GPs was chosen from via leave-one-out cross-validation bishop:pattern ; the validation error was obtained using each held-out coarse-grained data value. Here, the validation was conducted on the basis of the coarse-grained target areal data; namely we did not use the fine-grained target data in the validation process.

Baselines. We compared the proposed model, SAGP, with naive Gaussian process regression (GPR) carl:gaussian , two-stage GP-based model (2-stage GP) tanaka:refining , and semiparametric latent factor model (SLFM) teh:semiparametric . GPR predicts the fine-grained target data simply by using only the coarse-grained target data. 2-stage GP is one of the latest regression-based models. GPR and SLFM assume that data samples are observed at location points. We thus associate each areal observation with the centroid of the region. This simplification is also used for modeling the auxiliary data sets in tanaka:refining .

Results for the case of a single city. Table 1 shows MAPE and standard errors for GPR, 2-stage GP, SLFM, and SAGP, where the numbers in parentheses denote the number

of the latent GPs for SLFM and SAGP, as determined by the validation procedure described above. For all data sets, SAGP achieved better performance in refining coarse-grained areal data; the differences between SAGP and the baselines were statistically significant (Student’s t-test). In Table 

1, the single star () and the double star () indicate significant difference at the levels of values of and , respectively. These results show that SAGP can utilize the areal data sets with various granuralities from the same city to accurately predict the refined data.

Figures 3(a) and 3(b) show the refinement results of SAGP and SLFM for the poverty rate data in New York City. Here, the predictive values of each model were normalized to the range , and darker hues represent regions with higher values. Compared with the true data in Figure 1, SAGP yielded more accurate fine-grained data than SLFM. Figure 3(c) visualizes the mixing weights and the scale parameters estimated by SAGP and SLFM when predicting the fine-grained poverty rate data in New York City, where we picked up 4 areal data sets: Poverty rate, PM2.5, unemployment rate, and the number of 311 calls; their observations were also shown. One observes that the scale parameters estimated by SAGP are relatively small compared with those estimated by SLFM, presumably because the spatial aggregation process incorporated in SAGP effectively separates intrinsic spatial correlations and apparent smoothing effects due to spatial aggregation to yield areal observations. A comparison of the estimated weights in Figure 3(c) shows that SAGP emphasized the useful dependences between data sets, e.g., the strong correlation between the poverty rate data and the unemployment rate data.

Results for the case of two cities. SLFM and SAGP can be used for transfer learning across multiple cities, which is more advantageous in such a situation that we have only a few data sets available on a single city. We here show the results of refining the poverty rate data in Chicago with simultaneously utilizing the data sets from New York City. Table 4 shows MAPE and standard errors for SLFM (trans) and SAGP (trans). Comparing Tables 1 and 4, one observes that SAGP (trans) attained improved refinement performance compared with SLFM (trans) and models trained with only the data in a single city (i.e., Chicago). the differences between SAGP (trans) and the other models were statistically significant (Student’s t-test, value of ). This result shows that SAGP (trans) transferred knowledge across the cities, and yielded better refinement results even if there are only a few data sets available on the target city. Figure 4 shows the refinement results for the poverty rate data in Chicago. We illustrate the true data on the left in Figure 4, and the predictions attained by SAGP (trans) and SLFM (trans) on the right. As shown, SAGP (trans) better identified the key regions compared with SLFM (trans).

Chicago
Poverty rate
SLFM (trans) 0.328 0.050 (6)
SAGP (trans) 0.219 0.023 (4)
(a) True (b) SAGP (trans) (c) SLFM (trans)
Figure 4: Refined poverty rate data in Chicago.
Table 2: MAPE and standard errors for the prediction of the fine-grained data (two cities).

6 Conclusion

This paper has proposed the Spatially Aggregated Gaussian Processes for inferring the multivariate function from multiple areal data sets with various granularities. To handle the multivariate areal data, we design its observation model with the spatial aggregation process for each areal data set, which is the integral of the Gaussian process over the corresponding region. We have confirmed that our model can accurately refine the coarse-grained areal data, and improve the refinement performance by using the areal data sets from multiple cities.

References

  • (1) Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  • (2) A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Pianesi, and A. Pentland. Once upon a crime: Towards crime prediction from demographics and mobile data. In ICMI, pages 427–434, 2014.
  • (3) E. Bonilla, K. M. Chai, and C. Williams. Multi-task Gaussian process prediction. In NeurIPS, pages 153–160, 2008.
  • (4) P. Boyle and M. Frean. Dependent Gaussian processes. In NeurIPS, pages 217–224, 2005.
  • (5) T. M. Burgess and R. Webster. Optimal interpolation and isarithmic mapping of soil properties. Journal of Soil Science, 31(2), 1980.
  • (6) City of Chicago. Chicago Data Portal. https://data.cityofchicago.org/.
  • (7) City of New York. NYC open data. https://opendata.cityofnewyork.us/.
  • (8) M. Gibbs and D.J.C. MacKay. Efficient implementation of Gaussian processes. Technical Report, 1997.
  • (9) P. Goovaerts. Combining areal and point data in geostatistical interpolation: Applications to soil science and medical geography. Mathematical Geosciences, 42(5):535–554, 2010.
  • (10) C. A. Gotway and L. J. Young. Combining incompatible spatial data. Journal of the American Statistical Association, 97(458):632–648, 2002.
  • (11) D. Higdon. Space and space-time modelling using process convolutions. Quantitative Methods for Current Environmental Issues, pages 37–56, 2002.
  • (12) R. Howitt and A. Reynaud. Spatial disaggregation of agricultural production data using maximum entropy. European Review of Agricultural Economics, 30(2):359–387, 2003.
  • (13) M. Jerrett, R. T. Burnett, B. S. Beckerman, M. C. Turner, D. Krewski, G. Thurston, and et al. Spatial analysis of air pollution and mortality in California. American Journal of Respiratory and Critical Care Medicine, 188(5):593–599, 2013.
  • (14) P. Keil, J. Belmaker, A. M. Wilson, P. Unitt, and W. Jetz. Downscaling of species distribution models: a hierarchical approach. Methods in Ecology and Evolution, 4(1):82–94, 2013.
  • (15) H. C. L. Law, D. Sejdinovic, E. Cameron, T. C. D. Lucas, S. Flaxman, K. Battle, and K. Fukumizu. Variational learning on aggregate outputs with Gaussian processes. In NeurIPS, pages 6084–6094, 2018.
  • (16) D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1–3):503–528, 1989.
  • (17) J. Luttinen and A. Ilin. Variational Gaussian-process factor analysis for modeling spatio-temporal data. In NeurIPS, pages 1177–1185, 2009.
  • (18) C. A. Micchelli and M. Pontil. Kernels for multi-task learning. In NeurIPS, pages 921–928, 2004.
  • (19) D. Murakami and M. Tsutsumi. A new areal interpolation technique based on spatial econometrics. Procedia-Social and Behavioral Sciences, 21:230–239, 2011.
  • (20) D. E. Myers. Co-kriging – new developments. In G. Verly, M. David, A. G. Journel, and A. Marechal, editors, Geostatistics for Natural Resources Characterization: Part 1, volume 122 of NATO ASI Series C: Mathematical and Physical Sciences, pages 295–305. D. Reidel Publishing, Dordrecht, 1984.
  • (21) N. -W. Park. Spatial downscaling of TRMM precipitation using geostatistics and fine scale environmental variables. Advances in Meteorology, 2013:1–9, 2013.
  • (22) C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
  • (23) A. Rupasinghaa and S. J. Goetz. Social and political forces as determinants of poverty: A spatial analysis. The Journal of Socio-Economics, 36(4):650–671, 2007.
  • (24) C. -C. Smith, A. Mashhadi, and L. Capra. Poverty on the cheap: Estimating poverty maps using aggregated mobile communication networks. In CHI, pages 511–520, 2014.
  • (25) H. J. W. Sturrock, J. M. Cohen, P. Keil, A. J. Tatem, A. L. Menach, N. E. Ntshalintshali, M. S. Hsiang, and Roland D Gosling. Fine-scale malaria risk mapping from routine aggregated case data. Malaria Journal, 13:421, 2014.
  • (26) Y. Tanaka, T. Iwata, T. Tanaka, T. Kurashima, M. Okawa, and H. Toda. Refining coarse-grained spatial data using auxiliary spatial data sets with various granularities. In AAAI, 2019 to appear.
  • (27) B. M. Taylor, R. Andrade-Pacheco, and H. J. W. Sturrock. Continuous inference for aggregated point process data. Journal of the Royal Statistical Society: Series A (Statistics in Society), page 12347, 2018.
  • (28) Y. W. Teh, M. Seeger, and M. I. Jordan. Semiparametric latent factor models. In AISTATS, pages 333–340, 2005.
  • (29) H. Wang, D. Kifer, C. Graif, and Z. Li. Crime rate inference with big data. In KDD, pages 635–644, 2016.
  • (30) R. L. Wilby, S. P. Zorita, E. Timbal, B. Whetton, and L. O. Mearns. Guidelines for Use of Climate Scenarios Developed from Statistical Downscaling Methods, 2004.
  • (31) K. Wilson and J. Wakefield. Pointless spatial modeling. Biostatistics, 2018.
  • (32) G. Wotling, C. Bouvier, J. Danloux, and J. -M. Fritsch. Regionalization of extreme precipitation distribution using the principal components of the topographical environment. Journal of Hydrology, 233(1-4):86–101, 2000.
  • (33) A. Xavier, M. B. C. Freitas, M. D. S. Rosrio, and R. Fragoso. Disaggregating statistical data at the field level: An entropy approach. Spatial Statistics, 23:91–103, 2016.
  • (34) J. Yuan, Y. Zheng, and X. Xie. Discovering regions of different functions in a city using human mobility and POIs. In KDD, pages 186–194, 2012.

Appendix A Derivation of the multivariate GP

In this appendix, we show that the process defined via (2) is itself a multivariate GP with mean function and covariance function . To prove that is indeed a multivariate GP, one has only to show that, for an arbitrary and an arbitrary set of points ,

is a multivariate Gaussian random variable. By the definition (2) of

, one has

(19)

where we let and , and where denotes the Kronecker product. By the definition of Gaussian processes, since and are Gaussian processes, and are multivariate Gaussian random variables. Since (19) shows that is a linear combination of the multivariate Gaussian random variables and , it is itself multivariate Gaussian, irrespective of the choice of . This in turn shows that is again a multivariate Gaussian process.

Mean of is given by

(20)

Covariance of and is given by

(21)

These show that the mean function and the covariance function of the multivariate Gaussian process are given by and , respectively.

Appendix B Derivation of the posterior GP

In this appendix, we derive the posterior Gaussian process shown in Section 4. It should be noted that we here assume, without concrete justification, that the relevant integrals are all well-defined and well-behaved. Let be a multivariate GP defined on taking values in . For an arbitrary and an arbitrary set of points , let

(22)

By the definition of GP, is a -dimensional Gaussian vector. Let

(23)

and

(24)

be the mean vector and the covariance matrix of . In the following, we specifically assume that are taken to be grid points of a regular grid covering and with the grid cell volume , and consider Riemann sums to approximate those integrals on appearing in the formulation of SAGP. We then take the limit to derive the posterior GP given areal observations on .

Consider the observation process yielding observations , defined by

(25)

where

(26)

and where is an -dimensional Gaussian noise vector with mean zero and covariance . One has

(27)

and

(28)

respectively. The posterior of given is known to be a multivariate Gaussian with mean

(29)

and covariance

(30)

respectively, where .

By regarding sums over the terms as Riemann sums approximating the corresponding integrals over , in the limit , one can replace those sums over terms with the corresponding integrals over . Specifically, one has

(31)
(32)
(33)

showing that the mean vector and the covariance matrix of are reduced in this limit to the vector and the matrix defined in (9) and (10), respectively. One also has

(34)

where is defined in (14). The above calculation shows that in the limit the posterior process is a multivariate GP with mean function and covariance function given by (15) and (16), respectively.

Appendix C Description of real-world areal data sets

We used the real-world areal data sets from NYC Open Data [7