1 Introduction
The empirical analysis of geospatial patterns has a long tradition, with applications ranging from estimating rainfall patterns
[AzimiZonooz et al., 1989] to predicting housing prices [Basu and Thibodeau, 1998]. Recently, machine learning methods have become increasingly popular for these tasks. Traditional techniques to model spatial dependencies include clustering
[Huang et al., 2013] or kernel methods like Gaussian Processes (GPs) [Datta et al., 2016]. Recent years have seen efforts to scale GP models to highdimensional data
[Gardner et al., 2018]and the emergence of convolutional neural networks (CNNs) for learning spatial representations
[Shi et al., 2015]. But while deep learning methods like CNNs improve upon GP models by enabling noneuclidean, graphstructured data [Henaff et al., 2015], they appear to struggle with longrange spatial dependencies [Linsley et al., 2018]. A recent review paper by Reichenstein et al. [Reichstein et al., 2019] highlights further problems of deep learning applications with spatial data, setting a research agenda aiming to improve the representation of spatial structures, particularly in deep learning methods.Furthering this agenda, we explore how generative adverserial nets (GANs) [Goodfellow et al., 2014] can capture spatially dependent data and how we can leverage them to learn observed spatial patterns. As they preform well on visual data, in the geospatial context GANs have been used for generating satellite imagery [Lin et al., 2017]. However, geospatial point patterns—data points distributed across continuous or discrete dimensional space with one or more feature dimensions—remain unexplored in that regard. While previous studies have examined GAN performance in the presence of onedimensional autocorrelation, such as temporal point processes [Xiao et al., 2017] or financial timeseries [Koshiyama et al., 2019], the multidimensional correlation structures in geospatial point patterns pose a more complex challenge. We tackle this issue by introducing SpaceGAN: Borrowing well established techniques from geographic information science, we use spatial neighbourhoods as context to train a conditional GAN (cGAN) and optimize cGAN selection for the best representation of the inputs local spatial autocorrelation structures. GANs are difficult to train, often failing to converge to a stable solution. Our novel stopping criterion explicitly measures the quality of the representation of observed spatial patterns. Furthermore, this approach enables us to work with data distributed in discrete and continuous space. Respresentations learned by SpaceGAN can be used for downstream tasks, even on outofsample geospatial locations. We show how this can be used for prediction via an ensemble learning framework. We test our approach on synthetic and realworld geospatial prediction tasks and evaluate the results using spatial crossvalidation.
The main contributions of this study are as follows: First, we introduce a novel cGAN approach for geospatial data domains, focusing on capturing spatial dependencies. Second, we introduce a novel ensemble learning method tailored to spatial prediction tasks by utilizing SpaceGAN samples as training data for a set of base learners. Across different experimental settings, we show that SpaceGANgenerated samples can substantially improve the performance of predictive models. As such, the results also have practical implications: our proposed framework can be used to inflate lowdimensional spatial data. This allows for enhanced model training and reduced bias by compensating for a lack of training data. We thus improve generalization performance, even when compared to existing methods for data augmentation. The remainder of this paper is structured as follows: Section 2 introduces the SpaceGAN framework and elaborates on the technical details in respect to the cGAN architecture and spatial autocorrelation representation. In Section 3, we evaluate SpaceGAN empirically using synthetic and realworld data and comparing it to existing methods for spatial data augmentation and ensemble learning. Section 4 reviews existing literature related to our study.
2 SpaceGAN
2.1 Spatial Correlation Structures
The socalled "First Law of Geography", made famous by Waldo Tobler, states that "everything is related to everything else, but near things are more related than distant things" [Tobler, 1970]. Following this premise, when working with geospatial data, inherent local interdependencies represent an additional information layer that can be exploited. A brief example to illustrate this concept: In a typical city, when we want to estimate the price of a house, we might want to check house prices at nearby locations. If, for instance, the house is located in a rich, spatially contained neighbourhood, just knowing the price of a nearby property and without any further knowledge about the features of the house (e.g. size, age), can provide us with an informed guess. Let us formulate this intuition by first defining the th data point as a tuple , where describes a set of features, describes the target vector and describes the point coordinates in
space. While a supervised learning setting with target
is not needed, we introduce it here for simplicity since we apply this during the experiments in Section 3. The features can be distributed across space randomly, or follow a—global or local—spatial process. This can be examined by measuring the correlation of a feature with its local neighbourhood, the so called local spatial autocorrelation, which is given by the Moran’s I metric [Moran, 1950]. While originally theorized for phenomena distributed in dimensional space, the concept was widely popularized in geostatistics by Luc Anselin [Anselin, 1995]. His formalization gives a local autocorrelation coefficient for a vector distributed across space. While this can be applied to any vector in the feature set of , we will explain the concept using the target vector here. We assume to follow some spatial process . As such, consists of realvalued observations referenced by an index set indicating the spatial unit corresponding to the coordinate . Let the neighbourhood of the spatial unit be . In accordance with our conceptualization above, we can then compute its local spatial autocorrelation as:(1) 
where represents the mean of ’s and are components of a weight matrix indicating membership of the local neighbourhood set between the observations and . For distributed in continuous space, the weight matrix can, for example, correspond to a nearestneighborhood with if and otherwise. For distributed in discrete space (e.g. nonoverlapping, bordering polygons), the weight matrix could for example correspond to a queen neighbourhood (see Figure 1). The Moran’s I metric hence takes in a vector distributed in space and its corresponding neighbourhood structure to calculate how strongly (positively or negatively) the vector is autocorrelated with its spatial neighbourhood at any given location. Intuitively, this makes the selection of the weight matrix , i.e. the definition of "neighbourhood", an important design choice which we have to account for when trying to augment spatial data imitating the spatial autocorrelation structures of the input. For this augmentation process, we turn towards a popular family of generative models: GANs.
2.2 Spatiallyconditioned GANs
GANs are a class of models employing two Neural Networks: a Generator () and a Discriminator (). The Generator is responsible for producing a latent representations of the input, attempting to replicate a given data generation process. It is defined as a neural network with parameters , mapping noise to some feature space (). The Discriminator, a neural network , aims to probabilistically distinguish the synthetic input created by the Generator and real data (). Both networks compete in a minimax game, improving their performance until the real and synthetic data are undistinguishable from one another. But while GANs have been successfully applied in many areas, training them is highly nontrivialSalimans et al. [2016], Gulrajani et al. [2017] and remains an area of intense study Arjovsky et al. [2017], Gulrajani et al. [2017], Wang et al. [2018], Mao et al. [2017]. This is further complicated by the non nature of geospatial data, in which learning an unconditional model would ignore inherent local dependencies. To overcome this, a sampling process taking spatial structure into account is needed, thus preserving statistical properties such as local spatial autocorrelation.
Therefore, conditional GANs (cGANs) Mirza and Osindero [2014] are better fit to handle contextdependent data generation, such as geospatial data. In cGANs, the input to both the generator and discriminator are augmented by a context vector . Typically, represents a class label that we want the cGAN to generate an input for, but it can be any form of contexualization. Formally, we can define a cGAN by including the conditional variable in the original formulation so that and . The minimax game between and is then given as :
(2) 
cGANs have previously been used for spatial conditioning of image data, using pixel coordinates. In our formulation, this would translate to setting [Lin et al., 2019, Hu et al., 2017]. However, this approach is not sufficient for our problem since mere conditioning on the point coordinate alone would omit valuable information about the local neighbourhood of each point. Instead, for each point we are interested in capturing how its features relate to those of neighbouring points . As such, we define the SpaceGAN context vector of point as .
Similarly to our intuition of spatial autocorrelation, outlined above, we assume that the features of nearby data points may offer valuable information on the pointofinterest. By conditioning each data point on all neighbouring points we allow for the learning of local patterns across the feature space. Beyond this, the versatility of constructing spatial weights enables experimentation with and optimization of different spatial neighbourhood definitions. This offers a flexibility that is not provided by point coordinate conditioning.
2.3 Training and Selecting Generators for Spatial Data
One problem concerning GANs is that they typically fail to converge to a stable solution. To overcome this, we seek to tie training convergence to some measure of quality of the synthesized data. Accordingly, we propose to evaluate the generator performance by the faithfulness of its produced spatial patterns in relation to the true patterns observed in the input. For this, we introduce a new metric, the Mean Moran’s I Error (MIE). It is defined as the mean absolute difference between the local spatial autocorrelation of the input versus that of the generated samples :
(3) 
We apply this metric for model selection by choosing the model that minimizes , i.e. the loss of local spatial autocorrelation between real and generated . In our supervised learning setting, we are particularly interested in a faithful representation of the target vector and hence use to calculate . Of course, can also be calculated using any other feature vector from . An implementation for multidimensional input is also formalized by Anselin [Anselin, 2019] or can be achieved by averaging through multiple features. To train SpaceGAN, we proceed as when training a normal cGAN, but include the stoppingcriterion. Algorithm 1 details our training procedure.
(4) 
The set of userdefined hyperparameters for running
SpaceGAN Training and Selection mainly encompass: and architectures, number of lags , noise vector size and prior distribution, minibatch size, number of epochs, snapshot frequency (
), number of samples as well as parameters associated to the stochastic gradient optimizer. For a precise description of the architecture and specific settings, see the experiments in Section 3 for details. Notably, our proposed stopping criterion can be seen as choosing the best member from a population of GANs acquired during training. In this way, our approach resembles "snapshot ensembling", introduced by Huang et al. [Huang et al., 2017].2.4 Ganning: GAN augmentation for ensemble learning
A common usecase of geospatial data is spatial prediction. We approach this from an ensemble learning perspective. In ensemble learning, individually "weak" base learners (e.g. Regression Trees) can be aggregated and as such outperform "strong" learners (e.g. Support Vector Machines). Traditionally, this idea include models like Random Forest, Gradient Boosting Trees and other implementations that make use of Bagging, Boosting or Stacking principles
Friedman et al. [2001], Efron and Hastie [2016]. Here, we follow Koshiyama et al. [Koshiyama et al., 2019] and utilize SpaceGANgenerated samples as training data for the ensemble learners. This approach has not been applied to spatial data before, and since it is analogous to Bagging, we will refer to it as "Ganning" from hereon. Algorithm 2 outlines this approach. Assuming a fully trained and parametrized SpaceGAN, we repeatedly draw SpaceGAN samples and train a base learner for each. After repeating this for samples we return the whole set of base modelsas an ensemble. The benefits of ensemble learning schemes can be best explained using the variance reduction lemma
Friedman et al. [2001]. Intuitively, we can reduce the variance of the ensemble by averaging many weakly correlated predictors. Following the concept of biasvariance tradeoff Friedman et al. [2001], Efron and Hastie [2016], the ensemble Mean Squared Error (MSE) decreases, particularly when low bias and high variance base learners such as Deep Decision Trees are used. Nevertheless, there is a potential risk factor to this approach. Should
SpaceGAN fail to replicate the true data generation process truthfully, SpaceGAN samples might not only be more diverse, but also more "biased". Consequentially, this could lead to base learners missing obvious patters, or finding new patters that do not exist in the real data.3 Experiments
(bottom) of the observed and SpaceGAN generated data for Toy 1. 
(bottom) of the observed and SpaceGAN generated data for Toy 2. 

We evaluate our proposed methods in two experiments. First, we assess SpaceGAN’s ability to generate spatial data, including realistic representations of its internal spatial autocorrelation structure. Second, we analyze the use of SpaceGAN samples in an ensemble learning approach for spatial predictive modeling. For this, we use three different datasets:
Toy 1: The data points are a rectangular grid of regularly distributed, synthetic point coordinates , a random Gaussian noise vector and an outcome variable , a simple quadratic function of the spatial coordinates and random vector .
Toy 2: The data points are a rectangular grid of regularly distributed, synthetic point coordinates , a random Gaussian noise vector and an outcome variable . Here, is a more complex combination of a function, a function and a linear global pattern of and .
California Housing: This realworld dataset describes the prices of California houses, taken from the 1990 census. The house prices come with point coordinates
and some further predictor variables
, such as house age or number of bedrooms. The dataset was introduced by Pace and Barry [Kelley Pace and Barry, 2003] and is a standard example for continuous, spatially autocorrelated data.All our experiments are conducted using 10fold spatial crossvalidation [Pohjankukka et al., 2017]. Here, points spatially close to the test set are removed from the training set. This is done to prevent overfitting in spatial prediction tasks, as including spatially close and—assuming spatial dependencies—hence similar data to the test set during training can lead to overconfident predictions. For a further elaboration on this scheme, see the Appendix. Note that for the realworld dataset, we refer to California Housing 15 as a nearest neighbour implementation of the spatial crossvalidation, and California Housing 50 as a nearest neighbour implementation. For both toy datasets, we use simple queen neighbourhood (see Figure 1). For a description of the specific neural network architectures for SpaceGAN used in the different experiments, see Appendix for details.
3.1 Experiment 1: Reproducing spatial correlation patterns
Dataset  GP  SpaceGAN 

Toy 1  1.9495 (0.1750)  0.3173 (0.1791) 
Toy 2  0.2195 (0.0175)  0.2141 (0.0157) 
California Housing 15  1.9932 (0.0826)  1.1468 (0.0416) 
California Housing 50  3.8183 (0.2072)  0.9333 (0.0288) 
 output and prediction were normalized before calculation.
(and its standard error) between real and augmented data for
SpaceGAN and GP implementationsOur first experiment aims to investigate SpaceGANs ability to not only generate data, but also its capability of reproducing observed spatial patterns. We train SpaceGAN on the three experimental datasets and at each spatial location return samples from the generator, as shown in Figure 3. Note that these results show outofsample extrapolations. For the dataset Toy 1, SpaceGAN is able to capture both the target vector and its spatial autocorrelation almost perfectly. In Toy 2, which represents a substantially more complicated pattern, we capture parts of the observed pattern seamlessly, however the spatial areas characterized by more subtle patterns are not captured fully. Nevertheless, this result shows that SpaceGAN also works when the spatial correlation structure is homogeneous. Lastly, we assess the realworld dataset California Housing. Again, SpaceGAN is able to capture both the target and the spatial dependencies in the data. In the realworld setting we also compare SpaceGAN to a Gaussian Process (GP) smooth for data augmentation (implemented as VanillaGP with RBF kernel in sklearn [Pedregosa et al., 2012]). We can see that the GP struggles with capturing both, the target vector and its local spatial autocorrelation. Table 1 provides the metric for SpaceGAN and a GP smooth, showing that SpaceGAN is best capable of capturing the spatial interdependencies in the input. Higher resolution figures and GP comparisons for Toy 1 and Toy 2 can be found in the appendix.
3.2 Experiment 2: Data augmentation for predictive modeling
Our second experiment focuses on predictive modelling in a spatial setting. As outlined in section 2.4, we seek to use SpaceGANgenerated samples in an ensemble learning setting—so called "Ganning". More specifically, we test two SpaceGAN configurations: First, a SpaceGAN using as convergance criterion, second, a SpaceGAN using for convergence. These are compared to two comparable ensemble baselines: First, a GPBagging approach, where we draw samples from a fully trained Gaussian Process posterior and use these to train base models for ensembling (GP). Second, a traditional Bagging approach using spatial bootstrapping (Spatial Boot). Table 2 provides the outofsample prediction values for the four approaches. Figure 4 highlights the average
(with confidence intervals bars) with
. We can observe that SpaceGAN (with convergence) outperforms the competitors by a substantial margin on all three datasets.Model (B = 100)  
Dataset  SpaceGANMIE  SpaceGANRMSE  GP  Spatial Boot 
Toy 1  0.9921 (0.0995)  1.1993 (0.1494)  1.2388 (0.1490)  1.2013 (0.1366) 
Toy 2  1.0097 (0.1092)  1.2065 (0.1496)  1.3135 (0.1443)  1.2962 (0.1413) 
California Housing 15  139534 (12026)  143983 (10341)  159340 (8550)  148830 (8660) 
California Housing 50  128756 (7463)  145612 (7152)  156814 (8718)  148546 (8611) 
4 Related work
We now want to contextualize our findings in relation to existing work in the field. As the academic field of machine learning advances, more and more sophisticated techniques are being developed with the aim to capture the complexity of the real world they are trying to model. This is particularly true for spatial methods, where assumptions like distributive independence or Euclidean distances restrict the performance of the most common algorithms. The motivation for this study originates from recent approaches of a more explicit modeling of spatial context within machine learning techniques. Among these are the emergence of vector embeddings for spatially distributed image data [Jean et al., 2019], the opportunities to model nonEuclidean spatial graphs using graph convolutional networks (GCNs) Defferrard et al. [2016] and the modelling of spatial point processes using matrix factorization [Miller et al., 2014]. We see SpaceGAN as an addition to the family of spatially explicit machine learning methods.
GAN models already have been applied to data autocorrelated in one dimensional space, e.g time series [Xiao et al., 2017, Koshiyama et al., 2019], two dimensional space, e.g. remote sensing imagery [Lin et al., 2017, Zhu et al., 2019] and even three dimensional space, e.g. point clouds [Li et al., 2018, Fan et al., 2017]. However, none of this previous work used measures of local autocorrelation to improve the representation of spatial patterns. In the context of data augmentation, GANs have become a popular tool for inflating training data and increasing model robustness [Xiao et al., 2017, Taylor and Nitschke, 2019, FridAdar et al., 2018, Bowles et al., 2018]. However, such a method does not exist yet for multivariate point data, where techniques such as the spatial bootstrap [Brenning, 2012] or synthetic point generators [Li et al., 2016, Quick et al., 2015] are most commonly used. Spatial image data and point clouds on the other hand are often augmented using random perturbations, rotations or cropping [Gerke et al., 2016, Zhou and Tuzel, 2018]. Lastly, ensemble learning is increasingly popular for spatial modeling [Davies and Van Der Laan, 2016], with applications ranging from forest fire susceptibility prediction [Tehrany et al., 2018]
to class ambiguity correction in spatial classifiers
[Jiang et al., 2017]. Nevertheless, to our knowledge, no research has yet been conducted combining GAN augmentation and ensemble learning within a spatial data environment, highlighting the novelty of this study.5 Conclusion
In this paper we introduce SpaceGAN, a novel data augmentation method for spatial data, reproducing both the data structure and its spatial dependencies through two key innovations: First, we provide a novel approach to spatially condition the GAN. Instead of conditioning on raw spatial features like coordinates, we use the feature vectors of spatially near data points for conditioning. Second, we introduce a novel convergence criterion for GAN training, the . This metric measures how well the generator is able to imitate the observed spatial patterns. We show that this architecture succeeds at generating faithful samples in experiments using synthetic and realworld data. Turning towards predictive modeling, we propose an ensemble learning approach for spatial prediction tasks utilizing augmented SpaceGAN samples as training data for an ensemble of base models. We show that this approach outperforms existing methods in ensemble learning and spatial data augmentation.
In developing SpaceGAN, we seek to further the agenda of spatial representations in deep learning. As many realworld applications of deep learning algorithms deal with geospatial data, tools tailored to these tasks are required [Reichstein et al., 2019]. Nevertheless, the potential applications of SpaceGAN go beyond the geospatial and other dimensional data domains. While the use of neighbourhood structures as well as the Moran’s I metric allow for the handling of data distributed in
dimensional space, we seek to confirm the applicability in future studies. Further potentially fruitful research directions include experiments with different GAN architectures, e.g. Wasserstein loss functions, and application studies with sensitive spatial data, which could be obfuscated using
SpaceGAN without loosing desirable statistical properties.Acknowledgments
The authors gratefully acknowledge funding from the UK Engineering and Physical Sciences Research Council, the EPSRC Centre for Doctoral Training in Urban Science (EPSRC grant no. EP/L016400/1); The Alan Turing Institute (EPSRC grant no. EP/N510129/1).
References
 Anselin [1995] L. Anselin. Local Indicators of Spatial Association—LISA. Geographical Analysis, 27(2):93–115, sep 1995. ISSN 15384632. doi: 10.1111/j.15384632.1995.tb00338.x. URL http://doi.wiley.com/10.1111/j.15384632.1995.tb00338.x.
 Anselin [2019] L. Anselin. A Local Indicator of Multivariate Spatial Association: Extending Geary’s c. Geographical Analysis, 51(2):133–150, apr 2019. ISSN 15384632. doi: 10.1111/gean.12164. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/gean.12164.
 Arjovsky et al. [2017] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017.
 AzimiZonooz et al. [1989] A. AzimiZonooz, W. F. Krajewski, D. S. Bowles, and D. J. Seo. Spatial rainfall estimation by linear and nonlinear cokriging of radarrainfall and raingage data. Stochastic Hydrology and Hydraulics, 3(1):51–67, mar 1989. ISSN 09311955. doi: 10.1007/BF01543427. URL http://link.springer.com/10.1007/BF01543427.
 Basu and Thibodeau [1998] S. Basu and T. G. Thibodeau. Analysis of Spatial Autocorrelation in House Prices. Journal of Real Estate Finance and Economics, 17(1):61–85, 1998. ISSN 08955638. doi: 10.1023/A:1007703229507. URL http://link.springer.com/10.1023/A:1007703229507.
 Bowles et al. [2018] C. Bowles, L. Chen, R. Guerrero, P. Bentley, R. Gunn, A. Hammers, D. A. Dickie, M. V. Hernández, J. Wardlaw, and D. Rueckert. GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks. arXiv Preprint, 2018. URL https://arxiv.org/abs/1810.10863http://arxiv.org/abs/1810.10863.
 Brenning [2012] A. Brenning. Spatial crossvalidation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest. In International Geoscience and Remote Sensing Symposium (IGARSS), pages 5372–5375. IEEE, jul 2012. ISBN 9781467311595. doi: 10.1109/IGARSS.2012.6352393. URL http://ieeexplore.ieee.org/document/6352393/.
 Datta et al. [2016] A. Datta, S. Banerjee, A. O. Finley, and A. E. Gelfand. Hierarchical NearestNeighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514):800–812, apr 2016. ISSN 1537274X. doi: 10.1080/01621459.2015.1044091. URL https://www.tandfonline.com/doi/full/10.1080/01621459.2015.1044091.
 Davies and Van Der Laan [2016] M. M. Davies and M. J. Van Der Laan. Optimal Spatial Prediction Using Ensemble Machine Learning. International Journal of Biostatistics, 12(1):179–201, may 2016. ISSN 15574679. doi: 10.1515/ijb20140060. URL http://www.degruyter.com/view/j/ijb.2016.12.issue1/ijb20140060/ijb20140060.xml.
 Defferrard et al. [2016] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Advances in Neural Information Processing Systems (NeurIPS), 2016. URL http://papers.nips.cc/paper/6081convolutionalneuralnetworksongraphswithfastlocalizedspectralfilteringhttp://arxiv.org/abs/1606.09375.
 Efron and Hastie [2016] B. Efron and T. Hastie. Computer age statistical inference, volume 5. Cambridge University Press, 2016.

Fan et al. [2017]
H. Fan, H. Su, and L. J. Guibas.
A Point Set Generation Network for 3D Object Reconstruction From a
Single Image.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 605–613, 2017. URL http://openaccess.thecvf.com/content{_}cvpr{_}2017/html/Fan{_}A{_}Point{_}Set{_}CVPR{_}2017{_}paper.html.  FridAdar et al. [2018] M. FridAdar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings  International Symposium on Biomedical Imaging, volume 2018April, pages 289–293. IEEE, apr 2018. ISBN 9781538636367. doi: 10.1109/ISBI.2018.8363576. URL https://ieeexplore.ieee.org/document/8363576/.
 Friedman et al. [2001] J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics New York, NY, USA:, 2001.
 Gardner et al. [2018] J. R. Gardner, G. Pleiss, D. Bindel, K. Q. Weinberger, and A. G. Wilson. GPyTorch: Blackbox MatrixMatrix Gaussian Process Inference with GPU Acceleration. In Advances in Neural Information Processing Systems (NeurIPS), 2018. URL http://papers.nips.cc/paper/7985gpytorchblackboxmatrixmatrixgaussianprocessinferencewithgpuaccelerationhttp://arxiv.org/abs/1809.11165.
 Gerke et al. [2016] S. Gerke, K. Müller, and R. Schäfer. Soccer Jersey Number Recognition Using Convolutional Neural Networks. In IEEE International Conference on Computer Vision (ICCV), volume 2016Febru, pages 734–741. IEEE, dec 2016. ISBN 9781467383905. doi: 10.1109/ICCVW.2015.100. URL http://ieeexplore.ieee.org/document/7406449/.
 Goodfellow et al. [2014] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NeurIPS), pages 2672–2680, 2014. URL http://papers.nips.cc/paper/5423generativeadversarialnets.
 Gulrajani et al. [2017] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
 Henaff et al. [2015] M. Henaff, J. Bruna, and Y. LeCun. Deep Convolutional Networks on GraphStructured Data. In Advances in Neural Information Processing Systems (NeurIPS), jun 2015. URL http://arxiv.org/abs/1506.05163.

Hu et al. [2017]
Y. Hu, E. Gibson, L. L. Lee, W. Xie, D. C. Barratt, T. Vercauteren, and J. A.
Noble.
Freehand ultrasound image simulation with spatiallyconditioned
generative adversarial networks.
In
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, volume 10555 LNCS, pages 105–115. Springer, Cham, 2017. ISBN 9783319675633. doi: 10.1007/9783319675640_11. URL http://link.springer.com/10.1007/9783319675640{_}11.  Huang et al. [2017] G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger. Snapshot Ensembles: Train 1, get M for free. In International Conference on Learning Representations (ICLR), mar 2017. URL http://arxiv.org/abs/1704.00109.
 Huang et al. [2013] K. Huang, N. Kupp, J. M. Carulli, and Y. Makris. Handling Discontinuous Effects in Modeling Spatial Correlation of Waferlevel Analog/RF Tests. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pages 553–558, New Jersey, 2013. IEEE Conference Publications. ISBN 9781467350716. doi: 10.7873/date.2013.123. URL http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6513569.
 Jean et al. [2019] N. Jean, S. Wang, A. Samar, G. Azzari, D. Lobell, and S. Ermon. Tile2Vec: Unsupervised representation learning for spatially distributed data. In AAAI Conference on Artificial Intelligence, may 2019. URL http://arxiv.org/abs/1805.02855.
 Jiang et al. [2017] Z. Jiang, Y. Li, S. Shekhar, L. Rampi, and J. Knight. Spatial Ensemble Learning for Heterogeneous Geographic Data with Class Ambiguity. In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 1–10, New York, New York, USA, 2017. ACM Press. ISBN 9781450354905. doi: 10.1145/3139958.3140044. URL http://dl.acm.org/citation.cfm?doid=3139958.3140044.

Kelley Pace and Barry [2003]
R. Kelley Pace and R. Barry.
Sparse spatial autoregressions.
Statistics & Probability Letters
, 33(3):291–297, may 2003. ISSN 01677152. doi: 10.1016/s01677152(96)00140x. URL https://www.sciencedirect.com/science/article/pii/S016771529600140X.  Koshiyama et al. [2019] A. Koshiyama, N. Firoozye, and P. Treleaven. Generative Adversarial Networks for Financial Trading Strategies FineTuning and Combination. arXiv Preprint, jan 2019. URL http://arxiv.org/abs/1901.01751.
 Li et al. [2018] C.L. Li, M. Zaheer, Y. Zhang, B. Poczos, and R. Salakhutdinov. Point Cloud GAN. arXiv Preprint, oct 2018. URL http://arxiv.org/abs/1810.05795.

Li et al. [2016]
Y. Li, D. Min, M. N. Do, and J. Lu.
Fast guided global interpolation for depth and motion.
In European Conference on Computer Vision (ECCV), volume 9907 LNCS, pages 717–733. Springer, Cham, 2016. ISBN 9783319464862. doi: 10.1007/9783319464879_44. URL http://link.springer.com/10.1007/9783319464879{_}44.  Lin et al. [2019] C. H. Lin, C.C. Chang, Y.S. Chen, D.C. Juan, W. Wei, and H.T. Chen. COCOGAN: Generation by Parts via Conditional Coordinating. mar 2019. URL http://arxiv.org/abs/1904.00284.
 Lin et al. [2017] D. Lin, K. Fu, Y. Wang, G. Xu, and X. Sun. MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification. IEEE Geoscience and Remote Sensing Letters, 14(11):2092–2096, nov 2017. ISSN 1545598X. doi: 10.1109/LGRS.2017.2752750. URL http://ieeexplore.ieee.org/document/8059820/.
 Linsley et al. [2018] D. Linsley, J. Kim, V. Veerabadran, and T. Serre. Learning longrange spatial dependencies with horizontal gatedrecurrent units. In Advances in Neural Information Processing Systems (NeurIPS). Curran Associates Inc., 2018. URL https://dl.acm.org/citation.cfm?id=3326958http://arxiv.org/abs/1805.08315.
 Mao et al. [2017] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. P. Smolley. Least squares generative adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, pages 2813–2821. IEEE, 2017.
 Miller et al. [2014] A. Miller, L. Bornn, R. Adams, and K. Goldsberry. Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball. In International Conference on Machine Learning (ICML), 2014. URL http://www.jmlr.org/proceedings/papers/v32/miller14.pdfhttp://arxiv.org/abs/1401.0942.
 Mirza and Osindero [2014] M. Mirza and S. Osindero. Conditional generative adversarial networks. Manuscript: https://arxiv. org/abs/1709.02023, 2014.
 Moran [1950] P. A. Moran. Notes on continuous stochastic phenomena. Biometrika, 37(12):17–23, jun 1950. ISSN 00063444. doi: 10.1093/biomet/37.12.17. URL https://www.jstor.org/stable/2332142?origin=crossref.
 Pedregosa et al. [2012] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, A. Müller, J. Nothman, G. Louppe, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay. Scikitlearn: Machine Learning in Python. Journal of Machine Learning Research, 12(Oct):2825–2830, 2012. ISSN ISSN 15337928. URL http://www.jmlr.org/papers/v12/pedregosa11a.htmlhttp://arxiv.org/abs/1201.0490.
 Pohjankukka et al. [2017] J. Pohjankukka, T. Pahikkala, P. Nevalainen, and J. Heikkonen. Estimating the prediction performance of spatial models via spatial kfold cross validation. International Journal of Geographical Information Science, 31(10):2001–2019, oct 2017. ISSN 13623087. doi: 10.1080/13658816.2017.1346255. URL https://www.tandfonline.com/doi/full/10.1080/13658816.2017.1346255.
 Quick et al. [2015] H. Quick, S. H. Holan, C. K. Wikle, and J. P. Reiter. Bayesian marked point process modeling for generating fully synthetic public use data with pointreferenced geography. Spatial Statistics, 14:439–451, nov 2015. ISSN 22116753. doi: 10.1016/j.spasta.2015.07.008. URL https://www.sciencedirect.com/science/article/pii/S2211675315000718.
 Reichstein et al. [2019] M. Reichstein, G. CampsValls, B. Stevens, M. Jung, J. Denzler, N. Carvalhais, and Prabhat. Deep learning and process understanding for datadriven Earth system science. Nature, 566(7743):195–204, feb 2019. ISSN 14764687. doi: 10.1038/s4158601909121. URL http://www.nature.com/articles/s4158601909121.
 Salimans et al. [2016] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
 Shi et al. [2015] X. Shi, Z. Chen, H. Wang, D.Y. Yeung, W.k. Wong, and W.c. Woo. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Advances in Neural Information Processing Systems (NeurIPS), 2015. URL http://papers.nips.cc/paper/5955convolutionallstmnetworkamachinelearningapproachforprecipitationnowcastinghttp://arxiv.org/abs/1506.04214.
 Taylor and Nitschke [2019] L. Taylor and G. Nitschke. Improving Deep Learning with Generic Data Augmentation. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018, pages 1542–1547. IEEE, nov 2019. ISBN 9781538692769. doi: 10.1109/SSCI.2018.8628742. URL https://ieeexplore.ieee.org/document/8628742/.
 Tehrany et al. [2018] M. S. Tehrany, S. Jones, F. Shabani, F. MartínezÁlvarez, and D. Tien Bui. A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using LogitBoost machine learning classifier and multisource geospatial data. Theoretical and Applied Climatology, pages 1–17, sep 2018. ISSN 14344483. doi: 10.1007/s0070401826289. URL http://link.springer.com/10.1007/s0070401826289.
 Tobler [1970] W. R. Tobler. A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46:234, jun 1970. ISSN 00130095. doi: 10.2307/143141. URL https://www.jstor.org/stable/143141?origin=crossref.
 Wang et al. [2018] C. Wang, C. Xu, X. Yao, and D. Tao. Evolutionary generative adversarial networks. arXiv preprint arXiv:1803.00657, 2018.
 Xiao et al. [2017] S. Xiao, M. Farajtabar, X. Ye, J. Yan, L. Song, and H. Zha. Wasserstein Learning of Deep Generative Point Process Models. In Advances in Neural Information Processing Systems (NeurIPS), pages 3247–3257, 2017. URL http://papers.nips.cc/paper/6917wassersteinlearningofdeepgenerativepointprocessmodels.
 Zhou and Tuzel [2018] Y. Zhou and O. Tuzel. VoxelNet: EndtoEnd Learning for Point Cloud Based 3D Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4490–4499. IEEE, jun 2018. ISBN 9781538664209. doi: 10.1109/CVPR.2018.00472. URL https://ieeexplore.ieee.org/document/8578570/.
 Zhu et al. [2019] D. Zhu, X. Cheng, F. Zhang, X. Yao, Y. Gao, and Y. Liu. Spatial interpolation using conditional generative adversarial neural networks. International Journal of Geographical Information Science, pages 1–24, apr 2019. ISSN 13658816. doi: 10.1080/1365881YYxxxxxxxx. URL https://www.tandfonline.com/doi/full/10.1080/13658816.2019.1599122.
Appendix
A. Experimental Data
Here we provide a more elaborate description of the datasets used for evaluating Experiment 1 and Experiment 2.
Toy 1: We create a synthetic dataset of observations. Following the notation , we first set the spatial resolution, i.e. the spatial coordinates :
(5) 
We then add an independent feature
as a random draw from a Gaussian distribution with mean
:(6) 
Now, we create the target variable as a function of spatial coordinates and the random noise as follows:
(7) 
The table below provides the summary statistics of the such constructed synthetic dataset.
Statistic  N  Mean  St. Dev.  Min  Pctl(25)  Pctl(75)  Max 

400  50.000  28.868  2.500  26.250  73.750  97.500  
400  50.000  28.868  2.500  26.250  73.750  97.500  
400  0.000  1.000  0.846  0.731  0.426  3.743  
400  0.008  0.960  2.993  0.641  0.638  2.702 
Toy 2: We create a synthetic dataset of observations. We again start by setting the spatial resolution, i.e. the spatial coordinates :
(8) 
We again add an independent variable as a random draw from a Gaussian distribution with mean and standard deviation :
(9) 
Lastly, we create the target variable as a more complex function of spatial coordinates and the random noise as follows:
(10) 
where . The table below provides again provides the summary statistics.
Statistic  N  Mean  St. Dev.  Min  Pctl(25)  Pctl(75)  Max 

841  50.750  29.301  1.750  26.250  75.250  99.750  
841  50.750  29.301  1.750  26.250  75.250  99.750  
841  0.000  1.000  2.294  0.606  0.631  2.488  
841  0.032  1.021  3.372  0.700  0.660  3.495 
California Housing: This real world dataset, introduced by [Kelley Pace and Barry, 2003], is widely popular for analyzing spatial patterns and accessible via Kaggle^{1}^{1}1See:https://www.kaggle.com/camnugent/californiahousingprices (it is also integrated into sklearn^{2}^{2}2See:https://scikitlearn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html). The table below provides an overview of the features and their statistical properties:
Statistic  N  Mean  St. Dev.  Min  Pctl(25)  Pctl(75)  Max 
longitude  20,640  119.570  2.004  124.350  121.800  118.010  114.310 
latitude  20,640  35.632  2.136  32.540  33.930  37.710  41.950 
housing_median_age  20,640  28.639  12.586  1  18  37  52 
total_rooms  20,640  2,635.763  2,181.615  2  1,447.8  3,148  39,320 
total_bedrooms  20,433  537.871  421.385  1.000  296.000  647.000  6,445.000 
population  20,640  1,425.477  1,132.462  3  787  1,725  35,682 
households  20,640  499.540  382.330  1  280  605  6,082 
median_income  20,640  3.871  1.900  0.500  2.563  4.743  15.000 
median_house_value  20,640  206,855.800  115,395.600  14,999  119,600  264,725  500,001 
We can break the dataset down into the familiar notation as follows:
(11) 
(12) 
(13) 
B. Experimental Setting
The tables below provide details on architecture and configuration of the neural networks used in SpaceGAN during our experiments. Note that the kernel size parameter for Toy 1 and Toy 2 corresponds to the queen neighbourhood (for discrete spatial data) outlined in 1 and is the same neighbourhood that is used for spatial conditioning and spatial cross validation (see Appendix E). The kernel size for California Housing 15 and California Housing 50
corresponds to same kNNneighbourhood (with
) that is used for spatial conditioning and spatial crossvalidation.Parameter  Values 

Architecture  1DCNN 
Number of hidden layers  1 
Training steps  20000 
Batch Size  100 
Optimizer  Stochastic Gradient Descent 
Optimizer Parameters  learning rate = 0.01 
Noise prior  
Snapshot frequency ()  500 
Number of samples for evaluation  500 
Input features scaling function  score (standardization) 
Target scaling function  score (standardization) 
Parameter  Toy 1  Toy 2  California 15  California 50 

(, ) filters ()  (50, 50)  (100, 100)  (100, 100)  (200, 200) 
(, ) kernel size  (8, 8)  (8, 8)  (15, 15)  (50, 50) 
(, ) hidden layer function  (relu, tanh) 
(relu, tanh)  (relu, tanh)  (relu, tanh) 
(, ) output layer function  (linear, sigmoid)  (linear, sigmoid)  (linear, sigmoid)  (linear, sigmoid) 
Noise dimension  8  8  15  15 
D. Training convergence: vs.


During Experiment 2, we compare the convergence (and performance) of two SpaceGAN implementations, one using and one using as convergence criterion. For completeness, we define the (root mean squared error) as follows:
(14) 
The figure below shows SpaceGAN training using the different convergence criteria for Toy 1 and California Housing 50 over training steps , during a typical training cycle. Interestingly, for Toy 1, both criteria are almost antithetic, that is a local minimum for convergence approximately relates to a local maximum for convergence in the same training step. Moreover, struggles to provide assistance for when a convergence point is reached, as it shows several local minima of approximately similar value. This point is also true for the California Housing 50 dataset. The criterion however appears to have a relatively stable minimum at the first local minimum point.
E. Spatial CrossValidation
We use a variation of fold spatial crossvalidation [Pohjankukka et al., 2017] to evaluate all our experiments. The goal of spatial crossvalidation is to check for generalizability of spatial models and to avoid overfitting. In a naive crossvalidation setting with spatial data, this can occur when training and test points are spatially to close. Assuming some spatial dependency between nearby points, this would roughly relate to training on the test set. Hence, we need to create a socalled buffer area around the test set within which we remove all data points from the training set. Assuming a set of data points , we first create spatially coherent test sets. In our case, we do this by slicing through each of the two dimensions of the coordinate space five times with equal binning, thus creating folds of the same width. This leaves us with a set of test sets . We now define the training set as all points in set which are not part of the test set and which are not neighbouring points of the test set points, thus creating a buffer area: . As a quick example, for the California Housing 50 dataset, we would define the test set, then exclude all point which are not part of the test set, but are one of the nearestneighbours of one of the test set points. The remaining, not excluded points provide the training set. While we chose to define the buffer zone according to the neighbourhood based spatial weights matrix , other methods such as defining a deadzone area using a radius around the test set are also applicable. The spatial folds cross validation process is outlined in Figure 6 to the right.
F. Experimental Results
Here, we want to provide some higherresolution images of the SpaceGANaugmented data across the three example datasets. Please note again that all synthetic samples are based on outofsample extrapolations from the respective generator (SpaceGAN or GP).
Comments
There are no comments yet.