Modelling urban networks using Variational Autoencoders

by   Kira Kempinska, et al.

A long-standing question for urban and regional planners pertains to the ability to describe urban patterns quantitatively. Cities' transport infrastructure, particularly street networks, provides an invaluable source of information about the urban patterns generated by peoples' movements and their interactions. With the increasing availability of street network datasets and the advancements in deep learning methods, we are presented with an unprecedented opportunity to push the frontiers of urban modelling towards more data-driven and accurate models of urban forms. In this study, we present our initial work on applying deep generative models to urban street network data to create spatially explicit urban models. We based our work on Variational Autoencoders (VAEs) which are deep generative models that have recently gained their popularity due to the ability to generate realistic images. Initial results show that VAEs are capable of capturing key high-level urban network metrics using low-dimensional vectors and generating new urban forms of complexity matching the cities captured in the street network data.



There are no comments yet.


page 5

page 6

page 7

page 8

page 9


Urban morphology meets deep learning: Exploring urban forms in one million cities, town and villages across the planet

Study of urban form is an important area of research in urban planning/d...

Learning from Discovering: An unsupervised approach to Geographical Knowledge Discovery using street level and street network images

Recent researches have shown the increasing use of machine learning meth...

Re-designing cities with conditional adversarial networks

This paper introduces a conditional generative adversarial network to re...

Defining and Generating Axial Lines from Street Center Lines for better Understanding of Urban Morphologies

Axial lines are defined as the longest visibility lines for representing...

Simplifying Urban Data Fusion with BigSUR

Our ability to understand data has always lagged behind our ability to c...

Methodological Foundation of a Numerical Taxonomy of Urban Form

Cities are complex products of human culture, characterised by a startli...

Urban Street Network Analysis in a Computational Notebook

Computational notebooks offer researchers, practitioners, students, and ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Temporal and spatial patterns of human interactions shape our cities making them unique, but, at the same time, create universal processes that make urban structures comparable to each other. A long-standing effort of urban studies focuses on the creation of quantitative models of the spatial forms of cities that would capture their essential characteristics and enable data-driven comparisons. There have been several attempts at studying urban forms using quantitative methods, typically based on complexity theory or network science arcaute2016cities ; barthelemy2008modeling ; murcio2015multifrac ; buhl2006topological ; cardillo2006structural ; masucci2009random ; strano2013urban . The approaches create an abstract representation of an urban form to derive its key quantitative characteristics. Although theoretically robust, the abstractions might often be too simplistic to capture the full breadth and complexity of existing urban structures.

With the increasing availability of urban street network data and the advancements in deep learning methods, we are presented with an unprecedented opportunity to push the frontiers of urban modelling towards more data-driven and accurate urban models. In this study, we present our initial work on applying deep generative models to urban street network data to create spatially explicit models of urban networks. We based our work on Variational Autoencoders (VAEs) trained on images of street networks. VAEs are deep generative models that have recently gained their popularity due to the ability to generate realistic images. VAEs have two fundamental qualities that make them particularly suitable for urban modelling. Firstly, they can condense high dimensional images of urban street networks to a low-dimensional representation which enables quantitative comparisons between urban forms without any prior assumptions. Secondly, VAEs can generate new realistic urban forms that capture the diversity of existing cities.

In the following sections, we show our experiments based on urban street networks from Open Street Map (OSM). The results indicate that VAE trained on the OSM data is capable of capturing critical high-level urban metrics using low-dimensional vectors. The model can also generate new urban forms of structure matching the cities captured in the OSM dataset. All code and experiments for this study are available at

2 Methodology and dataset

2.1 Variational Autoencoder

Variational Autoencoders (VAEs) have emerged as one of the most popular deep learning techniques for unsupervised learning of complicated data distributions. VAEs are particularly appealing because they compress data into a lower-dimensional representation which can be used for quantitative comparisons and new data generation. VAEs are built on top of standard function approximators (neural networks) efficiently trained with stochastic gradient descent

kingma2013auto . VAEs have already been used to generate many kinds of complex data, including handwritten digits, faces, house numbers, and predicting the future from static images. In this work, we apply VAEs to street network images to learn low-dimensional representations of street networks. We use the representations to make quantitative comparisons between urban forms without making any prior assumptions and to generate new realistic urban forms.

Figure 1: Variational Autoencoder takes as input an image of the street network (left), condenses the image to a lower-dimensional encoding (middle) and finally reconstructs the image given the encoding (right).

A variational autoencoder consists of an encoder, a decoder, and a loss function. The

encoder is a neural network. Its input is a datapoint x

, its output is a hidden representation

z, and it has weights and biases . The goal of the encoder is to ’encode’ the data into a latent (hidden) representation space , which has much fewer dimensions that the data. This is typically referred to as a ’bottleneck’ because the encoder must learn an efficient compression of the data into this lower-dimensional space. The encoder is denoted by .

The decoder is another neural network. Its input is the representation , it outputs a data point , and has weights and biases . The decoder is denoted by . The decoder ’decodes’ the low-dimensional latent representation into the datapoint . Information is lost in the process because the decoder translates from a smaller to a larger dimensionality. How much information is lost? The information loss is measured using the reconstruction log-likelihood . The measure indicates how effectively the decoder has learned to reconstruct an input image given its latent representation .

The loss function of the variational autoencoder is the sum of the reconstruction loss, given by the negative log-likelihood, and a regularizer. The total loss is the sum of losses for datapoints, where the loss function for datapoint is:


The first term is the reconstruction loss or expected negative log-likelihood of the i-th data point. This term encourages the decoder to learn to reconstruct the data. Poor reconstruction of the data from its latent representation will incur a large cost in this loss term. The second term is a regularizer that we introduce to ensure that the distribution of the latent values approaches the prior distribution

specified as a Normal distribution with mean zero and variance one. The regularizer is the Kullback-Leibler divergence between the encoder’s distribution

and . It measures how close is to . The regularizer ensures that the representations of each data point are sufficiently diverse and distributed approximately according to a normal distribution, from which we can easily sample.

The variational autoencoder is trained using gradient descent to optimize the loss with respect to the parameters of the encoder and decoder and .

In our work, we selected Convolutional Neural Networks (CNNs)

fukushima1980neocognitron ; lecun1990handwritten as the encoder and decoder architectures. CNNs are deep learning architectures that are particularly well-suited to image data lecun1995convolutional ; krizhevsky2012imagenet as they consider the two-dimensional structure of images and scale well to high-dimensional images. We tested several CNN architectures and finally chose a network architecture in Figure 2

with the encoder and the decoder architectures consisting of four convolutional blocks, each with a convolutional and a rectified linear unit (ReLU) layer (which introduces non-linearity to the network). The architecture takes as input an image of size 64 x 64 pixels, convolves the image through the encoder network and then condenses it to a 32-dimensional latent representation. The decoder then reconstructs the original image from the condensed latent representation. We implemented the variational autoencoder using PyTorch library for Python.

Figure 2: Variational autoencoder architecture. Yellow blocks represent convolutional blocks (convolutional layer followed by ReLU layer) with dimensions corresponding to their output dimensions. The purple block is the learnt embedding .

2.2 Street Network Data

The street networks used for model training and testing were obtained from OpenStreetMap haklay2008openstreetmap by ranking world cities by 2017 population and then selecting the ones with more than 500,000 inhabitants, for a total of 1059 cities111We compiled the list of cities from the UN data website (accessed December 2018) . We saved the street networks as images and, as the Variational autoencoders required images to have a fixed spatial scale, we extracted a 3 x 3km sample from the centre of each city image and resized it to a 64 x 64 pixels binary image. The final dataset contained 1,059 binary images of 64 x 64 pixels, which we split into 80% training and 20% testing datasets. During model training, we augmented the training dataset by randomly cropping and flipping the images horizontally. Figure 3 shows images for randomly selected cities.

Figure 3: Example images of the street network in randomly selected cities, shown as a square window of 3 x 3km centered on the city centre.

3 Results

3.1 Reconstruction quality

The variational autoencoder was trained to minimise the loss function defined in (1). The training is equivalent to minimising the image reconstruction loss, subject to a regularizer. We can inspect the training quality by visually comparing reconstructed images to their original counterparts. Figure 4 shows several examples of reconstructed images of urban street networks. As observed in the examples, the trained autoencoder performs well at reconstructing the overall shape of road networks and their main roads. The quality of the reconstruction drops for very dense road networks when only the overall network shape is captured by the autoencoder (see the leftmost image in Figure 4). The observation suggests that variational autoencoders are better suited for reconstructing images with wide patches of pixels with similar properties rather than narrow stretches such as roads.

Figure 4: Street network reconstructed (bottom) from the original images (top) using the trained autoencoder.

3.2 Urban networks comparison

The trained autoencoder learnt mapping from the space of street network images (64 x 64 or 4,096 dimensions) to a lower dimensional latent space (32 dimensions). The latent representation stores all the information required to reconstruct the original image of the street network, so it is effectively a condensed representation of the street network that preserves all its connectivity and spatial information. In the lack of well-defined similarity metrics of urban networks, this paper uses the condensed representations as vectors of street network features. Hereafter, we call the vectors urban network vectors. Urban network vectors can be used to measure the similarity between different street network forms and to perform further similarity analysis, such as clustering.

Similarity analysis

Firstly, we demonstrated the use of urban network vectors for measuring similarity between urban street forms. We measured the similarity between pairs of vectors as the Euclidean distance. Given two urban network vectors and , where is the size of the latent space , the Euclidean distance between and is defined as:


Figure 5 shows randomly chosen street networks (top row) and their most similar networks based on the Euclidean distance between their urban street networks. As shown in the figure, the proposed methodology enables finding street networks with matching properties, such as network density, spatial structure and orientation without explicitly including any of the properties in the similarity computation.

Figure 5: Street network images (top row) with most similar street networks (rows below) based on the Euclidean distance between their urban network vectors. The latent representations, obtained using the trained encoder, seem to capture well network properties such as density, orientation or road shape.

Secondly, we used the urban network vectors to detect clusters of similar urban street forms. We used the K-means clustering algorithm witten2016data . It is a popular clustering approach that assigns data points to clusters based on distances to cluster centroids. The algorithm requires specifying the number of clusters a priori. We identified as the optimal number of clusters for the street image data using the elbow method dangeti2017statistics . As shown in Figure (a)a, the obtained clusters seem to separate street networks based on their density only, failing to reflect more subtle network differences, such as road connectivity or road shapes. When we increased the number of clusters to in Figure (b)b, we could differentiate road networks based on more subtle network characteristics, such as disconnectedness of roads in the first cluster (top-left in Figure (b)b) or large gaps in road provision in the second cluster (top-centre in Figure (b)b). We visualised both cluster assignments in Figure 6 (right) by projecting the thirty-two-dimensional urban network vectors to a two-dimensional grid using T-SNE algorithm maaten2008visualizing for dimensionality reduction. The visualisations shows that street networks naturally cluster into three groups that were detected by the -means algorithm. The three clusters are further mapped in Figure 7 to investigate spatial patterns in urban form variation.

(a) three clusters
(b) six clusters
Figure 6: (a) Three or (b) six clusters of urban street forms obtained by applying K-means algorithm to the condensed urban network vectors. Subfigures show example street networks in each cluster (top left), street network density in each cluster (bottom left) approximated using pixel intensity of street images, and a two-dimensional visualisation of all urban vectors with colour-coded cluster membership.
Figure 7: Distribution of urban street forms across the globe. Each dot represents a city and is colour-coded according to cluster memberships in Figure (a)a. Despite limited data size, spatial trends start to emerge, such as the concentration of high-density urban networks in California, USA (red cluster) and low-density urban networks in south-eastern Asia (black cluster).

3.3 Urban networks generation

In Section 3.2, we used the autoencoder to compress real street images to low-dimensional vectors which we then used to make quantitative comparisons. This employed one strength of variational autoencoders: the ability to encode high-dimensional observations as meaningful low-dimensional representations. The second strength pertains to the ability to generate realistic urban street forms that match the complexity of urban forms across the globe. The ability could potentially advance the current state-of-the-art in simulations of urban forms and socio-economic processes taking place on urban networks.

To generate a synthetic urban network, we firstly sample an embedding value from the prior distribution specified as a standard Gaussian (see Section 2.1) and then pass the value through the decoder network to obtain a corresponding image. Images corresponding to several embedding samples are shown in Figure 8. As shown in the figure, the generated images lack the detail of real street images in Figure 3. Although the samples follow the general structure of road networks with major roads and areas of mixed-density minor roads, the decoder fails to reconstruct details of dense road segments and instead represents them blurred. The problem must be accredited to too few images used in the study. Although the proposed model is flexible enough to model urban street networks, which is confirmed by high-quality reconstructions of real images in Figure 4

, it does not see enough images to learn to interpolate between them to sample new forms of street networks to sufficient detail.

Figure 8: Examples of synthetic urban street forms generated by passing a randomly sampled latent code through the decoder network.

4 Discussion and conclusions

This study is an early exploration of how modern generative machine learning models such as variational autoencoders could augment our ability to model urban forms. With the ability to extract key urban features from high-dimensional urban imagery, variational autoencoders open new avenues to integrating high-dimensional data streams in urban modelling. The study considered images of street networks, but the proposed methodology could be equally applied to other image data, such as urban satellite imagery.

Variational autoencoders were selected among deep generative models moosavi2017urban ; albert2018modeling due to their two capabilities: firstly to condense images to low-dimensional representations, secondly to generate new previously unseen images that match the complexity of observed images. The first capability enabled us to extract key urban metrics from street network images, the second gave us the power to generate realistic images of previously unseen urban networks.

Our results, based on 1,059 city images across the globe, showed that VAEs successfully condensed urban images into low-dimensional urban network vectors. This enabled quantitative similarity analysis between urban forms, such as clustering. What is more, VAEs managed to generate new urban forms with complexity matching that of the observed data. Unfortunately, the resolution of the generated images was low which was accredited to the small size of the dataset. Future work will repeat model training on a much larger corpus of images to improve the generative quality.

Despite the promising results, the study opens essential questions for future work. The first question pertains to the black-box nature of deep learning models that lack comprehensive human interpretability. This limitation is already receiving much attention in the deep learning literature lime ; shrikumar2017learning ; lundberg2017unified . In this study, the limitation manifests itself in our lack of understanding of how latent space representations of urban networks relate to established network metrics newman2010networks . A related question refers to the ability to evaluate the quality of model outputs, i.e. latent representations and synthetic images. Again, quality assessment of deep generative models is a hot topic in the broader deep learning research community (see for example wu2016quantitative ). Future work could address the problem from the perspective of urban network science.

5 Declarations

Availability of data and materials

All data and program source code described in this article is available to any interested parties. The source code and experiments are available at GitHub at the following URL: The raw data and datasets generated during this study are available upon request.

Competing interests

The authors declare that they have no competing interests.


There is no specific funding received for the study.

Authors’ contributions

KK designed and implemented the methodology, executed the computer runs, and wrote the initial version of the article. RM prepared street network data and extensively revised the article. Both authors read and approved the final manuscript.


The authors would like to thank Szymon Zareba and Adam Gonczarek (Alphamoon Ltd) for advice on deep generative models during the course of the project.

Authors’ information

KK is a lecturer in geospatial machine learning at the Bartlett’s Centre for Advanced Spatial Analysis, University College London, UK and a machine learning researcher at Alphamoon, PL. She develops machine learning algorithms for urban modelling and sensor data mining. Her research interests include geospatial data mining, sensor data fusion and machine learning for sensor networks.

RM is a senior research fellow at the Bartlett’s Centre for Advanced Spatial Analysis, University College London, UK. His academic interests include urban complex networks, information transfer in social systems, spatial interaction models and pedestrian flows. One of his main research topics is the application of multifractal measures to different urban aspects, such as street networks and social inequality.


  • (1) Albert, A., Strano, E., Kaur, J., González, M.: Modeling urbanization patterns with generative adversarial networks. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 2095–2098. IEEE (2018)
  • (2) Arcaute, E., Molinero, C., Hatna, E., Murcio, R., Vargas-Ruiz, C., Masucci, A.P., Batty, M.: Cities and regions in britain through hierarchical percolation. Royal Society open science 3(4), 150691 (2016). DOI
  • (3) Barthélemy, M., Flammini, A.: Modeling urban street patterns. Physical review letters 100(13), 138702 (2008)
  • (4) Buhl, J., Gautrais, J., Reeves, N., Solé, R., Valverde, S., Kuntz, P., Theraulaz, G.: Topological patterns in street networks of self-organized urban settlements. The European Physical Journal B-Condensed Matter and Complex Systems 49(4), 513–522 (2006)
  • (5) Cardillo, A., Scellato, S., Latora, V., Porta, S.: Structural properties of planar graphs of urban street patterns. Physical Review E 73(6), 066107 (2006)
  • (6) Dangeti, P.: Statistics for machine learning. Packt Publishing Ltd (2017)
  • (7)

    Fukushima, K.: Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.

    Biological cybernetics 36(4), 193–202 (1980)
  • (8) Haklay, M., Weber, P.: Openstreetmap: User-generated street maps. IEEE Pervasive Computing 7(4), 12–18 (2008)
  • (9) Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  • (10)

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks.

    In: Advances in neural information processing systems, pp. 1097–1105 (2012)
  • (11) LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10), 1995 (1995)
  • (12) LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp. 396–404 (1990)
  • (13) Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)
  • (14) Maaten, L.v.d., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research 9(Nov), 2579–2605 (2008)
  • (15) Masucci, A.P., Smith, D., Crooks, A., Batty, M.: Random planar graphs and the london street network. The European Physical Journal B 71(2), 259–271 (2009)
  • (16) Moosavi, V.: Urban morphology meets deep learning: Exploring urban forms in one million cities, town and villages across the planet. arXiv preprint arXiv:1709.02939 (2017)
  • (17) Newman, M.: Networks: an introduction. Oxford university press (2010)
  • (18) R, M., P, M.A., E, A., Batty, M.: Multifractal to monofractal evolution of the london street network. Phys Rev E 92(6), 2130 (2015). DOI
  • (19)

    Ribeiro, M.T., Singh, S., Guestrin, C.: ”why should I trust you?”: Explaining the predictions of any classifier.

    In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pp. 1135–1144 (2016)
  • (20) Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 3145–3153. JMLR. org (2017)
  • (21) Strano, E., Viana, M., da Fontoura Costa, L., Cardillo, A., Porta, S., Latora, V.: Urban street networks, a comparative analysis of ten european cities. Environment and Planning B: Planning and Design 40(6), 1071–1086 (2013)
  • (22) Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2016)
  • (23) Wu, Y., Burda, Y., Salakhutdinov, R., Grosse, R.: On the quantitative analysis of decoder-based generative models. arXiv preprint arXiv:1611.04273 (2016)