We study the problem of machine learning for automated urban planning. Urban planning is an interdisciplinary and complex process that involves public policy, social science, engineering, architecture, landscape, and other related fields. In this paper, we refer urban planning to the efforts of designing land-use configurations of a target region, which is a reduced yet essential task of urban planning(Van Assche et al., 2013). Effective urban planning can help to mitigate the operational and social vulnerability of a urban system, such as high tax, crimes, traffic congestion and accidents, pollution, depression, and anxiety (Yiftachel, 1989).
This observation motivates us to rethink urban planning in the era of artificial intelligence: What roles do deep learning play in urban planning? Can machines develop and learn at a human capability to automatically and quickly calculate land-use configurations? In this way, machines can be planning assistants and urban planning professionals can finally adjust machine-generated plans for specific needs.
Due to the high complexity and specificity of urban systems, urban planners need to consider and balance different various planning requirements, such as proximity metrics (e.g., distances to important places), access indexes (e.g., accessibility to food, recreation, goods, services, entertainment, transit, municipal services, mobility indexes), mobility indices (e.g., sidewalks, bike lanes, speed limits, crash rates), emergency responses (e.g., hospitals, fire departments) and, thus, planing highly replies on empirical experience and domain knowledge (Maheshwari et al., 2016). As a result, it is highly appealing to pursuit a fast, automated, and machine-assisted planning strategy. The recent advance of deep learning, particularly deep adversarial and generative learning, provide a great potential to teach a machine at a human capability to design and generate city configurations (Zhang et al., 2020, 2019; Ding et al., 2018; Liu et al., 2020). This inspiration motivates us to rethink urban planning from the lens of deep learning: can AI automate the calculation of land-use configuration and the balancing of various planning factors, so professional planners can finally adjust machine-generated plans for specific needs?
All of the above evidences prompt us to develop a data-driven AI-enabled automated urban planner. However, three unique challenges arise to achieve the goal: (1) How can we quantify a land-use configuration plan? (2) How can we develop a deep adversarial generative learning framework to learn the good and the bad of existing urban communities as data-driven knowledge, and, moreover, generate quality urban configuration? (3) How can we evaluate the quality of generated land-use configurations? Next, we will introduce our research insights and solutions for the three challenges.
First, as the objective is to teach a machine to generate the land-use configuration of a target region, it is naturally critical to define a machine- perceivable structure for the land-use configuration. In practice, the land-use configuration plan of a region can be geographically defined by a set of Point of Interests (POIs) and their corresponding locations (e.g., latitudes and longitudes) and urban functionality categories (e.g., shopping, banks, education, entertainment, residential). A close look can reveal that a land-use configuration is a high-dimensional indicator that precisely illustrates what, where, and how many we should build in a target region. After exploring large-scale land-use data, we observe that there is not just location-location statistical autocorrelation but also location-functionality statistical autocorrelation in a land-use configuration. To preserve such statistical correlations, we propose to represent a land-use configuration as a latitude-longitude-channel tensor, where each channel is a specific category of POIs that are distributed across the target area, and the value of an entry in the tensor is the number of POIs. In this way, the tensor not just describes the location-location interaction, but also captures location-function interaction.
Second, after we quantitatively define the land-use configuration, the next question is that how to teach a machine to automatically generate a land-use configuration? We analyze large-scale urban residential community data, and find that: (1) an urban community can be viewed as an attributed node in a socioeconomic network (city as a community-community network), and this node proactively interacts with surrounding nodes (environments); (2) the coupling, interaction, and coordination of a community and surrounding environments significantly influence the livability, vibrancy, and quality of a community. Based on the above observations, we aim to develop a function that map the surrounding contexts to a well-planned configuration tensor. Recently, the development of deep generative and adversarial learning provides a great potential for solving this problem. We reformulate the task into an adversarial learning paradigm, in which: (1) A neural generator is analogized as a machine planner that generates a land-use configuration; (2) The generator generates a configuration in terms of the feature representation of surrounding spatial contexts; (3) The surrounding context feature representation is learned via self-supervised representation learning collectively from spatial graphs. (4) A neural discriminator is to classify whether the generated land-use configuration is well-planned (positive) or poorly-planned (negative). (5) A new mini-max loss function is constructed to guide the generator to learn the configuration patterns of well-planned areas, compared to poorly-planned areas.
Third, how can we evaluate the quality of a generated land-use configuration? This has been a long-standing challenging question. The most solid and sound validation is to collaborate with urban developers and city agencies to implement a machine-generated configuration into a target region to observe the development of the region in the following years. However, the validation method is not practical in reality. In this paper, we design and develop three strategies to assess the generated configurations: (1) We leverage different distance measurements to measure the similarity between generated configurations and well-planned configurations. If the distance is small, it indicates that our generated configurations preserve the overarching distribution characteristics of well-planned configurations. (2) We develop a scoring model to score the quality of the generated configurations. Specifically, since we have collected a set of existing land-use configurations and 0-1 labels (1: well-planned 0: poorly-planned) as training data, we train a regression model to predict the quality score ranging from 0 to 1. After that, given a machine-generated configuration as testing data, we use the regression model to predict its corresponding score. (3) We use a variety of visualization approaches to visualize the generated configurations, so domain experts can evaluate the generated quality and rationality.
Our preliminary work in (Wang et al., 2020b)
proposed a fundamental automated urban planning framework to automatically generate land-use configurations. The preliminary framework can be further improved to enhance its stability and efficiency from a computational perspective. For this purpose, in this journal version, we develop a new conditioning augmentation module adding to the preliminary framework to enhance its performance. Specifically, we first estimate the distribution of the embedding space of surrounding spatial contexts. Then, we sample embedding vectors from the estimated distribution of the embedding space to replace the embeddings of surrounding spatial contexts. The benefit is to use embedding space distribution estimation to augment data and overcome the sparsity of surrounding spatial context data. Later, we propose a new loss function that considers the embedding space regularization and standardization of surrounding spatial contexts. The new loss function can accelerate the convergence and improve the efficiency of learning. In addition, aside from prediction-based and visualization-based valiation approaches(Wang et al., 2020b), in this paper, we design a distance-based strategy to evaluate the quality of machine-generated configuration plans.
In summary, in both our preliminary work (Wang et al., 2020b) and this extended version, we develop an adversarial learning framework to generate effective land-use configurations by learning from urban geography, human mobility, and socioeconomic data. Specifically, our contributions are: 1) We develop a latitude-longitude-channel tensor to quantify a land-use configuration plan. 2) We propose a socioeconomic interaction perspective to understand urban planning as a process of optimizing the coupling between a community and surrounding environments. 3) We reformulate the automated urban planning problem into an adversarial learning framework that maps surrounding spatial contexts into a configuration tensor. 4) We computationally enhance the efficiency and stability of the proposed framework by devising a conditioning augmentation module via leveraging a new sampling technique and a new optimization loss function. 5) We develop multiple strategies (i.e., distance-based, prediction-based, and visualization-based) to validate the effectiveness of our framework on real-world data.
2. Problem Statement and Framework Overview
2.1.1. Target Area
refers to an geographical area, where is centered on a geographical location (described by latitude and longitude), and the shape of the area is square.
2.1.2. Surrounding Contexts
refer to the surrounding squares which wraps the target area from different directions. The shape of each square in the surrounding contexts is same as the target area. In our research assumption, we have known the information such as demographic data, social activity, traffic volume, etc of the surrounding contexts. According to the geographical vicinity and the information of the surrounding contexts, we construct a spatial attributed graph, in which the vertices are the squares of the surrounding contexts and the attributes of each vertex are extracted from the information of each square. Figure 2 shows the geographical spatial relations between a target area and the surrounding contexts, in which different contexts have different urban utility and characteristics. Our framework aims to generate the land-use configuration of the target area based on the surrounding contexts.
2.2. Problem Statement
As mentioned before, we aim to build up an automated generation framework that generates land-use configuration of the target area based on the surrounding contexts. Formally, assuming a target area is , the surrounding contexts of are , and the land-use configuration for is that is a longitude-latitude-channel tensor. Given a spatial attributed graph that is constructed by extracting explicit features such as traffic condition, economic development, etc from surrounding contexts , we aim to find the mapping function . The function takes the spatial attributed graph as input, and outputs the land-use configuration . In this paper, owing to the shape of the target area is square, the number of the squares in the surrounding contexts is determined as .
2.3. Framework Overview
Figure 3 shows an overview of our proposed method (LUCGAN). This framework has two main phases: (i) surrounding contexts embedding phase; (ii) land-use configuration generation phase. In the surrounding contexts embedding phase, we first extract explicit features of the surrounding contexts from multiple aspects, such as value-added space, POI distribution. Then, we model the eight squares of the surrounding contexts as eight vertices and map the explicit features to the vertices as the corresponding attribute to construct a spatial attributed graph. Next, we employ a graph embedding model to preserve the information of the graph into an embedding vector. Through the above procedures, the final embedding vector represents the whole surrounding contexts. In the land-use configuration generation phase, we first input the embedding of the contexts into an extended generative adversarial networks (GAN). Then, the GAN model learns to formulate the distribution of the well-planned land-use configurations instead of poorly-planned configurations gradually. Finally, when the model converges, the extended GAN can produce suitable and desired land-use configurations based on the embeddings of the surrounding contexts.
3. Automatic planner for land-use configuration
In this section, we first introduce the strategy to represent surrounding contexts. Then, we detail how to quantify and evaluate the quality of land-use configurations. Finally, we develop an automated urban planner based on deep generative adversarial paradigm.
3.1. Extraction of Explicit Features of Surrounding Contexts
The surrounding contexts affect the land-use configuration of a target area. For instance, if the surrounding contexts own lots of recreational facilities, in order to avoid waste of resources, we will not plan lots of recreational buildings in the target area. Instead, we prefer to choose other kinds of buildings such as commercial or educational buildings to make the target area coexist with the surrounding contexts in harmony. Thus, based on the observation, during land-use configuration generation process, it is necessary to take surrounding contexts into consideration. In this paper, we extract the explicit features of surrounding contexts from four aspects:
Value-added Space. Commonly, the variation of housing price reflects the value-added space of one area. Thus, we calculate the dynamically changing trend of housing price of the contexts in continuous months. Here, we take the context as an example to explain the calculation process. First, we obtain the housing price list among months. Then, we calculate the changing trend of housing price by using the current housing price to subtract the previous housing price. So we get the changing trend of as , where represents the value of the changing trend at -th month. Finally, we collect the housing price changing trend of all contexts together. The collected result is denoted as , where .
POI Ratio. Since different POIs provide different services for residents, the ratio of different kinds of POIs can reflect the utility of one area. Therefore, we calculate the POI ratio of the contexts . Here, we take as an example to explain the calculation process. First, we count the total number of POI belonging to each POI category in respectively to form a vector. Then, we divide each item in the vector by the number of all POIs in to obtain the POI ratio vector, denoted by , where represents the ratio of i-th POI category and is the number of POI categories. Finally, we collect the POI ratio vector of all contexts together, denoted as , where .
Public Transportation. Public transportation (i.e. bus, subway) is one of the most important travel modes due to its convenience and economy. We need to consider the public transportation of the contexts . Here, we take as an example to show the calculation details. To capture the characteristics of public transportation, we extract features based on bus trajectory and bus station data from five perspectives: (1) the leaving volume of in one day, denoted by ; (2) the arriving volume of in one day, denoted by ; (3) the transition volume of in one day, denoted by ; (4) the density of bus stop of , denoted by ; (5) the average balance of smart card of , denoted by . Thus, the feature vector of can be denoted as . Finally, we collect the feature vectors of all contexts together. The collected result is denoted as , where .
Private Transportation. Private transportation (i.e. taxi, cab) is another important travel mode for individuals due to its flexibility. We extract the features of private transportation of the contexts based on taxi trajectory data from 5 perspectives. Taking as an example, the definitions of the 5 features as follows: (1) the leaving volume of in one day, denoted by ; (2) the arriving volume of in one day, denoted by ; (3) the transition volume of in one day, denoted by ; (4) in , the average driving velocity of taxis in one hour, denoted by ; (5) in , the average commute distance of taxis in one hour, denoted by ; Then, the feature vector of private transportation is denoted as . Finally, we collect the all context features together, denoted as , where .
After that, we obtain an explicit feature set from the contexts . The set contains four kinds of features , which describes the surrounding contexts from aforementioned perspectives.
3.2. Constructing Spatial Attributed Graphs with Explicit Features as Node Attributes
The surrounding contexts wrap the target area from different directions, resulting in spatial correlation among areas. To capture the spatial correlations among the areas, we construct a spatial attributed graph. Specifically, Figure 4 shows the graph structural relation between a target area and its surrounding contexts, where the blue vertices represent the surrounding contexts; the orange vertex indicates the target area; the edge between two vertices reflects the spatial connectivity between them.
Then, we map the explicit features to the spatial graph structure as the corresponding node attributes. Figure 5 expresses the mapping process. The final spatial attributed graph not only reflects the spatial correlation among different context squares but also depicts the utility characteristics of each square.
3.3. Learning Representation of Spatial Attributed Graphs
Figure 6 shows the spatial representation learning framework that preserves explicit features and spatial relations of the spatial attributed graph into a low-dimensional vector. Formally, we denote the spatial attributed graph as , where is the adjacency matrix that expresses the accessibility among different nodes; is the feature matrix of the graph, here, . In order to get the latent graph embedding , we minimize the reconstruction loss between the original graph and the reconstructed graph through the encoding-decoding paradigm.
The encoding part has two Graph Convolutional Network (GCN) layers. The first GCN layer takes and as input and outputs the feature matrix of low-dimensional space . Thus, the encoding module can be formulated as:
where is the diagonal degree matrix, is the weight matrix of the where is the output dimension of the layer, and the whole layer is activated by function. The second GCN layer takes and as input and then outputs the mean value
and the variance value
of normal distribution. So the calculation process of the second GCN layer can be formulated as:
where is the weight matrix of . Here, is the output dimension of the layer. Next, we use the reparameterization trick to obtain the latent representation :
The decoding module takes the as input and then outputs the reconstructed adjacent matrix . Hence, the decoding step can be formulated as:
represents the decoding layer activated by sigmoid function. Moreover,can be converted to . The inner product operation is beneficial to capture the spatial correlation among different contexts.
During the training phase, we minimize the joint loss function , denoted as:
where is the dimension of ; is the total number of the vertices in ; represents the real distribution of ; represents the prior distribution of .
includes two parts, the first part is the Kullback-Leibler divergence between the standard prior distributionand the distribution of , and the second part is the squared error between and . The training process try to make get close to and let the distribution of get close to . When the model converges, contains all information of the surrounding contexts.
3.4. Land-use Configuration Quantification and Quality Measurement
Land-use configuration indicates the location of different kinds of POIs in one area. To make a machine perceive and understand the configuration, we construct a longitude-latitude-channel tensor as the format of the configuration, where one channel denotes one POI category and the whole tensor represents the POI distribution in the area. Figure 7 shows the construction process of the longitude-latitude-channel configuration tensor. We first divide an unplanned target area into squares. Then we count the number of POIs belonging to each POI category in each square entry and fill the number into the corresponding entry respectively. In this way, we obtain the land-use configuration tensor. If we pick up one channel from the tensor, we can learn about the POI distribution of the corresponding POI category in the whole area.
Owing to we expect the generation framework can generate well-planned land-use configuration, the next big question is how to evaluate the quality of the configuration? In the classical urban planning domain, there are no general evaluation standards since the complexity of urban systems. To make our framework can produce the land-use configurations that people satisfied with, we provide a quality hyper-parameter to evaluate the quality of land-use configurations. In our experiment, is the combination of the POI diversity and the check-in frequency. Formally, we first count the total frequency number of mobile check-in events of an area, which reflect the social activity intensity, denoted by . Then, we calculate the total number of different POI categories of the area as the POI diversity, which depicts the completeness of urban functions, denoted by . Next, we incorporate the two indicators together by the formula (Wang et al., 2018d). If , the configuration of the area is regarded as a well-planned configuration, otherwise, it is justified as a poorly-planned configuration. Here, the value of threshold is determined by given requirements.
3.5. Land-use Configuration Generative Adversarial Networks
Recently, Generative Adversarial Networks (GANs) achieve tremendous achievements and reveal strong imaginative and generative abilities. It motivates us to formulate the land-use configuration generation task into the learning paradigm of GAN.
In our preliminary version (Wang et al., 2020b), we propose a land-use configuration GAN (LUCGAN), and the network structure of LUCGAN as illustrated in Figure 8. In LUCGAN, the generator generates land-use configuration based on the embeddings of surrounding contexts. The discriminator provides feedback to the generator for generating configurations close to well-planned configurations instead of poorly-planned configurations.
The Algorithm 1 shows the training process of LUCGAN. Specifically, in one training iteration, we first update the parameters of the discriminator for times, then learn the parameters of the generator for 1 time based on the current discriminator. For the updating process of the discriminator, we sample well-planned configurations, surrounding context embeddings, and poorly-planned configurations respectively. We utilize them to maximize the loss function illustrated in line 10 of Algorithm 1. Intuitively, we expect the discriminator to provide positive feedback for well-planned configurations, and negative feedback for poorly-planned and generated configurations. In this way, the discriminator improves the distinguishing ability for land-use configurations. For the updating process of the generator, we sample surrounding context embeddings firstly. Then we minimize the loss function shown in line 14 of Algorithm 1. Intuitively, we aim to utilize the discriminator to improve the generative ability of the generator for producing data structures similar to well-planned configurations.
However, the embeddings of the surrounding contexts come from a feature space constructed by spatial attributed graphs. Owing to the small number of graphs, the distribution of them in the feature space is sparse and discrete, which causes the learning process of the GAN model unstable. To overcome this limitation and improve model performance, we propose an enhanced framework, namely LUCGAN, and the network structure as shown in Figure 9. Compared with Figure 8, we add a conditioning augmentation module (Zhang et al., 2017) into our framework. Specifically, we first assume the prior distribution of the surrounding contexts embeddings is a normal distribution. Then, we estimate the mean and variance of the distribution based on the original embeddings. Next, we sample a vector from the distribution and combine it with a vector sampled from standard normal distribution as the input vector of the model. This process improves the model performance because it mitigates the discreteness and sparsity of original graphs in the feature space.
In addition, owing to the differences of the model structure between LUCGAN and LUCGAN, we customize a new training algorithm for LUCGAN as shown in Algorithm 2. Compared with Algorithm 1, there are two improvements: (1) conditioning augmentation module (line 4 line 7 in Algorithm 2); (2) loss function of the generator (line 19 in Algorithm 2). For the conditioning augmentation module, we calculate the mean and the variance based on original surrounding context embeddings respectively. Then, we utilize reparametrization trick to sample a vector from normal distribution , and concatenate the vector with a vector sampled from normal distribution as the surrogate context embeddings . For the learning process of the discriminator, the main logic is the same as Algorithm 1, we only replace the surrounding context embeddings with . For the loss of the generator, besides improving the generative capability of the generator, we also minimize the Kullback-Leibler (KL) divergence between and , which enhances the smoothness of the surrounding context embeddings in the feature space and avoids overfitting.
4. Experiment Results
In this section, we conduct extensive experiments and case studies to answer the following questions: Q1. Is our proposed automatic planner effective for generating land-use configurations? Q2. We split an area into squares for quantifying land-use configurations. What is the influence of the square size for generating configurations? Q3. What are the differences between the contexts of well-planned configurations poorly-planned configurations? Q4. What are the differences of land-use configurations generated by our framework when facing with different planning goals? Q5. What does the generated result for each POI category look like in a generated land-use configuration?
|code||POI category||code||POI category|
|1||car service||11||real estate|
|2||car repair||12||government place|
|6||daily life service||16||company|
|7||recreation service||17||road furniture|
|8||medical service||18||specific address|
4.1. Data Description
We use the following datasets for evaluation: Residential Community: The residential community dataset contains 2990 residential communities in Beijing 111http://www.soufun.com/. Each community is centered by a geographic point (described by latitude and longitude). POI: The POI dataset includes 328668 POIs in Beijing 222https://www.openstreetmap.org/. Each POI item includes latitude, longitude, and the corresponding POI category. Table 1 shows the detailed information of POI category. Taxi Trajectories: The taxi trajectories are collected from a Beijing taxi company 333https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/. Each trajectory contains trip ID, distance (m), travel time (s), average speed (km/h), pick-up and drop-off time, pick-up and drop-off point. Public Transportation: The public transportation dataset includes bus transactions in Beijing from 2012 to 2013, which contains 718 bus lines, 1734247 bus trips 444https://www.beijingcitylab.com/data-released-1/data1-20/. Housing Price: The housing price dataset is collected from a Chinese real estate website 555http://www.soufun.com/, which contains the housing price of residential communities of Beijing from 2011 to 2012. Check-In: The check-in dataset contains the Weibo 666https://open.weibo.com/wiki/2/place/pois/add_checkin check-in records in Beijing from 2011 to 2013. The data format of one record is: longitude, latitude, check-in time and check-in place.
4.2. Evaluation Metrics
We aim to generate land-use configurations that are similar to well-planned configurations. To evaluate the generative performance, we calculate the difference between the distribution of well-planned configurations and the distribution of generated configurations . The less distribution difference is, the better generative performance will be.
Kullback-Leibler (KL) Divergence: , where is a test sample.
Jensen-Shannon (JS) Divergence: .
Hellinger Distance (HD) :
Wasserstein Distance (WD) : , where
is a set of joint distribution betweenand ; is a joint distribution of ; are two samples sampled from ; is the expectation of distances between any two samples.
4.3. Baseline Methods
We compare the performance of our journal version framework (LUCGAN) against the following baseline models:
DCGAN: is an extension for traditional GAN, which utilizes convolutional layer and convolutional transpose layer in the generator and discriminator respectively (Radford et al., 2015).
WGAN: is a new GAN training framework, which improves the stability of learning and provides meaningful learning curve for debugging and hyperparameter adjustment (Arjovsky et al., 2017).
WGAN: utilizes gradient penalty to replace clipping weights of WGAN, which enhances the performance of WGAN further (Gulrajani et al., 2017).
LUCGAN: is the conference version of our land-use configuration GAN, which is capable of generating the configurations based on the surrounding contexts (Wang et al., 2020b).
To further study the generated land-use configurations, we adopt two new methods: scoring model and visualization. For the scoring model, we train a machine learning model to learn the scoring criteria that provides high score for well-planned configurations and low score for poorly-planned configurations. After we obtain all testing samples, the model can be used to evaluate the quality of generated results. For the visualization, we visualize the generated results in heat map, pie chart, 3d-bar chart for checking the POI distribution. We conduct all experiments on a x64 machine with Intel i9-9920X 3.50GHz CPU, 128GB RAM and Ubuntu 18.04.
4.4. Hyperparameters and Reproducibility
In our experiments, first, to obtain the embedding of surrounding contexts (section 3.3), we employ a VGAE (Kipf and Welling, 2016)
composed of an encoder and a decoder. The encoder contains three graph convolutional neural layers. The decoder only has one reconstructed layer. We perform Adaptive Moment Estimation (Adam) to optimize the VGAE model with a learning rate of 0.005 for 300 epochs. The dimension of surrounding contexts’ embedding is set to 100. Second, to quantify the quality of land-use configurations (section 3.4), we set the value of the hyper-parameterto 0.5. Third, our planner LUCGAN
consists of a generator and a discriminator (section 3.5). We optimize the generator by Adam with a learning rate of 0.0001. We perform Stochastic Gradient Descent (SGD) to optimize the discriminator with a learning rate of 0.0001 and a momentum of 0.95. The whole optimizing process continues for 50 epochs. To make other researchers easily reproduce our experiments, we release the code and data by Dropbox777https://www.dropbox.com/sh/16pk55efb9fzm2j/AACsosXxHtfQKXKjmL0NrOn1a?dl=0.
4.5. Overall Performance (Q1)
To validate the effectiveness of our model, we evaluate the gap between the distribution of well-planned configurations and the distribution of generated configurations in terms of KL Divergence (KL), JS Divergence (JS), Hellinger Distance (HD), and Wasserstein Distance (WD). As Figure 10 shows, compared with the best performance of baseline models (WGAN, WGAN,DCGAN), LUCGAN improves 16.2, 0.25, 28.4, 48.6 in terms of KL, JS, HD, and, WD respectively. This observation indicates that LUCGAN can capture more characteristics of the well-planned configurations compared with other baseline models. In addition, another interesting observation is that compared with LUCGAN, LUCGAN increases 8.92, 0.23, 8.43, 4.32 in terms of KL, JS, HD, and WD respectively. A potential interpretation for the observation is that the conditioning augmentation module and the new training approach of LUCGAN makes the learning process more stable and effective.
4.6. Study the influence of the square size for generating land-use configurations (Q2)
To quantify the land-use configuration, we divide an area into squares to collect the POI distribution information. To study the influence of the square size for generation, we vary , , , , to conduct experiments. Here, the smaller value of is, the larger size of square is. Figure 11 shows the performance of all models when facing different square sizes in terms of KL Divergence, JS Divergence, HD, and WD. We find that with the increase of the square size, the value of all metrics decreases. A possible explanation for the observation is that when the square size is larger, the distribution of the land-use configurations becomes simpler. The generative models can capture the characteristics of the distribution of the configurations very easily, thus, the values of all metrics become smaller. However, the large square size loses much information about urban planning details of the land-use configuration. Another interesting observation is that LUCGAN outperforms other baseline models in terms of all evaluation metrics when . But for some smaller values, LUCGAN is slightly worse than LUCGAN. A potential reason for the observation is that LUCGAN is enough to capture the pattern of land-use configurations collected by smaller values. Although the conditional augmentation module of LUCGAN can improve robustness, in this situation such module may cause the model to a slightly underfitting. However, in reality, we should avoid collecting land-use configurations under small values. Because such configurations lose many planning details, which is harmful to producing effective urban plans.
4.7. Study the surrounding contexts of different configurations (Q3)
Our framework generates land-use configurations based on the corresponding surrounding contexts. Thus, the surrounding contexts have strong influence on the generation of the land-use configuration. To observe the distribution of the surrounding contexts, we visualize the embeddings of the surrounding contexts on 2-dimensional space. Specifically, we first randomly choose 500 embeddings of the surrounding context of well-planned configurations and poorly-planned configurations respectively. Then, we utilize T-SNE algorithm (Van der Maaten and Hinton, 2008) to reduce the dimension of the embeddings into two. Next, we visualize the embeddings on 2-dimensional space, as illustrated in Figure 12. We find that the pattern of the well-planned configurations contexts is different from the pattern of the poorly-planned configurations contexts, which indicates that our research intuition, that generates the land-use configurations based on the surrounding contexts is reasonable.
4.8. Scoring model evaluation for generated land-use configurations (Q1)
To validate the effectiveness of LUCGAN further, we build a scoring model. As illustrated in Figure 13, LUCGAN owns the highest quality score compared with other baseline models, which indicates the superiority of LUCGAN. Meanwhile, it also shows that the scoring model can be regarded as a evaluation method for evaluating the generation of the land-use configurations.
4.9. Study the POI ratio of generated configurations under different (Q4)
In our framework, we leverage a hyperparameter to evaluate whether a land-use configuration is well-planned or poorly-planned. When individuals have different urban planning goals, the meanings of are different. To validate the utility of , we conduct two generative tasks: (1) is used to determine whether a land-use configuration is vibrant; (2) is used to validate whether a land-use configuration is living convenient. We visualize the POI ratio of generated configurations and original configurations under the two settings as shown in Figure 14, in which the numbers denote different POI categories that are shown Table 1, and the grey percentiles indicate the proportions of different POI categories in the corresponding configuration. Compared with Figure 14(a) and Figure 14(c), we find that for the vibrant configuration, POI category 4 (food service), 5 (shopping), 7 (recreation service), and 11 (real estate) cover a large portion. This is reasonable because a vibrant configuration always owns many POIs related to economics and social activities; For the living convenient configuration, POI category 12 (government place), 17 (road furniture), and 19 (public service) occupy the majority. A reasonable explanation is that a configuration is living convenient when it contains many POIs related to public services and traffic conditions. The two observations validate that LUCGAN can produce the customized land-use configuration utilizing according to people’s requirements. In addition, compared with Figure 14(a) and Figure 14(b), Figure 14(c) and Figure 14(d), another interesting observation is that the POI categories in generated configurations are more complete than the original configurations. A potential interpretation is that LUCGAN not only captures the characteristics of the specific kind of land-use configuration but also includes new design elements into the generated configuration.
4.10. Study the POI distribution of generated configurations under different (Q4)
To further understand the utility of and observe the differences between generated and original land-use configurations, we visualize the configurations into a 3-dimensional space as shown in Figure 15, in which the left color bar indicates the mapping relations between the number of POI categories and colors; the right part reflects the POI distribution of the configuration; the height of each bar indicates the number of POIs at the corresponding position. A careful inspection for Figure 15(a) and Figure 15(c) shows that the generated configurations are organized and contain enough planning information for implementation in realistic. In addition, another interesting observation is that the generated configurations contain more dense POI distribution compared with original configurations. A potential interpretation is that LUCGAN prefers to produce dense POI distribution, because it’s easy to capture the correlation among different POIs.
4.11. Study the generated situation of each channel in generated configurations (Q5)
We quantify a land-use configuration as a longitude-latitude-channel tensor. So, what is the generated situation for each channel (POI category)? To check it, we visualize the POI distribution of each channel. The visualization results are shown in Figure 16, in which the darker color of the block indicates the number of POI in the corresponding block is larger. An interesting observation standing out is that the POI distributions of different categories show their unique patterns. For example, transportation pots are more concentrated, while food service related POIs are more dispersed across the area; the distribution of car service spots is very similar to the recreation service, and the possible reason is that recreation service spots may occupy many parking lots which potentially attract car services. The observation shows that LUCGAN is capable of capturing characteristics of POI distribution of different categories at the same time. From another perspective, the observation also reflects that LUCGAN is able to capture the mutual interactions and constraints among different kinds of POIs. Thus, LUCGAN is superior and effective for generating land-use configurations automatically.
5. Related Work
Spatio-temporal Data Mining Spatio-temporal data mining refers to the process of discovering the pattern and knowledge from the data related to space and time (Atluri et al., 2018). Owing to the spatio-temporal data is closely relevant to our real life, many researchers attempted to extract the patterns hidden behind the data for improving the urban life quality (Wang and Song, 2018; Wang et al., 2021b, d, c; Zhou et al., 2020; Liu et al., 2018). For instance, Wang et al. employed deep learning approaches to forecast the travel demand of individuals based on the travel order data collected by car-hailing company (Wang et al., 2018a). Zhao et al. predicted the air quality index by considering spatio-temporal relatedness (Zhao et al., 2017)et al., 2020d). Yuan et al. utilized topic model to discover urban functional zone based on POI data and taxi trajectories data (Yuan et al., 2014). Wang et al. used peer and temporal-aware representation learning to analyze the driving behavior based on GPS trajectory data (Wang et al., 2018c). Liu et al. studied the mobility patterns of traffic flows for bus routing optimization (Liu et al., 2017). Du et al. provided a systematic study to capture the spatio-temporal dynamics of passenger transfers for crowdedness-aware route recommendations (Du et al., 2018). In this paper, to reduce the heavy workload of urban planners and accelerate the urban planning process, we expect to utilize spatio-temporal data for the urban planning pattern discovering.
The objective of representation learning is to preserve the information of original data into a low-dimensional feature space. In general, there are three types of representation learning models: (1) probabilistic graphical models; (2) manifold learning models; (3) auto-encoder models. The probabilistic graphical models build a complex Bayesian network system to learn the representation of uncertain knowledge buried in original data(Qiang et al., 2019). The manifold learning models infer low-dimensional manifold of original data based on neighborhood information by non-parametric approaches (Zhu et al., 2018). The auto-encoder models learn the latent representation by minimizing the reconstruction loss between original and reconstructed data (Otto and Rowley, 2019). In the spatio-temporal data mining domain, to capture the characteristics of spatial entity (i.e. city, geographical area), representation learning achieves great success (Fu et al., 2018; Wang et al., 2018b, 2020c; Chandra et al., 2019; Fu et al., 2019). For instance, to analyze the individual driving behaviors, Wang et al. utilized representation learning to mine the spatio-temporal characteristics of GPS trajectory data. (Wang et al., 2019). Du et al. proposed a new spatial representation learning framework to capture the static and dynamic characteristics among the spatial entities for predicting housing price (Du et al., 2019). Wang et al. employed a spatio-temporal representation learning module to extract the features of cyber attack in a graph for cyber attack detection (Wang et al., 2020a). In this paper, to incorporate the surrounding context characteristics into our framework, we employ representation learning to preserve the spatial attributed graphs constructed by the contexts into low-dimensional vectors.
. GAN algorithms can be classified into three categories from the task-driven perspective. (1) Semi-supervised learning GANs. Usually, a complete labeled data set is difficult to obtain, and the semi-supervised learning GANs can utilize unlabeled data or partially labeled data to train an excellent classifier(Ding et al., 2018; Liu et al., 2020)
. For instance, Akcay et al. designed a semi-supervised GAN anomaly detection framework that achieved good performance(Akcay et al., 2018)
. (2) Transfer learning GANs. Many researchers utilize the transfer learning GANs to transfer knowledge among different domains(Hoffman et al., 2018; Tzeng et al., 2017). For instance, Choi et al. built an unified GAN to translate the images among different style fields (Choi et al., 2018). (3) Reinforcement learning GANs. Reinforcement learning (RL) is incorporated into GANs to improve the generative performance (Sarmad et al., 2019). For instance, Ganin et al. combined reinforce learning and GAN to synthesize high-resolution images (Ganin et al., 2018). Aforementioned works indicate that GANs are capable of capturing the characteristics of the original data distribution and generate new data samples based on the distribution. Such observation motivates us to utilize the learning paradigm of GANs as the main framework of our automatic urban planner.
Urban Planning. Urban planning is a complex and interdisciplinary research domain (Adams, 1994). Urban experts need to consider lots of factors such as government policy, environmental protection, and more for designing appropriate land-use configurations (Niemelä, 1999; Adams, 1994; Wang et al., 2021a; Oliveira and Pinho, 2010). Meanwhile, different areas have various planning goals. For example, Barton et al. focused on constructing an urban planning solution for human health and well-being (Barton and Tsourou, 2013). John et al. discussed the relationship between urban planning and real estate development (Ratcliffe et al., 2004). Indeed, it is difficult to generate a good urban planning solution objectively. Recently, with the development of artificial intelligence (AI), many researchers focus on making the process of urban planning become smart and automated. These methods always build up a GAN model to generate the layout of a space based on the realistic architecture or designing image (Li et al., 2018; Nauata et al., 2020; Noyman and Larson, 2020). For instance, Albert et al. utilized generative adversarial networks to generate the complex and spatial organizations observed in global urban patterns based on footprint data (Albert et al., 2018). Bachl et al. proposed a new conditional GAN framework to learn the architecture features of major cities for generating the image of buildings which do not exist before (Bachl and Ferreira, 2019). These works have had a lot of success, but they have one drawback: they require expert layout data in order to train AI models. In addition, some researchers use transfer learning to transfer spatial knowledge across many cities to increase the generalization of spatial AI models and learning efficiency (Jiang et al., 2021; He et al., 2020). These works are capable of perceiving human mobility patterns, which gives a decent foundation for urban planning, but they are unable to immediately develop successful urban layouts. Compared to these works, our framework LUCGAN has no strict condition for data collection. We focus on producing customized land-use configurations based on geographical spatial data such as POI, traffic data, economic data, demographic data, etc. These data resources are always publicly available, which makes our framework have good flexibility and generalization.
6. Conclusion Remarks
In order to generate a suitable and excellent land-use configuration solution objectively and reduce the heavy burden of urban planning specialists, we propose an automatic land-use configuration planner framework. This framework generates the land-use configuration based on the surrounding contexts. Specifically, we first collect a set of land-use configurations and corresponding surrounding contexts. Then, we construct spatial attributed graphs that contain explicit features such as value-added space, POI distribution, traffic conditions, and more of surrounding contexts, and preserve the information of the graphs into the surrounding embeddings. Next, we employ our proposed automatic urban planner model to generate well-planned land-use configurations based on the embeddings. Finally, through extensive experiments, we find that LUCGAN is more effective and robust than other baseline models. In addition, different square sizes affect the generative ability of LUCGAN, so users should adopt suitable segmentation scheme for land-use configurations based on their requirements. Moreover, LUCGAN is capable of customizing land-use configurations based on hyperparameter . Furthermore, LUCGAN not only generates the POI distribution of whole area but also provides the generation of each POI category.
- Urban planning and the development process. Psychology Press. Cited by: §5.
Ganomaly: semi-supervised anomaly detection via adversarial training.
Asian conference on computer vision, pp. 622–637. Cited by: §5.
- Modeling urbanization patterns with generative adversarial networks. In IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 2095–2098. Cited by: §5.
- Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, D. Precup and Y. W. Teh (Eds.), Proceedings of Machine Learning Research, Vol. 70, International Convention Centre, Sydney, Australia, pp. 214–223. External Links: Cited by: item 2.
- Spatio-temporal data mining: a survey of problems and methods. ACM Computing Surveys (CSUR) 51 (4), pp. 1–41. Cited by: §5.
- City-gan: learning architectural styles using a custom conditional gan architecture. arXiv preprint arXiv:1907.05280. Cited by: §5.
- Healthy urban planning. Routledge. Cited by: §5.
- Collective representation learning on spatiotemporal heterogeneous information networks. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 319–328. Cited by: §5.
Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8789–8797. Cited by: §5.
- Semi-supervised learning on graphs with generative adversarial nets. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 913–922. Cited by: §1, §5.
- SmartTransfer: modeling the spatiotemporal dynamics of passenger transfers for crowdedness-aware route recommendations. ACM Transactions on Intelligent Systems and Technology (TIST) 9 (6), pp. 1–26. Cited by: §5.
- Beyond geo-first law: learning spatial representations via integrated autocorrelations and complementarity. In 2019 IEEE International Conference on Data Mining (ICDM), Vol. , pp. 160–169. Cited by: §5.
- Representing urban forms: a collective learning model with heterogeneous human mobility data. IEEE transactions on knowledge and data engineering 31 (3), pp. 535–548. Cited by: §5.
- Efficient region embedding with multi-view spatial networks: a perspective of locality-constrained spatial autocorrelations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 906–913. Cited by: §5.
- Synthesizing programs for images using reinforced adversarial learning. In International Conference on Machine Learning, pp. 1666–1675. Cited by: §5.
- Improved training of wasserstein gans. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5769–5779. Cited by: item 3.
- What is the human mobility in a new city: transfer mobility knowledge across cities. In Proceedings of The Web Conference 2020, WWW ’20, New York, NY, USA, pp. 1355–1365. External Links: Cited by: §5.
- CyCADA: cycle-consistent adversarial domain adaptation. In International Conference on Machine Learning, pp. 1989–1998. Cited by: §5.
Transfer urban human mobility via poi embedding over multiple cities.
ACM Transactions on Data Science2 (1), pp. 1–26. Cited by: §5.
- Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §4.4.
- LayoutGAN: generating graphic layouts with wireframe discriminators. In International Conference on Learning Representations, Cited by: §5.
- Modeling the interaction coupling of multi-view spatiotemporal contexts for destination prediction. In Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 171–179. Cited by: §5.
- Intelligent bus routing with heterogeneous human mobility patterns. Knowledge and Information Systems 50 (2), pp. 383–415. Cited by: §5.
CatGAN: category-aware generative adversarial networks with hierarchical evolutionary learning for category text generation.. In AAAI, pp. 8425–8432. Cited by: §1, §5.
- Balanced urban development: is it a myth or reality?. In Balanced urban development: Options and strategies for liveable cities, pp. 3–13. Cited by: §1.
- House-gan: relational generative adversarial networks for graph-constrained house layout generation. In European Conference on Computer Vision, pp. 162–177. Cited by: §5.
- Ecology and urban planning. Biodiversity & Conservation 8 (1), pp. 119–131. Cited by: §5.
- A deep image of the city: generative urban-design visualization. Challenge 7, pp. 30. Cited by: §5.
- Evaluation in urban planning: advances and prospects. Journal of Planning Literature 24 (4), pp. 343–361. Cited by: §5.
Linearly recurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems 18 (1), pp. 558–593. Cited by: §5.
- Learning to generate posters of scientific papers by probabilistic graphical models. Journal of Computer Science and Technology 34 (1), pp. 155–169. Cited by: §5.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: item 1.
- Urban planning and real estate development. Vol. 8, Taylor & Francis. Cited by: §5.
- Rl-gan-net: a reinforcement learning agent controlled gan network for real-time point cloud shape completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5898–5907. Cited by: §5.
- Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7167–7176. Cited by: §5.
- Co-evolutions of planning and design: risks and benefits of design perspectives in planning systems. Planning Theory 12 (2), pp. 177–198. Cited by: §1.
- Visualizing data using t-sne.. Journal of machine learning research 9 (11). Cited by: §4.7.
- Defending water treatment networks: exploiting spatio-temporal effects for cyber attack detection. In 2020 IEEE International Conference on Data Mining (ICDM), Vol. , pp. 32–41. External Links: Cited by: §5.
- DeepSTCL: a deep spatio-temporal convlstm for travel demand prediction. In 2018 International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 1–8. External Links: Cited by: §5.
- Reimagining city configuration: automated urban planning via adversarial learning. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, pp. 497–506. Cited by: §1, §1, §3.5, item 4.
- Deep human-guided conditional variational generative modeling for automated urban planning. arXiv preprint arXiv:2110.07717. Cited by: §5.
- Automated feature-topic pairing: aligning semantic and embedding spaces in spatial representation learning. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems, pp. 450–453. Cited by: §5.
- Towards semantically-rich spatial network representation learning via automated feature topic pairing. Frontiers in big Data 4. Cited by: §5.
- Reinforced imitative graph representation learning for mobile user profiling: an adversarial training perspective. arXiv preprint arXiv:2101.02634. Cited by: §5.
- A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing 314, pp. 198–206. Cited by: §5.
- Learning urban community structures: a collective embedding perspective with periodic spatial-temporal mobility graphs. ACM Transactions on Intelligent Systems and Technology (TIST) 9 (6), pp. 1–28. Cited by: §5.
- You are how you drive: peer and temporal-aware representation learning for driving behavior analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2457–2466. Cited by: §5.
- Exploiting mutual information for substructure-aware graph representation learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 3415–3421. Cited by: §5.
- Spatiotemporal representation learning for driving behavior analysis: a joint perspective of peer and temporal dependencies. IEEE Transactions on Knowledge and Data Engineering. Cited by: §5.
- Incremental mobile user profiling: reinforcement learning with spatial knowledge graph for modeling event streams. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 853–861. Cited by: §5.
- Ensemble-spotting: ranking urban vibrancy via poi embedding with multi-view spatial graphs. In Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 351–359. Cited by: §3.4.
- Towards a new typology of urban planning theories. Environment and Planning B: Planning and Design 16 (1), pp. 23–39. Cited by: §1.
- Discovering urban functional zones using latent activity trajectories. IEEE Transactions on Knowledge and Data Engineering 27 (3), pp. 712–725. Cited by: §5.
- Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 5907–5915. Cited by: §3.5.
- TrafficGAN: off-deployment traffic estimation with traffic generative adversarial networks. In 2019 IEEE International Conference on Data Mining (ICDM), pp. 1474–1479. Cited by: §1, §5.
- Curb-gan: conditional urban traffic estimation through spatio-temporal generative adversarial networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 842–852. Cited by: §1, §5.
- Incorporating spatio-temporal smoothness for air quality inference. In 2017 IEEE International Conference on Data Mining (ICDM), pp. 1177–1182. Cited by: §5.
- Deep flexible structured spatial–temporal model for taxi capacity prediction. Knowledge-Based Systems 205, pp. 106286. Cited by: §5.
- Image reconstruction by domain-transform manifold learning. Nature 555 (7697), pp. 487–492. Cited by: §5.