City2City: Translating Place Representations across Cities

Large mobility datasets collected from various sources have allowed us to observe, analyze, predict and solve a wide range of important urban challenges. In particular, studies have generated place representations (or embeddings) from mobility patterns in a similar manner to word embeddings to better understand the functionality of different places within a city. However, studies have been limited to generating such representations of cities in an individual manner and has lacked an inter-city perspective, which has made it difficult to transfer the insights gained from the place representations across different cities. In this study, we attempt to bridge this research gap by treating cities and languages analogously. We apply methods developed for unsupervised machine language translation tasks to translate place representations across different cities. Real world mobility data collected from mobile phone users in 2 cities in Japan are used to test our place representation translation methods. Translated place representations are validated using landuse data, and results show that our methods were able to accurately translate place representations from one city to another.



There are no comments yet.


page 3

page 7

page 8


Learning Fine Grained Place Embeddings with Spatial Hierarchy from Human Mobility Trajectories

Place embeddings generated from human mobility trajectories have become ...

Casting Light on Invisible Cities: Computationally Engaging with Literary Criticism

Literary critics often attempt to uncover meaning in a single work of li...

The time geography of segregation during working hours

Understanding segregation is essential to develop planning tools for bui...

Shared E-scooters: Business, Pleasure, or Transit?

Shared e-scooters have become a familiar sight in many cities around the...

City limits in the age of smartphones and urban scaling

Urban planning still lacks appropriate standards to define city boundari...

The Hangulphabet: A Descriptive Alphabet

This paper describes the Hangulphabet, a new writing system that should ...

Learning Large-scale Location Embedding From Human Mobility Trajectories with Graphs

GPS coordinates and other location indicators are fine-grained location ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


This work was accepted as a 4-page short paper in the ACM SIGSPATIAL 2019 Conference. This is the extended (originally submitted) version.

 author = {Yabe, Takahiro and Tsubouchi, Kota and Shimizu, Toru and Sekimoto, Yoshihide
           and Ukkusuri, Satish V.},
 title = {City2City: Translating Place Representations Across Cities},
 booktitle = {Proceedings of the 27th ACM SIGSPATIAL International Conference on
              Advances in Geographic Information Systems},
 series = {SIGSPATIAL ’19},
 year = {2019},
 isbn = {978-1-4503-6909-1},
 location = {Chicago, IL, USA},
 pages = {412--415},
 numpages = {4},
 url = {},
 doi = {10.1145/3347146.3359063},
 acmid = {3359063},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {human mobility, machine translation, mobile phone data, place representations,
             urban functions},

1 Introduction

Large mobility datasets collected from mobile phones, social media, and various sensors have allowed us to observe the dynamics of cities at an unprecedented spatio-temporal resolution [batty2012smart, ratti2006mobile]. Data-driven methods have revolutionized the way we tackle various urban challenges [zheng2014urban] such as pollution [zheng2013u], traffic congestion [iqbal2014development], and disaster management [lu2012predictability].

In particular, recent studies have made significant progress on understanding the semantics of places from large mobility data [feng2017poi2vec, liu2016exploring, yan2017itdl, zhao2017geo, wang2017region]

. Many of these studies produce place representations (or embeddings) in a similar manner to generating word representations in the natural language processing field (e.g.

word2vec [mikolov2013distributed]), by treating words and places, sentences and trip sequences as analogies. These place representations have been shown to successfully characterize the functionality of places within cities, and have been applied in various tasks related to urban planning, such as identifying spatial clusters with respect to functionality [yao2018representing], choosing sites for opening new stores [xu2016], and predicting where users will go to in future timesteps [chang2018content].

However, such studies have been limited to understanding the place representations of cities in an individual manner, and has lacked an inter-city perspective. Because the representations of different cities were not generated in a common vector space, it has been difficult to transfer insights based on place representations from one city to another, let alone transferring various phenomena (e.g. evacuation after disasters) across cities. If we could map place representations learned in one city to another, in the same way we translate words to words across different languages, we would be able to utilize knowledge accumulated in other cities to perform better analyses and predictions. For example, we may be able to predict locations that could become evacuation shelters in future disasters in city

, by translating the representations of places that became evacuation shelters in a past disaster in city to city .

In this study, we attempt to bridge these gaps by treating cities and languages analogously, which extends the analogies made by previous studies (“places and words” and “trip sequences and sentences”). More specifically, our goal is to develop methods that can map places from different cities with similar meanings closely on a common vector space via an operation analogous to translation. We propose models that extend the methods developed in the natural language processing field for unsupervised machine language translation tasks [zhang2017adversarial, conneau2017word, artetxe2018robust]. Figure 1 shows an illustration of our problem setting and approach. Given representations of places , in cities and , directly overlaying on would be uninformative, since the vector spaces are not aligned with eachother. Using methods to translate representations, we obtain which is aligned to the space of city , allowing us to compare representations of places in different cities for further analyses and predictions.

The model performances are tested using real world data collected from mobile phones in 2 cities in Japan, and are validated using landuse data. Results show that our methods are able to accurately translate place representations from one city to another.

The main contributions of this paper are as follows:

  • We propose and test methods to translate place representations across cities, which can map places from different cities with similar functions closely together.

  • We verify that our method can successfully translate place representations, using real mobility world data from 2 cities.

  • We make the translated place representations publicly available for researchers and practitioners.

Figure 1: Illustration of our problem setting. Given representations of places in 2 cities () generated from the observed mobility patterns, our problem is to translate the place representations of city to the vector space of city , so that similar places from the two cities become mapped closely in the common vector space, as shown in the bottom right panel.

2 Preliminaries

Definition 1 (Human Mobility Patterns)

Sequences of users’ staypoint locations with timestamps are extracted from mobility data using methods explained in Section 3.1. The usual human mobility patterns of a city is the set of all staypoint sequences of individuals whose home location belongs to city .

Definition 2 (Place Representations)

A city is divided into disjoint cells by grid sizes of meters. We will call each cell as a place , and denote its representation as , which is a -dimensional vector. Place representations are learned from the human mobility patterns observed in city , using methods explained in Section 3.2. Representations of all places are stacked as a matrix , where is the number of places in city .

Problem Definition (Translation of Place Representations)

Place representations are learned for each city from the observed mobility patterns. Thus, for different cities, the vector spaces are not shared. Translating place representations from city to city is equivalent to finding a mapping function that aligns the two vector spaces, i.e., . Methods used to translate place representations are explained in Section 3.3.

3 Methodology

3.1 Extracting Human Mobility Patterns

We first extract human mobility pattern datasets for each city, from the location data observed from mobile phones. Each observation of the location data contains the user ID, timestamp, longitude and latitude. More details of the mobile phone data that we use in this study are explained in Section 4.1.1. Our goal is to extract users’ sequences of staypoint locations from the observations. We achieve this by setting two threshold parameters; one spatial threshold and one temporal threshold. To cope with noisy location observations (e.g. spatial errors in GPS data), we perform mean shift clustering to estimate the true location for each observation, as described in previous studies (e.g.

[ashbrook2003using, kanasugi2013spatiotemporal]). For each user, we read their location data in time order, and search for locations where the user has stayed within the distance defined by the spatial threshold parameter for a duration longer than the time defined by the temporal threshold parameter. We use 1000 meters as the spatial threshold, and 30 minutes as the temporal threshold in this study. As a result, we are able to obtain sequences of staypoint locations for each user, which will be used to generate place representations using methods explained in the following section.

3.2 Generating Place Representations

To obtain the representations of places in a city, we solve a self-supervised task in which an LSTM RNN model is trained to predict the next staypoint of a user using mobility data, which is analogous to language models which are trained to predict the next word in a sentence. After training an LSTM RNN model using staypoint sequences of a city , we extract and stack the embedding layer’s parameters of the size , and define it as the matrix of place representations . We refer to this place representation learning architecture as “MobLSTM

” in the following sections. Specific model hyperparameter settings are explained in Section 4.2.1.

3.3 Translating Place Representations

Three approaches for translating place representations across cities are tested. The first approach is to jointly learn the place representations of places in both cities using a common MobLSTM architecture (Section 3.3.1). The second and third approaches learn place representations using MobLSTM separately for different cities, and then attempt to align them using an optimization method (Section 3.3.2), or adversarial training (Section 3.3.3).

3.3.1 Joint Learning Approach

In the first approach, we apply the MobLSTM model to the two cities together on the self-supervised next staypoint prediction task. We merge the mobility datasets of two cities into one, train the model over the merged data, and use the transposed embedding layer matrix of the size as the representation matrix. The rationale behind this approach is that, representations of places with similar functions will be visited in a similar manner (e.g. time of day, day of week, after and before certain places) regardless of the city the places belong to. To let the model treat places of cities and as equally as possible, we mask the candidates of at the output when the next staypoint belongs to and vice versa, releasing the model from the burden of distinguishing between two cities. A previous study shows that this approach is effective in translating embeddings of one language to another in an unsupervised manner [wada2018unsupervised]. We refer to this translation method as “Joint-MobLSTM”.

3.3.2 Procrustes Transformation Approach

The second approach applies the Procrustes transformation method, which is originally used in the supervised problem setting. Given place representations and , and a dictionary of pairs of places which are ranked by their popularity (indicated by the superscript ), orthogonal Procrustes is applied to align them together into a common vector space by optimizing the following function:


where , and is the Frobenius norm. The solution gives the best rotational alignment of the two vector spaces. Although our original problem is in the unsupervised setting, we generate synthetic representation pairs by pairing up the top visited places from both cities. We use as the default number of place pairs, however we test its effect on translation performance in Section 4.4.4. We refer to this translation method as “MobLSTM-P”.

3.3.3 Adversarial Training Approach

The Procrustes method requires a dictionary of pairs of places from the two cities that are expected to be mapped closely together, however, a fully unsupervised approach is shown to work better in some settings [conneau2017word, artetxe2017unsupervised]. The third approach uses adversarial training to learn the transition matrix , which is then used for translating the representations learned via MobLSTM between the two cities. A similar approach as Conneau et al. [conneau2017word] is taken here, where a model is trained the discriminate between representations randomly selected from and . is then trained to prevent the discriminator from making accurate predictions [ganin2016domain]. The standard training procedure of deep adversarial networks is used for train the adversarial model [goodfellow2014generative]. We refer to this translation method as “MobLSTM-Adv”.

4 Experimental Validation

In our experiments, we define the sizes of the places as grid cells. Through the qualitative analysis of the place representations in Section 4.3.3, we confirm that this spatial scale is granular enough to be able to identify specific places. Moreover, the evacuation shelter analysis in Section 5 shows that the scale is informative enough to assist disaster relief officers in practice. We generated and translated representations of places (grid cells) instead of specific place of interests (POIs) that are specified in maps, because there are cases where places with no particular POI could have significant meanings to the people.

4.1 Data

4.1.1 Mobile Phone Data

Yahoo Japan Corporation111 collects location information of mobile phone app users in order to send relevant notifications and information to the users. The users in this study have accepted to provide their location information. The data are anonymized so that individuals cannot be specified, and personal information such as gender, age and occupation are unknown. Each GPS record consists of a user’s unique ID (random character string), timestamp, longitude, and latitude. The data acquisition frequency of GPS locations changes according to the movement speed of the user to minimize the burden on the user’s smartphone battery. If it is determined that the user is staying in a certain place for a long time, data is acquired at a relatively low frequency, and if it is determined that the user is moving, the data is acquired more frequently. The data has a sample rate of approximately 2% of the population, and past studies suggest that this sample rate is enough to grasp the macroscopic urban dynamics [yabe2016framework, nishi2014hourly]. Table 1 shows the statistics of the dataset collected for two cities (Kumamoto and Okayama), which are the cities that we focus on in this study. There are around 100,000 unique active users from both areas, and their location data were analyzed to extract their home locations and staypoint locations using methods in Section 3.1.

# Users 94,053 119,349
# GPS staypoints 2,832,329 2,382,861
Data period 2016/2/12/29 2018/6/16/30
# places 2565 2163
Table 1: Data statistics for the two cities

4.1.2 Landuse Data

To validate whether the generated and translated place representations correctly reflect the functionality of the places, we use the Urban Area Land Use Mesh Data222 in the National Land Numerical Information Database333 provided by the Ministry of Infrastructure, Land, and Transport and Tourism of Japan. The dataset divides all urban areas of the entire country into grid cells, and assigns one category to each grid cell out of 17 options. The 17 options include farmland, residential area, business district, parks, forests, factories, public facilities, water body, open spaces, roads, railways, golf courses, etc. We aggregate these data into our spatial scale (), thus for each place, we have a 17 dimensional vector where each element shows how many pixels of a specific land type exists in that place.

4.2 Experiment Settings

4.2.1 Model Hyperparameter Settings

To conduct the representation learning of places described in Section 3.2, we setup the model and input data as follows. The model consists of the embedding layer, LSTM RNN block, readout layer, and the output layer. While the main input of the model is a sequence of staypoints representing a user’s movement, we added two supplementary values, which are the timestamp of when the user had entered that place and the duration time of the stay, to incorporate time-dependency of the users’ behavior. The embeddings of staypoints were set to 96-dimensional vectors. The timestamp and stay duration were converted to 8-dimensional and 4-dimensional vectors respectively, and the three vectors at each step were concatenated into a 108-dimensional vector. The LSTM RNN block scanning over the embedding sequence consists of two layers of the size 128, and the hidden vectors of both layers were fed into the readout layer of the size 96, which were then read by the output layer producing the probability distribution over staypoints for the next place prediction. The parameter matrix of the staypoint embedding was reused as the output layer’s matrix to reduce the total number of parameters and make the training data usage more efficient. We applied dropout with the keep probability 0.7 to three points of the model: the embedding layer, readout layer, and output layer. We continued the training for 20 epochs, evaluated performance on the validation data at the end of each epoch, and used the embedding matrix of the best model for subsequent processing.

4.2.2 Comparative Methods

We first assess the quality of place representations generated by MobLSTM and Joint-MobLSTM in Section 4.3.1. Then, after clarifying that the generated representations accurately embed the functions of places in each city individually, we validate the performances of translation models MobLSTM, Joint-MobLSTM, MobLSTM-Adv, and MobLSTM-P in Section 4.3.2.

4.2.3 Evaluation Metrics

Two metrics are used to evaluate the performances of the translation methods. Given two sets of places, we measure the average mutual norm distance and average mutual cosine similarity of the place representations of those places.

Average Mutual Norm Distance (AMND). Given 2 place representations ,, norm distance is defined as . The average mutual norm distance between sets of places and is defined by the following:


where is the number of unique combinations between the places in sets and .
Average Mutual Cosine Similarity (AMCS). Cosine similarity is a commonly used metric to measure the similarity between 2 place representations ,, and is defined as:


We use the average mutual cosine similarity to measure the similarity between representations of two sets of places. Similar to the average mutual norm distance, average mutual cosine similarity between sets of places and is defined by the following:


In both experiments (Sections 4.3.1 and 4.3.2), we show whether the results are statistically significant by comparing the performance metrics to random pairs of places generated by the same model. For example, to assess the quality of place representations of MobLSTM for business places in Kumamoto in Section 4.3.1, we compare the AMND between representations of pairs of business places in Kumamoto generated by MobLSTM, against the AMND between representations of pairs of business places and randomly selected non-business places in Kumamoto generated by MobLSTM. Similarly, for example, to validate the translation performance of MobLSTM-P for business places in Section 4.3.2, we compare the AMND between representations of pairs of business places in Kumamoto and Okayama translated by MobLSTM-P, against the AMND between representations of pairs of business places in Kumamoto and randomly selected non-business places in Okayama translated by MobLSTM-P. Similarity between random pairs are shown in Figures 2 and 3 as gray crosses (vertical line indicate error bars).

4.3 Results

Figure 2: Intra-city validation results of business, shopping, residential and farmland areas in Kumamoto and Okayama with (A) AMND and (B) AMCS metrics. For both models, generated representations of all landuse types showed statistically significant intra-city similarity.

4.3.1 Intra-city Validation of Place Representations

To validate whether the place representations were correctly generated (i.e. representations of places with the same functionality are mapped closely), we measured the AMND and AMCS between places with the same landuse labels (business districts, shopping malls, residential areas, and farmland areas). We refer to this validation scheme as “intra-city validation”. Figure 2

shows the validation results for both cities. Note that for norm distance (A), lower is better, and for cosine similarity (B), higher is better. All error bars (vertical lines) show the standard deviation of the results of 10 iterations. Results show that both models were able to generate accurate representations, and embedded places with same landuse labels closer to eachother than randomly selected pairs of places.

MobLSTM and Joint-MobLSTM had comparable performances for generating place representations, however we see that MobLSTM had slightly better performances (lower norm distances and higher cosine similarity) for many of the landuse types. This result agrees with our intuition, because MobLSTM is able to allocate more dimensions in the parameter space to encode information related to places in their own city, whereas Joint-MobLSTM shares the parameter space across different cities, having less dimensions to encode the representations for each city.

Figure 3: Inter-city translation results from Kumamoto to Okayama of business, shopping, residential and farmland areas with (A) AMND and (B) AMCS metrics. MobLSTM-P is able to translate place representations across Kumamoto and Okayama for business, shopping, and residential areas, but not for farmland areas where little human mobility patterns are observed.
Figure 4: Qualitative analysis and inspection of the translated place representations across cities using MobLSTM-P. The three panels show that for both directions (Kumamoto Okayama and vice-versa), places such as shopping malls, business districts, and public parks were successfully translated so that places with similar functions from different cities were mapped closely together in the common vector space.

4.3.2 Inter-city Translation of Place Representations

To quantitatively validate the performance of translating place representations across cities, we measured the AMND and AMCS between places with same landuse types across different cities (e.g. similarity between representations of places with shopping malls in Kumamoto and representations of places with shopping malls in Okayama). We refer to this validation scheme as “inter-city translation”. Figure 3 shows the translation accuracy of all tested methods. Out of the four models, MobLSTM has had no translation operation, and in all landuse types the AMCS performance is worse than random, which confirms the negative example illustrated in Figure 1 does occur, and that a translation operation is indeed needed to compare place representations from different cities.

The rest of the models compare the performances of different translation methods. For business, shopping, and residential areas, the AMND seems to be lower than random for all methods, which implies that all of these types of places are clustered together in the vector space, away from the farmland areas. The AMCS metric allows us to measure more specific differences in the place representations, by normalizing the vectors by their lengths. AMCS results show that MobLSTM-P is able to translate representations of business, shopping and residential places successfully (statistically significantly). Even though Joint-MobLSTM and MobLSTM-Adv approaches are shown to succeed in language translation tasks, they fail to do so in place translation tasks. The failure of Joint-MobLSTM implies that the model was complex enough to completely distinguish places between Kumamoto and Okayama, and to embed them separately in the common vector space, contrary to our intuition. For Joint-MobLSTM to perform better, further searching for the appropriate model architecture may be effective, however is not cost effective considering the vast model space. MobLSTM-Adv failing to align the two vector spaces indicates that the probability distribution of the place representations of two cities are completely different, in contrary to word vector spaces. Even with MobLSTM-P, representations of farmlands were not successfully translated across cities. This is because we are not using many farmland areas as anchor points in our dictionary for solving the optimization problem (Section 3.3.2), due to the lack of observed human mobility patterns in such areas. Overall, results in Figure 3 confirm that MobLSTM-P is successful in translating representations of places visited by people (business, shopping, and residential) across cities.

4.3.3 Qualitative Inspection of Translated Representations

In addition to the quantitative evaluation, we inspected whether the place representations translated by MobLSTM-P were actually mapped close to similar places in the target city. Figure 4 shows successful cases where the place of the source city is mapped closely with similar locations in the target city. In each of the three panels, the original place in the source city is shown in the left black box (e.g. Shopping Mall “Aeon Mall Kurashiki” of Okayama), and the cosine-similarity values between the representations of all places in the target city and the translated representation of the original place is shown in color on the map. Red and blue colors show high and low similarity, respectively. The color bar is adjusted so that only the places with top 5 percentile cosine similarity are shown in red. For places with high similarity (red places), POIs within each place are annotated on the map.

The left panel shows how a shopping mall in Okayama, when translated to the Kumamoto vector space, becomes mapped close to major shopping malls in Kumamoto, including the central shopping district, Aeon Mall Kumamoto, and several other shopping facilities. The two panels on the right side show how translation of places in the opposite direction (Kumamoto Okayama) also produced intuitive and accurate results. The top right panel shows that the business districts of Kumamoto (the Kami-tori and Shimo-tori area), when translated to the Okayama vector space, were similar to the two major city center districts (Okayama city center and Kurashiki city center), implying that urban functionality can also be translated successfully, reinforcing our results in Figure 3. The bottom right panel shows an instance where even a major public park in Kumamoto (“Suizenji-Ezuko Park”) was successfully translated so that it became mapped close to a major park in Okayama (“Okayamaken Kurashiki Sports Park”). Although public parks were not included in our quantitative evaluation, this result implies that MobLSTM-P can translate more specific places of interest to other cities. Overall, Figure 4 shows promising results that MobLSTM-P successfully maps similar places together onto the common vector space.

Figure 5: Translation performance of shopping malls from Kumamoto to Okayama using MobLSTM-P, using different number of anchor places for translation (x-axis), chosen by 2 different criteria (most frequently visited or random).

4.3.4 Sensitivity to Number of Place Pairs

The MobLSTM-P model requires a synthetic dictionary (a dataset with pairs of places) used to solve the optimization task. In the previous sections, we used as the number of anchor places (i.e. pairs of places used for the optimization task). Here we show a sensitivity analysis of this parameter, by looking at the average mutual cosine similarity of shopping places in Kumamoto and Okayama using MobLSTM-P with different values of . We observe from Figure 5 that the performance initially increases as we increase the number of pairs. However we see a plateau in performance after , and a decrease after , implying that choosing too many places as anchors adds too much irrelevant information for choosing the optimal rotation. Even though our method beats random pairing (light green color) for all values, this result indicates that selecting the appropriate number of anchor places is important for the performance of our method. Further investigation using data from more cities is needed to determine whether there is a universal rule in determining the appropriate parameter value for .

5 Discussions

In this paper, we proposed and tested methods to translate place representations across cities. Experimental results using real world mobility data from two cities in Japan clarified that we are indeed able to translate representations of places across cities accurately using MobLSTM-P, which finds the best rotational alignment between vector spaces using anchor places based on visit frequencies through optimization. We clarified both quantitatively and qualitatively that places from different cities with similar landuse types became mapped closely in the common vector space after translation. Moreover, although the task of translating place representations across cities is analogous to word translation across languages, we observed several differences in the problem setting through failures of methods that were successful in the language translation domain, namely the joint learning (Joint-MobLSTM) and adversarial learning (MobLSTM-Adv) approaches. In addition to evaluating the translation performances, we showcased a case study of an important urban challenge that may be better approached using our inter-city translation method, which was to use the representations of evacuation shelter locations from a disaster in the past to predict evacuation shelters in a future disaster in another city.

Now, we discuss future research opportunities that this study enables. The first direction of research is on improving the accuracy of the translation task. In this study, we tested several methods that extended the state of the art methods for unsupervised machine language translation developed in the natural language processing field. However, we believe that we are able to improve the accuracy by further integrating characteristics specific to geographical locations, that are different from words and sentences. For example, words have stronger interchangeability characteristics, since words can often be interchanged with very little or even no cost at all (I have a cat I have a dog). However, that is much less likely in sequence of places and it is much rare to have two or more places with exact interchangeability in the routines for human beings. Integrating insights from the human behavioral sciences into building the representation learning and translation model would be an interesting topic for future studies.

The second direction of research is to increase the diversity of cities for testing. Although the finding that we are able to translate place representations across different cities (Kumamoto and Okayama) was insightful and promising, we are motivated in further investigating whether this method works between a more diverse set of cities, such as Tokyo, Japan and Indianapolis, USA, where various aspects (e.g. social norms, peoples’ mobility patterns, city structures) are more different than between Kumamoto and Okayama. Should the method fail in such diverse pairs of cities, developing new models that consider exogenous contexts via fusion with other data sources would be an important and interesting problem. We hope to utilize a larger mobility dataset to investigate this topic in future studies.

We would also like to look into potential problems where we can apply this technique. Selection of appropriate locations to open new stores has been a popular problem in urban planning [xu2016]. Testing whether translating successful/unsuccessful locations across cities could predict success/failure of new stores, is of future research interest.

6 Related Works

6.1 Place Representation Learning

Learning the representations of places have been a popular research topic in the field of urban computing [zheng2014urban], often as a subproblem for larger tasks such as POI recommendation [chang2018content] and site selection problems [xu2016]. Recent developments in the natural language processing field on representation learning, such as word2vec [mikolov2013distributed], has inspired many studies on place representation learning. Models such as SkipGram and POI2vec have applied ideas similar to word2vec on social media check-in data, where sequences of POIs are treated as sentences in the word2vec model [liu2016exploring, feng2017poi2vec]. CAPE also used Instagram check-in data, and also uses text data for embedding the POIs [chang2018content]. Geo-Teaser uses the users’ check-in data and also considers the geographical proximity of POIs as well when generating representations [zhao2017geo]. Place2vec does not use the user check-in sequences, but instead uses the physical proximity between POIs and the number of visit counts to increase training data so that such pairs of POIs would have similar embeddings [yan2017itdl].

With the availability of large scale mobility data such as taxi GPS data and mobile phone data, more recent studies such as DeepMove have extended the aforementioned methods to spatio-temporal richer datasets [zhou2018deepmove]. Wang et al. [wang2017region] have extended the idea to learn the representations from mobility flow modeled as flow graphs, to include multi-hop transitions in their embeddings. Using the New York Taxi GPS dataset, ZE-Mob proposes a origin-destination coupled embedding model, where the assumptions are that origin and destinations of the same trip should have similar representations, and that trips taken in similar timeframes are similar [yao2018representing]. Since the main focus of this paper is to show that we can translate the produced representations across cities, we do not replicate and compare the performances of all of the models listed here. Instead, we apply a method similar to ZE-Mob in this study to produce place representations from mobility data.

6.2 Unsupervised Machine Translation

Machine translation, especially in the unsupervised setting, has become a popular topic in the representation learning literature, due to the difficulty of collecting large scale cross-lingual training data [artetxe2017unsupervised, artetxe2018unsupervised]. This unsupervised setting applies to our problem setting, since we are not given any training data in the form of location pairs from different cities to learn what places should be closely embedded. Studies take different approaches to unsupervised word translation tasks; Zhang et al. [zhang2017adversarial] uses an adversarial training approach, Conneau et al. [conneau2017word]

learns a linear transformation matrix to map words in one language to another, Artetxe et al.


applies better initial estimates using probability density functions of distances to other word embeddings, and Wada et al.

[wada2018unsupervised] apply a shared LSTM model to jointly embed two languages. Cross-comparative studies of these methods have shown that the LSTM models works best under low resources (training data). Hamilton et al. [hamilton2016diachronic] apply a similar idea with Conneau et al. [conneau2017word] to align word embeddings computed from corpus from different years, to understand the transition of the semantics of words over the years.

In the context of urban mobility, Pang et al. [pang2018replicating]

applied a reinforcement learning approach in replicating the usual urban dynamics. The study closest to ours is

CityCoupling, which attempts to map human mobility trajectories from one city to another [fancoupling]. Although the framework replicates the mobility patterns on New Years Day in another city well, the model does not focus on translating the functions of different locations. Moreover, the simulation results of the human mobility trajectories if the Great East Japan Earthquake happened in Osaka, lacks validation. In this paper, we apply and extend the methods developed for unsupervised machine translation to translate place representations across different cities and test their performances using real world data.

7 Conclusion

Despite the popularity in understanding urban functions through place representations generated from mobility patterns, no works so far have attempted to translate the representations across different cities. To bridge this gap, we tested methods inspired by machine language translation to translate place representations across cities. Experimental results show that our methods were able to translate place representations, so that functionally similar places from different cities were mapped closely together in the common vector space. The experimental results as well as the case study on disaster shelter prediction shows that this avenue of research may offer a novel and effective approach on solving various urban challenges via knowledge sharing across different cities.