Spatial representation learning (SRL) refers to exploiting representation learning techniques to learn features of spatial network data, which has been successfully applied in many real-world scenarios, such as transportation networks, power networks, social networks, water supply networks (Zhang et al., 2018)
. In reality, many practical applications need to understand not just which features are effective, but also what these effective features stand for. This issue relates to two tasks: 1) deep representation learning; 2) label generation and matching for latent embedded features. Although there has been a rich body of work in SRL, including node embedding, autoencoder, random walk, adversarial learning, generative learning based methods with spatial data(Wang and Li, 2017; Wang et al., 2018b, a, 2020a), research in unifying the two tasks is still in its early stage.
In response, we formulate the problem as a task of feature-topic pairing (Figure 1), which is to align a latent embedding feature space, consisting of multiple latent features, and a textual semantic topic space, consisting of multiple topic labels during SRL. The basic idea is to teach a machine to extract topic labels from texts, and then pair the labels with learned features. To that end, we propose to develop a novel deep learning framework to unify feature learning, topic selection, feature-topic matching. There are three unique challenges in addressing the problem: (1) Label Generation Challenge, in which a textual semantic topic space is difficult to construct due to the unstructured spatial texts; (2) Measurement Challenge, in which a promising measurement is highly desired to evaluate the alignment or quantify the matching score between the topic label space and the embedding feature space; (3) Optimization Challenge, in which a deep optimization framework is needed for to jointly and simultaneously unify the three tasks of feature learning, topic label selection, and feature-topic pairing.
To solve the three challenges, we develop a new PSO-based framework (named AutoFTP) that enclose the optimizations of feature learning, topic selection, and feature-topic pairing in a loop. Specifically, our contributions are: (1) formulating the feature-topic pairing problem for relieving the scarcity of semantic labels; (2) proposing a three-step method for generating candidate topic labels; (3) deriving a feature-topic alignment measurement by point-wise alignment between an embedding feature vector and a categorical topic distribution, and pair-wise alignment for the consistency of feature-feature similarity matrix and topic-topic similarity matrix; (4) developing a Particle Swarm Optimization (PSO)-based algorithm for unified optimization.
2. Proposed Method
2.1. The Feature-Topic Pairing Problem
The feature-topic pairing problem aims to pair the latent features extracted by representation learning, with the explicit topics of texts of a spatial entity. Formally, given a set ofspatial entities, the -th entity is described by multiple graphs (e.g., a POI-POI distance graph and a POI mobility connectivity , defined in Section 3.3) and a topic distribution extracted from textual descriptions . Let be the embedding vector of the n-th entity. The objective is to optimize a function that measures representation loss and feature-topic alignment:
where are the embeddings of all spatial entities, is the number features of an embedding vector.
Textual Topic Extraction. We employ a pre-trained deep word embedding model (He, 2014)
to generate topics. Specifically, we first collect the text descriptions of all entities. Besides, we extract keywords from texts using the TextRank algorithm(Mihalcea and Tarau, 2004) and leverage a pre-trained language model (He, 2014)
to learn the corresponding word embedding of each keyword. Moreover, we exploit a Gaussian Mixture Model (GMM) to cluster the keyword embeddings intotopics. The clustering model provides a topic label for each keyword.
Embedding of Spatial Entities. We construct a graph to capture the spatial autocorrelation between spatial entities. Specifically, we describe a spatial entity in terms of its POIs, by building two graphs. (i) POI-POI distance graph: denoted by , where POI categories are nodes and the average distances between POI categories are edge weights. (ii) POI-POI mobility graph: denoted by , where nodes are POI categories, and edge weights are human mobility connectivity, which is extracted by the method in (Wang et al., 2018a). We then apply Graph Auto Encoder (GAE) (Kipf and Welling, 2016) as the spatial representation learner to learn spatial embeddings over these two constructed graphs respectively. Finally, we aggregate the embeddings of these two graphs by avaraging, so as to construct the unified spatial embedding of the entity, denoted by .
2.3. PSO Based Feature-Topic Pairing
2.4.1 Measuring the Alignment of Embedding and Semantic Spaces. To pair features with topics, we conduct space alignment from the point-wise and pair-wise perspectives, with considering the alignment of the coordinate system and information contents respectively. To be convenient, we take the -th entity as an example to explain the calculation process.
1) Point-wise Alignment Loss: . Intuitively, the embedding feature of the spatial entity and corresponding topic should reach a consensus on describing an spatial entity, thus correlations are expected to be maximized between them. Therefore, we first select values from the topic vector as the vector , which contains the most representative semantics in the semantic space. Then, we maximize the correlation between and the spatial embedding , which is equal to minimize the negative correlation between the two vectors. The formula of the minimizing process as follows:
where cov(.) denotes the covariance calculation;
denotes the standard deviation.
2) Pair-wise Alignment Loss: . On the other hand, the embedding feature and the corresponding topic should show consistency on the pair-wise similarity in each space to reflect the pair-wise alignment. Therefore, we minimize the difference between the pair-wise similarity between these two spaces. Specifically, we first construct the topic-topic similarity matrix and the feature-feature similarity matrix . Specifically, for , we calculate the similarity between any two topics. For , we calculate the similarity between two features of spatial embeddings. We keep the pair-wise consistency between and by minimizing the Frobenius norm, as follows:
2.4.2 Supervised PSO For Automatic Topic Selection.
To select best K topics for feature-topic alignment, we propose to formulate the joint task of feature learning, topic selection, topic and feature pairing into a PSO problem. Specifically, we first randomly initialize a number of particles in PSO, where a particle is a binary topic mask (i.e., the mask value of 1 indicates “select” and the mask value of 0 indicates “deselect”). In other words, a set of particles select a subset of topics. A multi-objective deep learning model, whose objective function includes the losses of graph reconstruction, semantic alignment, and the regression estimator in the downstream task, is trained to learn spatial representations, using each selected topic subset. As an application, we use the embedding of spatial entities (residential communities) to predict their real estate prices, and the loss of the regression modelis:
where is the golden standard real estate price and is the predicted price. Next, we calculate the fitness of each particle according to the total loss of the deep model. The fitness can be calculated by:
Then, we utilize the fitness to inform all particles how far they are from the best solution. Next, each particle moves forward to the solution based on not only its current status but also all particles’ movement. After the fitness value of PSO converges, PSO identifies the best topic subset. Finally, the semantically-rich embeddings of spatial entities, given by: .
3. Experimental Results
3.1. Evaluation Task
In this work, we apply the proposed AutoFTP to the price prediction of real-estate as the evaluation task. Specifically, we first apply AutoFTP to learn a series of representations of spatial entities based on their geographical structural information and related text descriptions. Then, we build up a deep neural network (DNN) model for predicting average real estate price of each spatial entity according to its corresponding representation. We use RMSE, MAE, MAPE and MLSE as the evaluation metric.
3.2. Data Description
Table 2 shows the statistics of five data sources used in the experiments. Specifically, the taxi traces data describes the GPS trajectory of taxis in Beijing in three months; the residential regions, texts, and real estate price data sources are crawled from www.fang.com; and the POIs information are extracted from www.dianping.com.
|Taxi Traces||Number of taxis||13,597|
|Time period||Apr. - Aug. 2012|
|Time period of transactions||04/2011 - 09/2012|
|POIs||Number of POIs||328668|
|Number of POI categories||20|
|Texts||Number of textual descriptions||2,990|
|Time Period||04/2011 - 09/2012|
|Real Estate Prices||Number of real estate prices||41,753|
|Time Period||12/2011 - 06/2012|
3.2.1. Baseline Algorithms.
We compared our proposed method with seven baseline algorithms: AttentionWalk (Abu-El-Haija et al., 2018), ProNE (Zhang et al., 2019), GatNE (Cen et al., 2019), GAE (Kipf and Welling, 2016), DeepWalk (Perozzi et al., 2014), Node2Vec (Grover and Leskovec, 2016), and
Struc2Vec (Ribeiro et al., 2017). Besides, regarding the are four losses in AutoFTP: reconstruction loss , point-wise alignment loss , pair-wise alignment loss , and regression loss ., we also derive four variants: (ii) , which keeps and of AutoFTP; (iii) , which keeps and of AutoFTP; (iv) , which keeps , , and of AutoFTP.
3.3. Overall Performance
Table 1 shows the comparison of all the 11 models. As can be seen, AutoFTP, in overall, outperforms the baseline algorithms in terms of RMSE, MAE, MAPE and MSLE. A possible reason for this observation is that compared with other baseline algorithms, AutoFTP not just captures geographical structural information but also preserves rich semantics of spatial entity. Besides, the regression estimator (the downstream task) of AutoFTP provides a clear learning direction (accuracy) for spatial representation learning. Thus, in the downstream predictive task, the spatial embedding features learned by AutoFTP beats all baselines.
4. Related Work
Graph Representation Learning with Latent Semantics. Graph representation learning refers to techniques that preserve the structural information of a graph into a low-dimensional vector (Wang et al., 2020b, 2016). However, owing to traditional graph representation learning models are implemented by deep neural networks, the learned embeddings lack interpretability. Recently, to overcome this limitation, researchers leveraged the texts related to graphs to learn semantically rich representations (Mai et al., 2018; Xiao et al., 2017).
Topic Models in Spatio-temporal Domain. Topic models aim to automatically cluster words and expressions patterns for characterizing documents (Xun et al., 2017; Lee and Kang, 2018). Recently, to understand the hidden semantics of spatial entities, many researchers applied topic models in the spatio-temporal data mining domain (Huang et al., 2020, 2019). Thus, in this paper, we employ a pre-trained language model to get the embeddings of keywords and utilize Gaussian Mixture Model to extract topic distribution based on the embeddings.
We presented a novel spatial representation learning (SRL) framework, namely AutoFTP. The spatial embeddings produced by traditional SRL models lack semantic meaning. To overcome this limitation, we formulated the feature-topic paring problem. We proposed a novel deep learning framework to unify representation learning, topic label selection, and feature-topic pairing through a PSO-based optimization algorithm. Extensive experiments demonstrated the effectiveness of AutoFTP by comparing it with other baseline models. For future work, we plan to extend our approach from geospatial networks to other applications that consist of graphs and texts, such as social media and software code safety.
This research was partially supported by the National Science Foundation (NSF) via the grant numbers: 1755946, 2040950, 2006889, 2045567, 2141095.
- Watch your step: learning node embeddings via graph attention. Advances in Neural Information Processing Systems 31, pp. 9180–9190. Cited by: §3.2.1.
- Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1358–1368. Cited by: §3.2.1.
- Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §3.2.1.
- HanLP: Han Language Processing External Links: Cited by: §2.2.
- Mobility pattern analysis of ship trajectories based on semantic transformation and topic model. Ocean Engineering 201, pp. 107092. Cited by: §4.
- Adaptive resource prefetching with spatial–temporal and topic information for educational cloud storage systems. Knowledge-Based Systems 181, pp. 104791. Cited by: §4.
- Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §2.2, §3.2.1.
- Identifying core topics in technology and innovation management studies: a topic model approach. The Journal of Technology Transfer 43 (5), pp. 1291–1317. Cited by: §4.
Combining text embedding and knowledge graph embedding techniques for academic search engines.. In Semdeep/NLIWoD@ ISWC, pp. 77–88. Cited by: §4.
Textrank: bringing order into text.
Proceedings of the 2004 conference on empirical methods in natural language processing, Cited by: §2.2.
- Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §3.2.1.
- Struc2vec: learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 385–394. Cited by: §3.2.1.
- Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. Cited by: §4.
- Defending water treatment networks: exploiting spatio-temporal effects for cyber attack detection. In 2020 IEEE International Conference on Data Mining (ICDM), pp. 32–41. Cited by: §1.
- Region representation learning via mobility flow. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 237–246. Cited by: §1.
- Learning urban community structures: a collective embedding perspective with periodic spatial-temporal mobility graphs. ACM Transactions on Intelligent Systems and Technology (TIST) 9 (6), pp. 1–28. Cited by: §1, §2.2.
- You are how you drive: peer and temporal-aware representation learning for driving behavior analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2457–2466. Cited by: §1.
- Exploiting mutual information for substructure-aware graph representation learning.. In IJCAI, pp. 3415–3421. Cited by: §4.
SSP: semantic space projection for knowledge graph embedding with text descriptions.
Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §4.
- A correlated topic model using word embeddings.. In IJCAI, pp. 4207–4213. Cited by: §4.
- Network representation learning: a survey. IEEE transactions on Big Data 6 (1), pp. 3–28. Cited by: §1.
- ProNE: fast and scalable network representation learning.. In IJCAI, Vol. 19, pp. 4278–4284. Cited by: §3.2.1.