Log In Sign Up

Automated Feature-Topic Pairing: Aligning Semantic and Embedding Spaces in Spatial Representation Learning

Automated characterization of spatial data is a kind of critical geographical intelligence. As an emerging technique for characterization, Spatial Representation Learning (SRL) uses deep neural networks (DNNs) to learn non-linear embedded features of spatial data for characterization. However, SRL extracts features by internal layers of DNNs, and thus suffers from lacking semantic labels. Texts of spatial entities, on the other hand, provide semantic understanding of latent feature labels, but is insensible to deep SRL models. How can we teach a SRL model to discover appropriate topic labels in texts and pair learned features with the labels? This paper formulates a new problem: feature-topic pairing, and proposes a novel Particle Swarm Optimization (PSO) based deep learning framework. Specifically, we formulate the feature-topic pairing problem into an automated alignment task between 1) a latent embedding feature space and 2) a textual semantic topic space. We decompose the alignment of the two spaces into: 1) point-wise alignment, denoting the correlation between a topic distribution and an embedding vector; 2) pair-wise alignment, denoting the consistency between a feature-feature similarity matrix and a topic-topic similarity matrix. We design a PSO based solver to simultaneously select an optimal set of topics and learn corresponding features based on the selected topics. We develop a closed loop algorithm to iterate between 1) minimizing losses of representation reconstruction and feature-topic alignment and 2) searching the best topics. Finally, we present extensive experiments to demonstrate the enhanced performance of our method.


page 1

page 2

page 3

page 4


Latent Semantic Structure in Malicious Programs

Latent Semantic Analysis is a method of matrix decomposition used for di...

Learning Semantic Textual Similarity via Topic-informed Discrete Latent Variables

Recently, discrete latent variable models have received a surge of inter...

Coordinated Topic Modeling

We propose a new problem called coordinated topic modeling that imitates...

Deep Bayes Factor Scoring for Authorship Verification

The PAN 2020 authorship verification (AV) challenge focuses on a cross-t...

Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction

Representation (feature) space is an environment where data points are v...

1. Introduction

Spatial representation learning (SRL) refers to exploiting representation learning techniques to learn features of spatial network data, which has been successfully applied in many real-world scenarios, such as transportation networks, power networks, social networks, water supply networks (Zhang et al., 2018)

. In reality, many practical applications need to understand not just which features are effective, but also what these effective features stand for. This issue relates to two tasks: 1) deep representation learning; 2) label generation and matching for latent embedded features. Although there has been a rich body of work in SRL, including node embedding, autoencoder, random walk, adversarial learning, generative learning based methods with spatial data 

(Wang and Li, 2017; Wang et al., 2018b, a, 2020a), research in unifying the two tasks is still in its early stage.

Figure 1. The motivation of the feature-topic pairing problem: bridging the gap between feature embedding space and topic semantic space in representation learning.

In response, we formulate the problem as a task of feature-topic pairing (Figure 1), which is to align a latent embedding feature space, consisting of multiple latent features, and a textual semantic topic space, consisting of multiple topic labels during SRL. The basic idea is to teach a machine to extract topic labels from texts, and then pair the labels with learned features. To that end, we propose to develop a novel deep learning framework to unify feature learning, topic selection, feature-topic matching. There are three unique challenges in addressing the problem: (1) Label Generation Challenge, in which a textual semantic topic space is difficult to construct due to the unstructured spatial texts; (2) Measurement Challenge, in which a promising measurement is highly desired to evaluate the alignment or quantify the matching score between the topic label space and the embedding feature space; (3) Optimization Challenge, in which a deep optimization framework is needed for to jointly and simultaneously unify the three tasks of feature learning, topic label selection, and feature-topic pairing.

Figure 2. An overview of AutoFTP. In the framework, we first construct a topic semantic space based on the texts of spatial entities. Then, we initialize a embedding feature space based on the geographical structures of spatial entities. Later, we employ a PSO-based framework to conduct feature-topic pairing through jointly optimizing representation learning, point-wise alignment, pair-wise alignment, and downstream task over learning iterations.

To solve the three challenges, we develop a new PSO-based framework (named AutoFTP) that enclose the optimizations of feature learning, topic selection, and feature-topic pairing in a loop. Specifically, our contributions are: (1) formulating the feature-topic pairing problem for relieving the scarcity of semantic labels; (2) proposing a three-step method for generating candidate topic labels; (3) deriving a feature-topic alignment measurement by point-wise alignment between an embedding feature vector and a categorical topic distribution, and pair-wise alignment for the consistency of feature-feature similarity matrix and topic-topic similarity matrix; (4) developing a Particle Swarm Optimization (PSO)-based algorithm for unified optimization.

2. Proposed Method

2.1. The Feature-Topic Pairing Problem

The feature-topic pairing problem aims to pair the latent features extracted by representation learning, with the explicit topics of texts of a spatial entity. Formally, given a set of

spatial entities, the -th entity is described by multiple graphs (e.g., a POI-POI distance graph and a POI mobility connectivity , defined in Section 3.3) and a topic distribution extracted from textual descriptions . Let be the embedding vector of the n-th entity. The objective is to optimize a function that measures representation loss and feature-topic alignment:


where are the embeddings of all spatial entities, is the number features of an embedding vector.

2.2. Preprocessing

Textual Topic Extraction. We employ a pre-trained deep word embedding model (He, 2014)

to generate topics. Specifically, we first collect the text descriptions of all entities. Besides, we extract keywords from texts using the TextRank algorithm  

(Mihalcea and Tarau, 2004) and leverage a pre-trained language model (He, 2014)

to learn the corresponding word embedding of each keyword. Moreover, we exploit a Gaussian Mixture Model (GMM) to cluster the keyword embeddings into

topics. The clustering model provides a topic label for each keyword.

Embedding of Spatial Entities. We construct a graph to capture the spatial autocorrelation between spatial entities. Specifically, we describe a spatial entity in terms of its POIs, by building two graphs. (i) POI-POI distance graph: denoted by , where POI categories are nodes and the average distances between POI categories are edge weights. (ii) POI-POI mobility graph: denoted by , where nodes are POI categories, and edge weights are human mobility connectivity, which is extracted by the method in (Wang et al., 2018a). We then apply Graph Auto Encoder (GAE) (Kipf and Welling, 2016) as the spatial representation learner to learn spatial embeddings over these two constructed graphs respectively. Finally, we aggregate the embeddings of these two graphs by avaraging, so as to construct the unified spatial embedding of the entity, denoted by .

2.3. PSO Based Feature-Topic Pairing

2.4.1 Measuring the Alignment of Embedding and Semantic Spaces. To pair features with topics, we conduct space alignment from the point-wise and pair-wise perspectives, with considering the alignment of the coordinate system and information contents respectively. To be convenient, we take the -th entity as an example to explain the calculation process.

1) Point-wise Alignment Loss: . Intuitively, the embedding feature of the spatial entity and corresponding topic should reach a consensus on describing an spatial entity, thus correlations are expected to be maximized between them. Therefore, we first select values from the topic vector as the vector , which contains the most representative semantics in the semantic space. Then, we maximize the correlation between and the spatial embedding , which is equal to minimize the negative correlation between the two vectors. The formula of the minimizing process as follows:


where cov(.) denotes the covariance calculation;

denotes the standard deviation.

2) Pair-wise Alignment Loss: . On the other hand, the embedding feature and the corresponding topic should show consistency on the pair-wise similarity in each space to reflect the pair-wise alignment. Therefore, we minimize the difference between the pair-wise similarity between these two spaces. Specifically, we first construct the topic-topic similarity matrix and the feature-feature similarity matrix . Specifically, for , we calculate the similarity between any two topics. For , we calculate the similarity between two features of spatial embeddings. We keep the pair-wise consistency between and by minimizing the Frobenius norm, as follows:


2.4.2 Supervised PSO For Automatic Topic Selection.

To select best K topics for feature-topic alignment, we propose to formulate the joint task of feature learning, topic selection, topic and feature pairing into a PSO problem. Specifically, we first randomly initialize a number of particles in PSO, where a particle is a binary topic mask (i.e., the mask value of 1 indicates “select” and the mask value of 0 indicates “deselect”). In other words, a set of particles select a subset of topics. A multi-objective deep learning model, whose objective function includes the losses of graph reconstruction, semantic alignment, and the regression estimator in the downstream task, is trained to learn spatial representations, using each selected topic subset. As an application, we use the embedding of spatial entities (residential communities) to predict their real estate prices, and the loss of the regression model



where is the golden standard real estate price and is the predicted price. Next, we calculate the fitness of each particle according to the total loss of the deep model. The fitness can be calculated by:


Then, we utilize the fitness to inform all particles how far they are from the best solution. Next, each particle moves forward to the solution based on not only its current status but also all particles’ movement. After the fitness value of PSO converges, PSO identifies the best topic subset. Finally, the semantically-rich embeddings of spatial entities, given by: .

3. Experimental Results

3.1. Evaluation Task

In this work, we apply the proposed AutoFTP to the price prediction of real-estate as the evaluation task. Specifically, we first apply AutoFTP to learn a series of representations of spatial entities based on their geographical structural information and related text descriptions. Then, we build up a deep neural network (DNN) model for predicting average real estate price of each spatial entity according to its corresponding representation. We use RMSE, MAE, MAPE and MLSE as the evaluation metric.

3.2. Data Description

Table 2 shows the statistics of five data sources used in the experiments. Specifically, the taxi traces data describes the GPS trajectory of taxis in Beijing in three months; the residential regions, texts, and real estate price data sources are crawled from; and the POIs information are extracted from

RMSE Outperform MAE Outperform MAPE Outperform MSLE Outperform
AutoFTP 18.646 - 16.192 - 58.851 - 0.2267 -
AttentionWalk 21.418 19.712 68.590 0.2907
ProNE 21.830 19.929 69.188 0.2949
GatNE 21.229 19.288 67.043 0.2854
GAE 21.338 19.676 68.579 0.2894
DeepWalk 23.561 21.987 76.038 0.3321
Node2Vec 22.688 21.084 73.135 0.3152
Struc2Vec 21.589 19.937 69.423 0.2942
AutoFTP 21.965 20.283 70.991 0.2928
AutoFTP 20.509 18.921 66.477 0.2681
AutoFTP 21.014 19.413 67.920 0.2773
AutoFTP 20.211 18.676 65.685 0.2636
Table 1. Overall Performance with respect to RMSE, MAE, MAPE and MSLE. (The smaller value is, the better performance is)
Data Sources Properties Statistics
Taxi Traces Number of taxis 13,597
Time period Apr. - Aug. 2012
Residential Regions
Number of residential regions
Time period of transactions 04/2011 - 09/2012
POIs Number of POIs 328668
Number of POI categories 20
Texts Number of textual descriptions 2,990
Time Period 04/2011 - 09/2012
Real Estate Prices Number of real estate prices 41,753
Time Period 12/2011 - 06/2012
Table 2. Statistics of the Experimental Data

3.2.1. Baseline Algorithms.

We compared our proposed method with seven baseline algorithms: AttentionWalk  (Abu-El-Haija et al., 2018), ProNE  (Zhang et al., 2019), GatNE (Cen et al., 2019), GAE (Kipf and Welling, 2016), DeepWalk (Perozzi et al., 2014), Node2Vec (Grover and Leskovec, 2016), and

Struc2Vec (Ribeiro et al., 2017). Besides, regarding the are four losses in AutoFTP: reconstruction loss , point-wise alignment loss , pair-wise alignment loss , and regression loss ., we also derive four variants: (ii) , which keeps and of AutoFTP; (iii) , which keeps and of AutoFTP; (iv) , which keeps , , and of AutoFTP.

3.3. Overall Performance

Table 1 shows the comparison of all the 11 models. As can be seen, AutoFTP, in overall, outperforms the baseline algorithms in terms of RMSE, MAE, MAPE and MSLE. A possible reason for this observation is that compared with other baseline algorithms, AutoFTP not just captures geographical structural information but also preserves rich semantics of spatial entity. Besides, the regression estimator (the downstream task) of AutoFTP provides a clear learning direction (accuracy) for spatial representation learning. Thus, in the downstream predictive task, the spatial embedding features learned by AutoFTP beats all baselines.

4. Related Work

Graph Representation Learning with Latent Semantics. Graph representation learning refers to techniques that preserve the structural information of a graph into a low-dimensional vector  (Wang et al., 2020b, 2016). However, owing to traditional graph representation learning models are implemented by deep neural networks, the learned embeddings lack interpretability. Recently, to overcome this limitation, researchers leveraged the texts related to graphs to learn semantically rich representations (Mai et al., 2018; Xiao et al., 2017).

Topic Models in Spatio-temporal Domain. Topic models aim to automatically cluster words and expressions patterns for characterizing documents  (Xun et al., 2017; Lee and Kang, 2018). Recently, to understand the hidden semantics of spatial entities, many researchers applied topic models in the spatio-temporal data mining domain  (Huang et al., 2020, 2019). Thus, in this paper, we employ a pre-trained language model to get the embeddings of keywords and utilize Gaussian Mixture Model to extract topic distribution based on the embeddings.

5. Conclusion

We presented a novel spatial representation learning (SRL) framework, namely AutoFTP. The spatial embeddings produced by traditional SRL models lack semantic meaning. To overcome this limitation, we formulated the feature-topic paring problem. We proposed a novel deep learning framework to unify representation learning, topic label selection, and feature-topic pairing through a PSO-based optimization algorithm. Extensive experiments demonstrated the effectiveness of AutoFTP by comparing it with other baseline models. For future work, we plan to extend our approach from geospatial networks to other applications that consist of graphs and texts, such as social media and software code safety.


This research was partially supported by the National Science Foundation (NSF) via the grant numbers: 1755946, 2040950, 2006889, 2045567, 2141095.


  • S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi (2018) Watch your step: learning node embeddings via graph attention. Advances in Neural Information Processing Systems 31, pp. 9180–9190. Cited by: §3.2.1.
  • Y. Cen, X. Zou, J. Zhang, H. Yang, J. Zhou, and J. Tang (2019) Representation learning for attributed multiplex heterogeneous network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1358–1368. Cited by: §3.2.1.
  • A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §3.2.1.
  • H. He (2014) HanLP: Han Language Processing External Links: Link Cited by: §2.2.
  • L. Huang, Y. Wen, W. Guo, X. Zhu, C. Zhou, F. Zhang, and M. Zhu (2020) Mobility pattern analysis of ship trajectories based on semantic transformation and topic model. Ocean Engineering 201, pp. 107092. Cited by: §4.
  • Q. Huang, C. Huang, J. Huang, and H. Fujita (2019) Adaptive resource prefetching with spatial–temporal and topic information for educational cloud storage systems. Knowledge-Based Systems 181, pp. 104791. Cited by: §4.
  • T. N. Kipf and M. Welling (2016) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. Cited by: §2.2, §3.2.1.
  • H. Lee and P. Kang (2018) Identifying core topics in technology and innovation management studies: a topic model approach. The Journal of Technology Transfer 43 (5), pp. 1291–1317. Cited by: §4.
  • G. Mai, K. Janowicz, and B. Yan (2018)

    Combining text embedding and knowledge graph embedding techniques for academic search engines.

    In Semdeep/NLIWoD@ ISWC, pp. 77–88. Cited by: §4.
  • R. Mihalcea and P. Tarau (2004) Textrank: bringing order into text. In

    Proceedings of the 2004 conference on empirical methods in natural language processing

    Cited by: §2.2.
  • B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §3.2.1.
  • L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo (2017) Struc2vec: learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 385–394. Cited by: §3.2.1.
  • D. Wang, P. Cui, and W. Zhu (2016) Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1225–1234. Cited by: §4.
  • D. Wang, P. Wang, J. Zhou, L. Sun, B. Du, and Y. Fu (2020a) Defending water treatment networks: exploiting spatio-temporal effects for cyber attack detection. In 2020 IEEE International Conference on Data Mining (ICDM), pp. 32–41. Cited by: §1.
  • H. Wang and Z. Li (2017) Region representation learning via mobility flow. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 237–246. Cited by: §1.
  • P. Wang, Y. Fu, J. Zhang, X. Li, and D. Lin (2018a) Learning urban community structures: a collective embedding perspective with periodic spatial-temporal mobility graphs. ACM Transactions on Intelligent Systems and Technology (TIST) 9 (6), pp. 1–28. Cited by: §1, §2.2.
  • P. Wang, Y. Fu, J. Zhang, P. Wang, Y. Zheng, and C. Aggarwal (2018b) You are how you drive: peer and temporal-aware representation learning for driving behavior analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2457–2466. Cited by: §1.
  • P. Wang, Y. Fu, Y. Zhou, K. Liu, X. Li, and K. A. Hua (2020b) Exploiting mutual information for substructure-aware graph representation learning.. In IJCAI, pp. 3415–3421. Cited by: §4.
  • H. Xiao, M. Huang, L. Meng, and X. Zhu (2017) SSP: semantic space projection for knowledge graph embedding with text descriptions. In

    Thirty-First AAAI Conference on Artificial Intelligence

    Cited by: §4.
  • G. Xun, Y. Li, W. X. Zhao, J. Gao, and A. Zhang (2017) A correlated topic model using word embeddings.. In IJCAI, pp. 4207–4213. Cited by: §4.
  • D. Zhang, J. Yin, X. Zhu, and C. Zhang (2018) Network representation learning: a survey. IEEE transactions on Big Data 6 (1), pp. 3–28. Cited by: §1.
  • J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding (2019) ProNE: fast and scalable network representation learning.. In IJCAI, Vol. 19, pp. 4278–4284. Cited by: §3.2.1.