Representation Learning for Spatial Graphs

12/17/2018 ∙ by Zheng Wang, et al. ∙ Nanyang Technological University Baidu, Inc. 0

Recently, the topic of graph representation learning has received plenty of attention. Existing approaches usually focus on structural properties only and thus they are not sufficient for those spatial graphs where the nodes are associated with some spatial information. In this paper, we present the first deep learning approach called s2vec for learning spatial graph representations, which is based on denoising autoencoders framework (DAF). We evaluate the learned representations on real datasets and the results verified the effectiveness of s2vec when used for spatial clustering.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I introduction

Spatial graphs are graphs where each node is associated with a location [1]. One well-known instance of spatial graph is the location-based social networks (LBSN) [2, 3, 4], where the users in social networks, which correspond to nodes, are associated with some location information, e.g., online ”check-ins” in Brightkite, geo-tagged tweets in Twitter, geo-tagged photo albums in Foursquare. Some other instances of spatial graph are commonly used in ecology and evolution [1].

We consider the problem of spatial subgraph embedding based on deep representation learning. Once the embedding vectors of the graphs are computed, they could be used for a variety of downstream graph analysis tasks, including spatial clustering 


, outlier detection 

[4] and spatial classification. For instance, in social network analysis, we often gained subgraph-structured communities by performing community search or detection algorithms. Clustering these communities are very beneficial for some practical applications such as social marketing, as studied in [5], people with close social relationships tend to purchase in places that are also physically close. Advertisers can target limited resources to the communities with similar spatial and structural characteristics to boost sales figures and achieve maximize revenue.

Existing graph representation learning methods can be categorized into node representation [6, 7, 8, 9] and subgraph representations [10, 11, 12, 13]. However, almost all these prior works focus on structural analysis and do not consider the spatial features. Some alternative approaches for spatial graph similarity search include graph kernels [14] and graph matching [15], which, however cannot scale-up for large graphs due to their high computational cost. Additionally, these approaches do not consider the global structure of the graphs and are sensitive to noise.

In this paper, we propose an unsupervised spatial graph to vector approach called s2vec for learning representations of spatial subgraphs based on the LSTM denoising autoencoders framework (DAF). Specifically, we first sample a set of paths on each subgraph via random walks. Inspired by [16]

, a spatial information aware loss function is proposed based on negative log likelihood, which captures the similarity based on spatial proximity. Finally, we demonstrate the effectiveness of our approach by comparing our learned representations with those based some baselines when used for a spatial clustering task. To the best of our knowledge, s2vec is the first method that supports spatial subgraph representation, making it widely applicable to various downstream graph analysis tasks.

Ii related work

Our work is closely related to the literature of representation learning for graphs. The traditional methods in this field represent a graph as an associated matrix (e.g., adjacency matrix) or a collection of nodes and edges. In recent years, inspired by the success of word2vec [17], modern learning methods attempt to embed nodes into high-dimensional vectors in a continuous space so that nodes with similar representation vectors share similar structural properties such as DeepWalk [6], node2vec [7], LINE [8] and DeepCas [9].

Another line of related work comes from the representation of subgraph structures [10, 11, 12, 13]

. Many of the approaches are inspired by the huge success of representation learning and deep neural networks applied to various domains. For example, subgraph2vec 

[11] borrows the key idea from document embedding methods. The biggest difference between our method and the above works is that these model structures are designed without considering the spatial information and can not be directly used for spatial graph embedding.

In addition, some new graph kernels such as Weisfeiler-Lehman subtree kernel (WL) [14] and Deep Graph Kernel (DGK) [12]

have been proposed to characterize the similarity of different network structures. Others are motivated by representation learning of images to embed graph using convolutional neural networks (CNNs) 

[18, 10].

Iii Definitions and Preliminaries

In this section, we present definitions and preliminaries necessary to understand the problem solved and the denoising autoencoders model used in our solution.

Iii-a Definitions

Definition 1 (Spatial Graph)

Let be a graph, where denotes its vertex set and denotes its edge set. is called a spatial graph if each vertex corresponds to a tuple , where is the identification number of and is the location of .

Definition 2 (Gromov-Hausdorff Distance)

Given two spatial graphs and , the one-sided graph distance from to is defined as:

where is the distance on Euclidean space. The bidirectional graph distance between and is defined as

is then called Gromov-Hausdorff distance of spatial graph.

Iii-B Problem statement

Given a set of spatial subgraphs , we aim to compute an embedding vector of each subgraph such that if two subgraphs are spatial similar w.r.t. Gromov-Hausdorff distance, then they are also similar w.r.t. the Euclidean distance based on their embedding vectors. Specifically, given small enough , and for any spatial subgraph and ,

where is Gromov-Hausdorff distance, is Euclidean distance and function is our embedding algorithm. Suppose is the embedding dimension, the matrix of representations of the subgraph sets is denoted as in the following sections. Note that we did not preserve the structures of these subgraphs and assume they are similar. For example, they are all the same -core subgraphs obtained by community search algorithms,

Iii-C Preliminaries of denoising autoencoders framework

We briefly introduce the denoising autoencoders framework, which is proposed by Vincent et al. [19]. The principle behind it is to be able to reconstruct data from an input of corrupted or noisy data. In order to force the hidden layer to discover more robust features and prevent it from simply learning the identity, and thus the output will be a more refined version of the input data. Precisely, the denoising autoencoders framework has the following steps, as shown in Figure 1.

  • Distribution: Corrupt the initial input by sampling from a conditional distribution which stochastically maps input data to a corrupted version .

  • Encoder: Map the corrupted input to a latent representation .

  • Decoder: Reconstruct a data from the latent representation .

  • Loss: Compute the average reconstruction error over a training set.

The objective function of denoising autoencoders framework is

where and

denotes the empirical joint distribution associated to our training inputs. Minimizing the objective function by stochastic gradient descent method, the denoising autoencoders framework reconstruct the uncorrupted data from the corrupted one.

Fig. 1: Denoising Autoencoders Model

Iv Method

Our proposed method for solving this problem is inspired by the denoising autoencoders framework and techniques of Seq2Seq and the spatial proximity computation.

Iv-a Random walk sampling

In this subsection, we will introduce our path sampling method by which we sample a group of paths on each subgraph . The path sampling method is simplified from a popular graph sampling method called Random Walk with Jump [9]. The reason for the simplification of our sampling is in order to sample a set of connected paths. We walk from a starting node

picked uniformly from the whole subgraph. In the next step, we have a given probability

to walk to one of the adjacent node selected uniformly at random. The uniform selection means in the following way that supposes the degree of the current node is , then we have the probability transition from the current node to its adjacent node is

where denotes the set of s outgoing neighbors. is a smoother. Otherwise, we need to stay at the current node with probability . The walk stops at the end node after enough number of nodes are visited. According to our construction of the path, each path of the subgraph could be represented as , which is also regarded as a user visiting order-spatial sequence. In the following sections, for each subgraph , corresponds to the set containing all sampling paths on the subgraph .

Iv-B The latent representation of subgraph

For each subgraph , we construct a corrupted subgraph which is added random noise on each node of the initial subgraph , where is the magnitude of noise and

is the standard normal distribution. Thus, the Gromov-Hausdorff distance between the initial subgraph

and its corresponding corrupted subgraph is bounded by the magnitude . This can be proved directly by definition. Also, we construct the corrupted path set of . Each path of only differs the Gaussian noisy displacement on each node of the initial path of .

In the sequence encoder-decoder model, we need to input the discrete token into the model and thus we partition the spatial space into cells of equal size. For each node on the path, if it falls into one of the cells and then we provide this node a token. We only keep the cells which are sufficiently hit by sample points. These cells are referred to as hot cells . The sequence of token of the path is denoted as . The sequence set is denoted as .

According to the denoising autoencoders framework, we pick LSTM as the encoding function . Then, the time-varying hidden vector is

where means the corrupted version at timestamp . We let another LSTM as the decoding function , then

the objective function of our algorithm is inspired by the classic negative log likelihood loss [20],

The reconstruction error in our loss function is specifically defined as the spatial proximity aware loss [16],

where the coefficients of the polynomial

is the spatial proximity weight on cell when decoding target . is the projection matrix for cell that projects from the hidden state spaces into the token space. And, denotes the Euclidean distance between the centroid coordinates of the cells.

Eventually, the graph embedding function is defined as the average of the latent representation of the sampling path for each subgraph ,

Thus, our desired subgraph matrix representation is

where is the embedding dimension.

Iv-C Algorithm overview

Algorithm 1 presents the framework. The input includes a set of spatial subgraphs , the embedding size , learning model with encoding function and decoding function , where is the global parameters and learning rate (line 1-4). During the iterative training process (line 5-11), we first sample a set of paths of vertices from the set of subgraphs (line 6). Similar to DeepWalk [6], the sampling process could be generalized as performing a random walk. Then, we stochastically map the original input to a noisy version by adding a Gaussian noise which is subject to the conditional distribution (line 7) and perform the tokenization to get and respectively (line 8). Next, we get the reconstruction , which is computed by decoder component of the model and an optimizer such as stochastic gradient descent (SGD) is used to optimize the parameters (line 9-10). Finally, a vector representation of each subgraph is computed via the learned model (line 12-15). Note that the ultimate output is a matrix of vector representations of spatial subgraphs.

2:: a set of spatial graphs need to be embedded
3:: the embedding size
4:Learning model with encoder and decoder
5:: learning rate
7:      sample by random walking from the set of subgraphs
8:      get noisy version
9:      get and from and
10:      compute reconstruction from
12:until No improvement on validation set
13:for each  do
14:      get representation from
16:end for
17:Matrix of vector representations of spatial graphs
Algorithm 1 Algorithm framework

V Experiments

In this section, we use our representations for the task of spatial graph clustering on four real datasets and show the effectiveness of our model s2vec compared against several baseline approaches.

V-a Datasets

The experiments are conducted on four real datasets: Brightkite111, Gowalla1, Flickr222 and Foursquare333 For all the datasets, each vertex represents a user and each link represents the friendship between two users. The statistics of each dataset are shown in Table I, where the average degree , the average betweenness centrality BC and the clustering coefficient CC of each vertex in the dataset are included. We randomly select 200 subgraphs of each dataset among those found by a community search algorithm [4].

Type Name Vertices Edges BC CC
Real Brightkite 51,406 197,167 7.67 7.34E-5 0.1795
Gowalla 107,092 456,830 8.53 3.40E-5 0.2487
Flickr 214,698 2,096,306 19.5 1.65E-5 0.1113
Foursquare 2,127,093 8,640,352 8.12 1.68E-6 0.1044
TABLE I: Dataset statistics.

V-B Baselines

We compare s2vec with two baselines of graph representation, the first baseline is similar to Principal Component Analysis (PCA) 


, we arrange the spatial features of each vertex in the subgraph according to the order of longitude and latitude and then expand into a high-dimensional vector. In this case, vectors of inconsistent length are padded by a special value such as zero. After that, we perform PCA dimensionality reduction on all subgraphs to obtain new vector representations. Note that this algorithm considers the graph as a set of spatial vertices without structural features.

The second baseline is s kind of ablation of s2vec, adapting the sequence-to-sequence learning framework for autoencoding by using the same sequence for both the input and output [li2015hierarchical, 13]. We denote these two baselines as PCA and Vanilla-s2vec (V-s2vec) respectively.

V-C Spatial Graph Clustering

Given a set of spatial graphs , the goal of graph clustering is to group graphs with similar structure and spatial features together. s2vec’s representation vectors could be used along with conventional clustering algorithms such as DBSCAN with Euclidean distance for this purpose.

The lack of ground-truth makes it a challenging problem to evaluate the effectiveness of spatial graph clustering. To overcome it, we use the Gromov-Hausdorff distance to measure the similarity of spatial graphs and perform DBSCAN algorithm to find the results as its ground-truth.

Evaluation Metric.

In order to quantitatively measure the clustering accuracy, a standard clustering evaluation metric, namely, Adjusted Rand Index (ARI) 

[22] is used. The ARI values lie in the range . A higher ARI means a higher correspondence to the ground-truth results. We convert it into a percentage value for easy understanding.

The results of spatial graph clustering using s2vec and other baselines are presented in Table II. We observe that s2vec outperforms all the other compared approaches highly significantly. In particular, it outperforms the PCA techniques by at least 10%. This is because s2vec fully considers maintaining similarity in original space.

Dataset Brightkite Gowalla Flickr Foursquare
PCA 49.76 49.45 49.99 48.85
V-s2vec 57.14 59.81 69.59 54.60
s2vec 62.49 63.07 73.64 63.61
TABLE II: Spatial Graph Clustering.

Vi conclusion and future work

In this paper, we learn the latent representation of the spatial subgraphs in the denoising autoencoders framework and apply our latent representations in the spatial graph clustering task achieving the effective results. The problem of spatial graph representation is the first time to be proposed and has a great potential application value. In the future, we will consider a more complex graph structure with more information on its nodes.


  • [1] M. Dale and M.-J. Fortin, “From graphs to spatial graphs,” Annual Review of Ecology, Evolution, and Systematics, vol. 41, pp. 21–38, 2010.
  • [2] J. Shi, N. Mamoulis, D. Wu, and D. W. Cheung, “Density-based place clustering in geo-social networks,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data.   ACM, 2014, pp. 99–110.
  • [3] S. Fortunato, “Community detection in graphs,” Physics resports, vol. 486, no. 3-5, pp. 75–174, 2010.
  • [4] Y. Fang, Z. Wang, R. Cheng, X. Li, S. Luo, J. Hu, and X. Chen, “On spatial-aware community search,” IEEE Transactions on Knowledge and Data Engineering, 2018.
  • [5] P. Manchanda, G. Packard, and A. Pattabhiramaiah, “Social dollars: The economic impact of customer participation in a firm-sponsored online customer community,” Marketing Science, vol. 34, no. 3, pp. 367–387, 2015.
  • [6] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2014, pp. 701–710.
  • [7] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2016, pp. 855–864.
  • [8] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web.   International World Wide Web Conferences Steering Committee, 2015, pp. 1067–1077.
  • [9] C. Li, J. Ma, X. Guo, and Q. Mei, “Deepcas: An end-to-end predictor of information cascades,” in Proceedings of the 26th International Conference on World Wide Web.   International World Wide Web Conferences Steering Committee, 2017, pp. 577–586.
  • [10] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in

    International conference on machine learning

    , 2016, pp. 2014–2023.
  • [11] A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, and S. Saminathan, “subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs,” arXiv preprint arXiv:1606.08928, 2016.
  • [12] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.   ACM, 2015, pp. 1365–1374.
  • [13]

    A. Taheri, “Learning graph representations with recurrent neural network autoencoders,” 2018.

  • [14] C. Tan, L. Lee, and B. Pang, “The effect of wording on message propagation: Topic-and author-controlled natural experiments on twitter,” arXiv preprint arXiv:1405.1438, 2014.
  • [15] M. Aanjaneya, F. Chazal, D. Chen, M. Glisse, L. Guibas, and D. Morozov, “Metric graph reconstruction from noisy data,” International Journal of Computational Geometry & Applications, vol. 22, no. 04, pp. 305–325, 2012.
  • [16] X. Li, K. Zhao, G. Cong, C. S. Jensen, and W. Wei, “Deep representation learning for trajectory similarity computation,” in 2018 IEEE 34th International Conference on Data Engineering (ICDE).   IEEE, 2018, pp. 617–628.
  • [17] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  • [18] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end deep learning architecture for graph classification,” in Proceedings of AAAI Conference on Artificial Inteligence, 2018.
  • [19] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning.   ACM, 2008, pp. 1096–1103.
  • [20] J. Platt et al.

    , “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,”

    Advances in large margin classifiers

    , vol. 10, no. 3, pp. 61–74, 1999.
  • [21] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.
  • [22] W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical association, vol. 66, no. 336, pp. 846–850, 1971.