I introduction
Spatial graphs are graphs where each node is associated with a location [1]. One wellknown instance of spatial graph is the locationbased social networks (LBSN) [2, 3, 4], where the users in social networks, which correspond to nodes, are associated with some location information, e.g., online ”checkins” in Brightkite, geotagged tweets in Twitter, geotagged photo albums in Foursquare. Some other instances of spatial graph are commonly used in ecology and evolution [1].
We consider the problem of spatial subgraph embedding based on deep representation learning. Once the embedding vectors of the graphs are computed, they could be used for a variety of downstream graph analysis tasks, including spatial clustering
[2][4] and spatial classification. For instance, in social network analysis, we often gained subgraphstructured communities by performing community search or detection algorithms. Clustering these communities are very beneficial for some practical applications such as social marketing, as studied in [5], people with close social relationships tend to purchase in places that are also physically close. Advertisers can target limited resources to the communities with similar spatial and structural characteristics to boost sales figures and achieve maximize revenue.Existing graph representation learning methods can be categorized into node representation [6, 7, 8, 9] and subgraph representations [10, 11, 12, 13]. However, almost all these prior works focus on structural analysis and do not consider the spatial features. Some alternative approaches for spatial graph similarity search include graph kernels [14] and graph matching [15], which, however cannot scaleup for large graphs due to their high computational cost. Additionally, these approaches do not consider the global structure of the graphs and are sensitive to noise.
In this paper, we propose an unsupervised spatial graph to vector approach called s2vec for learning representations of spatial subgraphs based on the LSTM denoising autoencoders framework (DAF). Specifically, we first sample a set of paths on each subgraph via random walks. Inspired by [16]
, a spatial information aware loss function is proposed based on negative log likelihood, which captures the similarity based on spatial proximity. Finally, we demonstrate the effectiveness of our approach by comparing our learned representations with those based some baselines when used for a spatial clustering task. To the best of our knowledge, s2vec is the first method that supports spatial subgraph representation, making it widely applicable to various downstream graph analysis tasks.
Ii related work
Our work is closely related to the literature of representation learning for graphs. The traditional methods in this field represent a graph as an associated matrix (e.g., adjacency matrix) or a collection of nodes and edges. In recent years, inspired by the success of word2vec [17], modern learning methods attempt to embed nodes into highdimensional vectors in a continuous space so that nodes with similar representation vectors share similar structural properties such as DeepWalk [6], node2vec [7], LINE [8] and DeepCas [9].
Another line of related work comes from the representation of subgraph structures [10, 11, 12, 13]
. Many of the approaches are inspired by the huge success of representation learning and deep neural networks applied to various domains. For example, subgraph2vec
[11] borrows the key idea from document embedding methods. The biggest difference between our method and the above works is that these model structures are designed without considering the spatial information and can not be directly used for spatial graph embedding.In addition, some new graph kernels such as WeisfeilerLehman subtree kernel (WL) [14] and Deep Graph Kernel (DGK) [12]
have been proposed to characterize the similarity of different network structures. Others are motivated by representation learning of images to embed graph using convolutional neural networks (CNNs)
[18, 10].Iii Definitions and Preliminaries
In this section, we present definitions and preliminaries necessary to understand the problem solved and the denoising autoencoders model used in our solution.
Iiia Definitions
Definition 1 (Spatial Graph)
Let be a graph, where denotes its vertex set and denotes its edge set. is called a spatial graph if each vertex corresponds to a tuple , where is the identification number of and is the location of .
Definition 2 (GromovHausdorff Distance)
Given two spatial graphs and , the onesided graph distance from to is defined as:
where is the distance on Euclidean space. The bidirectional graph distance between and is defined as
is then called GromovHausdorff distance of spatial graph.
IiiB Problem statement
Given a set of spatial subgraphs , we aim to compute an embedding vector of each subgraph such that if two subgraphs are spatial similar w.r.t. GromovHausdorff distance, then they are also similar w.r.t. the Euclidean distance based on their embedding vectors. Specifically, given small enough , and for any spatial subgraph and ,
where is GromovHausdorff distance, is Euclidean distance and function is our embedding algorithm. Suppose is the embedding dimension, the matrix of representations of the subgraph sets is denoted as in the following sections. Note that we did not preserve the structures of these subgraphs and assume they are similar. For example, they are all the same core subgraphs obtained by community search algorithms,
IiiC Preliminaries of denoising autoencoders framework
We briefly introduce the denoising autoencoders framework, which is proposed by Vincent et al. [19]. The principle behind it is to be able to reconstruct data from an input of corrupted or noisy data. In order to force the hidden layer to discover more robust features and prevent it from simply learning the identity, and thus the output will be a more refined version of the input data. Precisely, the denoising autoencoders framework has the following steps, as shown in Figure 1.

Distribution: Corrupt the initial input by sampling from a conditional distribution which stochastically maps input data to a corrupted version .

Encoder: Map the corrupted input to a latent representation .

Decoder: Reconstruct a data from the latent representation .

Loss: Compute the average reconstruction error over a training set.
The objective function of denoising autoencoders framework is
where and
denotes the empirical joint distribution associated to our training inputs. Minimizing the objective function by stochastic gradient descent method, the denoising autoencoders framework reconstruct the uncorrupted data from the corrupted one.
Iv Method
Our proposed method for solving this problem is inspired by the denoising autoencoders framework and techniques of Seq2Seq and the spatial proximity computation.
Iva Random walk sampling
In this subsection, we will introduce our path sampling method by which we sample a group of paths on each subgraph . The path sampling method is simplified from a popular graph sampling method called Random Walk with Jump [9]. The reason for the simplification of our sampling is in order to sample a set of connected paths. We walk from a starting node
picked uniformly from the whole subgraph. In the next step, we have a given probability
to walk to one of the adjacent node selected uniformly at random. The uniform selection means in the following way that supposes the degree of the current node is , then we have the probability transition from the current node to its adjacent node iswhere denotes the set of s outgoing neighbors. is a smoother. Otherwise, we need to stay at the current node with probability . The walk stops at the end node after enough number of nodes are visited. According to our construction of the path, each path of the subgraph could be represented as , which is also regarded as a user visiting orderspatial sequence. In the following sections, for each subgraph , corresponds to the set containing all sampling paths on the subgraph .
IvB The latent representation of subgraph
For each subgraph , we construct a corrupted subgraph which is added random noise on each node of the initial subgraph , where is the magnitude of noise and
is the standard normal distribution. Thus, the GromovHausdorff distance between the initial subgraph
and its corresponding corrupted subgraph is bounded by the magnitude . This can be proved directly by definition. Also, we construct the corrupted path set of . Each path of only differs the Gaussian noisy displacement on each node of the initial path of .In the sequence encoderdecoder model, we need to input the discrete token into the model and thus we partition the spatial space into cells of equal size. For each node on the path, if it falls into one of the cells and then we provide this node a token. We only keep the cells which are sufficiently hit by sample points. These cells are referred to as hot cells . The sequence of token of the path is denoted as . The sequence set is denoted as .
According to the denoising autoencoders framework, we pick LSTM as the encoding function . Then, the timevarying hidden vector is
where means the corrupted version at timestamp . We let another LSTM as the decoding function , then
the objective function of our algorithm is inspired by the classic negative log likelihood loss [20],
The reconstruction error in our loss function is specifically defined as the spatial proximity aware loss [16],
where the coefficients of the polynomial
is the spatial proximity weight on cell when decoding target . is the projection matrix for cell that projects from the hidden state spaces into the token space. And, denotes the Euclidean distance between the centroid coordinates of the cells.
Eventually, the graph embedding function is defined as the average of the latent representation of the sampling path for each subgraph ,
Thus, our desired subgraph matrix representation is
where is the embedding dimension.
IvC Algorithm overview
Algorithm 1 presents the framework. The input includes a set of spatial subgraphs , the embedding size , learning model with encoding function and decoding function , where is the global parameters and learning rate (line 14). During the iterative training process (line 511), we first sample a set of paths of vertices from the set of subgraphs (line 6). Similar to DeepWalk [6], the sampling process could be generalized as performing a random walk. Then, we stochastically map the original input to a noisy version by adding a Gaussian noise which is subject to the conditional distribution (line 7) and perform the tokenization to get and respectively (line 8). Next, we get the reconstruction , which is computed by decoder component of the model and an optimizer such as stochastic gradient descent (SGD) is used to optimize the parameters (line 910). Finally, a vector representation of each subgraph is computed via the learned model (line 1215). Note that the ultimate output is a matrix of vector representations of spatial subgraphs.
V Experiments
In this section, we use our representations for the task of spatial graph clustering on four real datasets and show the effectiveness of our model s2vec compared against several baseline approaches.
Va Datasets
The experiments are conducted on four real datasets: Brightkite^{1}^{1}1http://snap.stanford.edu/data/index.html, Gowalla^{1}, Flickr^{2}^{2}2https://www.flickr.com/ and Foursquare^{3}^{3}3https://archive.org/details/201309_foursquare_dataset_umn. For all the datasets, each vertex represents a user and each link represents the friendship between two users. The statistics of each dataset are shown in Table I, where the average degree , the average betweenness centrality BC and the clustering coefficient CC of each vertex in the dataset are included. We randomly select 200 subgraphs of each dataset among those found by a community search algorithm [4].
Type  Name  Vertices  Edges  BC  CC  

Real  Brightkite  51,406  197,167  7.67  7.34E5  0.1795 
Gowalla  107,092  456,830  8.53  3.40E5  0.2487  
Flickr  214,698  2,096,306  19.5  1.65E5  0.1113  
Foursquare  2,127,093  8,640,352  8.12  1.68E6  0.1044 
VB Baselines
We compare s2vec with two baselines of graph representation, the first baseline is similar to Principal Component Analysis (PCA)
[21], we arrange the spatial features of each vertex in the subgraph according to the order of longitude and latitude and then expand into a highdimensional vector. In this case, vectors of inconsistent length are padded by a special value such as zero. After that, we perform PCA dimensionality reduction on all subgraphs to obtain new vector representations. Note that this algorithm considers the graph as a set of spatial vertices without structural features.
The second baseline is s kind of ablation of s2vec, adapting the sequencetosequence learning framework for autoencoding by using the same sequence for both the input and output [li2015hierarchical, 13]. We denote these two baselines as PCA and Vanillas2vec (Vs2vec) respectively.
VC Spatial Graph Clustering
Given a set of spatial graphs , the goal of graph clustering is to group graphs with similar structure and spatial features together. s2vec’s representation vectors could be used along with conventional clustering algorithms such as DBSCAN with Euclidean distance for this purpose.
The lack of groundtruth makes it a challenging problem to evaluate the effectiveness of spatial graph clustering. To overcome it, we use the GromovHausdorff distance to measure the similarity of spatial graphs and perform DBSCAN algorithm to find the results as its groundtruth.
Evaluation Metric.
In order to quantitatively measure the clustering accuracy, a standard clustering evaluation metric, namely, Adjusted Rand Index (ARI)
[22] is used. The ARI values lie in the range . A higher ARI means a higher correspondence to the groundtruth results. We convert it into a percentage value for easy understanding.The results of spatial graph clustering using s2vec and other baselines are presented in Table II. We observe that s2vec outperforms all the other compared approaches highly significantly. In particular, it outperforms the PCA techniques by at least 10%. This is because s2vec fully considers maintaining similarity in original space.
Dataset  Brightkite  Gowalla  Flickr  Foursquare 
PCA  49.76  49.45  49.99  48.85 
Vs2vec  57.14  59.81  69.59  54.60 
s2vec  62.49  63.07  73.64  63.61 
Vi conclusion and future work
In this paper, we learn the latent representation of the spatial subgraphs in the denoising autoencoders framework and apply our latent representations in the spatial graph clustering task achieving the effective results. The problem of spatial graph representation is the first time to be proposed and has a great potential application value. In the future, we will consider a more complex graph structure with more information on its nodes.
References
 [1] M. Dale and M.J. Fortin, “From graphs to spatial graphs,” Annual Review of Ecology, Evolution, and Systematics, vol. 41, pp. 21–38, 2010.
 [2] J. Shi, N. Mamoulis, D. Wu, and D. W. Cheung, “Densitybased place clustering in geosocial networks,” in Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014, pp. 99–110.
 [3] S. Fortunato, “Community detection in graphs,” Physics resports, vol. 486, no. 35, pp. 75–174, 2010.
 [4] Y. Fang, Z. Wang, R. Cheng, X. Li, S. Luo, J. Hu, and X. Chen, “On spatialaware community search,” IEEE Transactions on Knowledge and Data Engineering, 2018.
 [5] P. Manchanda, G. Packard, and A. Pattabhiramaiah, “Social dollars: The economic impact of customer participation in a firmsponsored online customer community,” Marketing Science, vol. 34, no. 3, pp. 367–387, 2015.
 [6] B. Perozzi, R. AlRfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 701–710.
 [7] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2016, pp. 855–864.
 [8] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: Largescale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2015, pp. 1067–1077.
 [9] C. Li, J. Ma, X. Guo, and Q. Mei, “Deepcas: An endtoend predictor of information cascades,” in Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017, pp. 577–586.

[10]
M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks
for graphs,” in
International conference on machine learning
, 2016, pp. 2014–2023.  [11] A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, and S. Saminathan, “subgraph2vec: Learning distributed representations of rooted subgraphs from large graphs,” arXiv preprint arXiv:1606.08928, 2016.
 [12] P. Yanardag and S. Vishwanathan, “Deep graph kernels,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015, pp. 1365–1374.

[13]
A. Taheri, “Learning graph representations with recurrent neural network autoencoders,” 2018.
 [14] C. Tan, L. Lee, and B. Pang, “The effect of wording on message propagation: Topicand authorcontrolled natural experiments on twitter,” arXiv preprint arXiv:1405.1438, 2014.
 [15] M. Aanjaneya, F. Chazal, D. Chen, M. Glisse, L. Guibas, and D. Morozov, “Metric graph reconstruction from noisy data,” International Journal of Computational Geometry & Applications, vol. 22, no. 04, pp. 305–325, 2012.
 [16] X. Li, K. Zhao, G. Cong, C. S. Jensen, and W. Wei, “Deep representation learning for trajectory similarity computation,” in 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 2018, pp. 617–628.
 [17] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
 [18] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An endtoend deep learning architecture for graph classification,” in Proceedings of AAAI Conference on Artificial Inteligence, 2018.
 [19] P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on Machine learning. ACM, 2008, pp. 1096–1103.

[20]
J. Platt et al.
, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,”
Advances in large margin classifiers
, vol. 10, no. 3, pp. 61–74, 1999.  [21] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 13, pp. 37–52, 1987.
 [22] W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical association, vol. 66, no. 336, pp. 846–850, 1971.
Comments
There are no comments yet.