1 Introduction
NRL is a powerful technique to learn representation of a graph. Such a representation gracefully lends itself to a wide variety of network analysis tasks, such as link prediction, node clustering, node classification, recommendation, and so forth.
In most studies, the learning is done in a contextfree fashion. That is, the representation of a node characterizes just a single aspect of the node, for instance, the local or global neighborhood of a node. Recently, a complementary line of research has questioned the sufficiency of single representations and considered a contextsensitive approach. Given a node, this approach projects it to different points in a space depending on other contexts it is coupled with. A context node can be sampled from a neighborhood (Tu et al., 2017; Zhang et al., 2018), random walk (Ying et al., 2018), and so on. In this study we sample from a node neighborhood (nodes connected by an edge). Thus, in the learning process of our approach a source node’s representation changes depending on the target (context) node it is accompanied by. Studies have shown that contextsensitive approaches significantly outperform previous contextfree SOTA methods in linkprediction task. A related notion (Peters et al., 2018; Devlin et al., 2018) in NLP has significantly improved SOTA across several NLP tasks.
In this paper we propose Gap (Graph neighborhood attentive pooling), which is inspired by attentive pooling networks (apn) (dos Santos et al., 2016), originally proposed for solving the problem of pair ranking in NLP. For instance, given a question , and a set of answers , an apn can be trained to rank the answers in with respect to by using a twoway attention mechanism. apn
is based on the prevalent deep learning formula for SOTA NLP, that is,
embed, encode, attend, predict (Honnibal, 2018). Given a questionanswer pair , the apn model first projects the embedding of the pairs using two separate encoders, and the encoder can be a cnn or lstm. The projection helps to capture ngram context information and/or longterm dependencies in the input sequence. Next, a softalignment matrix is obtained by a mutualattention mechanism that transforms these projections using a parameter matrix. Attention vectors are then computed through a columnwise and rowwise pooling operations on the alignment matrix. Finally, the weighted sum of each of the above projections by its respective attention vector is computed to obtain the representations of the question and answer. Each candidate answer
is then ranked according to its similarity with the question computed using the representations of and .Recently, apn have been applied to contextsensitive NRL by Tu et al. (2017), and the inputs are textual information attached with a pair of incident nodes of edges in a graph. Such information, however, has the added overhead of encoding textual information.
Though we adopt apn in Gap, we capitalize on the graph neighborhood of nodes to avoid the need for textual documents without compromising the quality of the learned representations. Our hypothesis is that one can learn highquality contextsensitive node representations just by mutually attending to the graph neighborhood of a node and its context node. To achieve this, we naturally assume that the order of nodes in the graph neighborhood of a node is arbitrary. Moreover, we exploit this assumption to simplify the apn model by removing the expensive encode phase.
Akin to textual features in apn, Gap simply uses graph neighborhood of nodes. That is, for every node in the graph we define a graph neighborhood function to build a fixed size neighborhood sequence, which specifies the input of Gap. In the apn model, the encoder phase is usually required to capture highlevel features such as ngrams, and long term and order dependencies in textual inputs. As we have no textual features and due to our assumption that there is no ordering in the graph neighborhood of nodes, we can effectively strip off the encoder. The encoder is the expensive part of apn as it involves a rnn or cnn, and hence Gap can be trained faster than apn.
This simple yet empirically fruitful modification of the apn model enables Gap to achieve SOTA performance on link prediction and node clustering tasks using three real world datasets. Furthermore, we have empirically shown that Gap is more than 2 times faster than an apn like NRL algorithm based on text input. In addition, the simplification in Gap does not introduce new hyperparameters other than the usual ones, such as the learning rate and sequence length in apn.
2 apn Architecture
For the sake of being selfcontained, here we briefly describe the original apn architecture. We are given a pair of natural language texts as input, where and are a sequence of words of variable lengths, and each word is a sample from a vocabulary , , , and and could be different. The apn’s forward execution is shown in Fig. 1(A) and in the following we describe each component.
Embed:
First embedding matrices of and are constructed through a lookup operation on an embedding matrix of words, where is the embedding dimension. That is, for both and , respectively, embedding matrices and are constructed by concatenating embeddings of each word in and , the Embed box in Fig. 1(A).
Encode:
Each embedding matrix is then projected using a cnn or biLSTM encoder to capture inherent highlevel features, the Encode box in Fig. 1(A). More formally, the embedded texts , are projected as and where is the encoder, cnn or biLSTM, is the set of parameters of the encoder, and and , where is the number of filters or hidden features of the cnn and biLSTM, respectively.
Attend:
In the third step, a parameter matrix is introduced so as to learn a similarity or softalignment matrix between the sequence projections and as:
Then unnormalized attention weight vectors and
are obtained through a columnwise and rowwise maxpooling operations on
, respectively as , where , and and are the th and th row and column of , respectively. Next, the attention vectors are normalized using softmax, and . Finally, the normalized attention vectors are used to compute the final representations as and .Predict:
In the last step, the representations and will be used for ranking depending on the task on hand. For instance, in a question and answer setting, each candidate answer’s representation will be ranked based on its similarity score with the question’s representation .
3 Gap
Gap adopts the apn model for learning the representations of node pairs in a graph with a set of nodes and edges . can be a directed or undirected and weighted or unweighted graph. Without loss of generality we assume that is an unweighted directed graph.
We define a neighborhood function , which maps each node to a set of nodes . A simple way of materializing is to consider the firstorder neighbors of , that is, . An important assumption that we have on is that the ordering of the nodes in is not important. Gap capitalizes on this assumption to simplify the apn model and achieve SOTA performance. Even though one can explore more sophisticated neighborhood functions, in this study we simply consider the first order neighborhood.
Our goal is to learn node representations using the simplified apn based on the node neighborhood function . Hence, akin to the input text pairs in apn, we consider a pair of neighborhood sequences and associated with a pair of nodes and , and and . Without loss of generality we consider . Recall that we assume the order of nodes in and is arbitrary.
Given a source node , we seek to learn multiple contextsensitive embeddings of with respect to a target node it is paired with. In principle one can learn using all pairs of nodes, however that is not scalable, and hence we restrict learning between pairs in .
Gap’s forward execution model is shown in Fig 1(B), and learning starts by embedding and , respectively, as and . Since there is no order dependency between the nodes in or , besides being a neighbor of the respective node, we leave out the cnn or biLSTM based projections of and that could capture the dependencies. No encoder!
Thus, the next step of Gap is mutually attending on the embeddings, and , of the graph neighborhood of the node pairs; the Attend box of 1(B). That is, we employ the trainable parameter matrix and compute the softalignment matrix, , between the neighbors of and .
(1) 
Here is a softalignment matrix between every pair of nodes, . Therefore, for each axis of , we proceed by pooling the maximum alignment score for each node to obtain the unnormalized attention vectors and . As a result of the pooling operations, each neighbor of the source node, , selects a neighbor of the target node, , with the maximum alignment or similarity score. A similar selection is done for . This enables the source and target neighborhood sequences of the pair to influence each other in order to learn a contextsensitive representation of and . The normalized attention vectors are then obtained by applying softmax as and . Ultimately, we compute the contextsensitive representations and of the source and target nodes and , respectively as and .
Optimization:
The objective of Gap is to maximize the likelihood of the graph (edges) by predicting edge similarities using the dot product of the source and target representations as ; the Predict box of Fig 1(B). Hence, we employ a hardmargin loss given in Eq. 2.
(2) 
where is the representation of a negative target node , that is . The goal is to learn, in an unsupervised fashion, a contextsensitive embedding of nodes that enable us to rank the positive edges higher than the negative pairs .
Finally a word on the computational complexity of Gap that is proportional to the the number of edges, as we are considering each edge as an input pair.
4 Experimental Evaluation
Dataset  #Nodes  #Edges  Features 

Cora  2277  5214  Paper Abstract 
Zhihu  10000  43894  User post 
1005  25571  NA 
In this section we provide an empirical evaluation of Gap. To this end, experiments are carried out using the following datasets, and a basic summary is given in Table 1.

Email Leskovec et al. (2007): is an email communication network between the largest European research institutes. A node represents a person and an edge denotes that person has sent an email to .
The first two datasets have features (documents) associated to nodes. For Cora, abstract of papers and Zhihu user posts. Some of the baselines, discussed beneath, require textual information, and hence they consume the aforementioned features. The Email dataset has groundtruth community assignment for nodes based on a person’s affiliation to one of the 42 departments.
We compare our method against the following 11 popular and SOTA baselines grouped as:

Structure based ContextSensitive method: splitter Epasto and Perozzi (2019)
Now we report the experimental results carried out in two tasks, which are link prediction and node clustering. All experiments are performed using a 24Core CPU and 125GB RAM Ubuntu 18.04 machine.
4.1 Link Prediction
Link prediction is an important task that graph embedding algorithms are applied to. Particularly contextsensitive embedding techniques have proved to be well suited for this task. Similar to existing studies we perform this experiment using a fraction of the edges as a training set. We hold out the remaining fraction of the edges from the training phase and we will only reveal them during the test phase, results are reported using this set. All hyperparameter tuning is performed by taking a small fraction of the training set as a validation set.
Setup:
Inline with existing techniques (Tu et al., 2017; Zhang et al., 2018), the percentage of training edges ranges from 15% to 95% by a step of 10. The hyperparameters of all algorithms are tuned using randomsearch. For some of the baselines, our results are consistent with what is reported in previous studies, and hence for Cora and Zhihu we simply report these results.
Except the “unavoidable” hyperparameters (eg. learning rate, regularization/dropout rate) that are common in all the algorithms, our model has just one hyperparameter which is the neighborhood sequence length (#Neighborhood
), for nodes with smaller neighborhood size we use zero padding. As we shall verify later,
Gap is not significantly affected by the choice of this parameter.The quality of the prediction task is measured using the AUC score. AUC indicates the probability that a randomly selected pair
will have a higher similarity score than an edge . Similarity between a pair of nodes is computed as the dot product of their representation. For all the algorithms the representation size – is 200 and Gap’s configuration is shown in Table 2.Dataset  #Neighborhood ( and )  Dropout  Learning rate  Representation size 

Cora  100  0.5  0.0001  200 
Zhihu  250  0.65  0.0001  200 
100  0.8  0.0001  200 
Results:
The results of the empirical evaluations on the Cora, Zhihu, and Email datasets are reported in Tables 3, 4, and 5. Gap outperforms the SOTA baselines in all cases for Zhihu and Email, and in almost all cases for Cora. One can see that as we increase the percentage of training edges, performance increases for all the algorithms. As indicated by the “Gain” row, Gap achieves up to 9% improvement over SOTA contextsensitive techniques. Notably the gain is pronounced for smaller values of percentage of edges used for training. This is shows that Gap is suitable both in cases where there are several missing links and most of the links are present.
Algorithm  % of training edges  
15%  25%  35%  45%  55%  65%  75%  85%  95%  
DeepWalk  56.0  63.0  70.2  75.5  80.1  85.2  85.3  87.8  90.3 
Line  55.0  58.6  66.4  73.0  77.6  82.8  85.6  88.4  89.3 
Node2Vec  55.9  62.4  66.1  75.0  78.7  81.6  85.9  87.3  88.2 
WalkLets  69.8  77.3  82.8  85.0  86.6  90.4  90.9  92.0  93.3 
AttentiveWalk  64.2  76.7  81.0  83.0  87.1  88.2  91.4  92.4  93.0 
tadw  86.6  88.2  90.2  90.8  90.0  93.0  91.0  93.0  92.7 
TriDnr  85.9  88.6  90.5  91.2  91.3  92.4  93.0  93.6  93.7 
cene  72.1  86.5  84.6  88.1  89.4  89.2  93.9  95.0  95.9 
cane  86.8  91.5  92.2  93.9  94.6  94.9  95.6  96.6  97.7 
dmte  91.3  93.1  93.7  95.0  96.0  97.1  97.4  98.2  98.8 
splitter  65.4  69.4  73.7  77.3  80.1  81.5  83.9  85.7  87.2 
Gap  95.8  96.4  97.1  97.6  97.6  97.6  97.8  98.0  98.2 
GAIN%  4.5%  3.6%  3.4%  2.6%  1.6%  0.5%  0.4% 
Algorithm  %of training edges  
15%  25%  35%  45%  55%  65%  75%  85%  95%  
DeepWalk  56.6  58.1  60.1  60.0  61.8  61.9  63.3  63.7  67.8 
Line  52.3  55.9  59.9  60.9  64.3  66.0  67.7  69.3  71.1 
Node2Vec  54.2  57.1  57.3  58.3  58.7  62.5  66.2  67.6  68.5 
WalkLets  50.7  51.7  52.6  54.2  55.5  57.0  57.9  58.2  58.1 
AttentiveWalk  69.4  68.0  74.0  75.9  76.4  74.5  74.7  71.7  66.8 
tadw  52.3  54.2  55.6  57.3  60.8  62.4  65.2  63.8  69.0 
TriDnr  53.8  55.7  57.9  59.5  63.0  64.2  66.0  67.5  70.3 
cene  56.2  57.4  60.3  63.0  66.3  66.0  70.2  69.8  73.8 
cane  56.8  59.3  62.9  64.5  68.9  70.4  71.4  73.6  75.4 
dmte  58.4  63.2  67.5  71.6  74.0  76.7  78.7  80.3  82.2 
splitter  59.8  61.5  61.8  62.1  62.1  62.4  61.0  60.7  58.6 
Gap  72.6  77.9  81.2  80.8  81.4  81.8  82.0  83.7  86.3 
GAIN%  3.2%  9.9%  7.2%  5.1%  5.0%  5.1%  3.3%  3.4%  4.1% 
Algorithm  %of training edges  
15%  25%  35%  45%  55%  65%  75%  85%  95%  
DeepWalk  69.2  71.4  74.1  74.7  76.6  76.1  78.7  75.7  79.0 
Line  65.6  71.5  73.8  76.0  76.7  77.8  78.5  77.9  78.8 
Node2Vec  66.4  68.6  71.2  71.7  72.7  74.0  74.5  74.4  76.1 
WalkLets  70.3  73.2  75.2  78.7  78.2  78.1  78.9  80.0  78.5 
AttentiveWalk  68.8  72.5  73.5  75.2  74.1  74.9  73.0  70.3  68.6 
splitter  69.2  70.4  69.1  69.2  70.6  72.8  73.3  74.8  75.2 
Gap  77.6  81.6  81.9  83.3  83.1  84.1  84.5  84.8  84.8 
GAIN%  7.3%  8.4%  6.7%  4.6%  4.9%  6.0%  5.6%  4.8%  5.8% 
4.2 Node Clustering
Nodes in a network has the tendency to form cohesive structures based on some kinds of shared aspects. These structures are usually referred to as groups, clusters or communities and identifying them is an important task in network analysis. In this section we use the Email dataset that has ground truth communities, and there are 42 of them. Since this dataset has only structural information, we have excluded the baselines that require textual information.
Setup:
Since each node belongs to exactly one cluster, we employ the Spectral Clustering algorithm to identify clusters. The learned representations of nodes by a certain algorithm are the input features of the clustering algorithm. In this experiment the percentage of training edges varies from 25% to 95% by a step of 20%, for the rest we use the same configuration as in the above experiment.
Given the ground truth community assignment of nodes and the predicted community assignments , usually the agreement between and are measured using mutual information . However, is not bounded and difficult for comparing methods, hence we use two other variants of (Vinh et al., 2010). Which are, the normalized mutual information , which simply normalizes and adjusted mutual information , which adjusts or normalizes to random chances.
Algorithm  %of training edges  

25%  55%  75%  95%  
NMI  AMI  NMI  AMI  NMI  AMI  NMI  AMI  
DeepWalk  41.3  28.6  53.6  44.8  50.6  42.4  57.6  49.9 
Line  44.0  30.3  49.9  38.2  53.3  42.6  56.3  46.5 
Node2Vec  46.6  35.3  45.9  35.3  47.8  38.5  53.8  45.5 
WalkLets  47.5  39.9  55.3  47.4  54.0  45.4  50.1  41.6 
AttentiveWalk  42.9  30.0  45.7  36.5  44.3  35.7  47.4  38.5 
splitter  38.9  23.8  43.2  30.3  45.2  33.6  48.4  37.6 
Gap  67.8  58.8  64.7  55.7  65.6  57.6  65.4  58.7 
%Gain  20.3%  9.4%  11.0%  7.8% 
Results:
The results of this experiment are reported in Table 6, and Gap significantly outperforms all the baselines by up to 20% with respect to AMI score. Consistent to our previous experiment Gap performs well in both extremes for the value of the percentage of the training edges. Similar improvements are achieved for AMI score.
4.3 Parameter Sensitivity Analysis
Here we first show the sensitivity of the main hyperparameter of Gap, which is the size of the neighborhood, #Neighborhood . Figures 2 and 3(A) show the effects of this parameter both on link prediction and node clustering tasks. In both cases we notice that Gap is not significantly affected by the change of values. We show the effect across different values of percentage and fixed (55%) of training edges for link prediction and node clustering tasks, respectively. Regardless of the percentage of training edges, we only see a small change of AUC (Fig 2), and NMI and AMI (Fig 3a) across different values of #Neighborhood.
Next, we analyze the run time of training Gap and our goal is to show the benefit of removing the encoder of apn and we do this by comparing Gap against cane, which employs the exact apn architecture. For this experiment we include two randomly generated graphs, using Erdős–Rényi (ERG) and Barabási–Albert (BAG) models. ERG has 200K edges, and BAG has 1.5M edges. Figure 3(B) clearly shows that Gap is at least 2 times faster than cane in all the graphs.
4.4 Ablation Study
Datasets  Cora  

Baselines  Training ratio  Training ratio  
25%  55%  75%  95%  25%  55%  75%  95%  
GapCn  60  60  61  61  74  74  78  79 
GapApn  59  60  60  65  74  77  78  78 
GapMlp  56  63  66  73  72  77  78  77 
Gap  96  97  97  98  81  83  84  84 
Here we give a brief ablation experiment to strengthen the case for Gap. Concretely, we seek to compare Gap with different baselines to further motivate our choice of (i) the way we model nodes using their neighborhood, (ii) the assumption that order in this neighborhood is not important, and (iii) the choice of the apn algorithm without the cnn or biLSTM encoder. To this end, we introduce the following baselines and apart from the indicated difference everything will be the same as Gap.

First we assume order is important in the neighborhood of nodes based on neighbors similarity with the current node. We use a topological feature to induce order, which is common neighbors. This baseline is referred to as GapCn and uses the exact apn model to capture “order”.

Second, we use the same input as in Gap, but nodes’ neighborhood is now randomly permuted and fed multiple times (concretely 5 times), and the exact apn model is employed; this baseline is referred to as GapApn

Finally we replace Gap
’s Attend component with a standard feedforward neural network that consumes the same input (Embedding matrices
and ) and also has the same learning objective specified in Eq. 2; the baseline is referred to as GapMlp.
In Table 7 we report the results of the ablation experiment. This sheds some light on the assumptions and the design choices of Gap. For the reported results, GapCn and GapApn use a cnn Encoder. In both cases, they quickly overfit the data, and we found out that we have to employ aggressive regularization using a dropout rate of 0.95. In addition, for GapCn we have also observed that as we increase the kernel size to greater values than 1 (up to 5) the results keep getting worse, and hence, what we reported is the best one which is obtained using a kernel size of 1. For example, with a kernel size of 3 and training ratios of 25, 55, 75, and 95 percent the AUC scores respectively dropped to 54, 55, 56, and 58 percent on the Cora dataset, and 66, 69, 78 and 77 percent on the Email dataset. We conjecture that this is due the model’s attempt to enforce highlevel neighborhood patterns (eg. a combination of arbitrary neighbors) that are not intrinsically governing the underlying edge formation phenomena. Rather, what is important is to effectively pay attention to the presence of individual neighbors both in and regardless of their order. Apparently, training this model is at least twice slower than Gap as it is also illustrated in Section 4.3.
In the case of GapApn, though the variations in AUC are marginal with respect to the change in the kernel size, the training time of these model has increased almost by an order of magnitude. Finally we see that the mutual attention mechanism (Attend component) plays an important role by comparing the results between the GapMlp and Gap.
5 Related Work
NRL is usually carried out by exploring the structure of the graph and meta data, such as node attributes, attached to the graph (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Perozzi et al., 2016; Wang et al., 2016; Yang et al., 2015; Pan et al., 2016; Sheikh et al., 2019; Kefato et al., 2017; Sheikh et al., 2019). Random walks are widely used to explore local/global neighborhood structures, which are then fed into a learning algorithm. The learning is carried out in unsupervised manner by maximizing the likelihood of observing the neighbor nodes and/or attributes of a center node.
Recently graph convolutional networks have also been proposed for semisupervised network analysis tasks (Kipf and Welling, 2017; Hamilton et al., 2017; Wu et al., 2019; Velickovic et al., 2017; AbuElHaija et al., 2019). These algorithms work by way of aggregating neighborhood features, with a downstream objective based on partial labels of nodes, for example. All these methods are essentially different from our approach because they are contextfree.
Contextsensitive learning is another paradigm to NRL that challenges the sufficiency of a single representation of a node for applications such as, link prediction, product recommendation, ranking. While some of these methods (Tu et al., 2017; Zhang et al., 2018) rely on textual information, others have also shown that a similar goal can be achieved using just the structure of the graph (Epasto and Perozzi, 2019). However, they require an extra step of persona decomposition that is based on microscopic level community detection algorithms to identify multiple contexts of a node. Unlike the first approaches our algorithm does not require extra textual information and with respect to the second ones our approach does not require any sort of community detection algorithm.
6 Conclusion
In this study we present a novel contextsensitive graph embedding algorithm called Gap. It consumes node neighborhood as input feature, which are constructed based on an important assumption that their ordering is arbitrary. To learn representations of nodes Gap employs attentive pooling networks (apn). By exploiting the above assumption, it makes an important simplification of apn and gains more than 2X speed up over another SOTA method, which employs the exact apn. Furthermore, Gap consistently outperforms all the baselines and achieves up to 9% and 20% improvement over the best performing ones on the link prediction and node clustering tasks, respectively. In future we will investigate how node attributes can be incorporated and provide a theoretical framework on the relation between the neighborhood sampling and topological properties.
References
 Watch your step: learning graph embeddings through attention. CoRR abs/1710.09599. External Links: Link, 1710.09599 Cited by: 1st item.
 MixHop: higherorder graph convolutional architectures via sparsified neighborhood mixing. CoRR abs/1905.00067. External Links: Link, 1905.00067 Cited by: §5.
 BERT: pretraining of deep bidirectional transformers for language understanding. CoRR abs/1810.04805. External Links: Link, 1810.04805 Cited by: §1.
 Attentive pooling networks. CoRR abs/1602.03609. External Links: Link, 1602.03609 Cited by: §1.
 Is a single embedding enough? learning node representations that capture multiple social contexts. CoRR abs/1905.02138. External Links: Link, 1905.02138 Cited by: 4th item, §5.
 Node2Vec: scalable feature learning for networks. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, pp. 855–864. External Links: ISBN 9781450342322, Link, Document Cited by: 1st item, §5.
 Inductive representation learning on large graphs. CoRR abs/1706.02216. External Links: Link, 1706.02216 Cited by: §5.
 Embed, encode, attend, predict: the new deep learning formula for stateoftheart nlp models. External Links: Link Cited by: §1.

Mineral: multimodal network representation learning.
CONFERENCE In
Proc. of the 3rd International Conference on Machine Learning, Optimization and Big Data
, MOD’17. Cited by: §5.  Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), Cited by: §5.
 Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1 (1). External Links: ISSN 15564681, Link, Document Cited by: item 3.

Triparty deep network representation.
In
Proceedings of the TwentyFifth International Joint Conference on Artificial Intelligence
, IJCAI’16, pp. 1895–1901. External Links: ISBN 9781577357704, Link Cited by: 2nd item, §5.  DeepWalk: online learning of social representations. CoRR abs/1403.6652. External Links: Link, 1403.6652 Cited by: 1st item, §5.
 Walklets: multiscale graph embeddings for interpretable network classification. CoRR abs/1605.02115. External Links: Link, 1605.02115 Cited by: 1st item, §5.
 Deep contextualized word representations. CoRR abs/1802.05365. External Links: Link, 1802.05365 Cited by: §1.
 Gat2vec: representation learning for attributed graphs. JOURNAL In Journal of Computing, Cited by: §5.

A simple approach to attributed graph embedding via enhanced autoencoder
. CONFERENCE In Proceedings of the Eighth International Conference on Complex Networks and Their Applications (COMPLEX NETWORKS 2019), Studies in Computational Intelligence, Vol. 881, pp. 797–809. Cited by: §5.  A general framework for contentenhanced network representation learning. CoRR abs/1610.02906. External Links: Link, 1610.02906 Cited by: 2nd item.
 LINE: largescale information network embedding. CoRR abs/1503.03578. External Links: Link, 1503.03578 Cited by: 1st item, §5.
 CANE: contextaware network embedding for relation modeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 1722–1731. External Links: Link, Document Cited by: §1, §1, item 1, item 2, 3rd item, §4.1, §5.
 Graph attention networks. ArXiv abs/1710.10903. Cited by: §5.
 Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, pp. 2837–2854. Cited by: §4.2.
 Structural deep network embedding. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, New York, NY, USA, pp. 1225–1234. External Links: ISBN 9781450342322, Link, Document Cited by: §5.
 Simplifying graph convolutional networks. CoRR abs/1902.07153. External Links: Link, 1902.07153 Cited by: §5.
 Network representation learning with rich text information. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 2111–2117. External Links: ISBN 9781577357384, Link Cited by: 2nd item, §5.

Graph convolutional neural networks for webscale recommender systems
. CoRR abs/1806.01973. External Links: Link, 1806.01973 Cited by: §1.  Diffusion maps for textual network embedding. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems, NIPS’18, USA, pp. 7598–7608. External Links: Link Cited by: §1, item 1, item 2, 3rd item, §4.1, §5.
Comments
There are no comments yet.