Inductive Link Prediction for Nodes Having Only Attribute Information

by   Yu Hao, et al.
The Chinese University of Hong Kong

Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute information. It is more challenging since the new nodes do not have structure information and cannot be seen during the model training. To solve this problem, we propose a model called DEAL, which consists of three components: two node embedding encoders and one alignment mechanism. The two encoders aim to output the attribute-oriented node embedding and the structure-oriented node embedding, and the alignment mechanism aligns the two types of embeddings to build the connections between the attributes and links. Our model DEAL is versatile in the sense that it works for both inductive and transductive link prediction. Extensive experiments on several benchmark datasets show that our proposed model significantly outperforms existing inductive link prediction methods, and also outperforms the state-of-the-art methods on transductive link prediction.



There are no comments yet.


page 1

page 2

page 3

page 4


Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction

Many practical graph problems, such as knowledge graph construction and ...

Link Prediction with Contextualized Self-Supervision

Link prediction aims to infer the existence of a link between two nodes ...

BSAL: A Framework of Bi-component Structure and Attribute Learning for Link Prediction

Given the ubiquitous existence of graph-structured data, learning the re...

Compositional Network Embedding

Network embedding has proved extremely useful in a variety of network an...

Enhance Ambiguous Community Structure via Multi-strategy Community Related Link Prediction Method with Evolutionary Process

Most real-world networks suffer from incompleteness or incorrectness, wh...

Learning Attribute-Structure Co-Evolutions in Dynamic Graphs

Most graph neural network models learn embeddings of nodes in static att...

Effective and Efficient Network Embedding Initialization via Graph Partitioning

Network embedding has been intensively studied in the literature and wid...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Link prediction is a fundamental task in graph data analytics [Liben-NowellK07]

. Many real-world applications can benefit from link prediction, such as recommendations, knowledge graph completion, etc 

[bhagavatula2018content, kazemi2018simple, xu2019link]

. Graph embedding, which represents the nodes on the graph by low-dimensional vectors, has been proved as an effective approach for link prediction on attributed graphs (e.g., 

[kipf2017semi, bojchevski2018deep]).

Generally, there are two types of link prediction, i.e., transductive and inductive, as shown in Figure 1. Most existing (attributed) graph embedding approaches focus on transductive link prediction, where both nodes are already in the given graph and can be seen during the training process, such as GCN [kipf2017semi], GAE [kipf2016variational], and SEAL [zhang2018link]. However, many real-world applications need inductive link prediction which requires embeddings to be quickly generated for new nodes with only attribute information (e.g., a new user in a recommender system). SDNE [wang2016structural] and GraphSage [hamilton2017inductive] can compute embeddings for new nodes but the edges of nodes are required. G2G [bojchevski2018deep] can perform inductive link prediction for the unseen nodes without local structures. However, it cannot well distinguish the nodes with similar attributes, because this model does not well capture the structure information in the node representations.

Figure 1: Inductive and transductive link predictions.

In this work, we propose a novel graph embedding model called Dual-Encoder graph embedding with ALignment (Deal) for inductive link prediction of new nodes with only attribute information. We aim to learn the connections between the nodes’ attributes and the graph structure through this model. The model embeds the graph nodes into the vector space, and it can compute an embedding vector for the new query node with only attributes which is compared to another node’s embedding for link prediction.

Figure 2: An illustration of the proposed model Deal.

As shown in Figure 2, Deal has three components: an attribute-oriented encoder , a structure-oriented encoder , and an alignment mechanism. The attribute-oriented encoder maps the nodes’ attributes to embeddings in the vector space. Given two linked attributed nodes, the similarity of their embedding vectors computed by is high. is used for computing the embedding vectors of the new nodes with attributes. computes the node embeddings that preserve the structure information. Given two linked nodes, the similarity of their embedding vectors computed by is high. The alignment mechanism aligns the two types of embeddings to build the connections between the attributes and graph structure. The two encoders keep being updated during the training process such that their embeddings produced are aligned. In addition, we use a novel ranking-motivated loss that can be regularized by hyper-parameters, which is not considered in existing ranking loss-based graph embedding models. Although Deal focuses on inductive link prediction with only attribute information, it can perform transductive link prediction using both structural and attribute information as well. The main contributions of our approach are summarized as follows:

  • We design a model Deal for inductive link prediction of the new nodes with only attribute information given.

  • The proposed alignment mechanism builds the connections between attributes and graph structure, and improves the representation ability of node embeddings.

  • The experimental results on several real-world datasets show that our proposed model consistently outperforms the state-of-the-art models on both inductive (with at least 6% AP score improvement) as well as transductive link prediction.

2 Related Works

The early studies on link prediction usually have strong assumptions on when links may exist, and they are mainly based on different similarity measures such as the number of common neighbours, Jaccard coefficient, and Adamic-Adar to measure the node proximity  [Liben-NowellK07, zhou2009predicting, ZhaoAH16]

. They assume that the probability of the existence of the link between two nodes increases with their similarity. However, such assumptions may not hold in some real-world networks such as the protein-protein interaction (PPI) networks, because two proteins sharing many common neighbours are actually less likely to interact 


Recently, researchers have shown an increasing interest in solving the link prediction problem via graph embedding method [zhang2018link]. Graph embedding has been used widely [gao2018deep, gao2019progan, qiutemporal, 8970926], and it aims to map nodes to -dimensional vectors . Some graph embedding methods only capture the structural information of the graph, such as the random walk-based graph embedding approaches DeepWalk [perozzi2014deepwalk] and node2vec [grover2016node2vec] that adapt the Skip-Gram model and treat each generated path of nodes as the sequence of words, and SDNE [wang2016structural] that learns node embedding preserving both local and global structures. They do not leverage the attribute information and thus cannot be used for link prediction in attributed graphs.

For link prediction in attributed graphs, most existing studies focus on transductive link prediction, where both nodes are already in the graph. For example, GCN [kipf2017semi] requires that the full graph Laplacian is known during training; GAE [kipf2016variational] learns the node embeddings with a GCN encoder and an inner product decoder. SEAL [zhang2018link] is another GCN-based framework that solves the link prediction problem using local sub-graphs. To our best knowledge, only G2G [bojchevski2018deep]

is able to perform the task of inductive link prediction with only attribute information. It relies on a deep encoder that embeds each node as a Gaussian distribution. However, the output of its encoder is primarily based on the nodes’ attributes, and thus it cannot well distinguish the nodes with similar attributes. G2G will embed these nodes nearby in the vector space, because their structure information is not fully utilized.

In our model Deal, the alignment mechanism aligns the attribute embedding and the structure embedding, and both attribute and structure information can be well captured by the learned node representations, and thus it can achieve better link prediction performance.

3 Problem Statement

We study the problem on an attributed graph , with a node set (), an edge set , and an attribute feature matrix , where is the attribute feature vector of node ( is the number of attributes in the graph).

Given a graph as well as a pair of nodes , the link prediction task aims to predict the existence of the link between and . In transductive link prediction, both nodes and are already in the graph (i.e., and thus they can be seen during the training process). In this work, we focus on the inductive link prediction, where either or both and are not seen during the training process. During the prediction, only the attribute information of the new nodes is available, i.e., their local structures are unknown.

We propose to utilize the node embedding approach, which embeds the graph nodes into the vector space, to solve this problem. When performing prediction, for a new node with its attribute feature vector , the model outputs the node embedding for this node from and then compares ’s embedding with another node’s embedding to predict their relationship.

4 Our Model Deal

4.1 Overview

Figure 2 illustrates our model Deal, which consists of three components: two node embedding encoders and one alignment mechanism. We aim to predict the links for new nodes with only attribute information. This requires us to build the connections between the attributes and the links between nodes, which means that we need to compute an embedding vector for a given set of attributes. Our attribute-oriented encoder performs this task. It maps the nodes’ attributes to embeddings in the vector space. Given two linked nodes with their attributes, the similarity of their embedding vectors computed by is high.

However, when the nodes attributes are too uninformative or similar, such a single encoder is not sufficient to output useful embeddings to distinguish nodes with similar attributes, because they are all embedded nearby in the vector space. To remedy this issue, we use another structure-oriented encoder to compute node embeddings that preserve the graph structure information. As a result, given two linked nodes, the similarity of their embedding vectors computed by is high. Next, we propose an alignment mechanism to align the two types of embeddings. During the training process, the two encoders keep being updated in order to produce node embeddings that are aligned in the vector space. Finally, the connections between the attributes and the links are captured by the two encoders, yielding better embedding vector computation during the link prediction.

4.2 Attribute-oriented Encoder

The attribute-oriented encoder takes a node attribute vector (attributes of the node ) as the input and outputs a node embedding, denoted as , i.e., .

There are many neural network choices for learning

, and we choose the multilayer perceptron (MLP) with nonlinear activation layer  

[Goodfellow-et-al-2016] in this paper as follows:


with trainable parameters , , and , and the exponential linear unit  [clevert2015fast]. It is to be noted that the layer number of varies with different datasets. Here, we do not use GCN in , because we observe that aggregating too much information from a node’s neighbours may affect the representation ability of the node’s attributes for the link prediction task in attributed graphs. As shown in Section 5.5, we tried GCN-like encoders to obtain node embeddings from attributes, and the performance is not as good as using MLP. In addition, without aggregating the information from neighbours, the training process can be speeded up significantly.

4.3 Structure-oriented Encoder

The structure-oriented encoder aims to generate node embeddings that preserve the structural information of the graph without considering the attributes. We expect that the vectors of two linked nodes computed by

can have high similarity. To achieve this, we use the one-hot encoding of the nodes

(can be regarded as nodes’ identifiers [you2019position]) as the input of . can be seen as a function that maps node to its node embedding vector : . From the perspective of Occam’s razor, we adopt a linear model as the encoder :


where the weight normalization [salimans2016weight] is employed to reparameterize the parameter . In addition,

is able to accelerate the convergence of stochastic gradient descent optimization. By minimizing the ranking-motivated loss which will be presented in details in Section 

4.4, the final embedding is able to capture the structural information.

Note that the encoder can be another node embedding method in Deal that focuses on learning graph structure with corresponding input, such as GCN with the adjacency matrix as the input. However, we find that using GCN in decreases the link prediction performance, which is shown in Tables 2 and 3 in Section 5.

4.4 Alignment Mechanism and Model Training

We propose an alignment mechanism to align the embeddings generated by the two types of encoders to learn the connections between the node attributes and the graph structure. During the model training, the two encoders keep being updated in order to be able to produce the embeddings that are aligned in the vector space. We first introduce the loss functions we used, and then we describe the proposed alignment mechanism in our model, and we finally present the training algorithm for


Loss Function.

Learning graph embedding via ranking is based on the ranking-motivated loss, which has been proved to be effective in many studies [bojchevski2018deep, bhagavatula2018content]. In this paper, we propose a novel mini-batch learning method with a personalized ranking-motivated loss to learn the node embeddings with comprehensive representation performance.

Optimizing a ranking-motivated loss can help to capture the relationships between each pair of training samples. Contrastive loss [hadsell2006dimensionality] is one kind of ranking-motivated loss, and it is originally proposed to solve the dimensionality reduction problem. Given pairs of samples , the loss is shown as follows:


where is a distance function, if the samples and are deemed similar and otherwise, is the hinge function, and is the margin. The learning objective of Eq. 3 is to map similar input samples to nearby points in the output vector space and map dissimilar samples to distant points.

We have a similar learning objective in our problem that is to map the linked nodes (positive sample) in the graph to points that are close in the output vector space and map the unlinked nodes (negative sample) to points that are far away from each other. However, there is a problem when directly adapting contrastive loss for link prediction. The negative pair-wise samples have different distances in the graph, and thus using a fixed margin for all the negative samples in Eq. 3 is not appropriate, but it is difficult to set proper margins for the negative samples with different distances. Moreover, neither the existing work [bojchevski2018deep, bhagavatula2018content] nor Eq. 3 considers the regularization in the loss function, which is important and can further improve the prediction performance. Motivated by the above observations, we propose the following loss to be optimized for a given mini-batch of node pair samples where with (obtained by sampling pairs of nodes from the graph):


where is the function to measure the similarity between node embeddings and , is the link relation label with if and are connected and otherwise, is a weight function, and and are derived from the function with different hyper-parameters. Specifically, there are many choices of

, such as dot product, cosine similarity, etc. A high score of

indicates that the nodes are similar, and vice versa. We find that using the cosine similarity have good results in our model.

We use and because the regularization is not considered in Eq. 3. Inspired by that the logistic loss can be seen as a “soft” version of the hinge loss with an infinite margin [lecun2006tutorial] and is differentiable, we adopt the generalized logisitic loss function as follows:


where and are loss margin parameters that can tune regularization [masnadi2015view].

is a weight function to measure the importance of negative samples with different distances. Specifically, we define as follows:


where is a hyper-parameter, and denotes the shortest path distance between a node pair. If node cannot reach node , . This weight aims to help the model pay more attention to the close negative neighbours during the training process.

Alignment Mechanism.

By minimizing and , we can lean structure-oriented node embedding and attribute-oriented node embedding , respectively. However, if we learn them separately, the two types of embeddings are isolated and cannot well represent the connections between attributes and graph structure. We propose to align the two types of embeddings during the training process and learn the two encoders simultaneously. We design two alignment methods:

1. Tight Alignment (-) aims to maximize the similarity between and for each node . Mathematically, the objective of the tight alignment is to minimize


However, the tight method sometimes is too strict during the aligning the two types of embeddings.

2. Loose Alignment (-) aims to maximize the similarity between and of two linked nodes and , and it adopts the loss function in Eq. 4. Mathematically, the objective of the loose alignment is to minimize


Putting everything all together, the final objective of our model is as below:


where is a hyper-parameter vector to parameterize the weights of different losses.

Training algorithm and prediction.

Algorithm 1 summarizes the training process of the proposed model.

Input: Graph ; a set of mini-batches ; loss weight and other hyper-parameters;
Output: Node embeddings and

1:  for  do
5:     Update and with stochastic gradient
6:  end for
7:  Update and via and
8:  return and
Algorithm 1 The learning process of Deal

To predict whether there is a link between two nodes and , we can calculate a score with and as follows

, (10)

where is another hyper-parameter vector used to give each similarity score a different weight. In inductive link prediction, for a new node , is computed by , and . Our model can also perform transductive link prediction by setting to a non-zero value.

5 Experiments

5.1 Datasets

Datasets Nodes Edges Attributes
CS ([shchur2018pitfalls]) 18,333 81,894 6,805
PPI ([Zitnik2017]) 1,767 16,159 50
Cora ([mccallum2000automating]) 2,708 5,278 1,433
CiteSeer ([sen2008collective]) 3,327 4,552 3,703
PubMed ([namata2012query]) 19,717 44,324 500
Computers ([mcauley2015image]) 13,752 245,861 767
Photo ([mcauley2015image]) 7,650 119,081 745
Table 1: Statistics of experimental datasets.

For link prediction tasks, we evaluate our proposed model and baselines on four types of real-world datasets, i.e., the co-authorship graph (CS), the protein-protein interactions graph (PPI), co-purchase graphs (Computers and Photo), and citation network datasets (Cora, CiteSeer and PubMed). Details of these datasets are summarised in Table 1.

Cora CiteSeer CS PubMed Computers Photo
MLP 0.826 0.674 0.897 0.789 0.921 0.810 0.842 0.705 0.866 0.692 0.901 0.753
Cite. 0.839 0.712 0.914 0.824 0.939 0.862 0.912 0.809 0.898 0.762 0.926 0.808
G2G 0.845 0.739 0.922 0.842 0.948 0.889 0.910 0.798 0.853 0.684 0.862 0.704
GCN-Deal 0.855 0.766 0.912 0.862 0.969 0.943 0.961 0.924 0.943 0.888 0.959 0.907
Deal 0.864 0.804 0.937 0.907 0.977 0.959 0.966 0.931 0.953 0.899 0.965 0.922
Table 2: The results of inductive link prediction.

5.2 Baseline Methods

We compare our model Deal with MLP and several state-of-the-art graph embedding methods, including SEAL [zhang2018link], G2G [bojchevski2018deep] and GAE [kipf2016variational]. In addition, the original GAE takes GCN as the encoder. We also consider other GAE variants, which replace the GCN encoder with GIN [xu2018how], GAT [velickovic2018graph] and SAGE [hamilton2017inductive] respectively. The GAE variants are denoted as their encoder model names.

Moreover, Deal variants can use different graph embedding models as structure-oriented or attributed-oriented encoders. Deal denotes the proposed encoders presented in Section 4. A Deal variant is denoted as -, using the model as the structure-oriented and the model as the attribute-oriented encoder. We select three representative variants. All the Deal variants aim to minimize Eq. 9

. To ensure fairness, we set all models with a similar amount of parameters and train them for the same number of epochs.

5.3 Experimental Setup

We evaluate the proposed model Deal and baseline models under both inductive and transductive learning settings.

Inductive link prediction.

For the inductive case, the nodes in the test set are unseen during the training process. Similar to the dataset split setting of [bojchevski2018deep], we randomly hide 10% nodes and use the edges between them for the test set. The remaining nodes and edges are used for training and validation.

Transductive link prediction.

For the transductive case, all the nodes on the graph can be seen during the training. Similar to the dataset split setting of [you2019position], we randomly sample 10%/10% edges and an equal number of non-edges as validation/test set. The remaining non-edges and 80% edges are used as the training set.

The test set performance will be reported when the model achieves the best performance on the validation set. For the experimental results, we report the mean area under the ROC curve (AUC) and the average precision (AP) scores over ten trials with different random seeds and train/validation splits. In all the experiments, the default embedding size is 64. For each training mini-batch, the linked node pairs account for 40%. We tune the hyper-parameters of baseline models and our proposed Deal with the grid search algorithm on the validation set.

5.4 Results of Inductive Link Prediction

As GCN-based models cannot aggregate neighbours’ information in the inductive link prediction scenario, we compare our proposed model with MLP and G2G, and the experimental results are shown in Table 2. It shows that Deal significantly outperforms MLP and G2G across all datasets. On the Computers dataset, for instance, Deal improves AUC and AP scores by 6.12% and 17.98%, respectively. By comparing GCN-Deal with Deal, it shows that using GCN-layer as cannot improve the link prediction performance. Also, it is observed that G2G performs worse when the graphs contain a small number of feature dimensions (compared with the number of nodes), such as Computers and Photo. The reason is that the node embedding encoder of G2G solely takes feature matrix as the input. As an extreme example, when the features of all the nodes are similar, it will be difficult to distinguish different nodes for G2G.

Cora CiteSeer CS PubMed PPI
GAT 0.8684 0.8866 0.8423 0.8662 0.9465 0.9473 0.9193 0.9202 0.8092 0.8136
GCN 0.8670 0.8755 0.8466 0.8620 0.9452 0.9421 0.9287 0.9272 0.8384 0.8364
GIN 0.8666 0.8762 0.8405 0.8617 0.9432 0.9407 0.9262 0.9254 0.8086 0.8086
SAGE 0.8739 0.8881 0.8498 0.8721 0.9485 0.9504 0.9254 0.9270 0.8112 0.8131
Cite. 0.9145 0.9143 0.9385 0.9417 0.9501 0.9517 0.9435 0.9378 0.6047 0.5981
SEAL 0.8269 0.7959 0.8064 0.7769 0.9146 0.8856 0.9235 0.9239 0.8825 0.8749
P-GNN 0.8225 0.8427 0.8065 0.8436 0.8779 0.8811 0.8145 0.8647 0.7303 0.6716
G2G 0.9282 0.9336 0.9413 0.9421 0.9636 0.9640 0.9432 0.9364 0.5896 0.5333
GCN-Deal 0.9163 0.9047 0.9221 0.9219 0.9796 0.9801 0.9456 0.9498 0.8673 0.8709
Deal-GCN 0.9002 0.9075 0.8496 0.8747 0.9646 0.9663 0.9299 0.9348 0.8861 0.8868
Deal-GAT 0.8985 0.9091 0.8463 0.8752 0.9640 0.9666 0.9311 0.9355 0.8711 0.8762
Deal 0.9455 0.9501 0.9519 0.9591 0.9827 0.9841 0.9593 0.9611 0.8894 0.8973
Table 3: The results of transductive link prediction.

5.5 Results of Transductive Link Prediction

The experimental results of transductive link prediction are summarized in Table 3. The results show that our proposed model Deal achieves the best performance. For the baselines, G2G performs well on the citation networks and co-authorship graph, which have informative node attributes. It is worth noting that the number of node attributes in PPI is less than 10% of that in other datasets. In the PPI graph, where each node contains limited attribute information, G2G has the worst performance, while SEAL achieves outstanding performance.

Interestingly, the remaining GAE baseline models achieve comparable performance on all the datasets, although they have different methods of aggregating neighbours’ information. Moreover, the attention mechanism of GAT does not outperform other GNN layers in this scenario. The reason may be that the GAE variants are insensitive to the different information aggregation methods on the link prediction problem. Compared to the baselines, Deal is more robust and shows stronger generalization ability on different types of datasets. In addition, the Deal framework enables GAEs to achieve better performance.

5.6 Comparison of Alignment Methods

To compare different alignment methods in Section 4.4, we conduct both inductive and transductive link prediction experiments on three representative datasets, i.e., Cora, CS, and PubMed. The experimental results (Table 4) show that, on these three datasets, both two alignment methods are effective, and the loose alignment method slightly outperforms the tight alignment method, especially for the inductive link prediction task. The reason is that the loose alignment method places fewer restrictions on the node embedding alignment. The loose one also provides flexibility that the node embeddings and need by adjusting the hyper-parameters.

Cora CS PubMed
-I 0.845 0.774 0.972 0.951 0.951 0.905
-I 0.865 0.803 0.976 0.956 0.966 0.931
-T 0.939 0.942 0.976 0.978 0.955 0.958
-T 0.946 0.950 0.983 0.984 0.954 0.961
Table 4: Comparison of different alignment methods. We consider both inductive and transductive link prediction tasks, which are denoted as -I and -T respectively.
0.932 0.933 0.952 0.950 0.813 0.820
0.934 0.941 0.962 0.960 0.848 0.851
0.939 0.943 0.973 0.978 0.869 0.875
0.946 0.950 0.983 0.984 0.889 0.897
Table 5: The results of transductive link prediction with different values of and . Here, and denote the optimal hyper-parameter values found by grid search algorithm on each dataset.

5.7 Parameter Analysis

We here conduct experiments to analyse two key parameters, (Eq. 5) and (Eq. 6), in Deal. Table 5 indicates that the performance can be improved with tuning both of them, and plays a more important role than . The reason is that varying is able to regularize the loss (Eq. 4). It is interesting to note that different can also change the similarity of node pairs in the embedding space, as shown in Figure 3. It also indicates that the same node pair tends to have a higher similarity score in the structure-oriented embedding space than the one in the attribute-oriented embedding space. The reason is that the node embeddings in are separate, while there are correlations between the ones in , especially for those who have certain common attributes.

Figure 3: Comparison of similarity scores with different values of . Given a set of -hop node pairs , node embedding and , denotes the average cosine similarity of the pairs in . Other hyper-parameters in Deal are fixed. Here, we compare with and respectively for transductive link prediction task on CiteSeer.

6 Conclusions

In this work, we propose a novel model Deal to address the inductive link prediction problem on attributed graphs, where the local structure of the new node is unknown. Different from the typical GCNs that aggregate information from neighbours, Deal learns comprehensive node representations via two encoders and an alignment mechanism. We have experimentally shown that our proposed model Deal consistently outperforms state-of-the-art methods. In the future, we will develop more efficient training algorithms, so that our model can process large-scale datasets.


Xin Cao is supported by ARC DE190100663. Xike Xie is supported by NSFC (No. 61772492), Jiangsu NSF (No. BK20171240) and the CAS Pioneer Hundred Talents Program. Sibo Wang is supported by Hong Kong RGC ECS Grant (No. 24203419), CUHK Direct Grant (No. 4055114), and NSFC (No. U1936205).