1. Introduction
Graph is a powerful tool to represent diverse types of data including social networks, chemical networks, etc. Learning with graphs has been an active research topic recently and various representation learning methods on graphs (Perozzi et al., 2014; Tang et al., 2015; Cao et al., 2015; Tu et al., 2016; Yang et al., 2016; Grover and Leskovec, 2016; Kipf and Welling, 2017; Duran and Niepert, 2017; Veličković et al., 2018; Rossi et al., 2018; Hamilton et al., 2017; Xu et al., 2019; Ribeiro et al., 2017; Xu et al., 2018; Qu et al., 2019; Ma et al., 2019; Liu et al., 2019; Chami et al., 2019; Wu et al., 2019a; Cui et al., 2020) have been proposed. Given a graph, representation learning on the graph aims to learn a node embedding function that maps each node in the observational space to a latent space by capturing the graph structural information. The learned node embedding function can be used for many graphrelated tasks. For instance, node classification and link prediction are two basic tasks. Node classification aims to predict the label of unlabeled nodes in a graph based on a set of labeled nodes, while link prediction aims to predict the link status between a pair of nodes, based on a set of observed positive/negative links^{1}^{1}1Positive means there exists a link between a pair of nodes, and negative means no link between them. in a graph.
Existing representation learning methods on graphs have demonstrated to achieve stateoftheart performance on these tasks, e.g., node classification (Kipf and Welling, 2017; Veličković et al., 2018) and link prediction (Zhang and Chen, 2018). However, due to that the learnt node representations are not taskspecific, we note that existing methods could unintendedly leak important information. For instance, we observe that one can accurately infer the links in a graph from a node classifier trained on the learnt node representations; and one can also predict the node labels from a link predictor based on the node representations (See Table 2(b) in Section 5.2). Such information leakage could involve serious privacy issues. Take users in a social network (e.g., Twitter) as an example. Some users (e.g., celebrities) in the social network may just want to make their identities known to the public, but they do not want to expose their private social relationship (e.g., family relationship). Some other users (e.g., malicious users) do not want to reveal their identities, but do want to expose their social relationship with normal users to make themselves also look normal. Suppose the social network has deployed a user identity classification system (i.e., node classification) or friendship recommendation system (i.e., link prediction) using certain graph representation learning method. Then, if an adversary (e.g., insider) knows the method, he could thus infer the user’s private friendship links/identities.
Our work: In this paper, we aim to address the above privacy violation issue and propose a privacypreserving representation learning framework on graphs from the mutual information perspective. Specifically, our framework includes a primary learning task and a privacy protection task. We consider node classification and link prediction as the two tasks of interest^{2}^{2}2Note that our framework can be generalized to any graphrelated tasks.. Under this context, our framework includes three modules: node embedding function (for node representation learning), link predictor (uses the node representations to perform link prediction), and node classifier (uses node representations to perform node classification).
Then, we target the following two problems:

Problem 1: Link prediction + node privacy protection. The primary learning task is learning node representations such that the link predictor can achieve high link prediction performance, and the privacy protection task is to enforce that the learnt node representations cannot be used by the node classifier to accurately infer the node label.

Problem 2: Node classification + link privacy protection. The primary learning task is learning node representations such that the node classifier can achieve high node classification performance, and the privacy protection task is to enforce that the learnt node representations cannot be used by the link predictor to accurately infer the link status.
We formally formulate our problems using two mutual information objectives, which are defined on the primary learning task and the privacy protection task, respectively. Then, for Problem 1, the goal of the two objectives is to learn an embedding function such that: 1) The representations of a pair of nodes retain as much information as possible of the respective link status. Intuitively, when the pair of node representations keep the most information about the link status, the link predictor trained on the node representations could have the highest link prediction performance. 2) The node representation contains as less information as possible about the node label. Intuitively, when the learnt node representation preserves the least information on the node label, the node classifier trained on the node representations could have the lowest node classification performance. Similarly, for Problem 2, the goal is to learn an embedding function such that: 1) The node representation contains as much information as possible to facilitate predicting the node label. 2) The representations of node pairs retain as less information as possible to prevent inferring the link status.
However, the mutual information terms are challenging to calculate in practice, as they require to compute an intractable posterior distribution. Motivated by mutual information neural estimators
(Belghazi et al., 2018; Chen et al., 2016; Cheng et al., 2020), we convert the intractable mutual information terms to be the tractable ones via introducing variational (upper and lower) bounds. Specifically, each variational bound involves a variational posterior distribution, and it be parameterized via a neural network. Estimating the true mutual information thus reduces to training the parameterized neural networks. Furthermore, we propose an alternative training algorithm to train these neural networks.We finally evaluate our framework on multiple benchmark graph datasets. Experimental results demonstrate that without privacy protection, the learnt node representations by existing methods for the primary learning task can be also used to obtain high performance on the privacy protection task. However, with our proposed privacyprotection mechanism, the learnt node representations can only be used to achieve high performance for the primary learning task, while obtaining the performance for the privacy protection task close to random guessing. Our key contributions can be summarized as follows:

We propose the first work to study privacypreserving representation learning on graphs.

We formally formulate our problems via mutual information objectives and design tractable algorithms to estimate intractable mutual information.

We evaluate our framework on various graph datasets and results demonstrate the effectiveness of our framework for privacypreserving representation learning on graphs.
2. Related Work
2.1. Representation Learning on Graphs
Various representation learning methods on graphs have been proposed (Perozzi et al., 2014; Tang et al., 2015; Cao et al., 2015; Tu et al., 2016; Yang et al., 2016; Grover and Leskovec, 2016; Kipf and Welling, 2017; Duran and Niepert, 2017; Veličković et al., 2018; Rossi et al., 2018; Hamilton et al., 2017; Xu et al., 2019; Ribeiro et al., 2017; Xu et al., 2018; Qu et al., 2019; Ma et al., 2019; Wu et al., 2019b; Liu et al., 2019; Chami et al., 2019; Wu et al., 2019a; Cui et al., 2020) in the past several years. Graph representation learning based on graph neural networks have exhibit stronger performance than random walk and factorizationbased methods (Perozzi et al., 2014; Grover and Leskovec, 2016; Tang et al., 2015; Cao et al., 2015; Qiu et al., 2018). For instance, Graph Convolutional Network (GCN) (Kipf and Welling, 2017) is motivated by spectral graph convolutions (Duvenaud et al., 2015) and learns node representations, based on the graph convolutional operator, for node classification. HGCN (Chami et al., 2019) leverages both the expressiveness of GCN and hyperbolic geometry to learn node representations. Specifically, HGCN designs GCN operations in the hyperbolic space and maps Euclidean node features to embeddings in hyperbolic spaces with trainable curvatures at each layer. The learnt node representations make HGCN achieve both higher node classification performance and link prediction performance than Euclidean spacebased GCNs.
A few recent works (Velickovic et al., 2019; Sun et al., 2020; Peng et al., 2020) propose to leverage mutual information to perform unsupervised graph representation learning. For instance, Peng et al. (Peng et al., 2020)
propose a concept called Graphical Mutual Information (GMI), which measures the correlation between the entire graph and highlevel hidden representations, and is invariant to the isomorphic transformation of input graphs. By virtue of GMI, the authors design an unsupervised model trained by maximizing GMI between the input and output of a graph neural encoder. The learnt node representations of GMI are used for node classification and link prediction and GMI achieves better performance than other unsupervised graph representation learning methods.
Note that although our framework also adopts mutual information, its goal is completely different from mutual informationbased graph representation learning methods. Our goal is to learn privacypreserving node representations that consider both a primary learning task and a privacy protection task, while these existing methods mainly focus on learning node representations that achieve high performance for a primary learning task.
2.2. Mutual Information Estimation
Estimating mutual information accurately between high dimensional continuous random variable is challenging
(Belghazi et al., 2018). To obtain differentiable and scalable mutual information estimation, recent methods (Alemi et al., 2017; Belghazi et al., 2018; Oord et al., 2018; Poole et al., 2019; Hjelm et al., 2019; Cheng et al., 2020) propose to first derive mutual information (upper or lower) bounds by introducing auxiliary variational distributions and then train parameterized neural networks to estimate variational distributions and approximate true mutual information. For instance, MINE (Belghazi et al., 2018) treats mutual information as the KL divergence between the joint and marginal distributions, converts it into the dual representation, and obtains a lower mutual information bound. Cheng et al. (Cheng et al., 2020) propose a Contrastive Logratio Upper Bound (CLUB) of mutual information. CLUB bridges mutual information estimation with contrastive learning (Oord et al., 2018), and mutual information is estimated by the difference of conditional probabilities between positive and negative sample pairs.
2.3. Other PrivacyPreserving Techniques
Differential privacy (DP) and homomorphic encryption (HE) are two other types of methods that ensure privacy protection. However, DP incurs utility loss and HE incurs intolerable computation overheads. Mutual information is a recent methodology that protects privacy based on information theory (Li et al., 2020). Compared to DP and HE, the mutual informationbased method is demonstrated to be more efficient or/and effective. Motivated by these advantages, we adopt mutual information to study privacypreserving graph representation learning.
3. Background & Problem Definition
3.1. Representation Learning on Graphs
Let be an attributed graph, where is a node and is the total number of nodes; is a link between and ; is the adjacency matrix, where , if and , otherwise; and is the node feature matrix with the node
’s feature vector. The purpose of representation learning on graphs is to learn a node embedding function
, parameterized by , that maps each node ’s feature vector in the observational space to a feature vector in a latent space by capturing the graph structural information, i.e., , where we call the node representation. The learnt node representations can be used for various graphrelated tasks. In this paper, we mainly focus on two tasks of interest: node classification and link prediction.Node classification. Each node in the graph is associated with a label from a label set . Then, given a set of labeled nodes with the node representations as the training nodes, node classification is to take the training nodes and their learnt representations as input and learn a node classifier , parameterized by , that has a minimal loss on the training nodes. Suppose we use the crossentropy loss. Then, the objective function of node classification is defined as follows:
where is an indicator vector whose th entry is 1, and 0, otherwise. With the learnt , we can predict the label for each unlabeled node as .
Link prediction. Given a set of positive links (i.e., ) and a set of negative links (i.e., ) as the training links. Link prediction is to take the training links and the associated nodes’ representations as input and learn a link predictor , parameterized by , that has a minimal reconstruction error on the training links. Specifically, the objective function of link prediction we consider is as follows:
With the learnt , we predict a link between unlabeled pair of nodes and if , predict no link, otherwise.
3.2. Problem Definition
Node classification and link prediction are two graphrelated tasks. However, in existing graph representation learning methods, given a primary learning task, one can also obtain promising performance for the other task with the learnt node representations. That is, one can accurately infer the link status between nodes (or infer the node label) even if the primary learning task is node classification (or link prediction) (See Table 2(a) and Table 2(c) in Section 5.2). Such a phenomenon could induce privacy concerns in practical applications. For instance, a celebrity in Twitter just wants to share his identity, but does not want to reveal his private family relationship. A malicious user in Twitter does not want to expose his identity, but does want to make his social relationship with normal users known to the public, in order to let himself also look normal.
We highlight that the root cause of the above consequences is that when learning node representations for a task, existing methods do not consider protecting privacy for other tasks. To address the issue, we are motivated to propose privacypreserving representation learning methods on graphs. We mainly consider the node classification and link prediction tasks, where one is the primary learning task and the other is the privacy protection task. Therefore, our problem involves three modules: node embedding function (for node representation learning), link predictor (uses the node representations to perform link prediction), and node classifier (uses node representations to perform node classification). In particular, we study the following two problems, each involving a primary learning task and a privacy protection task.

Problem 1: Link prediction + node privacy protection. In this problem, our primary learning task is learning node representations such that the link predictor can achieve high link prediction performance, and our privacy protection task is to enforce that the learnt node representations cannot be used by the node classifier to accurately infer the node label.

Problem 2: Node classification + link privacy protection. In this problem, our primary learning task is learning node representations such that the node classifier can achieve high node classification performance, and our privacy protection task is to enforce the learnt node representations cannot be used by the link predictor to accurately infer the link status.
In the next section, we will formally formulate our two problems and design algorithms to solve the problems.
4. PrivacyPreserving Representation Learning on Graphs
We formulate our problems via mutual information. Specifically, we define two mutual information objectives that are associated with the primary learning task and the privacy protection task, respectively. However, the mutual information terms are challenging to calculate in practice. Then, we convert them to be the tractable ones via designing variational bounds, and each bound can be estimated by a parameterized neural network. Finally, we propose algorithms to train these neural networks to achieve high performance for the primary learning task, and performance close to random guessing for the privacy protection task. Figure 1 overviews our privacypreserving graph representation learning framework.
4.1. Problem 1: Link Prediction with Node Privacy Protection
4.1.1. Formulating Problem 1 with mutual information objectives
Suppose we have a set of samples , , consisting of node features , node label , link status
. We define the probability distribution associated with a node (i.e., node features
and node label ) as . Moreover, we define the probability distribution associated with a link (i.e., a pair of node features and , and the associated link status ) as ^{3}^{3}3Note that for notation simplicity, we slightly abuse the notations , , and . That is, these notations are originally used for the link status between nodes and , ’s raw feature vector and ’s label in the graph. Here, we also use them to indicate random variable/vector.
.Our goal is to learn an embedding function to transform to the representation such that: 1) The node representation of a pair (e.g., and ) retain as much information as possible on the link status (e.g., ). Intuitively, when the representations of the node pair keep the most information about the link, the link predictor trained on the node representations could have the highest link prediction performance. 2) The node representation (e.g., ) contains as less information as possible about the node label (e.g., ). Intuitively, when the node representation preserves the least information on the node label, the node classifier trained on the node representation could have the lowest node classification performance. Formally, to achieve 1) and 2), we have the following two respective mutual information objectives:
(1)  
(2) 
where is the random vector after applying the embedding function on . is the mutual information between and the joint (, ), which indicates the information (, ) kept for the link variable . We maximize such mutual information to enhance the link prediction performance. is the mutual information between and and indicates the information preserves for the label variable . We minimize such mutual information to protect node privacy. Ideally, if , no node’s label can be inferred from its node representation, i.e., no node classifier can perform better than random guessing.
4.1.2. Estimating mutual information via tractable variation lower bound and upper bound.
In practice, the mutual information terms in Equations 1 and 2
are hard to compute as the random variables are potentially highdimensional and mutual information terms require to know posterior distributions that are challenging to calculate. To address the challenge, we are inspired by existing mutual information neural estimation methods
(Alemi et al., 2017; Belghazi et al., 2018; Oord et al., 2018; Poole et al., 2019; Hjelm et al., 2019; Cheng et al., 2020), which convert the intractable mutual information calculation to the tractable one by designing variational bounds. Specifically, we first obtain a mutual information variational lower bound for Equation 1 and a mutual information variational upper bound for Equation 2 by introducing two auxiliary posterior distributions, respectively. Then, we parameterize each auxiliary distribution with a neural network, and approximate the true posteriors by maximizing the variational lower bound and minimizing the variational upper bound through training the involved neural network.Maximizing the mutual information in Equation 1. To solve Equation 1, we derive the following variational lower bound:
(3) 
where
is the KullbackLeibler divergence between two distributions
and and is nonnegative. is an (arbitrary) auxiliary posterior distribution. is the variational lower bound of the true mutual information and is a constant. Note that the lower bound is tight when the auxiliary distribution becomes the true posterior distribution .Our target now is to maximize the lower bound by estimating the auxiliary posterior distribution via a parameterized neural network. Specifically, we have
(4) 
Minimizing the mutual information in Equation 2. To solve Equation 2, we leverage the variational upper bound in (Cheng et al., 2020):
where is an auxiliary distribution of that needs to satisfy the following condition (Cheng et al., 2020):
(5) 
To achieve Inequality 5, we need to minimize:
(6) 
where we have the last Equation because the first term in the secondtolast Equation is irrelevant to .
Finally, achieving Equation 2 becomes solving the following adversarial training objective:
(7) 
Remark. The above objective function can be interpreted as an adversarial game between an adversary who aims to infer the label from and a defender (i.e., the embedding function ) who aims to protect the node privacy from being inferred.
Implementation via training parameterized neural networks. We solve Equation 4 and Equation 7 in practice via training two parameterized neural networks associated with the two auxiliary posterior distributions and . With it, we expect to obtain high link prediction performance for our primary learning task and low node classification performance for our privacy protection task.
To solve Equation 4, we first sample a set of triplets from the graph . Then, we parameterize the variational posterior distribution via a link predictor defined on the node representations and of the sampled node pairs and . Suppose we are given a set of positive links and a set of negative links , then we have
(8) 
To solve Equation 7, we first sample a set of labeled nodes , and then we parameterize via a node classifier defined on the node representations of these labeled nodes. Suppose we sample a set of labeled nodes , then we have
(9) 
Combining Equation 8 and Equation 9, we have the final objective function for our Problem 1 as follows:
(10) 
where is a tradeoff factor to balance between achieving high link prediction performance and low node classification performance.
Note that Equation 10 involves optimizing three neural networks: the node embedding function , the link predictor , and the node classifier . We alternatively train the three neural networks. Specifically, in each round, we perform several iterations of gradient descent to update , several iterations of gradient ascent to update , and several iterations of gradient descent to update . We iteratively perform these steps until reaching a predefined maximal number of rounds or the convergence condition. Algorithm 1 in Appendix illustrates the training procedure of these networks.
4.2. Problem 2: Node Classification with Link Privacy Protection
4.2.1. Formulating Problem 2 with mutual Information.
In this problem, our goal is to learn an embedding function to transform to the representation such that: 1) The representation of a node (e.g., ) contains as much information as possible to facilitate predicting the node label (e.g., ). 2) The representation of node pairs (e.g., and ) retain as less information as possible to prevent inferring the link status (e.g., ). Formally, to achieve 1) and 2), we have the following two mutual information objectives:
(11)  
(12) 
4.2.2. Estimating mutual information via tractable variation lower bound and upper bound.
Similarly, we first obtain a lower bound for Equation 11 and an upper bound for Equation 12 by introducing two auxiliary posterior distributions, respectively. Then, we parameterize each auxiliary distribution with a neural network, and train each neural network to maximize the lower bound or minimize the upper bound, respectively.
Maximizing the mutual information in Equation 11. To solve Equation 11, we have the following variational lower bound
(13) 
Note that the variational lower bound is tight when the auxiliary distribution becomes the true posterior distribution . Now, we maximize the variational lower bound to achieve Equation 11 by estimating . Specifically, we have
(14) 
Minimizing the mutual information in Equation 12. To solve Equation 12, we derive the vCLUB motivated by (Cheng et al., 2020) and have
(15) 
where is an auxiliary distribution of that needs to satisfy the following condition:
(16) 
That is,
is a mutual information upper bound if the variational joint distribution
is closer to the joint distribution than to .To achieve Inequality 16, we need to minimize the KLdivergence as follows:
Finally, our target to achieve Equation 12 becomes the following adversarial training objective:
(17) 
Remark. The above objective function can be interpreted as an adversarial game between an adversary who aims to infer the link from the pair of node representations and ; and a defender (i.e., the embedding function ) who aims to protect the link privacy from being inferred.
Implementation via training parameterized neural networks. We solve Equation 14 and Equation 17 in practice via training two parameterized neural networks. With it, we expect to obtain a high node classification performance for our primary learning task and a low link prediction performance for our privacy protection task.
Similar to solving Problem 1, to solve Equation 14, we first sample a set of labeled nodes , and then we parameterize the variational posterior distribution via a node classifier defined on the node representation of these labeled nodes. Suppose we sample a set of labeled nodes , then we have
(18) 
To solve Equation 17, we first sample a set of triplets from the graph . Then, we parameterize via a link predictor defined on the node representation and of the sampled node pairs and . Depending on the real scenarios, we can protect a set of positive links with or/and a set of negative links with . In our experiments, we consider protecting both positive links and negative links. Suppose we are given a set of positive links and a set of negative links , then we have
(19) 
Combining Equation 18 and Equation 19, we have the final objective function for our Problem 2 as follows:
(20) 
where is a tradeoff factor to balance between achieving high node classification performance and low link prediction performance.
Similar to Problem 1, our objective function for Problem 2 in Equation 20 involves optimizing three neural networks: the node embedding function , the link predictor , and the node classifier . We alternatively train the three neural networks. Algorithm 2 in Appendix illustrates the training procedure of these networks.
Dataset  #Nodes  #Edges  #Features  #Labels 
Cora  2,708  5,429  1,433  7 
Citeseer  3,327  4,732  3,703  6 
Pubmed  19,717  44,338  500  3 




5. Evaluation
5.1. Experimental Setup
Dataset description. We use three benchmark citation graphs (i.e., Cora, Citeseer, and Pubmed) (Sen et al., 2008) to evaluate our method. In these graphs, each node represents a documents and each edge indicates a citation between two documents. Each document treats the bagofwords feature as the node feature vector and also has a label. Table 1 shows basic statistics of these citation graphs.
Representation learning methods. We select three graph neural networks, i.e., GCN (Kipf and Welling, 2017), GAT (Veličković et al., 2018), HGCN (Chami et al., 2019) as the representative graph representation learning methods. Each method learns node representations for both node classification and link prediction. Specifically, in these methods, the input layer to the secondtolast layer are used for learning node representations. Suppose node ’s representation is . When performing node classification, all these methods train a (Euclidean) softmax classifier in the last layer, i.e., , where is a transpose and . When performing link prediction, GCN and GAT train a parameterized bilinear link predictor , i.e., ; HGCN trains a FermiDirac decoder (Krioukov et al., 2010; Nickel and Kiela, 2017) as the link predictor.
Note that these methods only focus on learning node representations for solving the primary learning task and do not consider protecting the privacy for the other task. Furthermore, we apply our framework to learn privacypreserving node representations.
Training set, validation set, and testing set. Following existing works (Chami et al., 2019; Zhang and Chen, 2018; Kipf and Welling, 2017), in each graph dataset, for node classification, we randomly sample 20 nodes per class to form the training set, randomly sample 500 nodes in total as the validation set, and randomly sample 1,000 nodes in total as the testing set. For link prediction, we randomly sample 85% positive links and 50% negative links for training, sample 5% positive links and an equal number of negative links for validation, and use the remaining 10% positive links and sample an equal number of negative links for testing.
Parameter setting.
We train our framework on the training set and tune hyperparameters to select the model with the minimal error on the validation set. Then, we use the selected model to evaluate the testing set. By default, we set the tradeoff factor
to be 0.5 during training. We also study the impact ofin our experiments. We train all graph neural networks using the publicly available source code. We implement our framework in PyTorch.
Evaluation metric. Following previous works (Chami et al., 2019; Zhang and Chen, 2018; Kipf and Welling, 2017, 2016), we use AUC to evaluation the link prediction performance and use accuracy to evaluate the node classification performance.
5.2. Experimental Results
5.2.1. Link prediction without/with node privacy protection
In this experiment, we consider link prediction as the primary learning task, and node classification as the privacy protection task. Table 2(a) shows the performance of the two tasks without node privacy protection by existing methods. Specifically, we use the three graph neural networks, i.e., GCN, GAT, and HGCN, to learn node representations, and use them to train a link predictor for link prediction. Next, we also leverage these node representations to train a node classifier. We have the following observations: 1) All these methods achieve very high AUCs on the three graphs, i.e., almost all AUCs are above 90%, demonstrating their effectiveness for link prediction. 2) Although these node representations are not specially learnt for node classification, they can be used by the node classifier to accurately infer the node labels, thus leaking node privacy. For instance, all the methods obtain the accuracies around/above 70% and they perform significantly better than random guessing.
Table 2(b) shows the performance of the two tasks with node privacy protection. Specifically, we use our framework to learn node representations, the link predictor, and the node classifier, simultaneously. We observe that our framework achieves an utilityprivacy tradeoff. In particular, our framework has a tolerable AUC drop, compared with AUCs in Table 2(b). However, our framework obtains much lower accuracies than those in Table 2(b). In some cases, the accuracies are close to random guessing, demonstrating a nearly perfect node privacy protection. The above results validate that our framework is effective for link prediction, as well as for protecting node privacy.
5.2.2. Node classification with link privacy protection
In this experiment, we consider node classification as the primary learning task, and link prediction as the privacy protection task. Table 2(c) shows the performance of the two tasks without link privacy protection by existing methods. Similarly, we use the three graph neural networks to learn node representations, and use them to train a node classifier for node classification. We have similar observations as results shown in Table 2(a). First, all methods achieve promising accuracies on the three graphs, i.e., close to the results shown in (Kipf and Welling, 2017; Veličković et al., 2018; Chami et al., 2019). Next, we leverage these node representations to train a link predictor to infer link status. We observe that these methods obtain AUCs significantly larger than those obtained by random guessing, thus leaking link privacy seriously.
Table 2(d) shows the performance of the two tasks with link privacy protection. Our framework achieves an utilityprivacy tradeoff, similar to results in Table 2(d). First, our framework has slightly accuracies degradation (around 1%4%), compared with accuracies in Table 2(b). Second, our framework obtains much lower AUCs and almost all of these AUCs are close to random guessing. The results again validate that our framework is effective for node classification, as well as for protecting link privacy.
5.2.3. Impact of the tradeoff factor
In this experiment, we study the impact of the tradeoff factor in our framework. Figure 2 and Figure 3 show the performance on the three graphs vs. different for protecting node privacy and protecting link privacy, respectively. We have the following key observations: 1) When , our framework only considers protecting node/link privacy, and achieves the lowest performance (i.e., close to random guessing) for inferring the node label/link status. However, the performance for the primary learning task is also the worst (i.e., random guessing). 2) When , our framework only considers primary task learning and achieves the highest performance for the link prediction/node classification. However, it also obtains the highest performance for inferring the node label/link status, thus leaking the most information of nodes/link status. 3) When , our framework considers both primary learning and privacy protection. We note that our framework is not sensitive to ’s value in this range. That is, the performance of graph neural networks based on our framework for primary learning and privacy protection are relatively stable across all ’s in this range.
6. Conclusion
We propose the first framework for privacypreserving representation learning on graphs from the mutual information perspective. Our framework includes a primary learning task and a privacy protection task. The goal is to learn node representations such that they can be used to achieve high performance for the primary learning task, while obtaining low performance for the privacy protection task (e.g., close to random guessing). We formally formulate our goal via mutual information objectives. However, mutual information is challenging to compute in practice. Motivated by mutual information neural estimation, we derive tractable variational bounds for the mutual information, and parameterize each bound via a neural network. Next, we train these neural networks to approximate the true mutual information and learn privacypreserving node representations. We evaluate our framework on various graph datasets and show that our framework is effective for learning privacypreserving node representations on graphs.
Acknowledgements. We thank the anonymous reviewers for their constructive comments. This work is supported by the Amazon Research Award. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies.
References
 Deep variational information bottleneck. In ICLR, Cited by: §2.2, §4.1.2.
 Mutual information neural estimation. In ICML, Cited by: §1, §2.2, §4.1.2.
 Grarep: learning graph representations with global structural information. In CIKM, Cited by: §1, §2.1.

Hyperbolic graph convolutional neural networks
. In NeurIPS, Cited by: §1, §2.1, §5.1, §5.1, §5.1, §5.2.2.  Infogan: interpretable representation learning by information maximizing generative adversarial nets. In NIPS, Cited by: §1.
 CLUB: a contrastive logratio upper bound of mutual information. In ICML, Cited by: §1, §2.2, §4.1.2, §4.1.2, §4.2.2.
 Adaptive graph encoder for attributed graph embedding. In KDD, Cited by: §1, §2.1.
 Learning graph representations with embedding propagation. In NIPS, Cited by: §1, §2.1.
 Convolutional networks on graphs for learning molecular fingerprints. In NIPS, Cited by: §2.1.
 Node2vec: scalable feature learning for networks. In SIGKDD, Cited by: §1, §2.1.
 Inductive representation learning on large graphs. In NIPS, Cited by: §1, §2.1.
 Learning deep representations by mutual information estimation and maximization. In ICLR, Cited by: §2.2, §4.1.2.
 Variational graph autoencoders. In NIPS Workshop, Cited by: §5.1.
 Semisupervised classification with graph convolutional networks. ICLR. Cited by: §1, §1, §2.1, §5.1, §5.1, §5.1, §5.2.2.
 Hyperbolic geometry of complex networks. Physical Review E. Cited by: §5.1.

TIPRDC: taskindependent privacyrespecting data crowdsourcing framework for deep learning with anonymized intermediate representations
. In KDD, Cited by: §2.3.  Hyperbolic graph neural networks. In NeurIPS, Cited by: §1, §2.1.
 Graph convolutional networks with eigenpooling. In KDD, Cited by: §1, §2.1.
 Poincaré embeddings for learning hierarchical representations. In NIPS, Cited by: §5.1.
 Representation learning with contrastive predictive coding. arXiv. Cited by: §2.2, §4.1.2.
 Graph representation learning via graphical mutual information maximization. In WWW, Cited by: §2.1.
 Deepwalk: online learning of social representations. In SIGKDD, Cited by: §1, §2.1.
 On variational bounds of mutual information. In ICML, Cited by: §2.2, §4.1.2.
 Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM, Cited by: §2.1.
 GMNN: graph markov neural networks. In ICML, Cited by: §1, §2.1.
 Struc2vec: learning node representations from structural identity. In KDD, Cited by: §1, §2.1.
 Deep inductive network representation learning. In WWW, Cited by: §1, §2.1.
 Collective classification in network data. AI magazine. Cited by: §5.1.
 Infograph: unsupervised and semisupervised graphlevel representation learning via mutual information maximization. In ICLR, Cited by: §2.1.
 Line: largescale information network embedding. In WWW, Cited by: §1, §2.1.
 Maxmargin deepwalk: discriminative learning of network representation. In IJCAI, Cited by: §1, §2.1.
 Graph attention networks. In ICLR, Cited by: §1, §1, §2.1, §5.1, §5.2.2.
 Deep graph infomax.. In ICLR, Cited by: §2.1.
 Simplifying graph convolutional networks. In ICML, Cited by: §1, §2.1.
 Demonet: degreespecific graph neural networks for node and graph classification. In KDD, Cited by: §2.1.
 How powerful are graph neural networks?. In ICLR, Cited by: §1, §2.1.
 Representation learning on graphs with jumping knowledge networks. In ICML, Cited by: §1, §2.1.

Revisiting semisupervised learning with graph embeddings
. In ICML, Cited by: §1, §2.1.  Link prediction based on graph neural networks. In NIPS, Cited by: §1, §5.1, §5.1.