With the rapid development of the internet, online activities produce a massive amount of data in various forms, such as text, images, and graphs, among which the graph data has attracted more and more attention and interest from the research community recently due to its powerful capabilities to illustrate the complex relations among different entities [19, 35]. To acquire informative semantics maintained in the graphs, researchers has developed a sort of graph representation learning methods, including graph embedding methods [25, 32]
and graph neural networks (GNNs)[19, 35]. Usually, the basic versions of such methods need to be trained with massive training data in a centralized manner. However, such a setting is prohibitive in real-world scenarios due to privacy concerns, commercial competitions, and increasingly strict regulations and laws [12, 10], such as GDPR111https://gdpr-info.eu/ and CCPA222https://oag.ca.gov/privacy/ccpa. Some researchers has introduced federated learning (FL) into graph learning domain to address it . FL is a distributed learning paradigm that can solve this limitation via training graph models on local devices and incorporating various privacy-preserving mechanisms, for example, differential privacy on graphs . Differential privacy (DP)  is one of the most widely used privacy preserving methods with strong guarantee . The basic idea of DP is to introduce a controlled level of noises into to query results to perturb the consequence of comparing two adjacency dataset. There are several perspectives to implement DP in graph domain, such as node privacy, edge privacy, out-link privacy, and partition privacy , among which edge privacy provides meaningful privacy protection in many real-world applications and is more widely used 
. So, we mainly foucs on edge-level DP in this paper. The advantages of DP lie in that it can be easily achieved and has a solid theoretical guarantee. It has been proved that introducing noises that follow the specific Laplacian or Gaussian distributions can achieve the designated-level privacy protection according to the Laplacian or Gaussian mechanism. Such a property means that the DP mechanism has robustness against any post-processing and would never be compromised by any algorithms .
Despite the differential privacy mechanism in federated graph learning secures the classified information maintained in graphs and has theoretically-proven advantages, it degrades the performances of the graph learning models . There are few research works about alleviating such performance dropping. The representative work is that J. Wang et al.  proposed to generate and inject specific perturbed data and train the model with both clean and perturbed data to help the model gain the robustness to the DP noises. However, two significant limitations exist to adopting this strategy in federated graph learning. First, this strategy is designed for deep neural networks (DNNs) and image classification tasks. It cannot be directly used to address graph learning and GNNs problems. Second, the model training requires clean data. If DP processes the data uploaded from edge devices, the model stored on the cloud server could never acquire clean data to fulfil such a training process. Hence, we need new solutions for federated graph learning.
The intuition of addressing performance sacrifice in  is to obtain robustness to the noises introduced by DP, which could lead us to find new solutions. Note that the DP mechanism on graph edges introduces noises to perturb the proximity of the graph. Such an operation meets the definition of edge perturbation, which is one of the graph augmentations, in . Therefore, the DP mechanism on graph edges can be regarded as a kind of graph augmentation. Moreover, we notice that Y. You et al.  proposed to leverage graph contrastive learning to cooperate with graph augmentations to help the graph learning model obtain robustness against such noises. Inspired by this finding, we propose to utilise graph contrastive learning to learn from the graph processed by the DP mechanism. Graph contrastive learning (GCL) [46, 31, 36, 44, 9], a trending graph learning paradigm, leverages contrastive learning concepts to acquire critical semantics between augmented graphs, which has the robustness to the noises brought by augmentations . Besides the edge perturbation, there are other graph augmentations, such as node dropping, attribute masking, and subgraph sampling . GCL is an empirically-proven graph learning technique, which has achieved the SOTA performances on various graph learning tasks [46, 50], and is successfully applied to many real-world scenarios [43, 39].
Nevertheless, it is challenging and not straightforward to apply GCL in federated graph learning. First, noises introduced by DP follow some random distributions, which means that it is not simply deleting or adding edges in the graph to protect privacy. Instead, it perturbs the probability that a link exists between two nodes. To cooperate with graph learning settings, we need to convert the graph into a fully connected graph and use the perturbed probabilities to denote the edge weights. Second, to follow the real-world settings, many federated frameworks train models in a streaming manner, which means that the training batch size is 1. So, we cannot follow the settings in current GCL literature to collect negative samples in the same training batch[46, 44] to fulfil the contrastive learning protocol. In this paper, specifically, we maintain a stack with a limited size on each device to store previously trained graphs and randomly extract samples in the stack to serve as negative samples in each training round. More details can be found in the methodology part of this paper.
In summary, the contributions of this paper are listed as follows:
We propose a method to convert a graph to a table to facilitate current DP methods to achieve privacy-preserving on graphs.
This paper first proposes a novel federated graph contrastive learning method (FGCL). The proposed method can be a plug-in for federated graph learning with any graph encoders.
We conduct extensive experiments to show that contrastive learning can alleviate the performance dropping caused by the DP mechanism’s noise.
This section will give some background knowledge related to the research work of this paper, which are differential privacy and graph contrastive learning, respectively.
Ii-a Differential Privacy
Differential privacy (DP) is a tailored mechanism to the privacy-preserving data analysis , which is designed to defend against differential attack . The superiority of the DP mechanism has a theoretically-proven guarantee to control how much information leakage is during a malicious attack . Such a property means that the DP mechanism is immune to any post-processing and cannot be compromised by any algorithms . In other words, if the DP mechanism protects a dataset, the privacy loss in this dataset during a malicious attack would never exceed the given threshold, even if attacked by the most advance and sophisticated methods.
Formally, differential privacy is defined as follows:
 A randomized query algorithm satisfies -differentially privacy, iff for any dataset and that differ on a single element and any output of , such that
In the definition above, the dataset and that differ by only one single data item is named adjacent datasets, which is task-specific defined. For example, in graph edge privacy protection, a graph denotes a dataset, and the graph is ’s adjacent dataset if is derived from via adding or deleting an edge. The parameter in the definition denotes privacy budget , which controls how much privacy leakage by can be tolerated. A lower privacy budget indicates less tolerance to information leakage, that is, a higher privacy guarantee.
A general method to convert a deterministic query function into a randomized query function is to add random noise to the output of . The noise is generated by the specific random distributions calibrated to the privacy budget and , the global sensitivity of , which is defined as the maximal value of . In this paper, for simplicity, we consider Laplacian mechanism to achieve DP :
where , and denotes the Laplacian distribution.
Ii-B Graph Augmentations for Graph Contrastive Learning
Graph contrastive learning (GCL) [36, 26, 46, 50, 44] has emerged as a delicate tool for learning high-quality graph representations. Graph augmentation techniques contribute a lot to the success of GCL, playing a critical role in the GCL process. Many researchers focus on this part and endeavour to it. GraphCL  is one of the most impactful works in gcl domain, giving comprehensive and insightful analysis regarding graph augmentations, which summarized four basic graph augmentation methods and their corresponding underlying priors:
Node Drooping. A small portion of deleted vertex does not change semantics on graphs.
Edge Perturbation. Graphs have semantic robustness to scale-limited proximity variations.
Attribute Masking. Graphs have semantic robustness to losing partial attributes on nodes, edges, or graphs.
Subgraph Sampling. Local subgraphs can hint the entire semantics on the original graph.
In summary, the intuition behind graph augmentations is to introduce noises into the graph, and the augmented graph would be utilised in the following contrastive learning phase to help the model to learn the semantic robustness of the original graph. Such an intuition provides researchers with a potentially feasible way to bridge the graph-level privacy preserving and graph augmentations or GCL.
The graph augmentations mentioned above are randomly implemented, achieving suboptimal performances in GCL practices . GCA  is an improved version of GraphCL, which conducts adaptive augmentations to possess better performances. Besides aforementioned noise-based graph augmentations, there are other methods, such as MVGRL  and DSGC , which alleviate the semantic compromise via avoiding introducing noises to implement graph augmentations.
Details of the proposed method are given in this section with four parts involved.
The overview of the proposed method FGCL is demonstrated in Fig 1. There are five steps to conduct graph contrastive learning in the context of federated learning:
(1) Download global model. The initial parameters of the graph encoder and the classifier are stored on the server. Clients must first download these parameters before local training.
(2) Introduce noises into graph for DP. For each graph in current training batch on every client, they will be processed twice by Laplacian mechanism to introduce noises into graph proximity to obtain two augmented views for graph contrastive learning.
(3) Update local model via training. Following the training protocols of graph contrastive learning, the model’s parameters downloaded from the server will be updated.
(4) Upload updated parameters. Clients that get involved in current training round will upload the locally updated parameters to the server for global model updating.
(5) FedAvg & update the global model. The server utilises FedAvg algorithm  to aggregate the updated local parameters and update the global parameters. Specifically, the server collects uploaded parameter updates and averages them to acquire the global updates.
Iii-B Graph Edge Differential Privacy
Some research works adopted differential privacy to introduce noises to the node or graph embeddings to protect privacy . However such a strategy could be problematic in the scenarios of lacking initial features. It is essential for researchers to adopt DP methods on graph structures (e.g., node privacy and edge privacy ). Here we convert graph data to tabular data to utilise current differential privacy methods, such as Laplacian mechanism and Gaussian mechanism. Without loss of generality, we solely consider Laplacian mechanism in this paper.
Differential privacy methods, like Laplacian mechanism, originate from tabular data . The key to adapting such methods to preserve graph structural privacy is how to convert graph data to tabular data and how to define ’adjacent graph’ . Our method sets a pair of node indices as the key and the connectivity between these two nodes as the value to form a tabular dataset. As to ’adjacent graphs’, given two graphs and , they are ’adjacent graphs’ if and only if and , an illustrative example is shown in Fig 2. Let denotes the query function on the tables generated from ’adjacent graphs’. To defend differential attack, we need to introduce noises into look-up results , where is Laplacian random noise. Specifically, satisfies differential privacy, where is the sensitivity and is the privacy budget .
From a graph structural perspective, introducing noises to the values in the table generated from a graph is equivalent to perturbation on the proximity of the graph. Let denotes the adjacency matrix of the graph , then the perturbed adjacency matrix would be:
where . Privacy preserving scarifies the performances due to introducing noises into the graph. But such a process can be regarded as a graph augmentation, which is categorized into edge perturbation. The underlying prior of edge perturbation is that semantic robustness against connectivity variations 
. With augmented views of the same graph, the aims of graph contrastive learning is to maximize the agreement among these views via a contrastive loss function, such as InfoNCE, to enforce the graph encoders to acquire representations invariant to the graph augmentations . Graph contrastive learning can help to improve performances on various graph learning tasks, which is empirically proved by some works [46, 26]. Therefore, we can leverage the advantages of graph contrastive learning to alleviate the performance dropping while preserving privacy. Moreover, it is worth noting that the entries in the adjacent matrix are binary and cannot be the decimal which is derived from the original entry via adding noises. So, in practices, we convert the the original graph into a full-connected graph and store the protected proximity information on edge weights alternatively, instead of adjacent matrix.
To better understand the graph structural privacy preserving in this section, we give an illustrative example here, shown in Fig 2. Let us assume that there is a social network consists of three participants. At the begining, has connections with both and . Then, there is a new connection built between and . If there is a malicious attacker who initiated a differential attack on this social network in two different time slot, he or she will be aware that there is a change of relationship between and , which could be private information. To defend such attacks, we introduce noise to perturb the values, in which case the attacker would be confused if two participants are connected or not via single or several differetial attacks. Because the expectation of the introduced noise is , it will require a lot of queries and average look-up results in a short period of time to have the precise look-up result, which is infeasible for the malicious attacker.
Iii-C Graph Contrastive Learning with Augmented Graphs
Performance dropping is unavoidable since noises are introduced into graph to perturb graph structures . However, with the development of graph contrastive learning techniques [26, 46, 31, 36, 44, 9, 50], such techniques can facilitate invariant representation learning and identifying critical structural semantics for augmented graphs along with rigorous theoretical and emprical analysis [46, 50]. Because the process to achieve graph structural differential privacy introduced in previous section meets the definition of graph augmentations, we can utilise graph contrastive learning to acquire invariant representation of those augmented graphs to relief the performance dropping brought by introducing noises.
We first consider the scenario that conducting graph contrastive learning for augmented graphs on single client. As shown in Fig 3, the whole process can be roughly broken down into six parts:
(1) Training batch partitioning. Every client possess a graph dataset
locally. It is unlikely for the local model to train on all of the data simultaneously. Following the practical experiences that are widely used by most machine learning methods, we first need to determine the batch sizeto partition the entire local dataset to form training batches , where . Moreover, according to many graph contrastive learning methods [46, 44, 26], having negative contrasting samples to formulate negative pairs is mandatory to fulfill the graph contrastive learning process. We can follow the settings adopted by [46, 44] to sample negative contrasting samples from the same training batch in which the positive sample is. However, if the model on local device is trained in a streaming manner, in which case , such a method will no longer work.
(2) Applying DP to graph edges. Laplacian random noises are introduced here to perturb graph proximity to achieve differential privacy. This process can be regarded as graph augmentations from the perspective of graph contrastive learning. For a graph in , it has the adjacency matrix . We apply DP the adjacency matrix twice to acquire two augmented views of the original graph and . Both augmented graphs have adjacency matrices and , respectively, where and . The privacy budget could be different for producing two different augmented views. Intuitively, graph contrastive learning could benefit more from two more distinguishable views (e.g., produced with different privacy budget) via maximizing agreement between the augmented views. We will examine this with comprehensive experiments.
(3) Maintaining a stack to store negative samples. To address the limitation of traditional methods when mentioned in the first part, in this paper, we propose to maintain a size-fixed stack to store negative samples. Specifically, at the beginning of the training process, we intialize a stack, whose size is . Then, we select the last instances, where , in the graph dataset and insert them into the stack. These instances will serve as the negative samples of the target graph in the first training process. When a training process is finished, the target graph has two perturbed views, which are and , one of them will be inserted into the stack. Once the number of the elements in the stack is more than , the oldest element will be popped out. In all the following training processes, all the negative samples are sampled from this stack.
(4) Contrasting samples coupling. Though some details about acquiring negative samples and positive samples are introduced in the previous two steps, we give a formal description about contrasting samples coupling here. Given a target graph , we apply DP to to obtain two augmented views of it, which are and , respectively. Then, we couple these two views as positive pair between which the agreement will be maximized. For negative samples, we follow the settings adopted by [46, 44]. Specifically, we samples negative graph instances , which should have different labels from that of the target graph. The sampled negative instances could be duplicated in case the number of graphs whose labels are different from that of the target graph is less than . One of the augmented views of each negative sample will be coupled with one of the augmented views of the target graph. Without loss of generality, we have a set of negative pairs . The agreement between each negative pair will be minimized.
(5) Graph encoding. After obtaining a series of graphs, next step is to encode these graphs to acquire high-quality graph embeddings. Various graph encoders are proposed such as GNN models. In this paper, we select three representative GNN models to study, including GCN , GAT , and GraphSAGE . Let denotes the graph encoder, indicates the proximity and be the node feature matrix of graph , where is the number of nodes in and denotes the dimensions of initial node features. We can have updated node embeddings via feeding the adjacency matrix and initial feature matrix into graph encoder:
where indicates the hidden dimension. Then, a readout function will be applied to summarize node embeddings to acquire the graph embedding:
There are many choices for readout function, the decision could vary among different downstream tasks. To obtain the embeddings of selected positive pair and negative pairs, we apply graph encoding to these graphs to have positive embedding pair and negative embedding pairs .
(6) Learning objectives. The objective of graph contrastive learning can be roughly summarized as maximizing the agreement between positive pair and minimizing the agreement between negative pairs. We choose InforNCE , which is widely used by many works [26, 44], to serve as the objective function for graph contrastive learning:
is the score function to measure the smilarity between two graph embeddings, such as cosine similarity, andserves as the temperature hyper-parameter. Moreover, consider the scenarios where training data has labels, we adopt cross-entropy function to introduce supervison signals into the model training. Before that, we need a classifier to predict graph labels according to the obtained graph embeddings:
Then, we apply cross-entropy function:
For the single client and the training batch , we have the overall training objective:
where is the hyper-parameter controlling the weight of the contrastive learning term.
Each client in the federated learning system would follow the training protocol mentioned above to conduct graph contrastive learning locally. Via leveraging the advantages of graph contrastive learning, we hope to distill critical structural semantics and maintain invariant graph representations after introducing noises to acquire high-quality graph embeddings for downstream tasks.
Iii-D Global Parameter Update
|N||The maximum number of negative samples stored in the negative sample stack.||100|
|k||The number of negative samples used in each graph contrastive learning training round.||[1, 5, 10, 20, 30, 40, 50]|
|Privacy budget.||[0.1, 1, 10, 100]|
|The hyper-parameter that controls the weight of graph contrastive learning in the overall training objective.||[0.001, 0.01, 0.1, 1]|
|The temperature hyper-parameter in the contrastive learning objective.||1|
|Dataset||# Graphs||Avg. # Nodes||Avg. # Edges||Avg. # Degree||# Classes|
When the training phase finished on each client, the updated local parameters will be uploaded to the server. Then, the server will adopt federated learning algorithm to aggregate them and update the global model. For the sake of simplicity, we consider the most widely-used algorithm FedAvg  in this paper.
Specifically, the server collects uploaded parameter updates and averages them to acquire the global updates. Let and denotes the global parameters at time and , and indicates the updated parameters of local model stored on client . Assuming that there are clients in total and clients would be sampled out, denoted as a set , to participate in the training in each round. So, the process of FedAvg can be formulated as:
After the whole training process ends, the model can be used to conduct inference. Note that there is no difference between the training and inference process. Moreover, we summarize the whole training procedure in Algorithm 1 to better illustrate the proposed methodology.
Detailed experimental settings are listed for reproducibility at the beginning of this section, followed by introductions to the datasets and base models adopted in the experiments. Then, we conduct comprehensive experiments and give the corresponding analysis in this section, providing a detailed view of the proposed FGCL method.
Iv-a Experimental Settings
|Fed+Noise / 100||0.6050||0.6162||0.8003||0.6632||0.5317|
|Fed+Noise / 10||0.5884||0.6245||0.7598||0.6577||0.5303|
|Fed+Noise / 1||0.5636||0.6246||0.7061||0.6185||0.5327|
|Fed+Noise / 0.1||0.5374||0.6666||0.7091||0.6036||0.5323|
|FGCL+Noise / 100||0.6285 (+3.88%)||0.6697 (+8.68%)||0.6740 (-15.78%)||0.7035 (+6.08%)||0.6675 (+25.54%)|
|FGCL+Noise / 10||0.6334 (+7.65%)||0.6633 (+6.21%)||0.6999 (-7.88%)||0.6915 (+5.14%)||0.6697 (+24.17%)|
|FGCL+Noise / 1||0.6062 (+7.56%)||0.6665 (+6.71%)||0.7917 (+12.12%)||0.6980 (+12.85%)||0.6470 (+21.47%)|
|FGCL+Noise / 0.1||0.5549 (+3.26%)||0.6858 (+2.88%)||0.8378 (+18.15%)||0.7044 (+16.70%)||0.5961 (+11.99%)|
|Fed+Noise / 100||0.6035||0.6296||0.8108||0.6526||0.5499|
|Fed+Noise / 10||0.5817||0.647||0.7502||0.6097||0.5349|
|Fed+Noise / 1||0.5663||0.6206||0.8160||0.5906||0.5380|
|Fed+Noise / 0.1||0.5345||0.6291||0.7421||0.5920||0.5334|
|FGCL+Noise / 100||0.6140 (+1.74%)||0.6826 (+8.42%)||0.7037 (-13.21%)||0.7314 (+12.07%)||0.6720 (+22.20%)|
|FGCL+Noise / 10||0.6223 (+6.98%)||0.6803 (5.15%)||0.7034 (-6.24%)||0.7264 (+19.14%)||0.6615 (+23.67%)|
|FGCL+Noise / 1||0.6025 (+6.39%)||0.6789 (+9.39%)||0.7093 (-13.01%)||0.6817 (+15.42%)||0.6535 (+21.47%)|
|FGCL+Noise / 0.1||0.5287 (-1.09%)||0.6433 (+2.26%)||0.8433 (+17.86%)||0.6356 (+7.36%)||0.6352 (+7.36%)|
|Fed+Noise / 100||0.5509||0.6539||0.8812||0.6208||0.5335|
|Fed+Noise / 10||0.5482||0.6324||0.7529||0.5808||0.5416|
|Fed+Noise / 1||0.5543||0.6297||0.7915||0.6370||0.5405|
|Fed+Noise / 0.1||0.5551||0.5839||0.7700||0.6173||0.5253|
|FGCL+Noise / 100||0.6174 (+12.07%)||0.7091 (+8.44%)||0.9034 (+2.52%)||0.7005 (+12.84%)||0.5592 (+4.82%)|
|FGCL+Noise / 10||0.6369 (+16.18%)||0.6744 (+6.64%)||0.7923 (+5.23%)||0.7044 (+21.28%)||0.5463 (+0.87%)|
|FGCL+Noise / 1||0.5619 (+1.37%)||0.6977 (+10.80%)||0.7387 (-6.67%)||0.6755 (+6.04%)||0.5342 (-1.17%)|
|FGCL+Noise / 0.1||0.5721 (+3.06%)||0.6720 (+15.09%)||0.5978 (-22.36%)||0.6726 (+8.96%)||0.5359 (+2.02%)|
|Fed+Noise / 100||0.5916||0.6863||0.7785||0.6632||0.5350|
|Fed+Noise / 10||0.5942||0.6619||0.6882||0.6832||0.5324|
|Fed+Noise / 1||0.5747||0.6157||0.7938||0.5826||0.5358|
|Fed+Noise / 0.1||0.5610||0.5952||0.6719||0.6017||0.5385|
|FGCL+Noise / 100||0.6331 (+7.01%)||0.6898 (+0.51%)||0.8716 (+11.96%)||0.7229 (+9.00%)||0.6324 (+18.21%)|
|FGCL+Noise / 10||0.6335 (+10.23%)||0.7093 (+7.16%)||0.7291 (+5.94%)||0.7138 (+4.48%)||0.6396 (+20.14%)|
|FGCL+Noise / 1||0.6016 (+4.68%)||0.6870 (+11.58%)||0.8027 (+1.12%)||0.7450 (27.88%)||0.6474 (+20.22%)|
|FGCL+Noise / 0.1||0.5678 (+1.21%)||0.6826 (+14.68%)||0.7882 (+17.31%)||0.6547 (+8.81%)||0.5823 (+8.13%)|
The proposed FGCL focuses on federated graph learning. To implement the proposed method, a widely used toolkit, named FedGraphNN 333https://github.com/FedML-AI/FedGraphNN, is adopted to serve as the backbone of our implementations. It provides various APIs and exemplars to build federated graph learning models quickly. FGCL shares the same training protocols and the common hyper-parameters, including hidden dimensions, number of GNN layers, and learning rate, with FedGraphNN. And only the graph classification task serves as the measurement in our experiments. The detailed settings can be found in [12, 10].
Nevertheless, it is worth noting that some hyper-parameters are solely involved in the graph contrastive learning module, which needs to be specified for reproducibility. These hyper-parameters’ definitions and values can be found in Table I.
Iv-B Datasets and Baselines
We choose several widely used and famous graph dataset to conduct experiments, which are as follows:
BACE  provides qualitative binding results (binary labels) for a sort of inhibitors of human beta-secretase 1.
ClinTox  compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. It assigns a binary lable to each drug molecule to indicate if they have clinical trial toxicity.
BBBP  is designed for the modeling and prediction of barrier permeability. It allocates binary lables to the drug molecules to indicate if they have penetration to get through blood-brain barrier.
Tox21555https://tripod.nih.gov/tox21/challenge/ contains many graphs representing chemical compounds. Each compound has 12 lables to reflect its outcomes in 12 different toxicological experiments.
The statistics of the selected datasets mentioned above is summarized in Table II.
Since the performances of federated graph learning may vary with different graph encoders, we adopt several widely used GNN models to serve as graph encoder to comprehensively examine the proposed method. Moreover, we leverage the advantages of the toolkit named PyTorch Geometric666https://www.pyg.org/ to efficiently implement high-quality GNN models.
Note that, Section III-B mentioned that the protected proximity information would be stored on edge weights. As a consequence, we only select GNNs that are capable to handle with edge weights in PyTorch Geometric, which are as follows:
GCN  is one of the first proposed GNNs, which implements a localized first-order approximation of spectral graph convolutions.
kGNNs  can take higher-order graph structures at multiple scales into account, which overcomes the shortcomings existing in conventional GNNs.
LightGCN  is a simplified version of GCN, which can achieve more concise and appropriate for recommendation, only including the most essential component in GCN.
Iv-C Experimental Results
In this part, a detailed and comprehensive analysis of experimental results is given to illustrate the properties of the proposed method and justify FGCL’s effectiveness.
Iv-C1 How much does graph structural differential privacy degrade federated graph learning?
Each base model has three rows in Table III, which are experimental results of centralized and federated settings, federated settings with differential privacy, and the proposed federated graph contrastive learning. In this part, we examine the first two rows of the experimental results.
First, let us compare the experimental results between the centralized and federated settings. We notice that the centralized setting outperforms the federated setting on all the datasets with any base models. Such a phenomenon is verified by many research works [12, 10, 38]. The reason is that the federated learning protocol distributes the training data to different devices and updates the global model based on the gradients or parameters passed from the local devices, which makes the model more difficult to find the global optimal.
Then, random noises are introduced to achieve DP on graph edges. The corresponding results are shown in the second row. In most cases, the ROC-AUC scores of the federated setting with DP privacy preservation are lower than the pure federated setting. It is reasonable that the introduced noises protect privacy, but they undermine the semantics in the graph, resulting in performance decreases. Moreover, different privacy budgets are tried in the experiments. Theoretically, more privacy budgets mean more minor noises, indicating that the federated setting with more privacy budget should perform better than the federated setting with less privacy budget. However, the experimental results do not strictly follow such a pattern. That is because the noises are randomly generated. This pattern should not be revealed until sufficient rounds of experiments finish.
In summary, federated graph learning addresses some limitations in centralized training protocol, but it scarifies the performance of the model. Moreover, introducing privacy-preserving mechanism into federated learning, such as differential privacy, further decreases the performance. Hence, it is critical for the research community to explore solutions to alleviate such perforamnce dropping.
Iv-C2 Can graph contrastive learning help to alleviate the performance dropping caused by graph structure differential privacy?
As mentioned in the last section, introducing the DP mechanism to import noises to protect privacy would decrease the performance of federated graph learning. In this paper, we propose leveraging the advantages of graph contrastive learning to acquire semantic robustness against noises via contrasting perturbed graph views. The experimental results of the proposed method are also listed in Table III, in the third row. Overall, the proposed FGCL, though, is not competitive when compared to the centralized setting. It outperforms the federated setting in some cases and is better than the federated setting with DP privacy-preserving in most cases. The improvement is generally between 3% and 15%, but we notice that in some cases, there is no improvement or the improvement is exceptionally high, such as more than 20%. First, the non-improvement cases mainly occur in the experiments on the dataset ClinTox. We observe that the performances of the centralized setting, the federated setting, and the federated setting with DP noises on dataset ClinTox are very high and much better than that on other datasets. Therefore, the room for improvement of the performance on the dataset ClinTox is less than others. Our proposed FGCL may not work well when the room for improvement is limited. Then, some exceptionally significant improvement appears in the experimental results on the dataset Tox21. We note that, on Tox21, the performance decreasing is the most server, around -35%, after implementing the federated setting and the federated setting with the DP mechanism. The significant gap between the performances provides sufficient room for improvement, which indicates FGCL is more powerful when the performance dramatically drops after implementing federated learning and the DP mechanism. Moreover, it is worth noting that there is a particular case: the base model TAG. The improvement brought by FGCL with TAG as the base model on the dataset Tox21 is minor. It is because of the nature of TAG. In the centralized setting, TAG has worse performances than other base models, indicating that TAG is not as delicate as others. We also find that the overall performance of FGCL with TAG as the base model is the worst among all the base models, which will be illustrated in the next section. Hence, we think an appropriate base model is one of the keys to fully leveraging the advantages of the proposed FGCL.
Iv-C3 Hyperparameters study for federated graph contrastive learning.
Three hyper-parameter experiments are conducted in this section to reveal the properties and details of the proposed FGCL fully. The first is , the number of negative samples in the graph contrastive learning module. Negative samples are verified to play a critical role in the model training process . Here, we make all the hyper-parameters fixed, except , to investigate the impact brought by it. Candidate values of are in , and the experimental results are illustrated by Figure 4. The first observation is that selecting a good number of negative samples can further enhance the performances of FGCL and achieve greater improvement. However, no obvious pattern indicates how to conduct selection, and such a task is highly task- or dataset-specific. For instance, the dataset BACE requires FGCL to have a relatively large value for better performances, but a relatively small value is enough for the dataset Tox21. In summary, hyper-parameter is highly task- or dataset-specific, which should be carefully selected, and for the sake of computation efficiency, selecting a small when possible is better.
The second hyper-parameter experiment is regarding , a parameter controlling the weight of the contrastive learning objective. The overall training objective of FGCL consists of two parts: graph classification loss and contrastive learning loss, respectively, where graph classification is the primary task. plays the role of balancing these two objectives. We select the value for from , and the corresponding experimental results are shown in Figure 5. According to the results, the impact of on the performances is not as significant as that of . Nevertheless, we observe that FGCL with can achieve slightly higher ROC-AUC scores. When , the contrastive learning objective undermines the importance of the primary task in the overall objective, graph classification, resulting in suboptimal performance regarding the graph classification task. Therefore, we recommend adopting a relatively more minor value to achieve better results in practice.
The third experiment is regarding the budget privacy , which controls how much privacy leakage can be toleranced. In other words, the privacy budget determines how many noises will be introduced. Note that, to obtain a couple of positive contrasting views, we need to augment the original graph twice with privacy budget and . In this experiment, we focus on studying the performances of FGCL with instead of . The values of privacy budget are selected from . So, there are combinations on each dataset with a selected base model. Figure 6 shows the experimental results. We notice that the overall performances are similar to what Table III reflects. However, a careful fine-tuning regarding this hyper-parameter can make FGCL have better outcomes than the results listed in Table III, and it even happens on the dataset ClinTox. Plus, no regular pattern reveals how to choose combinations of and . Hence, this hyper-parameter should be selected according to the requirement of privacy protection.
V Related Work
V-a Graph Contrastive Learning
Graph contrastive learning (GCL) has emerged as a powerful tool for graph representation learning. Deep Graph Infomax (DGI)  is one of the first research works that introduced the concept of contrastive learning [7, 13, 33] into the graph learning domain, which adopts mutual information maximization as the learning objective to conduct contrastive learning between a graph and its corrupted instance. Then, other researchers borrow the same idea with different contrasting samples to conduct contrastive learning. For example, GCC  selected different graph instances from different datasets to construct contrasting pairs, GraphCL proposed using graph augmentation methods to do so, MVGRL , and DSGC  turned to generate multiple views to serve as the contrasting pairs. The success of GCL can be revealed by its wide applications in real-world scenarios, including recommender systems [43, 47, 45] and smart medical [21, 22, 39, 40].
V-B Federated Graph Learning
Federated graph learning (FGL) lies at the cross-disciplinaries of graph neural networks (GNNs) and federated learning (FL), which leverages the advantages of FL to address limitations existing in the graph learning domain and achieves huge success in many scenarios. A representative application case of FGL is molecular learning , which helps different institutions efficiently collaborate to train models based on the small molecule graph stored in each institution without transferring their classified data to a centralized server [11, 42]. Moreover, FGL is also applied in recommender systems [38, 41], social network analysis , and internet of things 
. Various toolkits help researchers quickly build their own FGL models, such as Tensorflow Federated777https://www.tensorflow.org/federated and PySyft . However, these toolkits do not provide graph datasets, benchmarks, and high-level APIs for implementing FGL. He et. al [12, 10] developed a framework named FedGraphNN, which is used in this paper, focusing on FGL. It provides comprehensive and high-quality graph datasets, convenient and high-level APIs, and tailored graph learning settings, which should facilitate the research regarding FGL.
This paper proposes a method named FGCL, the first work about graph contrastive learning in federated scenarios. We observe the similarity between differential privacy on graph edges and graph augmentations in graph contrastive learning, innovatively adopting graph contrastive learning methods to help the model obtain robustness against noises introduced by the DP mechanism. According to the comprehensive experimental results, the proposed FGCL successfully alleviates the performance decrease brought by noises introduced by DP mechanism.
This work is supported by the Australian Research Council (ARC) under Grant No. DP220103717, LE220100078, LP170100891 and DP200101374, and is partially supported by the APRC - CityU New Research Initiatives (No.9610565, Start-up Grant for the New Faculty of the City University of Hong Kong), the SIRG - CityU Strategic Interdisciplinary Research Grant (No.7020046, No.7020074), and the CCF-Tencent Open Fund.
-  (2019) Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (Eds.), pp. 15453–15462. External Links: Cited by: §I, §III-C.
-  (2017) Topology adaptive graph convolutional networks. CoRR abs/1710.10370. External Links: Cited by: 3rd item.
-  (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, S. Halevi and T. Rabin (Eds.), Lecture Notes in Computer Science, Vol. 3876, pp. 265–284. External Links: Cited by: §I.
-  (2014) The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 (3-4), pp. 211–407. External Links: Cited by: §I, §II-A, §II-A, §II-A, Definition 1.
-  (2014) The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 (3-4), pp. 211–407. External Links: Cited by: §I, §III-B.
-  (2016) A data-driven approach to predicting successes and failures of clinical trials. Cell Chemical Biology 23 (10), pp. 1294–1301. External Links: Cited by: 3rd item.
-  (2006) Dimensionality reduction by learning an invariant mapping. In , pp. 1735–1742. External Links: Cited by: §V-A.
-  (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), pp. 1024–1034. External Links: Cited by: §III-C.
-  (2020) Contrastive multi-view representation learning on graphs. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, Vol. 119, pp. 4116–4126. External Links: Cited by: §I, §II-B, §III-C, §V-A.
-  (2021) FedGraphNN: A federated learning system and benchmark for graph neural networks. CoRR abs/2104.07145. External Links: Cited by: §I, §IV-A, §IV-C1, §V-B.
-  (2021) SpreadGNN: serverless multi-task federated learning for graph neural networks. CoRR abs/2106.02743. External Links: Cited by: §V-B.
-  (2020) FedML: A research library and benchmark for federated machine learning. CoRR abs/2007.13518. External Links: Cited by: §I, §IV-A, §IV-C1, §V-B.
-  (2020) Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 9726–9735. External Links: Cited by: §V-A.
-  (2020) LightGCN: simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, J. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock, J. Wen, and Y. Liu (Eds.), pp. 639–648. External Links: Cited by: 4th item.
-  (2021) Differential privacy in privacy-preserving big data and learning: challenge and opportunity. In Silicon Valley Cybersecurity Conference - Second Conference, SVCC 2021, San Jose, CA, USA, December 2-3, 2021, Revised Selected Papers, S. Chang, L. A. D. Bathen, F. D. Troia, T. H. Austin, and A. J. Nelson (Eds.), Communications in Computer and Information Science, Vol. 1536, pp. 33–44. External Links: Cited by: §I, §II-A, §III-B.
-  (2021) Applications of differential privacy in social network analysis: a survey. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. External Links: Cited by: §I, §III-B.
-  (2022) Gromov-wasserstein discrepancy with local differential privacy for distributed structural graphs. CoRR abs/2202.00808. External Links: Cited by: §III-B.
-  (2011) Private analysis of graph structure. Proc. VLDB Endow. 4 (11), pp. 1146–1157. External Links: Cited by: §III-B.
-  (2017) Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, External Links: Cited by: §I, §III-C, 1st item.
-  (2015-10) The SIDER database of drugs and side effects. Nucleic Acids Research 44 (D1), pp. D1075–D1079. External Links: Cited by: 1st item.
-  (2021) GeomGCL: geometric graph contrastive learning for molecular property prediction. CoRR abs/2109.11730. External Links: Cited by: §V-A.
-  (2021-11) GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Briefings in Bioinformatics 23 (1). Note: bbab457 External Links: Cited by: §V-A.
Communication-efficient learning of deep networks from decentralized data.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, A. Singh and X. (. Zhu (Eds.), Proceedings of Machine Learning Research, Vol. 54, pp. 1273–1282. External Links: Cited by: §III-A, §III-D.
-  (2019) Weisfeiler and leman go neural: higher-order graph neural networks. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp. 4602–4609. External Links: Cited by: 2nd item.
-  (2014) DeepWalk: online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014, S. A. Macskassy, C. Perlich, J. Leskovec, W. Wang, and R. Ghani (Eds.), pp. 701–710. External Links: Cited by: §I.
-  (2020) GCC: graph contrastive coding for graph neural network pre-training. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, R. Gupta, Y. Liu, J. Tang, and B. A. Prakash (Eds.), pp. 1150–1160. External Links: Cited by: §II-B, §III-B, §III-C, §III-C, §III-C, §V-A.
-  (2021) Contrastive learning with hard negative samples. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, External Links: Cited by: §IV-C3.
A generic framework for privacy preserving deep learning. CoRR abs/1811.04017. External Links: Cited by: §V-B.
-  (2022) The application of in silico methods for prediction of blood-brain barrier permeability of small molecule pet tracers. Frontiers in Nuclear Medicine 2. External Links: Cited by: 4th item.
-  (2016) Computational modeling of -secretase 1 (bace-1) inhibitors using ligand based approaches. Journal of Chemical Information and Modeling 56 (10), pp. 1936–1949. Note: PMID: 27689393 External Links: Cited by: 2nd item.
-  (2020) InfoGraph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, External Links: Cited by: §I, §III-C.
-  (2015) LINE: large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015, A. Gangemi, S. Leonardi, and A. Panconesi (Eds.), pp. 1067–1077. External Links: Cited by: §I.
-  (2020) Contrastive multiview coding. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XI, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12356, pp. 776–794. External Links: Cited by: §V-A.
-  (2018) Representation learning with contrastive predictive coding. CoRR abs/1807.03748. External Links: Cited by: §III-B, §III-C.
-  (2018) Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Cited by: §I, §III-C.
-  (2019) Deep graph infomax. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, External Links: Cited by: §I, §II-B, §III-C, §V-A.
-  (2018) Not just privacy: improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, Y. Guo and F. Farooq (Eds.), pp. 2407–2416. External Links: Cited by: §I, §I, §II-A.
-  (2021) Fast-adapting and privacy-preserving federated recommender system. CoRR abs/2104.00919. External Links: Cited by: §IV-C1, §V-B.
-  (2021) Multi-view graph contrastive representation learning for drug-drug interaction prediction. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, J. Leskovec, M. Grobelnik, M. Najork, J. Tang, and L. Zia (Eds.), pp. 2921–2933. External Links: Cited by: §I, §V-A.
-  (2021) MolCLR: molecular contrastive learning of representations via graph neural networks. CoRR abs/2102.10056. External Links: Cited by: §V-A.
-  (2021) FedGNN: federated graph neural network for privacy-preserving recommendation. CoRR abs/2102.04925. External Links: Cited by: §V-B.
-  (2021) Federated graph classification over non-iid graphs. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.), pp. 18839–18852. External Links: Cited by: §V-B.
-  (2021) Hyper meta-path contrastive learning for multi-behavior recommendation. In IEEE International Conference on Data Mining, ICDM 2021, Auckland, New Zealand, December 7-10, 2021, J. Bailey, P. Miettinen, Y. S. Koh, D. Tao, and X. Wu (Eds.), pp. 787–796. External Links: Cited by: §I, §V-A.
-  (2022) Dual space graph contrastive learning. CoRR abs/2201.07409. External Links: Cited by: §I, §I, §II-B, §II-B, §III-C, §III-C, §III-C, §III-C, §V-A.
-  (2022) Knowledge graph contrastive learning for recommendation. CoRR abs/2205.00976. External Links: Cited by: §V-A.
-  (2020) Graph contrastive learning with augmentations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Cited by: §I, §I, §II-B, §III-B, §III-C, §III-C, §III-C.
-  (2021) Are graph augmentations necessary? simple graph contrastive learning for recommendation. arXiv. External Links: Cited by: §V-A.
-  (2021) Subgraph federated learning with missing neighbor generation. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.), pp. 6671–6682. External Links: Cited by: §V-B.
-  (2021) ASFGNN: automated separated-federated graph neural network. Peer-to-Peer Netw. Appl. 14 (3), pp. 1692–1704. External Links: Cited by: §V-B.
-  (2021) Graph contrastive learning with adaptive augmentation. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, pp. 2069–2080. External Links: Cited by: §I, §II-B, §II-B, §III-C.