I Introduction
With the rapid development of the internet, online activities produce a massive amount of data in various forms, such as text, images, and graphs, among which the graph data has attracted more and more attention and interest from the research community recently due to its powerful capabilities to illustrate the complex relations among different entities [19, 35]. To acquire informative semantics maintained in the graphs, researchers has developed a sort of graph representation learning methods, including graph embedding methods [25, 32]
and graph neural networks (GNNs)
[19, 35]. Usually, the basic versions of such methods need to be trained with massive training data in a centralized manner. However, such a setting is prohibitive in realworld scenarios due to privacy concerns, commercial competitions, and increasingly strict regulations and laws [12, 10], such as GDPR^{1}^{1}1https://gdprinfo.eu/ and CCPA^{2}^{2}2https://oag.ca.gov/privacy/ccpa. Some researchers has introduced federated learning (FL) into graph learning domain to address it [10]. FL is a distributed learning paradigm that can solve this limitation via training graph models on local devices and incorporating various privacypreserving mechanisms, for example, differential privacy on graphs [16]. Differential privacy (DP) [3] is one of the most widely used privacy preserving methods with strong guarantee [5]. The basic idea of DP is to introduce a controlled level of noises into to query results to perturb the consequence of comparing two adjacency dataset. There are several perspectives to implement DP in graph domain, such as node privacy, edge privacy, outlink privacy, and partition privacy [15], among which edge privacy provides meaningful privacy protection in many realworld applications and is more widely used [16]. So, we mainly foucs on edgelevel DP in this paper. The advantages of DP lie in that it can be easily achieved and has a solid theoretical guarantee. It has been proved that introducing noises that follow the specific Laplacian or Gaussian distributions can achieve the designatedlevel privacy protection according to the Laplacian or Gaussian mechanism
[5]. Such a property means that the DP mechanism has robustness against any postprocessing and would never be compromised by any algorithms [4].Despite the differential privacy mechanism in federated graph learning secures the classified information maintained in graphs and has theoreticallyproven advantages, it degrades the performances of the graph learning models [1]. There are few research works about alleviating such performance dropping. The representative work is that J. Wang et al. [37] proposed to generate and inject specific perturbed data and train the model with both clean and perturbed data to help the model gain the robustness to the DP noises. However, two significant limitations exist to adopting this strategy in federated graph learning. First, this strategy is designed for deep neural networks (DNNs) and image classification tasks. It cannot be directly used to address graph learning and GNNs problems. Second, the model training requires clean data. If DP processes the data uploaded from edge devices, the model stored on the cloud server could never acquire clean data to fulfil such a training process. Hence, we need new solutions for federated graph learning.
The intuition of addressing performance sacrifice in [37] is to obtain robustness to the noises introduced by DP, which could lead us to find new solutions. Note that the DP mechanism on graph edges introduces noises to perturb the proximity of the graph. Such an operation meets the definition of edge perturbation, which is one of the graph augmentations, in [46]. Therefore, the DP mechanism on graph edges can be regarded as a kind of graph augmentation. Moreover, we notice that Y. You et al. [46] proposed to leverage graph contrastive learning to cooperate with graph augmentations to help the graph learning model obtain robustness against such noises. Inspired by this finding, we propose to utilise graph contrastive learning to learn from the graph processed by the DP mechanism. Graph contrastive learning (GCL) [46, 31, 36, 44, 9], a trending graph learning paradigm, leverages contrastive learning concepts to acquire critical semantics between augmented graphs, which has the robustness to the noises brought by augmentations [46]. Besides the edge perturbation, there are other graph augmentations, such as node dropping, attribute masking, and subgraph sampling [46]. GCL is an empiricallyproven graph learning technique, which has achieved the SOTA performances on various graph learning tasks [46, 50], and is successfully applied to many realworld scenarios [43, 39].
Nevertheless, it is challenging and not straightforward to apply GCL in federated graph learning. First, noises introduced by DP follow some random distributions, which means that it is not simply deleting or adding edges in the graph to protect privacy. Instead, it perturbs the probability that a link exists between two nodes. To cooperate with graph learning settings, we need to convert the graph into a fully connected graph and use the perturbed probabilities to denote the edge weights. Second, to follow the realworld settings, many federated frameworks train models in a streaming manner, which means that the training batch size is 1. So, we cannot follow the settings in current GCL literature to collect negative samples in the same training batch
[46, 44] to fulfil the contrastive learning protocol. In this paper, specifically, we maintain a stack with a limited size on each device to store previously trained graphs and randomly extract samples in the stack to serve as negative samples in each training round. More details can be found in the methodology part of this paper.In summary, the contributions of this paper are listed as follows:

We propose a method to convert a graph to a table to facilitate current DP methods to achieve privacypreserving on graphs.

This paper first proposes a novel federated graph contrastive learning method (FGCL). The proposed method can be a plugin for federated graph learning with any graph encoders.

We conduct extensive experiments to show that contrastive learning can alleviate the performance dropping caused by the DP mechanism’s noise.
Ii Preliminaries
This section will give some background knowledge related to the research work of this paper, which are differential privacy and graph contrastive learning, respectively.
Iia Differential Privacy
Differential privacy (DP) is a tailored mechanism to the privacypreserving data analysis [37], which is designed to defend against differential attack [4]. The superiority of the DP mechanism has a theoreticallyproven guarantee to control how much information leakage is during a malicious attack [15]. Such a property means that the DP mechanism is immune to any postprocessing and cannot be compromised by any algorithms [4]. In other words, if the DP mechanism protects a dataset, the privacy loss in this dataset during a malicious attack would never exceed the given threshold, even if attacked by the most advance and sophisticated methods.
Formally, differential privacy is defined as follows:
Definition 1
[4] A randomized query algorithm satisfies differentially privacy, iff for any dataset and that differ on a single element and any output of , such that
(1) 
In the definition above, the dataset and that differ by only one single data item is named adjacent datasets, which is taskspecific defined. For example, in graph edge privacy protection, a graph denotes a dataset, and the graph is ’s adjacent dataset if is derived from via adding or deleting an edge. The parameter in the definition denotes privacy budget [4], which controls how much privacy leakage by can be tolerated. A lower privacy budget indicates less tolerance to information leakage, that is, a higher privacy guarantee.
A general method to convert a deterministic query function into a randomized query function is to add random noise to the output of . The noise is generated by the specific random distributions calibrated to the privacy budget and , the global sensitivity of , which is defined as the maximal value of . In this paper, for simplicity, we consider Laplacian mechanism to achieve DP [4]:
(2) 
where , and denotes the Laplacian distribution.
IiB Graph Augmentations for Graph Contrastive Learning
Graph contrastive learning (GCL) [36, 26, 46, 50, 44] has emerged as a delicate tool for learning highquality graph representations. Graph augmentation techniques contribute a lot to the success of GCL, playing a critical role in the GCL process. Many researchers focus on this part and endeavour to it. GraphCL [46] is one of the most impactful works in gcl domain, giving comprehensive and insightful analysis regarding graph augmentations, which summarized four basic graph augmentation methods and their corresponding underlying priors:

Node Drooping. A small portion of deleted vertex does not change semantics on graphs.

Edge Perturbation. Graphs have semantic robustness to scalelimited proximity variations.

Attribute Masking. Graphs have semantic robustness to losing partial attributes on nodes, edges, or graphs.

Subgraph Sampling. Local subgraphs can hint the entire semantics on the original graph.
In summary, the intuition behind graph augmentations is to introduce noises into the graph, and the augmented graph would be utilised in the following contrastive learning phase to help the model to learn the semantic robustness of the original graph. Such an intuition provides researchers with a potentially feasible way to bridge the graphlevel privacy preserving and graph augmentations or GCL.
The graph augmentations mentioned above are randomly implemented, achieving suboptimal performances in GCL practices [50]. GCA [50] is an improved version of GraphCL, which conducts adaptive augmentations to possess better performances. Besides aforementioned noisebased graph augmentations, there are other methods, such as MVGRL [9] and DSGC [44], which alleviate the semantic compromise via avoiding introducing noises to implement graph augmentations.
Iii Methodology
Details of the proposed method are given in this section with four parts involved.
Iiia Overview
The overview of the proposed method FGCL is demonstrated in Fig 1. There are five steps to conduct graph contrastive learning in the context of federated learning:
(1) Download global model. The initial parameters of the graph encoder and the classifier are stored on the server. Clients must first download these parameters before local training.
(2) Introduce noises into graph for DP. For each graph in current training batch on every client, they will be processed twice by Laplacian mechanism to introduce noises into graph proximity to obtain two augmented views for graph contrastive learning.
(3) Update local model via training. Following the training protocols of graph contrastive learning, the model’s parameters downloaded from the server will be updated.
(4) Upload updated parameters. Clients that get involved in current training round will upload the locally updated parameters to the server for global model updating.
(5) FedAvg & update the global model. The server utilises FedAvg algorithm [23] to aggregate the updated local parameters and update the global parameters. Specifically, the server collects uploaded parameter updates and averages them to acquire the global updates.
IiiB Graph Edge Differential Privacy
Some research works adopted differential privacy to introduce noises to the node or graph embeddings to protect privacy [17]. However such a strategy could be problematic in the scenarios of lacking initial features. It is essential for researchers to adopt DP methods on graph structures (e.g., node privacy and edge privacy [18]). Here we convert graph data to tabular data to utilise current differential privacy methods, such as Laplacian mechanism and Gaussian mechanism. Without loss of generality, we solely consider Laplacian mechanism in this paper.
Differential privacy methods, like Laplacian mechanism, originate from tabular data [15]. The key to adapting such methods to preserve graph structural privacy is how to convert graph data to tabular data and how to define ’adjacent graph’ [16]. Our method sets a pair of node indices as the key and the connectivity between these two nodes as the value to form a tabular dataset. As to ’adjacent graphs’, given two graphs and , they are ’adjacent graphs’ if and only if and [16], an illustrative example is shown in Fig 2. Let denotes the query function on the tables generated from ’adjacent graphs’. To defend differential attack, we need to introduce noises into lookup results , where is Laplacian random noise. Specifically, satisfies differential privacy, where is the sensitivity and is the privacy budget [5].
From a graph structural perspective, introducing noises to the values in the table generated from a graph is equivalent to perturbation on the proximity of the graph. Let denotes the adjacency matrix of the graph , then the perturbed adjacency matrix would be:
(3) 
where . Privacy preserving scarifies the performances due to introducing noises into the graph. But such a process can be regarded as a graph augmentation, which is categorized into edge perturbation. The underlying prior of edge perturbation is that semantic robustness against connectivity variations [46]
. With augmented views of the same graph, the aims of graph contrastive learning is to maximize the agreement among these views via a contrastive loss function, such as InfoNCE
[34], to enforce the graph encoders to acquire representations invariant to the graph augmentations [46]. Graph contrastive learning can help to improve performances on various graph learning tasks, which is empirically proved by some works [46, 26]. Therefore, we can leverage the advantages of graph contrastive learning to alleviate the performance dropping while preserving privacy. Moreover, it is worth noting that the entries in the adjacent matrix are binary and cannot be the decimal which is derived from the original entry via adding noises. So, in practices, we convert the the original graph into a fullconnected graph and store the protected proximity information on edge weights alternatively, instead of adjacent matrix.To better understand the graph structural privacy preserving in this section, we give an illustrative example here, shown in Fig 2. Let us assume that there is a social network consists of three participants. At the begining, has connections with both and . Then, there is a new connection built between and . If there is a malicious attacker who initiated a differential attack on this social network in two different time slot, he or she will be aware that there is a change of relationship between and , which could be private information. To defend such attacks, we introduce noise to perturb the values, in which case the attacker would be confused if two participants are connected or not via single or several differetial attacks. Because the expectation of the introduced noise is , it will require a lot of queries and average lookup results in a short period of time to have the precise lookup result, which is infeasible for the malicious attacker.
IiiC Graph Contrastive Learning with Augmented Graphs
Performance dropping is unavoidable since noises are introduced into graph to perturb graph structures [1]. However, with the development of graph contrastive learning techniques [26, 46, 31, 36, 44, 9, 50], such techniques can facilitate invariant representation learning and identifying critical structural semantics for augmented graphs along with rigorous theoretical and emprical analysis [46, 50]. Because the process to achieve graph structural differential privacy introduced in previous section meets the definition of graph augmentations, we can utilise graph contrastive learning to acquire invariant representation of those augmented graphs to relief the performance dropping brought by introducing noises.
We first consider the scenario that conducting graph contrastive learning for augmented graphs on single client. As shown in Fig 3, the whole process can be roughly broken down into six parts:
(1) Training batch partitioning. Every client possess a graph dataset
locally. It is unlikely for the local model to train on all of the data simultaneously. Following the practical experiences that are widely used by most machine learning methods, we first need to determine the batch size
to partition the entire local dataset to form training batches , where . Moreover, according to many graph contrastive learning methods [46, 44, 26], having negative contrasting samples to formulate negative pairs is mandatory to fulfill the graph contrastive learning process. We can follow the settings adopted by [46, 44] to sample negative contrasting samples from the same training batch in which the positive sample is. However, if the model on local device is trained in a streaming manner, in which case , such a method will no longer work.(2) Applying DP to graph edges. Laplacian random noises are introduced here to perturb graph proximity to achieve differential privacy. This process can be regarded as graph augmentations from the perspective of graph contrastive learning. For a graph in , it has the adjacency matrix . We apply DP the adjacency matrix twice to acquire two augmented views of the original graph and . Both augmented graphs have adjacency matrices and , respectively, where and . The privacy budget could be different for producing two different augmented views. Intuitively, graph contrastive learning could benefit more from two more distinguishable views (e.g., produced with different privacy budget) via maximizing agreement between the augmented views. We will examine this with comprehensive experiments.
(3) Maintaining a stack to store negative samples. To address the limitation of traditional methods when mentioned in the first part, in this paper, we propose to maintain a sizefixed stack to store negative samples. Specifically, at the beginning of the training process, we intialize a stack, whose size is . Then, we select the last instances, where , in the graph dataset and insert them into the stack. These instances will serve as the negative samples of the target graph in the first training process. When a training process is finished, the target graph has two perturbed views, which are and , one of them will be inserted into the stack. Once the number of the elements in the stack is more than , the oldest element will be popped out. In all the following training processes, all the negative samples are sampled from this stack.
(4) Contrasting samples coupling. Though some details about acquiring negative samples and positive samples are introduced in the previous two steps, we give a formal description about contrasting samples coupling here. Given a target graph , we apply DP to to obtain two augmented views of it, which are and , respectively. Then, we couple these two views as positive pair between which the agreement will be maximized. For negative samples, we follow the settings adopted by [46, 44]. Specifically, we samples negative graph instances , which should have different labels from that of the target graph. The sampled negative instances could be duplicated in case the number of graphs whose labels are different from that of the target graph is less than . One of the augmented views of each negative sample will be coupled with one of the augmented views of the target graph. Without loss of generality, we have a set of negative pairs . The agreement between each negative pair will be minimized.
(5) Graph encoding. After obtaining a series of graphs, next step is to encode these graphs to acquire highquality graph embeddings. Various graph encoders are proposed such as GNN models. In this paper, we select three representative GNN models to study, including GCN [19], GAT [35], and GraphSAGE [8]. Let denotes the graph encoder, indicates the proximity and be the node feature matrix of graph , where is the number of nodes in and denotes the dimensions of initial node features. We can have updated node embeddings via feeding the adjacency matrix and initial feature matrix into graph encoder:
(4) 
where indicates the hidden dimension. Then, a readout function will be applied to summarize node embeddings to acquire the graph embedding:
(5) 
There are many choices for readout function, the decision could vary among different downstream tasks. To obtain the embeddings of selected positive pair and negative pairs, we apply graph encoding to these graphs to have positive embedding pair and negative embedding pairs .
(6) Learning objectives. The objective of graph contrastive learning can be roughly summarized as maximizing the agreement between positive pair and minimizing the agreement between negative pairs. We choose InforNCE [34], which is widely used by many works [26, 44], to serve as the objective function for graph contrastive learning:
(6) 
where
is the score function to measure the smilarity between two graph embeddings, such as cosine similarity, and
serves as the temperature hyperparameter. Moreover, consider the scenarios where training data has labels, we adopt crossentropy function to introduce supervison signals into the model training. Before that, we need a classifier to predict graph labels according to the obtained graph embeddings:(7) 
(8) 
Then, we apply crossentropy function:
(9) 
For the single client and the training batch , we have the overall training objective:
(10) 
where is the hyperparameter controlling the weight of the contrastive learning term.
Each client in the federated learning system would follow the training protocol mentioned above to conduct graph contrastive learning locally. Via leveraging the advantages of graph contrastive learning, we hope to distill critical structural semantics and maintain invariant graph representations after introducing noises to acquire highquality graph embeddings for downstream tasks.
IiiD Global Parameter Update
Notation  Definition  Value 

N  The maximum number of negative samples stored in the negative sample stack.  100 
k  The number of negative samples used in each graph contrastive learning training round.  [1, 5, 10, 20, 30, 40, 50] 
Privacy budget.  [0.1, 1, 10, 100]  
The hyperparameter that controls the weight of graph contrastive learning in the overall training objective.  [0.001, 0.01, 0.1, 1]  
The temperature hyperparameter in the contrastive learning objective.  1 
Dataset  # Graphs  Avg. # Nodes  Avg. # Edges  Avg. # Degree  # Classes 

SIDER  1427  33.64  35.36  2.10  27 
BACE  1513  34.12  36.89  2.16  2 
ClinTox  1478  26.13  27.86  2.13  2 
BBBP  2039  24.05  25.94  2.16  2 
Tox21  7831  18.51  25.94  2.80  12 
When the training phase finished on each client, the updated local parameters will be uploaded to the server. Then, the server will adopt federated learning algorithm to aggregate them and update the global model. For the sake of simplicity, we consider the most widelyused algorithm FedAvg [23] in this paper.
Specifically, the server collects uploaded parameter updates and averages them to acquire the global updates. Let and denotes the global parameters at time and , and indicates the updated parameters of local model stored on client . Assuming that there are clients in total and clients would be sampled out, denoted as a set , to participate in the training in each round. So, the process of FedAvg can be formulated as:
(11) 
IiiE Summary
After the whole training process ends, the model can be used to conduct inference. Note that there is no difference between the training and inference process. Moreover, we summarize the whole training procedure in Algorithm 1 to better illustrate the proposed methodology.
Iv Experiments
Detailed experimental settings are listed for reproducibility at the beginning of this section, followed by introductions to the datasets and base models adopted in the experiments. Then, we conduct comprehensive experiments and give the corresponding analysis in this section, providing a detailed view of the proposed FGCL method.
Iva Experimental Settings
Base Models  Settings  Datasets  

SIDER  BACE  ClinTox  BBBP  Tox21  
GCN  Centralized  0.6637  0.8154  0.9227  0.8214  0.7990 
Federated  0.6055  0.6373  0.8309  0.6576  0.5338  
Fed+Noise / 100  0.6050  0.6162  0.8003  0.6632  0.5317  
Fed+Noise / 10  0.5884  0.6245  0.7598  0.6577  0.5303  
Fed+Noise / 1  0.5636  0.6246  0.7061  0.6185  0.5327  
Fed+Noise / 0.1  0.5374  0.6666  0.7091  0.6036  0.5323  
FGCL+Noise / 100  0.6285 (+3.88%)  0.6697 (+8.68%)  0.6740 (15.78%)  0.7035 (+6.08%)  0.6675 (+25.54%)  
FGCL+Noise / 10  0.6334 (+7.65%)  0.6633 (+6.21%)  0.6999 (7.88%)  0.6915 (+5.14%)  0.6697 (+24.17%)  
FGCL+Noise / 1  0.6062 (+7.56%)  0.6665 (+6.71%)  0.7917 (+12.12%)  0.6980 (+12.85%)  0.6470 (+21.47%)  
FGCL+Noise / 0.1  0.5549 (+3.26%)  0.6858 (+2.88%)  0.8378 (+18.15%)  0.7044 (+16.70%)  0.5961 (+11.99%)  
kGNNs  Centralized  0.6785  0.8912  0.9277  0.8541  0.7687 
Federated  0.6033  0.6131  0.8787  0.6371  0.5310  
Fed+Noise / 100  0.6035  0.6296  0.8108  0.6526  0.5499  
Fed+Noise / 10  0.5817  0.647  0.7502  0.6097  0.5349  
Fed+Noise / 1  0.5663  0.6206  0.8160  0.5906  0.5380  
Fed+Noise / 0.1  0.5345  0.6291  0.7421  0.5920  0.5334  
FGCL+Noise / 100  0.6140 (+1.74%)  0.6826 (+8.42%)  0.7037 (13.21%)  0.7314 (+12.07%)  0.6720 (+22.20%)  
FGCL+Noise / 10  0.6223 (+6.98%)  0.6803 (5.15%)  0.7034 (6.24%)  0.7264 (+19.14%)  0.6615 (+23.67%)  
FGCL+Noise / 1  0.6025 (+6.39%)  0.6789 (+9.39%)  0.7093 (13.01%)  0.6817 (+15.42%)  0.6535 (+21.47%)  
FGCL+Noise / 0.1  0.5287 (1.09%)  0.6433 (+2.26%)  0.8433 (+17.86%)  0.6356 (+7.36%)  0.6352 (+7.36%)  
TAG  Centralized  0.6276  0.7397  0.9392  0.7642  0.6052 
Federated  0.5552  0.6286  0.8865  0.6535  0.5360  
Fed+Noise / 100  0.5509  0.6539  0.8812  0.6208  0.5335  
Fed+Noise / 10  0.5482  0.6324  0.7529  0.5808  0.5416  
Fed+Noise / 1  0.5543  0.6297  0.7915  0.6370  0.5405  
Fed+Noise / 0.1  0.5551  0.5839  0.7700  0.6173  0.5253  
FGCL+Noise / 100  0.6174 (+12.07%)  0.7091 (+8.44%)  0.9034 (+2.52%)  0.7005 (+12.84%)  0.5592 (+4.82%)  
FGCL+Noise / 10  0.6369 (+16.18%)  0.6744 (+6.64%)  0.7923 (+5.23%)  0.7044 (+21.28%)  0.5463 (+0.87%)  
FGCL+Noise / 1  0.5619 (+1.37%)  0.6977 (+10.80%)  0.7387 (6.67%)  0.6755 (+6.04%)  0.5342 (1.17%)  
FGCL+Noise / 0.1  0.5721 (+3.06%)  0.6720 (+15.09%)  0.5978 (22.36%)  0.6726 (+8.96%)  0.5359 (+2.02%)  
LightGCN  Centralized  0.6866  0.8385  0.9269  0.7829  0.7734 
Federated  0.5933  0.6741  0.8098  0.6930  0.5379  
Fed+Noise / 100  0.5916  0.6863  0.7785  0.6632  0.5350  
Fed+Noise / 10  0.5942  0.6619  0.6882  0.6832  0.5324  
Fed+Noise / 1  0.5747  0.6157  0.7938  0.5826  0.5358  
Fed+Noise / 0.1  0.5610  0.5952  0.6719  0.6017  0.5385  
FGCL+Noise / 100  0.6331 (+7.01%)  0.6898 (+0.51%)  0.8716 (+11.96%)  0.7229 (+9.00%)  0.6324 (+18.21%)  
FGCL+Noise / 10  0.6335 (+10.23%)  0.7093 (+7.16%)  0.7291 (+5.94%)  0.7138 (+4.48%)  0.6396 (+20.14%)  
FGCL+Noise / 1  0.6016 (+4.68%)  0.6870 (+11.58%)  0.8027 (+1.12%)  0.7450 (27.88%)  0.6474 (+20.22%)  
FGCL+Noise / 0.1  0.5678 (+1.21%)  0.6826 (+14.68%)  0.7882 (+17.31%)  0.6547 (+8.81%)  0.5823 (+8.13%) 
The proposed FGCL focuses on federated graph learning. To implement the proposed method, a widely used toolkit, named FedGraphNN ^{3}^{3}3https://github.com/FedMLAI/FedGraphNN, is adopted to serve as the backbone of our implementations. It provides various APIs and exemplars to build federated graph learning models quickly. FGCL shares the same training protocols and the common hyperparameters, including hidden dimensions, number of GNN layers, and learning rate, with FedGraphNN. And only the graph classification task serves as the measurement in our experiments. The detailed settings can be found in [12, 10].
Nevertheless, it is worth noting that some hyperparameters are solely involved in the graph contrastive learning module, which needs to be specified for reproducibility. These hyperparameters’ definitions and values can be found in Table I.
IvB Datasets and Baselines
IvB1 Datasets
We choose several widely used and famous graph dataset to conduct experiments, which are as follows:

SIDER^{4}^{4}4http://sideeffects.embl.de/ [20] contains the information about some medicines and their observed adverse drug reactions. This information includes drug classification.

BACE [30] provides qualitative binding results (binary labels) for a sort of inhibitors of human betasecretase 1.

ClinTox [6] compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. It assigns a binary lable to each drug molecule to indicate if they have clinical trial toxicity.

BBBP [29] is designed for the modeling and prediction of barrier permeability. It allocates binary lables to the drug molecules to indicate if they have penetration to get through bloodbrain barrier.

Tox21^{5}^{5}5https://tripod.nih.gov/tox21/challenge/ contains many graphs representing chemical compounds. Each compound has 12 lables to reflect its outcomes in 12 different toxicological experiments.
The statistics of the selected datasets mentioned above is summarized in Table II.
IvB2 Baselines
Since the performances of federated graph learning may vary with different graph encoders, we adopt several widely used GNN models to serve as graph encoder to comprehensively examine the proposed method. Moreover, we leverage the advantages of the toolkit named PyTorch Geometric^{6}^{6}6https://www.pyg.org/ to efficiently implement highquality GNN models.
Note that, Section IIIB mentioned that the protected proximity information would be stored on edge weights. As a consequence, we only select GNNs that are capable to handle with edge weights in PyTorch Geometric, which are as follows:

GCN [19] is one of the first proposed GNNs, which implements a localized firstorder approximation of spectral graph convolutions.

kGNNs [24] can take higherorder graph structures at multiple scales into account, which overcomes the shortcomings existing in conventional GNNs.

TAG [2]
has a novel graph convolutional network defined in the vertex domain, which alleviates the computational complexity of spectral graph convolutional neural networks.

LightGCN [14] is a simplified version of GCN, which can achieve more concise and appropriate for recommendation, only including the most essential component in GCN.
IvC Experimental Results
In this part, a detailed and comprehensive analysis of experimental results is given to illustrate the properties of the proposed method and justify FGCL’s effectiveness.
IvC1 How much does graph structural differential privacy degrade federated graph learning?
Each base model has three rows in Table III, which are experimental results of centralized and federated settings, federated settings with differential privacy, and the proposed federated graph contrastive learning. In this part, we examine the first two rows of the experimental results.
First, let us compare the experimental results between the centralized and federated settings. We notice that the centralized setting outperforms the federated setting on all the datasets with any base models. Such a phenomenon is verified by many research works [12, 10, 38]. The reason is that the federated learning protocol distributes the training data to different devices and updates the global model based on the gradients or parameters passed from the local devices, which makes the model more difficult to find the global optimal.
Then, random noises are introduced to achieve DP on graph edges. The corresponding results are shown in the second row. In most cases, the ROCAUC scores of the federated setting with DP privacy preservation are lower than the pure federated setting. It is reasonable that the introduced noises protect privacy, but they undermine the semantics in the graph, resulting in performance decreases. Moreover, different privacy budgets are tried in the experiments. Theoretically, more privacy budgets mean more minor noises, indicating that the federated setting with more privacy budget should perform better than the federated setting with less privacy budget. However, the experimental results do not strictly follow such a pattern. That is because the noises are randomly generated. This pattern should not be revealed until sufficient rounds of experiments finish.
In summary, federated graph learning addresses some limitations in centralized training protocol, but it scarifies the performance of the model. Moreover, introducing privacypreserving mechanism into federated learning, such as differential privacy, further decreases the performance. Hence, it is critical for the research community to explore solutions to alleviate such perforamnce dropping.
IvC2 Can graph contrastive learning help to alleviate the performance dropping caused by graph structure differential privacy?
As mentioned in the last section, introducing the DP mechanism to import noises to protect privacy would decrease the performance of federated graph learning. In this paper, we propose leveraging the advantages of graph contrastive learning to acquire semantic robustness against noises via contrasting perturbed graph views. The experimental results of the proposed method are also listed in Table III, in the third row. Overall, the proposed FGCL, though, is not competitive when compared to the centralized setting. It outperforms the federated setting in some cases and is better than the federated setting with DP privacypreserving in most cases. The improvement is generally between 3% and 15%, but we notice that in some cases, there is no improvement or the improvement is exceptionally high, such as more than 20%. First, the nonimprovement cases mainly occur in the experiments on the dataset ClinTox. We observe that the performances of the centralized setting, the federated setting, and the federated setting with DP noises on dataset ClinTox are very high and much better than that on other datasets. Therefore, the room for improvement of the performance on the dataset ClinTox is less than others. Our proposed FGCL may not work well when the room for improvement is limited. Then, some exceptionally significant improvement appears in the experimental results on the dataset Tox21. We note that, on Tox21, the performance decreasing is the most server, around 35%, after implementing the federated setting and the federated setting with the DP mechanism. The significant gap between the performances provides sufficient room for improvement, which indicates FGCL is more powerful when the performance dramatically drops after implementing federated learning and the DP mechanism. Moreover, it is worth noting that there is a particular case: the base model TAG. The improvement brought by FGCL with TAG as the base model on the dataset Tox21 is minor. It is because of the nature of TAG. In the centralized setting, TAG has worse performances than other base models, indicating that TAG is not as delicate as others. We also find that the overall performance of FGCL with TAG as the base model is the worst among all the base models, which will be illustrated in the next section. Hence, we think an appropriate base model is one of the keys to fully leveraging the advantages of the proposed FGCL.
IvC3 Hyperparameters study for federated graph contrastive learning.
Three hyperparameter experiments are conducted in this section to reveal the properties and details of the proposed FGCL fully. The first is , the number of negative samples in the graph contrastive learning module. Negative samples are verified to play a critical role in the model training process [27]. Here, we make all the hyperparameters fixed, except , to investigate the impact brought by it. Candidate values of are in , and the experimental results are illustrated by Figure 4. The first observation is that selecting a good number of negative samples can further enhance the performances of FGCL and achieve greater improvement. However, no obvious pattern indicates how to conduct selection, and such a task is highly task or datasetspecific. For instance, the dataset BACE requires FGCL to have a relatively large value for better performances, but a relatively small value is enough for the dataset Tox21. In summary, hyperparameter is highly task or datasetspecific, which should be carefully selected, and for the sake of computation efficiency, selecting a small when possible is better.
The second hyperparameter experiment is regarding , a parameter controlling the weight of the contrastive learning objective. The overall training objective of FGCL consists of two parts: graph classification loss and contrastive learning loss, respectively, where graph classification is the primary task. plays the role of balancing these two objectives. We select the value for from , and the corresponding experimental results are shown in Figure 5. According to the results, the impact of on the performances is not as significant as that of . Nevertheless, we observe that FGCL with can achieve slightly higher ROCAUC scores. When , the contrastive learning objective undermines the importance of the primary task in the overall objective, graph classification, resulting in suboptimal performance regarding the graph classification task. Therefore, we recommend adopting a relatively more minor value to achieve better results in practice.
The third experiment is regarding the budget privacy , which controls how much privacy leakage can be toleranced. In other words, the privacy budget determines how many noises will be introduced. Note that, to obtain a couple of positive contrasting views, we need to augment the original graph twice with privacy budget and . In this experiment, we focus on studying the performances of FGCL with instead of . The values of privacy budget are selected from . So, there are combinations on each dataset with a selected base model. Figure 6 shows the experimental results. We notice that the overall performances are similar to what Table III reflects. However, a careful finetuning regarding this hyperparameter can make FGCL have better outcomes than the results listed in Table III, and it even happens on the dataset ClinTox. Plus, no regular pattern reveals how to choose combinations of and . Hence, this hyperparameter should be selected according to the requirement of privacy protection.
V Related Work
Va Graph Contrastive Learning
Graph contrastive learning (GCL) has emerged as a powerful tool for graph representation learning. Deep Graph Infomax (DGI) [36] is one of the first research works that introduced the concept of contrastive learning [7, 13, 33] into the graph learning domain, which adopts mutual information maximization as the learning objective to conduct contrastive learning between a graph and its corrupted instance. Then, other researchers borrow the same idea with different contrasting samples to conduct contrastive learning. For example, GCC [26] selected different graph instances from different datasets to construct contrasting pairs, GraphCL proposed using graph augmentation methods to do so, MVGRL [9], and DSGC [44] turned to generate multiple views to serve as the contrasting pairs. The success of GCL can be revealed by its wide applications in realworld scenarios, including recommender systems [43, 47, 45] and smart medical [21, 22, 39, 40].
VB Federated Graph Learning
Federated graph learning (FGL) lies at the crossdisciplinaries of graph neural networks (GNNs) and federated learning (FL), which leverages the advantages of FL to address limitations existing in the graph learning domain and achieves huge success in many scenarios. A representative application case of FGL is molecular learning [10], which helps different institutions efficiently collaborate to train models based on the small molecule graph stored in each institution without transferring their classified data to a centralized server [11, 42]. Moreover, FGL is also applied in recommender systems [38, 41], social network analysis [48], and internet of things [49]
. Various toolkits help researchers quickly build their own FGL models, such as Tensorflow Federated
^{7}^{7}7https://www.tensorflow.org/federated and PySyft [28]. However, these toolkits do not provide graph datasets, benchmarks, and highlevel APIs for implementing FGL. He et. al [12, 10] developed a framework named FedGraphNN, which is used in this paper, focusing on FGL. It provides comprehensive and highquality graph datasets, convenient and highlevel APIs, and tailored graph learning settings, which should facilitate the research regarding FGL.Vi Conclusion
This paper proposes a method named FGCL, the first work about graph contrastive learning in federated scenarios. We observe the similarity between differential privacy on graph edges and graph augmentations in graph contrastive learning, innovatively adopting graph contrastive learning methods to help the model obtain robustness against noises introduced by the DP mechanism. According to the comprehensive experimental results, the proposed FGCL successfully alleviates the performance decrease brought by noises introduced by DP mechanism.
Acknowledgment
This work is supported by the Australian Research Council (ARC) under Grant No. DP220103717, LE220100078, LP170100891 and DP200101374, and is partially supported by the APRC  CityU New Research Initiatives (No.9610565, Startup Grant for the New Faculty of the City University of Hong Kong), the SIRG  CityU Strategic Interdisciplinary Research Grant (No.7020046, No.7020074), and the CCFTencent Open Fund.
References
 [1] (2019) Differential privacy has disparate impact on model accuracy. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 814, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’AlchéBuc, E. B. Fox, and R. Garnett (Eds.), pp. 15453–15462. External Links: Link Cited by: §I, §IIIC.
 [2] (2017) Topology adaptive graph convolutional networks. CoRR abs/1710.10370. External Links: Link, 1710.10370 Cited by: 3rd item.
 [3] (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 47, 2006, Proceedings, S. Halevi and T. Rabin (Eds.), Lecture Notes in Computer Science, Vol. 3876, pp. 265–284. External Links: Link, Document Cited by: §I.
 [4] (2014) The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 (34), pp. 211–407. External Links: Link, Document Cited by: §I, §IIA, §IIA, §IIA, Definition 1.
 [5] (2014) The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 (34), pp. 211–407. External Links: Link, Document Cited by: §I, §IIIB.
 [6] (2016) A datadriven approach to predicting successes and failures of clinical trials. Cell Chemical Biology 23 (10), pp. 1294–1301. External Links: ISSN 24519456, Document, Link Cited by: 3rd item.

[7]
(2006)
Dimensionality reduction by learning an invariant mapping.
In
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 1722 June 2006, New York, NY, USA
, pp. 1735–1742. External Links: Link, Document Cited by: §VA.  [8] (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 49, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), pp. 1024–1034. External Links: Link Cited by: §IIIC.
 [9] (2020) Contrastive multiview representation learning on graphs. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 1318 July 2020, Virtual Event, Proceedings of Machine Learning Research, Vol. 119, pp. 4116–4126. External Links: Link Cited by: §I, §IIB, §IIIC, §VA.
 [10] (2021) FedGraphNN: A federated learning system and benchmark for graph neural networks. CoRR abs/2104.07145. External Links: Link, 2104.07145 Cited by: §I, §IVA, §IVC1, §VB.
 [11] (2021) SpreadGNN: serverless multitask federated learning for graph neural networks. CoRR abs/2106.02743. External Links: Link, 2106.02743 Cited by: §VB.
 [12] (2020) FedML: A research library and benchmark for federated machine learning. CoRR abs/2007.13518. External Links: Link, 2007.13518 Cited by: §I, §IVA, §IVC1, §VB.
 [13] (2020) Momentum contrast for unsupervised visual representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 1319, 2020, pp. 9726–9735. External Links: Link, Document Cited by: §VA.
 [14] (2020) LightGCN: simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 2530, 2020, J. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock, J. Wen, and Y. Liu (Eds.), pp. 639–648. External Links: Link, Document Cited by: 4th item.
 [15] (2021) Differential privacy in privacypreserving big data and learning: challenge and opportunity. In Silicon Valley Cybersecurity Conference  Second Conference, SVCC 2021, San Jose, CA, USA, December 23, 2021, Revised Selected Papers, S. Chang, L. A. D. Bathen, F. D. Troia, T. H. Austin, and A. J. Nelson (Eds.), Communications in Computer and Information Science, Vol. 1536, pp. 33–44. External Links: Link, Document Cited by: §I, §IIA, §IIIB.
 [16] (2021) Applications of differential privacy in social network analysis: a survey. IEEE Transactions on Knowledge and Data Engineering (), pp. 1–1. External Links: Document Cited by: §I, §IIIB.
 [17] (2022) Gromovwasserstein discrepancy with local differential privacy for distributed structural graphs. CoRR abs/2202.00808. External Links: Link, 2202.00808 Cited by: §IIIB.
 [18] (2011) Private analysis of graph structure. Proc. VLDB Endow. 4 (11), pp. 1146–1157. External Links: Link Cited by: §IIIB.
 [19] (2017) Semisupervised classification with graph convolutional networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 2426, 2017, Conference Track Proceedings, External Links: Link Cited by: §I, §IIIC, 1st item.
 [20] (201510) The SIDER database of drugs and side effects. Nucleic Acids Research 44 (D1), pp. D1075–D1079. External Links: ISSN 03051048, Document, Link, https://academic.oup.com/nar/articlepdf/44/D1/D1075/16661270/gkv1075.pdf Cited by: 1st item.
 [21] (2021) GeomGCL: geometric graph contrastive learning for molecular property prediction. CoRR abs/2109.11730. External Links: Link, 2109.11730 Cited by: §VA.
 [22] (202111) GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Briefings in Bioinformatics 23 (1). Note: bbab457 External Links: ISSN 14774054, Document, Link, https://academic.oup.com/bib/articlepdf/23/1/bbab457/42229967/bbab457.pdf Cited by: §VA.

[23]
(2017)
Communicationefficient learning of deep networks from decentralized data.
In
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 2022 April 2017, Fort Lauderdale, FL, USA
, A. Singh and X. (. Zhu (Eds.), Proceedings of Machine Learning Research, Vol. 54, pp. 1273–1282. External Links: Link Cited by: §IIIA, §IIID.  [24] (2019) Weisfeiler and leman go neural: higherorder graph neural networks. In The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, The ThirtyFirst Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27  February 1, 2019, pp. 4602–4609. External Links: Link, Document Cited by: 2nd item.
 [25] (2014) DeepWalk: online learning of social representations. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA  August 24  27, 2014, S. A. Macskassy, C. Perlich, J. Leskovec, W. Wang, and R. Ghani (Eds.), pp. 701–710. External Links: Link, Document Cited by: §I.
 [26] (2020) GCC: graph contrastive coding for graph neural network pretraining. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 2327, 2020, R. Gupta, Y. Liu, J. Tang, and B. A. Prakash (Eds.), pp. 1150–1160. External Links: Link, Document Cited by: §IIB, §IIIB, §IIIC, §IIIC, §IIIC, §VA.
 [27] (2021) Contrastive learning with hard negative samples. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 37, 2021, External Links: Link Cited by: §IVC3.

[28]
(2018)
A generic framework for privacy preserving deep learning
. CoRR abs/1811.04017. External Links: Link, 1811.04017 Cited by: §VB.  [29] (2022) The application of in silico methods for prediction of bloodbrain barrier permeability of small molecule pet tracers. Frontiers in Nuclear Medicine 2. External Links: Link, Document, ISSN 26738880 Cited by: 4th item.
 [30] (2016) Computational modeling of secretase 1 (bace1) inhibitors using ligand based approaches. Journal of Chemical Information and Modeling 56 (10), pp. 1936–1949. Note: PMID: 27689393 External Links: Document, Link, https://doi.org/10.1021/acs.jcim.6b00290 Cited by: 2nd item.
 [31] (2020) InfoGraph: unsupervised and semisupervised graphlevel representation learning via mutual information maximization. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 2630, 2020, External Links: Link Cited by: §I, §IIIC.
 [32] (2015) LINE: largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 1822, 2015, A. Gangemi, S. Leonardi, and A. Panconesi (Eds.), pp. 1067–1077. External Links: Link, Document Cited by: §I.
 [33] (2020) Contrastive multiview coding. In Computer Vision  ECCV 2020  16th European Conference, Glasgow, UK, August 2328, 2020, Proceedings, Part XI, A. Vedaldi, H. Bischof, T. Brox, and J. Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12356, pp. 776–794. External Links: Link, Document Cited by: §VA.
 [34] (2018) Representation learning with contrastive predictive coding. CoRR abs/1807.03748. External Links: Link, 1807.03748 Cited by: §IIIB, §IIIC.
 [35] (2018) Graph attention networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30  May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: §I, §IIIC.
 [36] (2019) Deep graph infomax. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 69, 2019, External Links: Link Cited by: §I, §IIB, §IIIC, §VA.
 [37] (2018) Not just privacy: improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 1923, 2018, Y. Guo and F. Farooq (Eds.), pp. 2407–2416. External Links: Link, Document Cited by: §I, §I, §IIA.
 [38] (2021) Fastadapting and privacypreserving federated recommender system. CoRR abs/2104.00919. External Links: Link, 2104.00919 Cited by: §IVC1, §VB.
 [39] (2021) Multiview graph contrastive representation learning for drugdrug interaction prediction. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 1923, 2021, J. Leskovec, M. Grobelnik, M. Najork, J. Tang, and L. Zia (Eds.), pp. 2921–2933. External Links: Link, Document Cited by: §I, §VA.
 [40] (2021) MolCLR: molecular contrastive learning of representations via graph neural networks. CoRR abs/2102.10056. External Links: Link, 2102.10056 Cited by: §VA.
 [41] (2021) FedGNN: federated graph neural network for privacypreserving recommendation. CoRR abs/2102.04925. External Links: Link, 2102.04925 Cited by: §VB.
 [42] (2021) Federated graph classification over noniid graphs. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 614, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.), pp. 18839–18852. External Links: Link Cited by: §VB.
 [43] (2021) Hyper metapath contrastive learning for multibehavior recommendation. In IEEE International Conference on Data Mining, ICDM 2021, Auckland, New Zealand, December 710, 2021, J. Bailey, P. Miettinen, Y. S. Koh, D. Tao, and X. Wu (Eds.), pp. 787–796. External Links: Link, Document Cited by: §I, §VA.
 [44] (2022) Dual space graph contrastive learning. CoRR abs/2201.07409. External Links: Link, 2201.07409 Cited by: §I, §I, §IIB, §IIB, §IIIC, §IIIC, §IIIC, §IIIC, §VA.
 [45] (2022) Knowledge graph contrastive learning for recommendation. CoRR abs/2205.00976. External Links: Link, Document, 2205.00976 Cited by: §VA.
 [46] (2020) Graph contrastive learning with augmentations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 612, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin (Eds.), External Links: Link Cited by: §I, §I, §IIB, §IIIB, §IIIC, §IIIC, §IIIC.
 [47] (2021) Are graph augmentations necessary? simple graph contrastive learning for recommendation. arXiv. External Links: Document, Link Cited by: §VA.
 [48] (2021) Subgraph federated learning with missing neighbor generation. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 614, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan (Eds.), pp. 6671–6682. External Links: Link Cited by: §VB.
 [49] (2021) ASFGNN: automated separatedfederated graph neural network. PeertoPeer Netw. Appl. 14 (3), pp. 1692–1704. External Links: Link, Document Cited by: §VB.
 [50] (2021) Graph contrastive learning with adaptive augmentation. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 1923, 2021, pp. 2069–2080. External Links: Link, Document Cited by: §I, §IIB, §IIB, §IIIC.