1 Introduction
Since graph neural network (GNN) enables directly model the structure information of network topology, it has attracted significant interest recently, both from research and application perspectives [wu2020comprehensive, zhou2020graph]. However, primarily due to business competition and regulatory restrictions, a wealth of sensitive graphstructured data that is held by different clients are unwilling to be shared, thus plaguing many practical applications, such as fraud detection over banks [kurshan2020graph] and social network recommendation over platforms [wu2021fedgnn].
Despite kinds of privacy preserving machine learning models have been successfully applied in data types like image
[hsu2020federated], text [ge2020fedner] and table [wu2021fedgnn], few works have concentrated on the domain of graph machine learning. For decentralized graph structure data, both nodes and edges are isolated, rendering most of the privacy learning methods designed for conventional datasets infeasible.In this work, we restrict attention to the problem of designing a privacypreserving GNN for node classification task that allows performance intact in the setting of horizontally partitioned graph dataset, which means the attributes of nodes and edges are aligned. As illustrated in Fig.1, we consider the scenario where several data holders which store private subgraphs access to one semihonest (a.k.a. honestbutcurious) server. Each local subgraph contains sensitive information about nodes, edges, attributes and labels. The semihonest server assumption means the server will follow protocol honestly, but it attempts to infer as much information as possible from received messages. In view of the fact that one node may interact with the same entity at several platforms, unlike previous work, we consider a more general scenario where overlapped nodes and edges exist among subgraphs.
To address the decentralized graph learning issue under privacy constraint, motivated by the ideas of split learning [vepakomma2018split] and horizontal federated learning [aono2017privacy], we propose a Server Aided Privacypreserving GNN (SAPGNN), where each GNN layer is divided into two submodels: the local model includes all the private data related computation to generate local node embedding, whereas the global model calculates global embedding by aggregating all local embedding. By this, the isolated neighborhood can be collaboratively utilized, and the receptive field can be enlarged by stacking multiple layers. Most importantly, when employing a pooling aggregator with proper update function, SAPGNN can generate identical node representation compared to the one learned over the combined graph.
The main contributions of this paper are summarized as follows:

We present a novel SAPGNN framework for training privacypreserving GNN in horizontally partitioned data setup. To the best of our knowledge, it is the first GNN learning paradigm that is capable of generating the same node embedding as the centralized counterpart.

We analyse the privacy and overhead of the proposed SAPGNN. A secure pooling mechanism instead of a naive global pooling aggregator is proposed to further protect privacy from the semihonest adversaries of server.

Experimental results on three datasets demonstrate the accuracy and macrof1 of SAPGNN surpass the one learned over isolated data, and also comparable to the stateoftheart approach, especially in the setting of I.I.D. label distribution.
This paper is organized as follows: Section 2 and 3 introduce recent works on privacy preserving GNN learning paradigms, notations as well as preliminaries; In Section 4 we describe and discuss our proposed SAPGNN framework in detail; These are followed by the experiments in Section 5; and finally Section 6 provides conclusion discussions and outlook.
2 Related Works
To tackle the privacypreserving node classification problem over decentralized graph data, some methods have recently been investigated to train a global GNN collaboratively on various split types of dataset.
First, two learning paradigms named PPGNN [zhou2020privacy] and ASFGNN [zheng2021asfgnn] were proposed based on split learning for vertically and horizontally split datasets respectively. Both of them alleviate isolation by firstly training local GNN models over private graphs and then learning global embedding at an assistant semitrusted third party. As the graph topology is still exploited locally, the model performance may be substantially reduced when the dataset is largely decentralized. More recently, LPGNN [sina2020practical] was developed to reduce communication overhead under the assumption that the server has accessed global graph topology except private node attributes. Despite its potential, this precondition is not always satisfactory since releasing topology to server may lead to privacy disclosure risk. We show the comparison of these methods in Table I.
Model  Nodes  Edges  Features 

PPGNN  aligned  not limited  not limited 
ASFGNN  different  different  aligned 
LPGNN  not limited  shared to server  aligned 
SAPGNN  not limited  not limited  aligned 
From the perspective of application, [wu2021fedgnn] proposes a GNNbased privacypreserving recommendation framework for the decentralized learning from useritem graph. [hefedgraphnn]
presents an opensource federated learning system and gives important insights into the federated GNN training over nonI.I.D. molecular datasets.
The nice property of our proposed SAPGNN is that it generates the same node embedding as the centralized GNN without having access to the raw data stored at other data holders. Unlike previous works, it can achieve the same accuracy as the one learned over the combined data for isolated datasets. In addition, it relaxes the constraints on the partition manners of both nodes and edges.
3 Preliminaries
For clarity, we summarize all the notations used in this paper in Table II.
Not.  Descriptions 

local graph of data holder  
nodes of data holder  
edges of data holder  
total number of data holders  
set of data holders  
total number of layers  
local loss at data holder  
total loss of all data holders  
neighbour of node at data holder  
input embedding of node at the th layer  
and global embedding at the th layer  
message of the edge connected to node and  
at data holder  
local aggregation of node at data holder  
local embedding of node at data holder  
global aggregation of node at server  
message construction function at layer  
local vertex update function at layer  
global vertex update function at layer  
XOR operator  
Nonnegative integer set not greater than  
encryption using additive sharing  
encryption using boolean sharing  
model weights  
the gradient of local weights at data holder  
the data size of local model weights  
the length of node embedding  
s the data size of the value of each weight  
the number of nodes from all local graphs  
weights of linear transformer matrix 

the label distribution ratio 
3.1 Graph representation learning
Let defines a graph with vertex set and edge set . Most existing layer stacked GNN models can be viewed as a special case of message passing architecture [gilmer2017neural]. Specifically, at the th layer, the message passing on node and its neighborhood set can be composed of three steps:
(1) 
(2) 
(3) 
where in (1) is a message construction function defined on each edge connected to . The message is constructed by combining the edge feature with the features of its incident nodes and ; The message aggregation function in (2) calculates by aggregating the feature of incoming finite unordered message set. The function is usually designed as a permutation invariant set function to guarantee the invariance/equivariance to isomorphic graph, popular choices include mean [hamilton2017inductive], pooling [li2019deepgcns], sum [xu2018powerful] and attention [velickovic2018graph]. Vertex update function in (3) updates the node feature according to its own feature and the aggregated message
. Lastly, the node representations are applied to loss functions for specific downstream tasks, e.g., node or graph classification
[kurshan2020graph, xu2018powerful], link prediction [wu2021fedgnn], etc.3.2 Split learning
Unlike federate learning [yang2019federated] where each client trains an entire replica of model, the keynote of split learning is splitting the execution of a model on a perlayer basis between clients and aided server [vepakomma2018split, gupta2018distributed]. In principle, each data holder first finishes the private data related calculation up to a cut layer, then the outputs are sent to another entity for subsequent computation. After the forward propagation, the gradients are computed based on loss function and backward propagated. Throughout the training or inference process, data privacy is guaranteed by the fact that raw data only participates in local computation and will never be accessed by others. Both theoretical analysis [singh2019detailed] and practical application [gao2020end] compare the efficiency and effectiveness of federated learning and split learning, and show the potential of both methods to design private decentralized learning procedures. For more details and advances, we refer to the reference [kairouz2019advances] and the website^{1}^{1}1https://splitlearning.github.io/.
3.3 Secret sharing
Our proposed model employs noutofn secret sharing schemes to recover privacy from secret shares [shamir1979share, demmler2015aby]. In particular, when client wants to share a bit value to parties, it firstly generates and sends uniformly at random to each client and generates that satisfies mod for additive sharing and for boolean sharing, respectively. Accordingly, can be reconstructed at any entity by gathering all shared values. Secret sharing has become one popular basis of advanced secure multiparty computation frameworks [patra2020aby2, byali2020flash] and been applied to many privacy preserving machine learning algorithms, such as secure aggregation [bonawitz2017practical], embedding generation [zhou2020privacy], and secure computation [mohassel2020practical]. For clarity, we denote additive sharing by and boolean sharing by in the following.
4 The Proposed SAPGNN Framework
In this section, we describe the proposed SAPGNN framework that has the ability to keep accuracy intact compared to the counterpart learned over the combined graph. The learning paradigm consists of parameter initialization, forward propagation, back propagation and local parameter fusion. At last, we give a discussion about additional overhead and data privacy in the presence of semihonest adversaries.
4.1 Parameter initialization
First of all, the participated data holders and server build pairwise secure channels for all sequential communication to ensure data integrity. Recall that all the nodes from local graphs share the same feature domain. Inspired by horizontal federated learning [aono2017privacy]
, local models at all data holders are initialized by the same weights to keep identical model behavior. This can be easily implemented by sharing the same initialization approach and random seed. Additionally, the shared parameters also include: (1) training hyperparameters that are shared among data holders and server, (2) hashed node index list that only shared to server. The hashed index list is used to index and distinguish nodes from all local graphs to hide the raw index information from the server. As for the server, it requires randomly initialization of global model weights to generate global embedding.
4.2 Forward Propagation
As illustrated in Fig.2, in order to protect data privacy (i.e., node attributes, edge information and node labels) while exploiting all isolated graph information, we design a modified message passing architecture in the manner of layerwise split learning. To be specific, the forward steps at each layer can be divided into two steps: it first calculates local embedding at each data holder individually with private data. Then, the semihonest server collects nonprivate local embeddings to compute global embedding. In the end, the output of the last layer is sent to label prediction and loss computation functions.
4.2.1 Private local embedding computation
In line of the message passing architecture, each data holder first constructs local message as
(4) 
where denotes the neighbor node set at the local graph of data holder , is the parameters of function .
The next step is local message aggregation. Suppose that aggregation is conducted over a combined graph from all data holders. Since the same edge may simultaneously appear at several data holders, it will lead to count the same node multiple times when sum [xu2018powerful], mean [hamilton2017inductive] and degreebased [kipf2017semi]
aggregators are employed. Fortunately, max/min pooling aggregator tackles this problem naturally, therefore we will complete the decentralized learning paradigm based on the pooling aggregator. Taking max pooling as an example, each data holder
aggregates messages over local neighbors by(5) 
After local aggregation, each data holder calculates the local node embeddings based on the node feature and aggregated neighbor feature via local vertex update function
(6) 
where
denotes the vector whose elements are all infinitesimals. The local embeddings
hide raw information of local graph, hence it can be sent to server for further global computation.4.2.2 Global embedding computation
This step consists of global aggregation and vertex update. Concretely, the server first aggregates local node embeddings from all data holders with the same pooling function to (5) by
(7) 
After that, the server transforms the aggregated embedding to compute global node representation of layer as
(8) 
To meet the various design space for GNN [you2020design], the combination of linear transformer, batchnorm, activation and dropout can be incorporated into the vertex update function to enhance model capacity.
Note that the result of pooling aggregation in (5) and (7) only depends on the elementwise maximum. In order to follow the same behavior of centralized GNN layer, the global aggregated result (i.e., the left side of (9)) should be identical to that aggregated at all neighbors of combined graph (i.e., the right side of (9)), which can be formulated as
(9)  
To satisfy the equation above, the constraints of local updates function can be given in the following:
Proposition[Constraints of local updates function] When the aggregation function is elementwise max, each element of the output of local update function should monotonically increase with each increased element of , e.g. can be chosen from , and , where denotes concatenation, denotes elementwise multiplication,
denotes multilayer perceptron.
Proof.
Denote the results of elementwise max aggregation at the local neighbor information as
(10) 
and the entire neighbor information as
(11)  
respectively. Incorporating (10) and (11), equation (9) in the paper can be simplified as
(12)  
Omits the layer index and node index , denote , the above equation reduces to
(13) 
Obviously, according to the property of max function, equation (12) holds if and only if for and each element of vector , i.e., each element of the output of should monotonically increase with the increasing of each element of . ∎
When the global embeddings of all nodes at layer have been obtained by (8), the server distributes them to each data holder according to the node list of local graph for the forward propagation of the next layer. This process is conducted iteratively until the last layer .
4.2.3 Private local loss computation
When the global node embeddings of the last layer have been computed, each data holder predicts labels based on the embeddings by
(14) 
the local loss at data holder over local training node labels can be then computed by
(15) 
respectively, where is the number of labels at data holder , is the loss function, such as crossentropy for a classification task and mean square loss for a regression task.
To summarize, the forward propagation algorithm is given in Algorithm 1. When the forward propagation is finished, model weights can be updated by the back propagation procedure outlined in what follows.
4.3 Back Propagation
Recall that the local part (i.e., the local embedding and loss related computation at data holders) and the global part (i.e., the global embedding related computation at server side) at each layer are spatially isolated. According to the chain rule of derivation, the entire model can be updated iteratively through communicating intermediate gradients between data holder and server. Herein, the gradient of local model weights are computed individually and then secretly aggregated for update. In the following, we give the computation and communication of back propagation procedure in detail.
4.3.1 Individual back propagation of predict layer
As the bridge of the final node embedding and the model output, the weight of predict function at data holder can be first learned by gradient descent through minimizing local loss individually. After that, the data holder computes the gradient of loss with respect to the input of predict function and then sends it to the server for subsequent back propagation.
4.3.2 Back propagation of each SAPGNN layer
Due to the property of derivation, the gradient of entire loss with respect to the output of the last layer can be computed as
(16) 
while for the th layer (), based on the derivation of max function, the gradient of loss with respect to the input embedding can be decomposed as
(17) 
where part denotes the gradient of loss with respect to the output global embedding of layer , denotes the gradient of global embedding with respect to the result of global aggregation , denotes the gradient of with respect to the input of global model . Obviously, both and can be computed at the server side. Part denotes the gradient of data holder output with respect to the input embedding , this can be obtained at each data holder individually. Therefore, according to (17), the gradients can be back propagated layer by layer recursively. At each layer, the propagation is first carried out globally at the server side and then locally and parallelly at each data holder side.
4.3.3 Global back propagation at server side.
The server first obtains by summing received gradients from all data holders, and then computes the derivation with respect to global model weights and local embedding for every :
(18) 
(19) 
respectively. The result of (19) is sent to corresponding data holder for the sequential local back propagation.
4.3.4 Local back propagation at data holder side.
The gradient of loss with respect to local weights set at data holder can be expressed as
(20) 
According to of (17), each data holder also needs to calculate and send the gradient of output local embedding with respect to input node embedding to server.
4.4 Weights update
As described above, the model weights of SAPGNN are spatially divided into two categories: global submodel weights held by server and local submodel weights held by data holders.
4.4.1 Update of global model weight.
When the corresponding gradients have been obtained by (18), the global weights can be directly updated through gradient descent.
4.4.2 Update of local model weight.
To keep the isolated local weights of all data holders identical during training, the corresponding local gradients should be federally aggregated at all data holders respectively, such as secure aggregation [bonawitz2017practical] or homomorphic encryption [aono2017privacy]. Taking secure aggregation as an example, let the gradients of weights be aggregated at data holder as . Each data holder first secretly shares local gradient to the data holder and then sums up the shares by
(21) 
After that, each data holder reconstructes the entire gradients for update by gathering aggregated results from others by
(22) 
Note that during this procedure, each data holder only accesses the secret shares and reconstructed entire gradients, whereas the server knows nothing about local gradients.
4.5 Discussion of security and overhead
4.5.1 Data privacy
In our proposed learning paradigm, data privacy can be guaranteed by the following reasons:

All aforementioned private data (including node attributes, edge information, labels and local model gradients) related computations are carried out by data holders locally. From the perspective of semihonest server, only the hashed node lists of local graph, local embedding computed at (6) and global model are observable. Therefore, our SAPGNN is secure against semihonest adversaries.

The only sensitive messages observed by data holders are the secret shares of gradients of local model weights. Since the gradients are split by noutofn secret sharing algorithm, raw data can be reconstructed if and only if one can gather all the shared parts. It prevents semihonest adversaries from other data holders.

TLS/SSL protocol ensures security and data integrity of pairwise network communication [aono2017privacy].
4.5.2 Extra communication overhead
Noutofn secret sharing leads to a quadratic growth of communication overhead with respect to the number of data holders. The overhead of aggregating gradient of local model for update is given as , where denotes the data size of each weight. In addition, as we have explained in the forward process, the local embedding and the global embedding are transmitted between server and data holder at each layer. Let denotes the length of node embedding, denotes the number of nodes from all local graphs, the communication overhead can be represented as . Therefore, although a small number of layers is sufficient for training a competitive GNN [chen2020simple] that impedes oversmoothing, the communication overhead will become a bottleneck and limits efficiency and scalability, since and can be extremely large in the case of a heavy model with millions of parameters, or the Internet of Things scenario with massive devices [gao2020end]. Potential solutions include conducting minibatch training instead of fullbatch training, or utilizing model and communication compression technology [rothchild2020fetchsgd]. We leave these optimizations as future works.
4.5.3 Secure global pooling aggregation
Note that when conducting global aggregation, only the elementwise maximum values over all local embedding in (7) are required during forward step, while corresponding indexes of data holder (i.e., in (17
)) are needed at backward step. To further improve privacy, the raw information can be encrypted by private compare approaches by exploiting the technique of secure maximum computation protocol, which has been widely utilized in machine learning applications such as kmeans
[jaschke2018unsupervised, mohassel2020practical]. Specifically, for each element of local embedding , the problem of outputting the secret share of the index vector that indicates the maximum value among numbers can be formulated as . This function has been deeply investigated in recent works such as [mohassel2020practical], which can be efficiently implemented by employing lessthan garbled circuits and instances of oblivious transfer extension. Utilizing the secure global pooling aggregation leads to more obstacles for the semihonest server to learn private information from data holders.5 Evaluation
In this section we present our experimental results for our proposed SAPGNN. We first describe the datasets, experimental setup and comparison methods. After that, we ran experiments to point the superiority of SAPGNN under a near IID label distribution setting.
Subgraph  Cora  Citeseer  Pubmed 

Nodes  2708  3327  19717 
Edges  5278  4552  44324 
Features  1433  3703  500 
Train  140  120  60 
Val  500  500  500 
Test  1000  1000  1000 
Classes  7  6  3 
5.1 Datasets and experimental setup
We test SAPGNN on three publicly available citation node classification datasets that are used for node classification in previous works [zhou2020privacy, zheng2021asfgnn], i.e., Cora, Citeseer and Pubmed. For these datasets, each node represents a document, while edges denote citation links. Each node has a bagofwords feature vector and a label indicating its category. We follow the same node mask with the default setting of DGL framework [wang2019dgl] for training, validation, and test node sets. The main characteristics of each dataset are given in Table III. All experiments are evaluated on a Windows desktop with 3.2G 6core Intel Core i78700 CPU and 16 GB of RAM.
5.2 Compared methods
We compare SAPGNN against two methods

The first is separate training (SP), i.e., each data holder trains GNN individually over their own subgraph. It cannot utilize information from others and thus can be treated as a baseline method.

The second is PPGNN [zhou2020privacy] that first conducts separate training and then predicts over combined node embedding. Note that training, validation, and test node sets for PPGNN need to be privately aligned among data holders respectively before experiments since it requires each node exists at all local graphs.
For all methods, we use a twolayer GNN constructed by following formulation:
(23) 
ReLU activation function and dropout are applied on the output of each layer except the last one. All the considered models are trained over a maximum of 300 epochs using the crossentropy with Adam optimizer and learning rate of 0.01. We performed a grid search with early stop to find the best choices for the hidden size for each method, and the accuracy and macroF1 are evaluated on the test set over 40 consecutive runs.
Number of data holders  1  2  3  4  

Dataset  Model  Acc  F1  Acc  F1  Acc  F1  Acc  F1 
Cora  SP  78.5  77.4  75.2  74.3  72.7  71.7  70.6  69.5 
0.54  0.55  0.79  0.77  0.85  1.00  0.87  0.91  
PPGNN  –  –  77.5  76.5  77.0  75.9  76.4  75.1  
–  –  1.36  1.31  1.10  1.14  1.36  1.36  
SAPGNN  78.5  77.4  78.5  77.4  78.5  77.4  78.5  77.4  
0.54  0.55  0.54  0.55  0.54  0.55  0.54  0.55  
Citeseer  SP  69.8  66.6  68.0  64.8  65.2  61.7  63.2  59.0 
0.59  0.62  1.81  1.76  0.99  1.16  2.12  2.38  
PPGNN  –  –  67.1  63.3  66.3  62.8  64.9  61.5  
–  –  1.72  2.45  2.07  1.81  2.69  2.58  
SAPGNN  69.8  66.6  69.8  66.6  69.8  66.6  69.8  66.6  
0.59  0.62  0.59  0.62  0.59  0.62  0.59  0.62  
Pubmed  SP  78.3  77.7  75.9  75.3  73.9  73.4  72.2  71.7 
0.51  0.49  1.08  1.12  1.09  1.08  1.01  1.01  
PPGNN  –  –  78.9  78.4  79.0  78.7  79.2  79.0  
–  –  0.88  0.84  0.58  0.57  0.61  0.56  
SAPGNN  78.3  77.7  78.3  77.7  78.3  77.7  78.3  77.7  
0.51  0.49  0.51  0.49  0.51  0.49  0.51  0.49 
5.3 Results with uniformly split edges
Firstly, we compare the three decentralized learning methods under the IID edge information setting, where the original edge set is divided uniformly into the subgraph of each data holder, and the performance results are reported on Table IV. First, we can observe that the metrics of SAPGNN keep identical with varying numbers of data holders, and equal to the results obtained by centralized counterpart (i.e., SP when the number of data holders is 1). The reason is straightforward, as the learned global node representation of SAPGNN is the same as that learned over the combined graph. Secondly, SAPGNN consistently outperforms SP, and the gaps widen with the growth of data holders, since SP only accesses local information. Compared to PPGNN, SAPGNN is competitive for Cora and Citeseer datasets, but is slightly worse in the case of Pubmed.In the following, we mainly compare SAPGNN and PPGNN in case of nonIID label distribution and drop the SP method for conciseness.
5.4 Results with varies label distribution
Existing works have demonstrated that the performance of decentralized learning method decreases with the raise of nonIID label distribution [gao2020end, zhao2018federated]. To examine this, we first divide nodes into different data holders according to the label, and then % nodes from each data holder are split uniformly to other data holders. Only the edges connected to nodes at the same data holder retained. Thus varying the label distribution level from to implies more similar label distribution among data holders, and increasing the number of data holders will lead to more removed edges. Taking two data holders with Cora dataset as an example, Fig.3 shows the percentage of nodes at different data holders for each class, where implies the labels among data holders are absolutely different, i.e., the subgraph at data holder 1 includes 1097 nodes of the first four classes, while data holder 2 only has 543 nodes with labels of the last three classes. As for the case of , each subgraph contains about half of the nodes of each class (821 nodes at data holder1 while 819 nodes at data holder 2). Note that original PPGNN can only generate embeddings for overlapped nodes at all data holders. For fair comparison, instead of directly removing nodes, we remove all connected edges for these nodes at each local subgraph and thus no messages will pass from or to adjacent neighbors.
Fig.4 and Fig.5 respectively show the node classification accuracy and F1 score when the number of data holders varies from 2 to 4 and . We can observe that label distribution has an important influence on metrics. In specific, when , the performance of PPGNN has a comfortable lead over SAPGNN.This is because PPGNN generates node embedding locally and thus can balance the contributions from different data holders. When the classes of nodes are totally different among data holders, training a shared or federal model has no benefit over that learns individually with a relatively simple classification task [zhao2018federated]. On the other hand, SAPGNN has comparable performance when , and outperforms PPGNN when in all datasets, which means SAPGNN is more effective on learning from adjacent information for the scenario where all data holders tend to have near IID label distribution (). At last, by comparing the performances of SAPGNN with the same over various number of data holders, we can find that removing interclass edges may reduce the learning performance for Citeseer, while has relatively low influence on those for Cora and Pubmed.
6 Conclusion
In this paper, we proposed a server aided privacypreserving GNN framework for the horizontally partitioned graph structure dataset. It enables the ability to generate the same node embeddings as the centralized GNN without revealing raw data. Therefore, proven concepts from the centralized one (e.g., convergence and generalization) can also be transferred to the proposed SAPGNN. For privacy concerns, we further give a secure global pooling aggregation mechanism that is capable of hiding raw local embeddings from semihonest adversaries. We showed successful cases of SAPGNN on the node classification task especially when the labels of isolated datasets tend to have identical distribution, but it behaves worse than existing methods under highly skewed nonIID label distribution. This observation can be utilized for the guidance of choosing suitable decentralized learning paradigms according to the distribution of graph data.
In future, we would like to transfer our proposed learning framework to more general GNN architecture and more partition types of graph dataset. More importantly, how to enhance communication efficiency should pay attention to unleash the full potential of SAPGNN and other decentralized GNN learning approaches for the applications in reality.
Comments
There are no comments yet.