Privacy-Preserving Graph Neural Network for Node Classification

05/25/2020 ∙ by Jun Zhou, et al. ∙ Ant Financial Zhejiang University Peking University 0

Recently, Graph Neural Network (GNN) has achieved remarkable progresses in various real-world tasks on graph data, consisting of node features and the adjacent information between different nodes. High-performance GNN models always depend on both rich features and complete edge information in graph. However, such information could possibly be isolated by different data holders in practice, which is the so-called data isolation problem. To solve this problem, in this paper, we propose a Privacy-Preserving GNN (PPGNN) learning paradigm for node classification task, which can be generalized to existing GNN models. Specifically, we split the computation graph into two parts. We leave the private data (i.e., features, edges, and labels) related computations on data holders, and delegate the rest of computations to a semi-honest server. We conduct experiments on three benchmarks and the results demonstrate that PPGNN significantly outperforms the GNN models trained on the isolated data and has comparable performance with the traditional GNN trained on the mixed plaintext data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph Neural Network (GNN) has gained increasing attentions from both academy and industry due to its ability to model high-dimensional feature information and high-order adjacent information on both homogeneous and heterogeneous graphs [34]

. GNN has a wide range of practical applications such as computer vision

[33], traffic prediction [26], fraud detection [22], and recommender system [36].

An important ingredient for high-performance GNN models is high-quality graph data including rich node features and complete adjacent information. However, in practice, such information could possibly be isolated by different data holders, which is the so-called data isolation problem. That is, due to business competition or regulatory reasons, different data holders are unwilling or unable to share data with each other, thus the data actually exists in the form of isolated islands [35]

. Such a data isolation problem also presents a serious challenge for developing other Artificial Intelligence (AI) algorithms. To this end, how to design private AI algorithms becomes a hot research topic.

To date, kinds of privacy preserving machine learning models have been designed for the data isolation problem, including decision tree

[18]

, linear regression

[12]

, logistic regression

[3, 5], recommender system [7, 6], and neural network [32, 38]. Surprisingly, there is no existing work on privacy preserving GNN yet. To fill this gap, in this paper, we present the first framework for developing privacy-preserving GNN models. However, unlike previous privacy preserving machine learning models that assume only samples (nodes) are held by different parties and these samples have no relationship, our task is more challenging because GNN relies on the relationships between samples, but the relationship is also kept by different data holders.

Problem. Figure 1 shows a node classification problem under vertically partitioned data setting. Here, for simplification, we assume there are three data holders (, , and ) and they have four same nodes. The node features are vertically split, i.e., has , , and , has and , and has and . Meanwhile, , , and may have their own edges. For example, has social relation between nodes while and have payment relation between nodes. We also assume is the party who has the node labels. Note that one can slightly modify Figure 1 to the horizontally partitioned setting [35], i.e., data holders have the same features and edge types but different nodes. The problem is to build privacy preserving GNN models by using the graph data of and cooperatively. In this paper, we take vertically partitioned setting for example, and present how to build privacy preserving GNN models.

Figure 1: The proposed PPGNN on vertically partitioned data.

Naive solution. A direct way to build privacy preserving GNN is employing advanced cryptographic algorithms, such as homomorphic encryption (HE) and secure multi party computation (MPC) [23, 11], to build the training/inference of GNN models. Such a pure cryptographic way can provide high security guarantees, however, it suffers high computation and communication costs, which limits their efficiency, as analyzed in [24].

Our solution. Instead, we propose an efficient Privacy Preserving GNN (PPGNN) learning paradigm. Motivated by the existing work in split learning [31, 24, 13]

, we split the computation graph of GNN into two parts for privacy and efficiency concern, i.e., the private data related computations carried out by data holders and non-private data related computations conducted by a semi-honest server. Here, private data refers to node features, edges, and node labels, while the non-private data refers to the encoded hidden representations.

Specifically, data holders first apply secure Multi-Party Computation (MPC) techniques to compute the initial layer of the GNN using private node feature information collaboratively, which acts as the feature extractor module, and then perform neighborhood aggregation using private edge information individually, similar as the existing GNNs [30], and finally get the local node embeddings. Next, we propose different combination strategies for a semi-honest server to combine local node embeddings from data holders and generate global node embeddings, based on which the server can conduct the successive non-private data related computations, e.g., the non-linear operations in deep network structures that are time-consuming for MPC techniques. Finally, the server returns the final hidden layer to the party who has labels to compute prediction and loss. Data holders and the server perform forward and back propagations to complete model training and prediction, during which the private data (i.e., features, edges, and labels) are always kept by data holders themselves. Here we assume data holders are honest-but-curious, and the server does not collude with data holders. We argue that this is a reasonable assumption since the server can be played by authorities such as governments or replaced by trusted execution environment [10].

Contributions. We summarize our main contributions as:

  • [leftmargin=*]

  • We propose a novel PPGNN learning paradigm, which not only can be generalized to most existing GNNs, but also enjoy good accuracy and efficiency.

  • We propose different combination strategies for the server to combine local node embeddings from data holders.

  • We evaluate our proposals on three real-world datasets, and the results demonstrate our effectiveness.

The rest of the paper is organized as follows. In Section 2, we describe the background knowledge on security model, GNN, and a popular MPC technique, i.e., secret sharing. In Section 2, we present our proposed privacy-preserving GNN learning framework. In Section 4, we present the experimental results and analysis. In Section 5, we review related work. Finally, we conclude the paper in Section 6.

2 Preliminaries

In this section, we present some preliminary techniques of our proposal, including security model, GNN, and a popular MPC technique, i.e., secret sharing.

2.1 Security Model

In literature, there are two commonly used security model, i.e., honest-but-curious (semi-honest) model and malicious model. The former has better efficiency while the later has better security degree. Our proposed model is secure against honest-but-curious (semi-honest) adversaries. That is, data holders and the server strictly follow the protocol, but they also use all intermediate computation results to infer as much information as possible. We also assume that the server does not collude with any data holders. This security setting is the same as most existing works [23, 15].

2.2 Graph Neural Network

GNN extended existing neural networks for processing the data represented in graph domains. The key of GNN is to learn a function to generate node embeddings for each node in a graph based on its own features and neighbors’ features , with denoting the neighborhood function in a graph [34]. After this, with the generated node embeddings, one can do further tasks such as classification [16], link prediction [37], and recommendation [36] using deep neural networks. In this paper, we focus on node classification task. The key to this task is the design of aggregator function, which learns to aggregate feature information from the node’s local neighborhood [39]. To date, different types of aggregator functions have been proposed, e.g., convolution based [14], gated mechanism based [17], and attention based [30, 21].

2.3 Additive Secret Sharing

Our proposal depends on Additive Secret Sharing [28]. Typically, we focus on -out-of- Secret Sharing in this paper, i.e., all shares are needed to reconstruct a secret, which has been popularly used in machine learning and data mining [20, 6]. To additively share an -bit value for party , party generates { and } uniformly at random, sends to party , and keeps mod . We use to denote the share of party . To reconstruct a shared value , each party sends to one who computes mod . For simplification, we denote additive sharing by in this paper.

Multiplication and Vectorization. Existing research on secret sharing multiplication protocol are based on Beaver’s triplet technique [4]. Specifically, to multiply two secretly shared values and between two parties and , they first need to collaboratively choose a shared triple , , and , where are uniformly random values in and mod . They then locally compute and , where . Next, they run and . Finally, gets as a share of the multiplication result, such that

. It is easy to vectorize the addition and multiplication protocols under secret sharing setting.

Apply to decimal numbers. The above protocols can not work directly with decimal numbers, since it is not possible to sample uniformly in [9]. We approximate decimal arithmetics following the existing work [23]. Suppose and are two decimal numbers with at most bits in the fractional part, to do fixed-point multiplication, we first transform them to integers by letting and , and then calculate . Finally, we truncate the last bits of so that it has at most bits representing the fractional part, with in this paper. It has been proven that this truncation technique also works when is secret shared [23].

3 The Proposed Model

In this section, we first give an overview of the proposed privacy-preserving GNN learning framework, which divides the computational graph of GNN into three sub-graphs. We then present its three important modules, i.e., generating initial node embeddings, generating local node embeddings, and generating global node embeddings on server. Finally, we present how to learn the proposed model.

3.1 Overview of PPGNN

Figure 2: The proposed PPGNN on vertically partitioned data. PPGNN is divided into three Computational Graphs (CG), i.e., private feature and edge related computations (CG1, shown in red), non-private data related computations (CG2, shown in green), and private label related computations on data holder (CG3, shown in blue). Besides, PPGNN has three key steps, i.e., generate initial node embeddings (Step 1), generate local node embeddings (Step 2), and generate global node embeddings (Step 3).

In practice, the data isolation problem can be mainly divided into two types based on data split forms, i.e., horizontally partitioned data and vertically partitioned data. Take GNN for example, the former indicates the isolated data holders have different nodes, but the same feature schema and edge type. While the later denotes the isolated data holders have the same nodes, but different feature schemes and edge types, as shown in Figure 1. Currently, users are always active on different platforms at the same time, e.g., social platform and e-commerce platform, which makes them have different feature schemes and edge types. Therefore, vertically data split setting is more common in practice, and we focus on this setting in this paper.

The first step in private machine learning under vertically data split setting is secure entity alignment, also known as Private Set Intersection (PSI). That is, data holders align nodes without exposing those that do not overlap with each other. PSI has been researched extensively in the prior work[25]. In this paper, we assume data holders have aligned their nodes and are ready for performing privacy preserving GNN training and inference.

As described in Section 1, for the sake of privacy and efficiency, we design a novel Privacy Preserving GNN (PPGNN) learning paradigm by splitting the computational graph of GNN into two parts. That is, we keep the private data related computations to data holders for privacy concern, and delegate the non-private data related computations to a semi-honest server for efficiency concern. In the context of GNN, the private data refers to node features, labels, and edges (node relations). To be specific, we divide the computational graph into the following three sub-Computational Graphs (CG), as is shown in Figure 2.

CG1: private feature and edge related computations. The first step of GNN is generating initial node embeddings using node’s private features, e.g., user features in social networks. In vertically data split setting, each data holder has partial node features, as shown in Figure 1. We will present how data holders collaboratively learn initial node embeddings in Section 3.2. In the next step, data holders generate local node embeddings by aggregating multi-hop neighbors’ information using different aggregator functions, which we will describe in Section 3.3.

CG2: non-private data related computations. We delegate non-private data related computations to a semi-honest server for efficiency concern. First, the server combines the local node embeddings from data holders with different COMBINE strategies, and obtains the global node embeddings, which we will further describe in details in Section 3.4

. Next, the server can perform the successive computations using plaintext data. Note that this part has many non-linear computations such as max-pooling and activation functions, which are not cryptographically friendly. For example, existing works approximate the non-linear activations by using either piece-wise functions that need secure comparison protocols

[23] or high-order polynomials [15]. Therefore, their accuracy and efficiency are limited. Delegating these plaintext computations to server will not only improve our model accuracy, but also significantly improve our model efficiency, as we will present in experiments. After this, the server gets a final hidden layer () and sends it back to the data holder who has label to calculate prediction, where is the total number of layers of the deep neural network.

CG3: private label related computations on data holder. The data holder who has label can compute prediction using the final hidden layer it receives from the server. For node classification task, the Softmax activation function is used for the output layer [16], which is defined as with be the node class and .

In the following subsections, we will describe three important components of PPGNN, i.e., initial node embeddings generation in CG1, local node embeddings generation in CG2, and global node embedding generation in CG3.

(a) Individually
(b) Collaboratively
Figure 3: Two methods of generating initial node embeddings.

3.2 Generate Initial Node Embeddings

Initial node embeddings are generated by using node features. Under vertically partitioned data setting, each data holder has partial node features. Under such setting, there are two methods for data holders to generate initial node embeddings, i.e., individually and collaboratively, as is shown in Figure 3.

The ‘individually’ method means that data holders generate initial node embeddings using their own node features, individually. For data holder , this can be done by , where and are node features and weight matrix of data holder . Although this method is simple and data holders do not need to communicate with each other, it cannot capture the relationship between features of data holders and thus causes information loss. As the example in Figure 3, data holder generates its inital node embeddings using features and , while data holder generates its inital node embeddings using features and . That is, it assumes the features of and are independent, and thus cannot capture the relation between them.

To solve the shortcoming of ‘individually’ method, we propose the ‘collaboratively’ method. It indicates that data holders generate initial node embeddings using their node features, collaboratively, and meanwhile keep their private features secure. Technically, this can be done by using cryptographic methods such as secret sharing and homomorphic encryption [2]. In this paper, we choose additive secret sharing (described in Section 2.3) due to its high efficiency. We summarize the protocol in Algorithm 1. Traditionally, the initial node embeddings can be generated by , where x is node features and W is weight matrix. When features are vertically partitioned, we calculate initial node embeddings as follows. First, we secretly share x among data holders (Lines 1-1). Then, data holders concat their received shares in order (Line 1). After it, we calculate following distributive law (Lines 1, 1, and 1). Take two data holders for example, . Finally, data holders reconstruct by summing over all the shares (Lines 13-15). The security can be proved by following the real world and ideal world simulation based approach [19], similar as SecureML [23].

Input: {}, and for short
Output: The share of initial node embeddings for each data holder {}
1 for  in parallel do
2       randomly initializes
3       locally generates
4       distributes to others
5       concats and get
6       locally calculates as -share
7      
8for  and  do
9       and calculate -share and -share using secret sharing
10       and calculate -share and -share using secret sharing
11      
12for  in parallel do
13       locally calculates the summation of all -shares, denoted as
14       sends to ,
15       reconstructs , denote as
return for each data holder
Algorithm 1 Data holders securely generate the initial node embeddings using secret sharing
Figure 4: Generate local node embeddings by neighborhood aggregation.

3.3 Generate Local Node Embeddings

We generate local node embeddings by using multi-hop neighborhood aggregation on graphs, based on inital node embeddings. Note that, unlike existing GNNs that perform neighborhood aggregation using the centralized data, under data isolated setting, neighborhood aggregation should be done by data holders separately, rather than cooperatively, to protect the private edge information. This is because one may infer the neighboorhoold information of given the neighborhood aggregation results of -hop () and -hop (), if neighborhood aggregation is done by data holders together. A special case is , where it is likely that is an isolated node in the graph of data holder . Therefore, to protect graph privacy, we let data holders perform multi-hop neighborhood aggregation separately using their own graphs.

For at each data holder, neighborhood aggregation is the same as the traditional GNN, as is shown in Figure 4. Take GraphSAGE [14], a general inductive framework, for example, it introduces aggregator functions to update hidden embeddings by sampling and aggregating features from a node’s local neighborhood:

(1)

where we follow the same notations as GraphSAGE, and the aggregator functions AGG are of three types, i.e., Mean, LSTM, and Pooling. Since our proposed PPGNN learning framework is suitable for any existing GNN models, without loss of generality, in this paper, we will take GraphSAGE as a typical GNN case and present how to make it secure in data isolated setting. After it, data holders send local node embeddings to a semi-honest server for combination and further non-private data related computations.

3.4 Generate Global Node Embeddings

The server combines the local node embeddings from data holders and gets global node embeddings. The combination strategy would be trainable and maintaining high representational capacity. We design three combination strategies.

Concat. The concat operator can fully preserve local node embeddings learnt from different data holders. That is, Line 14 in Algorithm 2 becomes

(2)

Mean. The mean operator takes the elementwise mean of the vectors in , assuming data holders contribute equally to the global node embeddings, i.e.,

(3)

Regression. The above two strategies treat data holders equally. In reality, the local node embeddings from different data holder may contribute diversely to the global node embeddings. We propose a Regression strategy to handle this kind of situation. The regression operator aims to combine the embedding elements from data holders through a regression model, whose parameters are learnt intelligently during training. Let be the weight vector of local node embeddings from data holder , then

(4)

where is element-wise multiplication. Regression can handle the situation where the data quality and quantity (feature and edge size) of data holders are different from each other.

These different combination operators can utilize local node embeddings in diverse ways, and we will empirically study their effects on model performances in experiments.

Input: Graph and node features {} on data holder ; depth ; aggregator functions AGG; weight matrices ; max layer ; weight matrices ; non-linearity ; neighborhood functions ; node labels on data holder and
Output: Node label predictions {} on data holder
1 # CG1: private feature and edge related computations by data holders
2 Data holders: collaboratively calculate the initial node embeddings , using Algorithm 1
3 for  in parallel do
4       for  to  do
5             for  do
6                   Data holder: calculates AGG
7                  
8            Data holder: calculates CONCAT
9            
10      Data holder: calculates local node embeddings and sends it to server
11 # CG2: non-private-data-related-computations by server
12 for  do
13       Server: combines the local node embeddings from data holders COMBINE
14       Server: forward propagation based on the global node embeddings
15       Server: sends to data holder
16      
17 # CG3: private label related computations by data holder who has label
18 Data holder : makes prediction by
Algorithm 2 Privacy preserving GraphSAGE for node label prediction (forward propagation)

3.5 Putting together

By combining thee sub-computational graphs (CG1-CG3) and the key components described above, we complete the forward propagation of PPGNN. To describe the procedures in details, we take GraphSAGE [14], a general inductive GNN framework, for example and present its forward propagation process in Algorithm 2. In it, Lines 2-2 denote CG1 on data holders, and particularly, Line 2 (Algorithm 1) shows how data holders generate initial node embeddings using node feature information collaboratively (Section 3.2), and Lines 2-2 describe how data holders generate local node embeddings separately (Section 3.3), i.e., perform multi-hop neighborhood aggregation using edge information. Note that Lines 2-2 can be replaced with the aggregation strategies of other existing GNNs, e.g., [14, 17, 30], to adapt them to our PPGNN learning paradigm. Lines 2-2 are CG2 on server, where Line 2 shows how to generate global node embeddings on server (Section 3.4). Line 2 shows how the server makes forward propagation to get the last hidden layer. Line 2 shows how data holder conduct CG3 using the last hidden layer.

3.6 Model learning

PPGNN can be learnt by gradient descent through minimizing the cross-entropy error over all labeled training examples

(5)

where is the set of training nodes that have labels.

Specifically, the model weights of PPGNN are in four parts. The weights for the initial node embeddings, i.e., , that are secretly shared by data holders, the weights for neighborhoold aggregation on graphs, i.e., , that are also kept by data holders, the weights for the hidden layers of deep neural networks, i.e., , that are hold by server, and the weights for the output layer of deep neural networks, i.e., , that are hold by the data holder who has label. These weights are trained using gradient descent. In this work, we perform batch gradient descent, i.e., we use the full dataset in each iteration, which is a viable option as long as dataset fits in memory. We leave memory-efficient extensions with mini-batch gradient descent for future work. As can be seen, in PPGNN, both private data and model are hold by data holders themselves, thus the data privacy are guaranteed.

4 Experiments

We conduct experiments to answer the following questions:

  • [leftmargin=*]

  • Q1: whether PPGNN outperforms the GNN models that are trained on the isolated data.

  • Q2: how does PPGNN behave comparing with the traditional insecure model trained on the plaintext mixed data.

  • Q3: how does PPGNN perform comparing with the naive solution in Section 1.

  • Q4: are our proposed combination strategies effective to PPGNN.

  • Q5: what is the effect of the number of data holders on PPGNN.

4.1 Experimental Setup

We first describe the datasets and comparison methods we use in our experiments.

Datasets. We use three benchmark datasets, i.e., Cora, Pubmed, and Citeseer [27]. They are popularly used to evaluate the performance of GNN. We use exactly the same dataset partition of training, validate, and test following the prior work [16]. Besides, in data isolated GNN setting, both node features and edges are hold by different parties. For all the experiments, we split the datasets randomly, use five-fold cross validation

and adopt accuracy as the evaluation metric.

Dataset #Node #Edge #Features #Classes
Cora 2,708 5,429 1,433 7
Pubmed 19,717 44,338 500 3
Citeseer 3,327 4,732 3,703 6
Table 1: Dataset statistics.

Comparison methods. We compare PPGNN with the GraphSAGE models [14] that are trained using isolated data and mixed plaintext data to answer Q1 and Q2. We also compare PPGNN with the naive solution described in Section 1 to answer Q3. To answer Q4, we vary the proportion of data (both features and edges) hold by and , and sutdy the performances of PPGNN with different combination strategies. We vary the number of data holders in PPGNN to answer Q5. For all these models, we choose Mean operator as the aggregator function.

Parameter settings. For all the models, we use TanH as the active function of neighbor propagation, and Sigmoid as the active function of hidden layers. For the deep neural network on server, we set the dropout rate to 0.5 and network structure as (, , ), where is the dimension of node embeddings and the nubmer of classes. Since we have too many comparison and ablation models, and they achieve the best performance with different parameters, we cannot report all the best parameters. Instead, we report the range of the best parameters. We vary the propagation depth , L2 regularization in , and learning rate in . We tune parameters based on the validate dataset and evaluate model performance on the test dataset during five-fold cross validation.

Dataset   Cora   Pubmed   Citeseer
GraphSAGE 0.6110 0.6720 0.5410
GraphSAGE 0.6060 0.7030 0.4570
PPGNN_C 0.7900 0.7740 0.6850
PPGNN_M 0.8090 0.7810 0.6950
PPGNN_R 0.8020 0.7820 0.6930
GraphSAGE 0.8150 0.7910 0.7001
Table 2: Performance comparison on three datasets in terms of accuracy.

4.2 Comparison Results and Analysis

To answer Q1-Q3, we assume there are two data holders ( and ) who have equal number of node features and edges, i.e., the proportion of data held by and is 5:5, and compare our models with GraphSAGEs that are trained on isolated data individually and on mixed plaintext data. We summarize the results in Table 2, where PPGNN_C, PPGNN_M, and PPGNN_R denote PPGNN with Concat, Mean, and Regression combination strategies.

Result1: answer to Q1. We first compare PPGNNs with the GraphSAGEs that are trained on isolated feature and edge data, i.e., GraphSAGE and GraphSAGE. From Table 2, we find that, PPGNNs with different combination strategies can significantly improve the performance of GraphSAGE and GraphSAGE on all the three datasets. Take Citeseer for example, PPGNN_R improves the accuracy of GraphSAGE and GraphSAGE by as high as 28.10% and 51.64%.

Analysis of Result1. The reason of result1 is straightforward. GraphSAGE and GraphSAGE can only use partial feature and edge information hold by and , respectively. In contrast, PPGNNs provide a solution for and to train GNNs collaboratively without compromising their own data. By doing this, PPGNNs can use the information from the data of both and simultaneously, and therefore achieve better performance.

Result2: answer to Q2. We then compare PPGNNs with GraphSAGE that is trained on the mixed plaintext data, i.e., GraphSAGE. It can be seen from Table 2 that PPGNNs have comparable performance with GraphSAGE, e.g., 0.8090 vs. 0.8150 on Cora dataset and 0.6950 vs. 0.7001 on Citeseer dataset.

Analysis of Result2. It is not hard to explain why our proposal has comparable performance with the model that are trained on the mixed plaintext data. First, we propose a secret sharing based protocol for and securely generate the initial node embeddings from their node features, as described in Algorithm 2, which are the same as those generated by using mixed plaintext features. Second, although and generate local node embeddings by using their own edge data to do neighbor aggregation separately (for security concern), we propose different combination strategies to combine their local node embeddings. Eventually, the edge information from both and are used for training the classification model. Therefore, PPGNNs have comparable performance with GraphSAGE.

Result3: answer to Q3. In PPGNN, we delegate the non-private data related computations to server. One would be curious about what if these computations are also performed by data holders using existing secure neural network protocols, i.e., SecureML [23]

. To answer this question, we compare PPGNN with the secure GNN model that is implemented using SecureML, which we call as SecureML-GNN, where we use 3-degree Taylor expansion to approximate TanH and Sigmoid. The accuracy and running time per epoch (in seconds) of PPGNN vs. SecureML-GNN on Pubmed are 0.8090 vs. 0.7970 and 18.65 vs. 166.81, respectively, where we use local area network.

Analysis of Result3. From the above comparison results, we find that our proposed PPGNN learning paradigm not only achieves better accuracy, but also has much better efficiency. This is because the non-private data related computations involve non-linear functions that are not cryptographically friendly, which are approximately calculated using time-consuming MPC techniques in SecureML.

4.3 Ablation Study

We now study the effects of different combination operators and different number of data holders on PPGNN.

Model PPGNN_C PPGNN_M PPGNN_R
Prop.=9:1 0.8085 0.8050 0.8090
Prop.=8:2 0.8015 0.7960 0.8070
Prop.=7:3 0.7925 0.7925 0.8030
Table 3: Comparison of combination operators on Cora by varying the proportion of data hold by and .

Result4: answer to Q4. Different combination operators can utilize local node embeddings in diverse ways and make our proposed PPGNN adaptable to different scenarios, we study this by varying the proportion (Prop.) of data (node features and edges) hold by and in . The results on Cora dataset are shown in Table 3.

Analysis of Result4. From Table 3, we find that with the proportion of data hold by and being even, i.e., from to , the performances of most strategies tend to decrease. This is because the neighbor aggregation is done by data holders individually, and with a bigger proportion of data hold by a single holder, it is more easier for this party to generate better local node embeddings. Moreover, we also find that Mean operator works well when data are evenly split, and Regression operator is good at handling the situations where data holders have different quatity of data, since it treats the local node embeddings from each data holder differently, and assigns weights to them intelligently.

No. of DH PPGNN_C PPGNN_M PPGNN_R
2 0.7900 0.8090 0.8020
3 0.7490 0.7740 0.7600
4 0.7120 0.7330 0.7220
Table 4: Comparison results on Cora by varying the number of DHs.

Result5: answer to Q5. We vary the number of Data Holders (DHs) in {2, 3, 4} and study the performance of PPGNN. We report the results in Table 4, where we use the Cora dataset and assume data holders have even feature and edge data.

Analysis of Result5. From Table 4, we find that, as the number of data holders increases, the accuracy of all the models decreases. This is because the neighborhood aggregation in PPGNN is done by each holder individually for privacy concern, and each data holder will have less edge data when there are more data holders, since they split the original edge information evenly. Therefore, when more participants are involved, more information are lost during the neighborhood aggregation procedure.

5 Related Work

We review three kinds of existing Privacy Preserving Neural Network (PPNN) models.

PPNN based on cryptographic methods. These methods mainly use cryptographic techniques, e.g., secret sharing and homomorphic encryption, to build approximated neural networks models [23, 32], since the nonlinear active functions are not cryptographically computable. Cryptograph based neural network models are difficult to scale to deep networks and large datasets due to its high communication and computation complexities. In this paper, we use cryptographic techniques for data holders to calculate the initial node embeddings securely.

PPNN based on differential privacy. These methods adopt differential privacy when training neural networks [29, 1]. The most common way is following parameter server learning paradigm, that is data holders calculate model gradients, add noise in it, and send it to server for updating the global model. These methods are efficient but fail to achieve promising performance when adding too much noise. Moreover, differential privacy is difficult to be applied into vertically partitioned data setting.

PPNN based on split computation graph. These methods split the computation graph of neural networks into two parts, and let data holders calculate the private data related computations individually and get a hidden layer, and then let a server makes the rest computations [31, 8, 24, 13]. Our model differs from them in mainly two aspects. First, we train a GNN rather than a neural network. Seconds, we use cryptographic techniques for data holders to calculate the initial node embeddings collaboratively rather than compute them based on their plaintext data individually.

6 Conclusion and Future Work

We proposed a privacy-preserving GNN learning paradigm for node classification task. We did this by splitting the computation graph of GNN. We left the private data related computations on data holders and delegated the rest computations to a server. Experiments on real world datasets demonstrate that our model significantly outperforms the GNNs by using the isolated data and has comparable performance with the traditional GNN by using the mixed plaintext data insecurely.

In future, we would like to solve our problem using mini-batch gradient descent. We are also interested in studying how to prevent attacks from server, e.g., membership attack.

References

  • [1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §5.
  • [2] A. Acar, H. Aksu, A. S. Uluagac, and M. Conti (2018) A survey on homomorphic encryption schemes: theory and implementation. ACM Computing Surveys (CSUR) 51 (4), pp. 79. Cited by: §3.2.
  • [3] Y. Aono, T. Hayashi, L. Trieu Phong, and L. Wang (2016) Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 142–144. Cited by: §1.
  • [4] D. Beaver (1991) Efficient multiparty protocols using circuit randomization. In Annual International Cryptology Conference, pp. 420–432. Cited by: §2.3.
  • [5] C. Chen, L. Li, W. Fang, J. Zhou, L. Wang, L. Wang, S. Yang, A. Liu, and H. Wang (2020) Secret sharing based secure regressions with applications. arXiv preprint arXiv:2004.04898. Cited by: §1.
  • [6] C. Chen, L. Li, B. Wu, C. Hong, L. Wang, and J. Zhou (2020) Secure social recommendation based on secret sharing. arXiv preprint arXiv:2002.02088. Cited by: §1, §2.3.
  • [7] C. Chen, Z. Liu, P. Zhao, J. Zhou, and X. Li (2018) Privacy preserving point-of-interest recommendation using decentralized matrix factorization. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §1.
  • [8] J. Chi, E. Owusu, X. Yin, T. Yu, W. Chan, P. Tague, and Y. Tian (2018) Privacy partitioning: protecting user data during the deep learning inference phase. arXiv preprint arXiv:1812.02863. Cited by: §5.
  • [9] M. d. Cock, R. Dowsley, A. C. Nascimento, and S. C. Newman (2015) Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp. 3–14. Cited by: §2.3.
  • [10] V. Costan and S. Devadas (2016) Intel sgx explained.. IACR Cryptology ePrint Archive 2016 (086), pp. 1–118. Cited by: §1.
  • [11] D. Demmler, T. Schneider, and M. Zohner (2015) ABY-a framework for efficient mixed-protocol secure two-party computation. In NDSS, Cited by: §1.
  • [12] A. Gascón, P. Schoppmann, B. Balle, M. Raykova, J. Doerner, S. Zahur, and D. Evans (2017)

    Privacy-preserving distributed linear regression on high-dimensional data

    .
    Proceedings on Privacy Enhancing Technologies 2017 (4), pp. 345–364. Cited by: §1.
  • [13] Z. Gu, H. Huang, J. Zhang, D. Su, A. Lamba, D. Pendarakis, and I. Molloy (2018) Securing input data of deep learning inference systems via partitioned enclave execution. arXiv preprint arXiv:1807.00969. Cited by: §1, §5.
  • [14] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pp. 1024–1034. Cited by: §2.2, §3.3, §3.5, §4.1.
  • [15] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne (2017) Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677. Cited by: §2.1, §3.1.
  • [16] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.2, §3.1, §4.1.
  • [17] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §2.2, §3.5.
  • [18] Y. Lindell (2005) Secure multiparty computation for privacy preserving data mining. In Encyclopedia of Data Warehousing and Mining, pp. 1005–1009. Cited by: §1.
  • [19] Y. Lindell (2017) How to simulate it–a tutorial on the simulation proof technique. In Tutorials on the Foundations of Cryptography, pp. 277–346. Cited by: §3.2.
  • [20] Y. Liu, C. Chen, L. Zheng, L. Wang, J. Zhou, and G. Liu (2020) Privacy preserving pca for multiparty modeling. arXiv preprint arXiv:2002.02091. Cited by: §2.3.
  • [21] Z. Liu, C. Chen, L. Li, J. Zhou, X. Li, L. Song, and Y. Qi (2019) Geniepath: graph neural networks with adaptive receptive paths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4424–4431. Cited by: §2.2.
  • [22] Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song (2018) Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 2077–2085. Cited by: §1.
  • [23] P. Mohassel and Y. Zhang (2017) SecureML: a system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 19–38. Cited by: §1, §2.1, §2.3, §3.1, §3.2, §4.2, §5.
  • [24] S. A. Osia, A. S. Shamsabadi, A. Taheri, K. Katevas, S. Sajadmanesh, H. R. Rabiee, N. D. Lane, and H. Haddadi (2019) A hybrid deep learning architecture for privacy-preserving mobile analytics. arXiv preprint arXiv:1703.02952. Cited by: §1, §1, §5.
  • [25] B. Pinkas, T. Schneider, and M. Zohner (2014) Faster private set intersection based on ot extension. In USENIX Security Symposium, pp. 797–812. Cited by: §3.1.
  • [26] A. Rahimi, T. Cohn, and T. Baldwin (2018) Semi-supervised user geolocation via graph convolutional networks. arXiv preprint arXiv:1804.08049. Cited by: §1.
  • [27] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §4.1.
  • [28] A. Shamir (1979) How to share a secret. Communications of the ACM 22 (11), pp. 612–613. Cited by: §2.3.
  • [29] R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on Computer and Communications Security, pp. 1310–1321. Cited by: §5.
  • [30] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §1, §2.2, §3.5.
  • [31] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar (2018) Split learning for health: distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564. Cited by: §1, §5.
  • [32] S. Wagh, D. Gupta, and N. Chandran (2019) SecureNN: 3-party secure computation for neural network training. Proceedings on Privacy Enhancing Technologies 1, pp. 24. Cited by: §1, §5.
  • [33] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon (2018) Dynamic graph cnn for learning on point clouds. arXiv preprint arXiv:1801.07829. Cited by: §1.
  • [34] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu (2019) A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596. Cited by: §1, §2.2.
  • [35] Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 12. Cited by: §1, §1.
  • [36] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018)

    Graph convolutional neural networks for web-scale recommender systems

    .
    In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 974–983. Cited by: §1, §2.2.
  • [37] M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, pp. 5165–5175. Cited by: §2.2.
  • [38] L. Zheng, C. Chen, Y. Liu, B. Wu, X. Wu, L. Wang, L. Wang, J. Zhou, and S. Yang (2020) Industrial scale privacy preserving deep neural network. arXiv preprint arXiv:2003.05198. Cited by: §1.
  • [39] J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434. Cited by: §2.2.