1 Introduction
Nowadays, open data of networks play a pivotal role in data mining and data analytics tang2008arnetminer ; sen2008collective ; blum2013learning ; snapnets . By releasing and sharing structured relational data with research facilities and enterprise partners, data companies are harvesting the enormous potential value from their data, which benefits decision making on various aspects including social, financial, environmental, through collectively improved ads, recommendation, retention and so on yang2017bridging ; yang2018know ; sigurbjornsson2008flickr ; kuhn2009compensation . However, network data usually encode sensitive information not only about individuals but also their interactions, which makes direct release and exploitation rather unsafe. More importantly, even with careful anonymization, individual privacy is still at stake under collective attack models facilitated by the underlying network structure zhang2019enabling ; cai2018collective . Can we find a way to securely release network data without drastic sanitization that essentially renders the released data useless?
To deal with such tension between the need to release utilizable data and the concern of data owners’ privacy, quite a few models have been proposed recently, focusing on gridbased data like images, texts and gene sequences frigerio2019differentially ; papernot2018scalable ; triastcyn2018generating ; narayanan2008robust ; mohammed2011differentially ; xie2018differentially ; chen2018differentially ; digvijay2018private ; balog2018differentially ; lecuyer2018certified ; zhang2018differentially . However, none of the existing models can be directly applied to the network (graph) setting. While a secure generative model on gridbased data apparently aims to preserve highlevel semantics (e.g., class distributions) and protect detailed training data (e.g., exact images or sentences), it remains obtuse what to be preserved and what to be protected for network data, due to its modeling of complex interactive objects.
Motivating scenario.
In Figure 1, a bank aims to encourage public studies on the community structures of its customers. It does so by firstly anonymizing all users in the network and then sharing the anonymized network (i.e., network (a) in Figure 1) to the public. However, an attacker interested in knowing the financial interactions (e.g., money transfer) between particular customers can easily get access to another public social network and locate a group of users that likely overlap with the customers in network (a) (e.g., by leveraging public user attributes). Simple graph properties like node degree distribution and triangle count can then be used to identify specific users with high accuracy (e.g., customer as the only node with degree 5 and within 1 triangle, and customer as the only node with degree 2 and within 1 triangle). Thus, the attacker confidently knows the identities of A and B and the fact that they have financial interactions, which seriously harms customers’ privacy and poses potential crises.
In this work, we formulate the goal of secure network release as preserving global network structure while protecting individual link privacy
. Continue with the toy example, the solution we propose is to train a graph neural network model on the original network and release the generated networks (
e.g., (b) in Figure 1). Towards the utility of generated networks, we require them to be similar to the original networks from a global perspective, which can be measured by various graph global properties (e.g., network (b) has very similar degree distribution and the same triangle count as (a)). In this way, we expect many downstream data mining and analytical tasks on them to produce similar results as on the original networks. As for privacy protection, we require that the information in the generated networks cannot confidently reveal the existence or absence of any individual links in the original networks (e.g., the attacker may still identify customers A and B in network (b), but their individual link structure has changed).However, there are two unique challenges in learning such structurepreserved and privacyprotected graph generation models, which have not been explored by existing literature so far.
Challenge 1: Rigorous protection of individual link privacy.
The rich relational structures in graph data often allow attackers to recover private information through various ways of collective inference zhang2014privacy ; narayanan2009anonymizing ; backstrom2007wherefore . Moreover, graph structure can always be converted to numerical features such as spectral embedding, after which most attacks on gridbased data like model inversion fredrikson2015model and membership inference shokri2017membership can be directly applied for link identification. How can we design an effective mechanism with rigorous privacy protection on links in networks against various attacks?
Challenge 2: Effective preservation of global network structure.
In order to capture global network structure, the model has to constantly compare the structures of the input and currently generated graphs during training. However, unlike images and other gridbased data, graphs have flexible structures, and thus lack efficient universal representations dong2019network . How can we allow a network generation model to effectively learn from the structural difference between two graphs, without conducting very timecostly operations like isomorphism tests all the time?
Present work.
In this work, for the first time, we draw attention to the secure release of network data with deep generative models. Technically, towards the aforementioned two challenges, we develop Differentially Private Graph Generative Nets (DPGGen), which imposes DP training over a link prediction based network generation model for rigorous individual link privacy protection, and further ensures structureoriented graph comparison for effective global network structure preservation. In particular, we first formulate and enforce edgeDP via Gaussian gradient distortion by injecting designed noise into the sensitive modules during model training. Then we leverage graph convolutional networks kipf2016semi through a variational generative adversarial network architecture gu2018dialogwae ; larsen2016autoencoding to enable structureoriented network generation.
To evaluate the effectiveness of DPGGen, we conduct extensive experiments on two realworld network datasets. On one hand, we evaluate the utility of generated networks by computing a suite of commonly concerned graph properties to compare the global structure of generated networks with the original ones. On the other hand, we validate the privacy of individual links by evaluating links predicted from the generated networks on the original networks. Consistent experimental results show that DPGGen is able to effectively generate networks that are similar to the original ones regarding global network structure, while at the same time useless towards individual link prediction.
2 Related Work
Differential Privacy (DP).
With graph structured data, two types of privacy constraint can be applied, i.e., nodeDP Kasiviswanathan13NodeDP and edgeDP Blocki12EdgeDP , which define two neighboring graphs to differ by at most one node or edge. In this work, we aim at the secure release of network data, and particularly, we focus on edge privacy, because it is essential for the protection of object interactions unique for network data in comparison with other types of data. Several existing works have studied the protection of edgeDP. For example, Sala11ShareGraphDP generates graphs based on the statistical representations extracted from the original graphs blurred by designed noise, whereas Wang13DPDegreeGraphGeneration enforces the parameters of dKgraph models to be private. However, based on shallow graph generation models, they do not flexibly capture global network structure that can support various unknown downstream analytical tasks zhang2019enabling ; wasserman2010statistical .
Recent advances in deep learning has led to the rapid development of DPoriented learning schemes. For example,
Abadi:2016:DLD:2976749.2978318refines the analysis of privacy costs, which provides tighter estimation on the overall privacy loss by tracking detailed information of the stochastic gradient descent process. DP learning has also been widely adapted to generative models
frigerio2019differentially ; papernot2018scalable ; triastcyn2018generating ; narayanan2008robust ; mohammed2011differentially ; xie2018differentially ; chen2018differentially ; digvijay2018private ; balog2018differentially ; lecuyer2018certified ; zhang2018differentially . For example, frigerio2019differentially ; chen2018differentially ; digvijay2018private ; zhang2018differentially share the same spirit by enforcing DP on the discriminators, and thus inductively on the generators, in a generative adversarial network (GAN) scheme. However, none of them can be directly applied to graph data due to the lack of consideration on structure generation.Graph Generation (GGen).
GGen has been studied for decades and widely used to synthesize network data used towards the development of various collective analysis and mining models evans2009line ; hallac2017network . Earlier works mainly use probabilistic models to generate graphs with certain properties erds1960evolution ; watts1998collective ; barabasi1999emergence ; newman2001clustering , which are manually designed based on sheer observations and prior assumptions.
Thanks to the surge of deep learning, many advanced GGen models have been developed recently, which leverage different powerful neural networks in a learntogenerate manner kipf2016variational ; bojchevski2018netgan ; you2018graphrnn ; simonovsky2018graphvae ; li2018learning ; you2018graph ; jin2018junction ; grover2018graphite ; de2018molgan ; zou2018encoding ; ma2018constrained . For example, NetGAN bojchevski2018netgan converts graphs into biased random walks, learns the generation of walks with GAN, and assembles the generated walks into graphs; GraphRNN you2018graphrnn
regards the generation of graphs as nodeandedge addition sequences, and models it with a heuristic breadthfirstsearch scheme and hierarchical RNN. These neural network based models can often generate graphs with much richer properties and flexible structures learned from realworld graphs.
To the best of our knowledge, no existing work on deep GGen has looked into the potential privacy threats laid during the learning and releasing of the powerful models. In fact, such concerns are rather urgent in the network setting, where sensitive information can often be more easily compromised in a collective manner dai2018adversarial ; backstrom2007wherefore ; zhang2014privacy and privacy leakage can easily further propagate narayanan2009anonymizing ; zugner2018adversarial .
3 DPGGen
In this work, we propose DPGGen for the secure release of generated networks, whose global graph structures are similar to the original sensitive networks, but the individual links (edges) between objects (nodes) are safely protected.
To provide robust privacy guarantees towards various graph attacks, we propose to leverage the wellstudied technique of differential privacy (DP) dwork2014algorithmic by enforcing the edgeDP defined as follows.
Definition 1 (Edge Differential Privacy Blocki12EdgeDP )
A randomized mechanism satisfies edgeDP if for any two neighboring graphs , which differ by at most one edge, , where .
Our key insight is, a graph generation model satisfying the above edgeDP should learn to generate similar graphs if trained with two neighboring graphs that differ by at most one edge; as a consequence, information in the generated graph does not confidently reveal the existence or absence of any one particular edge in the original graph, thus protecting individual link privacy.
To ensure DP on individual links, we exploit the existing link reconstruction based graph generation model GraphVAE kipf2016variational , and design a training algorithm to dynamically distort the gradients of its sensitive model parameters by injecting proper amounts of Gaussian noise based on the framework of DPSGD Abadi:2016:DLD:2976749.2978318 . Moreover, to improve the capturing of global graph structures, we replace the direct BCE loss on graph adjacency matrices in GraphVAE with a structureoriented graph discriminator based on GCN kipf2016semi and the framework of VAEGAN gu2018dialogwae .
Backbone GraphVAE.
Recent research on graph models has been largely focused around GCN kipf2016semi , which is shown to be promising in calculating universal graph representations maron2019invariant ; xu2018powerful ; chen2019equivalence ; keriven2019universal
. In this work, we harness the power of GCN under the consideration of edgeDP by adapting the link reconstruction based graph variational autoencoder (GraphVAE)
kipf2016variational as our backbone graph generation model.Particularly, we are given a graph , where is the set of nodes, and is the set of edges, which can be further modeled by a binary adjacency matrix . As a common practice hamilton2017inductive , we set the node features
simply as the onehot node identity matrix. The autoencoder architecture of GraphVAE consists of a GCNbased graph encoder to guide the learning of a fully connected feedforward neural network (FNN) based adjacency matrix decoder, which can be trained to directly reconstruct graphs with similar links as in the input graphs. A stochastic latent variable
is further introduced as the latent representation of as(1) 
where
is the matrix of mean vectors
, andis the matrix of standard deviation vectors
. is a twolayer GCN model. and share the firstlayer parameters . is the symmetrically normalized adjacency matrix of , with degree matrix . and form the encoder network.To generate a graph , a reconstructed adjacency matrix is computed from by a decoder network as
(2) 
where , is a twolayer FNN appended to
before the logistic sigmoid function. It aims to generate individual links to be compared with those in the input graph.
The whole model is trained through standard variational inference by optimizing the following variational lower bound
(3)  
where is implemented as the sum of an elementwise binary cross entropy (BCE) loss between the adjacency matrices of the input and generated graphs, and
is a prior loss based on the KullbackLeibler divergence towards the Gaussian prior
.Enforcing DP.
The probabilistic nature of allows the model to be generative, meaning that after training the model with an input graph , we can detach and disregard the encoder, and then freely generate an unlimited amount of graphs with similar links to , by solely drawing random samples of from the prior distribution and computing with the learned decoder network w.r.t. Eq. 2. However, as shown in kurakin2016adversarial ; gondim2018adversarial , powerful neural network models like VAE can easily overfit training data, so directly releasing a trained GraphVAE model poses potential privacy threats, as links in its generated graphs may be highly indicative towards links in the training graphs.
In this work, we care about rigorously protecting the privacy of individual links in the training data, i.e., ensuring edgeDP. Particularly, in Definition 1, the inequality guarantees that the distinguishability of any one edge in the graph will be restricted to the privacy leak level proportional to ;
relaxes the outlier nodes existing in the graph. The two parameters together quantify the absolute value of privacy information possibly to be leaked by the mechanism
, i.e., a graph generation model.According to Eq. 2, GraphVAE essentially takes a graph as input and generates a new graph with the same size as output by reconstructing links among the same set of nodes . Therefore, if we regard GraphVAE as the mechanism , as long as its model parameters are properly randomized, the framework satisfies edgeDP. Particularly, any two input graphs and differing by at most one edge in principle lead to similar generated graphs , so information in does not confidently reveal the existence or absence of any particular link in or . To exploit the wellstructured graph generation framework of GraphVAE, we leverage the technique of Gaussian mechanism to enforce edgeDP on it.
Theorem 1 (Gaussian Mechanism dwork2014algorithmic )
If the norm sensitivity of a deterministic function is , we have:
(4) 
where
is a random variable obeying the Gaussian distribution with mean 0 and standard deviation
. The randomized mechanism is differentially private if and .In our setting, is the original training graph. Then Eq. 4 tells us that a link reconstruction based graph generation model can be randomized to ensure edgeDP with properly parameterized Gaussian noise. Therefore, we leverage Theorem 1 by perturbing the gradient optimization of GraphVAE. In particular, we follow frigerio2019differentially to inject a designed Gaussian noise to the gradients of our decoder network clipped by a hyperparameter as follows
(5) 
where is the original gradient of decoder network on node , is the clipping hyperparameter required to bound the influence of each individual node, and is the noise scale hyperparameter. The idea behind this method is called DPSGD Abadi:2016:DLD:2976749.2978318 . According to Theorem 1 and the analysis in Abadi:2016:DLD:2976749.2978318 ; frigerio2019differentially , a model trained with such distorted gradients is guaranteed to be DP.
Since GraphVAE is trained in iterations, to guarantee
DP in the whole training process, we leverage the moments accountant mechanism proposed in
Abadi:2016:DLD:2976749.2978318 . Particularly, according to the composability property of moments accountant, we can accurately bound the total privacy loss of GraphVAE by setting the degree of perturbation (noise scale) at each training iteration as(6) 
where is the number of training iterations, and the sampling ratio. We term this model DPGVae.
In the generation stage, we can disregard the encoder and only use the decoder to generate an unlimited amount of graphs from randomly sampled vectors from the prior distribution . Since the normal Gaussian distribution is privacy irrelevant, it can be regarded as DP. By the composability property of DP dwork2014algorithmic , graphs generated by DPGVae then satisfy DP. In particular, according to Eq. 2, since the decoder network of GraphVAE is essentially generating links, the system is edgeDP, the release of which in principle does not disclose sensitive information regarding individual links in the original sensitive networks.
Note that, although the encoder network also directly touches sensitive data, according to Eq. 1, the gradients are already mixed with randomness of samples from the Gaussian prior before reaching the decoder network, so we do not need to add noise to it. Through this design, we can improve training of the decoder network with limited privacy gradient budget, with minimum interruptions to the encoder network, while guaranteeing the whole generation process to be edgeDP.
Improving structure learning.
Besides individual link privacy, we also aim to preserve global network structure so as to ensure the utility of released data. As we discuss before, original GraphVAE computes the reconstruction loss between input and generated graphs based on the elementwise BCE between their adjacency matrices. Such a computation is specified on each individual link, rather than the structure of the graph as a whole. To improve the learning of global graph structure, we leverage GCN again, which has been shown universally powerful in capturing graphlevel structures maron2019invariant ; xu2018powerful ; chen2019equivalence ; keriven2019universal . In particular, we borrow the framework of VAEGAN from recent research gu2018dialogwae ; larsen2016autoencoding ; yang2019conditional , and compute a structureoriented generative adversarial network (GAN) loss as
(7) 
where and are GCN and FNN networks similarly as defined before, besides in the end of the nodelevel representations are elementwise summed up as the graphlevel representation, which resembles the recently proposed GIN model for graphlevel representation learning xu2018powerful . In the VAEGAN framework, the decoder also serves as the generator, while is the discriminator. The intuition behind this novel technique is that, the GCN encodings and capture the graph structures of and , so a reconstruction loss captures the intrinsic structural difference between and instead of the simple sum of the differences over their individual links. Note that, the effectiveness of our structureoriented discriminator is critical not only because it can directly enforce better structure learning of the linkbased generator through the minimax game in Eq. 7, but also because it can learn to relax the penalty on certain individual links, through flexible and diverse configurations of the whole graph as long as the global structures remain similar, which exactly fulfills our goals of secure network release. The benefits of such diversity enabled by the VAEGAN have also been discussed in the cases like image generation gu2018dialogwae ; larsen2016autoencoding .
Following gu2018dialogwae , the encoder is trained w.r.t. , the decoder (generator) w.r.t. , and the discriminator w.r.t. , where and are loss weighing hyperparameters. To enforce DP constraints and complete our proposed DPGGen framework, Eq. 5 is applied to distort the gradients of the discriminator and guarantee the generator to be edgeDP, which can be used to securely generate networks with the other parts disregarded after training. The overall framework of DPGGen is shown in Figure 2 above and more training details are put in the Appendix.
4 Experimental Evaluations
We conduct two sets of experiments to evaluate the effectiveness of DPGGen in preserving global network structure and protecting individual link privacy. All code and data will be made public upon the acceptance of this work.
Experimental settings.
To provide sidetoside comparison between the original networks and generated networks, we use two standard datasets of realworld networks, i.e., DBLP and IMDB. DBLP includes 72 networks of author nodes and coauthor links, where the average numbers of nodes and links are 177.2 and 258; IMDB includes 1500 networks of actor/actress nodes and costar links, with average node and link numbers 13 and 65.9. The DBLP networks are constructed based on research communities, whereas the IMDB networks based on movie genres.
To show that DPGGen effectively captures global network structure, we compare it against DPGVae under different privacy budgets (controlled by in Eq. 6), as well as the original GraphVAE model kipf2016variational , regarding a suite of graph statistics commonly used to evaluate the performance of graph generation models, especially from a global perspective bojchevski2018netgan ; you2018graphrnn ; yang2019conditional .^{1}^{1}1Statistics we use include LCC (size of largest connected component), TC (triangle count), CPL (characteristic path length), GINI (gini index), and REDE (relative edge distribution entropy). Specifically, we train all compared models from scratch to convergence for times, where is the number of networks in the datasets. Each time, the trained models are used to generate one network, which is to be compared with the original network regarding the suite of graph statistics. Then we average the absolute differences between the generated networks and the original networks, which ensures that the positive and negative differences do not cancel out.
To facilitate better understanding towards how the graph statistics reflect the global network structure captured by the models, we also provide results of two recent stateoftheart network generation methods, i.e., NetGAN bojchevski2018netgan and GraphRNN you2018graphrnn , with default parameter settings and no DP constraints at all. In this experiment, we expect to see the more effective structurepreserving models generate networks that are more similar to the original ones regarding various graph statistics, thus maintaining high network data utility. Besides similarity on graph statistics, we further evaluate the utilities of generated graphs against the original ones on the downstream task of graph classification, of which the details and results are put in the Appendix due to space limitation.
To show that DPGGen effectively guarantees individual link privacy, we train all models for another times on each dataset. Differently from the previous setting where the complete networks are used, we randomly sample 80% of the links from the original networks to train the models. After generating the full networks from the trained models, we use degree distribution to align the nodes in the generated networks with those in the original networks. Then we evaluate the standard AUC metric on the task of individual link prediction^{2}^{2}2https://github.com/graphstarteam/graph_star by comparing links predicted in the generated networks and links hidden during training in the original networks. In this experiment, we expect to see the more effective privacyprotecting models generate networks that are less useful when used to predict individual links in the original networks, thus rigorously guaranteeing network data privacy.
For GraphVAE and our models, we use twolayer GCNs with sizes for both and of the encoder network, where the first layer is shared, and we use twolayer FNNs with sizes for of the decoder (generator) network. For DPGGen, we use another twolayer GCN with the same sizes for and a threelayer FNN with sizes for . For DPrelated hyperparameters, we follow existing works dwork2014algorithmic ; Abadi:2016:DLD:2976749.2978318 ; ShokriS15PPDL to fix to , noise scale to 5, and sampling ratio to 0.01 (which determines the batch size as with as the graph size). Then we vary from 0.1 to 10 to see how much graphlevel utilities are preserved under different privacy budgets. According to Eq. 6, we terminate the training of DPGGenat iterations when is depleted. Other than the essential parameters in Eq. 6, we empirically set the clipping parameter to , decay ratio to , learning rate to , and the loss weighing parameters and both to 0.1. We do not observe the model to be very sensitive to the setting of these nonessential parameters.
All experiments are done on a server with four GeForce GTX 1080 GPUs and a 12core 2.2GHz CPU. The training time of DPenforced models is often slightly shorter due to early stops when the privacy budget runs out, (e.g., a typical train of GraphVAE, DPGVae, and DPGGen takes 60, 42 and 53 seconds on average on DBLP, respectively). After training, the generation times of the three models are roughly the same (e.g., 0.02 second on average on DBLP). As a direct comparison, the stateoftheart deep network generation models of NetGAN and GraphRNN take longer times under the same settings especially for generation (e.g., 89 and 4.5 seconds for NetGAN to train and generate on DBLP, and 75 and 2.4 seconds for GraphRNN). Note that, although efficiency is not our major concern in this work, short runtimes (especially for generation) are favorable for efficient data share.
DBLP Networks  IMDB Networks  
Models  LCC  TC  CPL  GINI  REDE  LCC  TC  CPL  GINI  REDE 
Original  107.5  59.90  3.6943  0.3248  0.9385  13.001  305.9  1.2275  0.1222  0.9894 
GraphVAE(no DP)  7.51  66.93  0.1330  0.0213  0.0084  0.0145  25.83  0.0121  0.0030  0.0016 
NetGAN(no DP)  9.66  39.87  0.1943  0.0105  0.0022  0.0083  27.54  0.0192  0.0042  0.0011 
GraphRNN(no DP)  10.27  57.43  0.2043  0.0415  0.0052  0.0594  27.26  0.0214  0.0155  0.0094 
DPGVae(=10)  21.96  175.29  0.2471  0.0339  0.0153  0.0147  43.63  0.0367  0.0036  0.0030 
DPGVae(=1)  23.80  187.20  0.3059  0.0343  0.0156  0.0253  43.73  0.0373  0.0038  0.0031 
DPGVae(=0.1)  26.07  215.13  0.3342  0.0344  0.0158  0.0320  44.12  0.0392  0.0042  0.0032 
DPGGen(=10)  10.61  64.75  0.2035  0.0224  0.0093  0.0040  22.89  0.0164  0.0010  0.0017 
DPGGen(=1)  12.38  70.97  0.2643  0.0353  0.0117  0.0053  23.81  0.0168  0.0029  0.0023 
DPGGen(=0.1)  24.62  77.41  0.2713  0.0485  0.0191  0.0113  24.91  0.0168  0.0029  0.0025 
Performances.
In Table 1, our strictly DPconstrained models constantly yield highly competitive and even better results compared with the strongest DPfree baselines regarding the global network structural similarity between generated and original networks on both datasets. As we gradually increase the privacy budget , our two models (especially DPGGen) apparently perform better. The performance gaps are more significant in the poorer conditions, i.e., on DBLP and regarding the more community sensitive metrics like LCC and REDE. Such results clearly advocate the advantages of DPGGen in capturing global network structure and the effectiveness of our privacy constraints.
Looking deeper into the numbers, we observe that DPGGen constantly achieves significantly better performance over DPGVae
under the same privacy budgets on both datasets (scores all passed ttests with pvalue 0.01), which corroborates our novel model designs. Moreover, the suite of statistics measure global network structure from different perspectives. As can be inferred from TC, CPL and GINI, the IMDB networks are in general smaller, tighter and likely more structurally complex than the DBLP networks, which favors link generation models (
e.g., GraphVAE) over sequence generation models (e.g., NetGAN and GraphRNN), especially regarding the more structure sensitive measures like TC and CPL. Consequently, our DPGVae and DPGGen models also perform better on the IMDB networks, indicating their advantages on modeling complex link structures.As shown in Figure 3, for both datasets, links predicted on the networks generated by DPGGen are much less accurate than those predicted on the original networks (26%35% and 15%20% AUC drops on DBLP and IMDB, respectively) as well as the networks generated by all baselines. This means that even the attackers somehow identify nodes in the generated (released) networks, they cannot leverage the information there to accurately infer the existence or absence of links between particular pairs of nodes on the original networks. This directly corroborates our claim that DPGGen is effective in protecting individual link privacy.
To conduct more detailed inspections, we vary two of the major hyperparameters, i.e., the privacy budget and sampling ratio . Consistently with the results in Table 1, larger privacy budgets lead to more privacy leakage, which allow attackers to infer individual links in the original networks with higher accuracy. While some DPconstrained deep learning models are observed to be sensitive to the sampling ratio during training Abadi:2016:DLD:2976749.2978318 ; ShokriS15PPDL , the privacy protection utility of DPGGen is robust when is changed in large ranges in practice.
5 Conclusion
Due to the recent development of deep graph generation models, synthetic networks are generated and released for granted, without the concern about possible privacy leakage over the original networks used for model training. In this work, for the first time, we pay attention to the task of secure network release and formulate its goals as preserving global network structure while protecting individual link privacy. Subsequently, we adopt the wellstudied DP framework and develop DPGGen, which protects individual link privacy by enforcing edgeDP over the link prediction based graph generation model of GraphVAE while preserving global network structure through adversarial learning with a structureoriented graph discriminator. Comprehensive experiments show that DPGGen is advantageous in generating networks that are globally similar to the original ones (thus effectively maintaining network data utility), and at the same time useless for predicting individual links in the original network (thus rigorously protecting network data privacy).
References
 [1] Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In SIGSAC, 2016.
 [2] Lars Backstrom, Cynthia Dwork, and Jon Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW, 2007.
 [3] AlbertLászló Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
 [4] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The johnsonlindenstrauss transform itself preserves differential privacy. FOCS, 2012.
 [5] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. JACM, 2013.
 [6] Aleksandar Bojchevski, Oleksandr Shchur, Daniel Zügner, and Stephan Günnemann. Netgan: Generating graphs via random walks. In ICML, 2018.
 [7] Digvijay Boob, Rachel Cummings, Dhamma Kimpara, Uthaipon (Tao) Tantipongpipat, Chris Waites, and Kyle Zimmerman. Private synthetic data generation via gans. arXiv preprint arXiv:1803.03148, 2018.
 [8] Z. Cai, Z. He, X. Guan, and Y. Li. Collective datasanitization for preventing sensitive information inference attacks in social networks. TDSC, 2018.
 [9] Qingrong Chen, Chong Xiang, Minhui Xue, Bo Li, Nikita Borisov, Dali Kaarfar, and Haojin Zhu. Differentially private data generative models. arXiv preprint arXiv:1812.02274, 2018.
 [10] Zhengdao Chen, Soledad Villar, Lei Chen, and Joan Bruna. On the equivalence between graph isomorphism testing and function approximation with gnns. In NIPS, 2019.
 [11] Quanyu Dai, Qiang Li, Jian Tang, and Dan Wang. Adversarial network embedding. In AAAI, 2018.
 [12] Nicola De Cao and Thomas Kipf. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
 [13] Kun Dong, Austin R Benson, and David Bindel. Network density of states. KDD, 2019.
 [14] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Theoretical Computer Science, 9(3–4):211–407, 2014.
 [15] Jennifer G. Dy and Andreas Krause, editors. Differentially private database release via kernel mean embeddings, volume 80, 2018.
 [16] Pál Erdős and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hungar. Acad. Sci, 5:17–61, 1960.
 [17] TS Evans and Renaud Lambiotte. Line graphs, link partitions, and overlapping communities. Physical Review E, 80(1):016105, 2009.
 [18] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In SIGSAC, 2015.
 [19] Lorenzo Frigerio, Anderson Santana de Oliveira, Laurent Gomez, and Patrick Duverger. Differentially private generative adversarial networks for time series, continuous, and discrete open data. In IFIP SEC, 2019.
 [20] George GondimRibeiro, Pedro Tabacof, and Eduardo Valle. Adversarial attacks on variational autoencoders. arXiv preprint arXiv:1806.04646, 2018.
 [21] Aditya Grover, Aaron Zweig, and Stefano Ermon. Graphite: Iterative generative modeling of graphs. arXiv preprint arXiv:1803.10459, 2017.
 [22] Xiaodong Gu, Kyunghyun Cho, Jungwoo Ha, and Sunghun Kim. Dialogwae: Multimodal response generation with conditional wasserstein autoencoder. In ICLR, 2019.
 [23] David Hallac, Youngsuk Park, Stephen Boyd, and Jure Leskovec. Network inference via the timevarying graphical lasso. In KDD, 2017.
 [24] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, 2017.
 [25] Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for molecular graph generation. In ICML, 2018.
 [26] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. Analyzing graphs with node differential privacy. In TCC, 2013.
 [27] Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. In NIPS, 2019.
 [28] Thomas N Kipf and Max Welling. Variational graph autoencoders. In NIPS Workshop on Bayesian Deep Learning, 2016.
 [29] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 [30] Kristine M Kuhn. Compensation as a signal of organizational culture: the effects of advertising individual or collective incentives. IJHRM, 20(7):1634–1648, 2009.
 [31] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. ICLR, 2017.
 [32] Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, and Ole Winther. Autoencoding beyond pixels using a learned similarity metric. In ICML, 2016.
 [33] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified robustness to adversarial examples with differential privacy. arXiv preprint arXiv:1802.03471, 2018.
 [34] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
 [35] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. In ICML, 2018.
 [36] Tengfei Ma, Jie Chen, and Cao Xiao. Constrained generation of semantically valid graphs via regularizing variational autoencoders. In NIPS, 2018.
 [37] Haggai Maron, Heli BenHamu, Nadav Sharmir, and Lipman Yaron. Invariant and equivariant graph networks. In ICLR, 2019.
 [38] Noman Mohammed, Rui Chen, Benjamin C. M. Fung, and Philip S. Yu. Differentially private data release for data mining. In KDD, 2011.
 [39] Arvind Narayanan and Vitaly Shmatikov. Robust deanonymization of large datasets (how to break anonymity of the netflix prize dataset). SP, 2008.
 [40] Arvind Narayanan and Vitaly Shmatikov. Deanonymizing social networks. SP, 2009.
 [41] Mark EJ Newman. Clustering and preferential attachment in growing networks. Physical review E, 64(2):025102, 2001.
 [42] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. Scalable private learning with pate. In ICLR, 2018.
 [43] Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing graphs using differentially private graph models. In SIGCOMM, 2011.
 [44] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina EliassiRad. Collective classification in network data. AI mag., 2008.
 [45] Reza Shokri and Vitaly Shmatikov. Privacypreserving deep learning. In SIGSAC, 2015.
 [46] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against machine learning models. In ISP, 2017.
 [47] Börkur Sigurbjörnsson and Roelof Van Zwol. Flickr tag recommendation based on collective knowledge. In WWW, 2008.
 [48] Martin Simonovsky and Nikos Komodakis. Graphvae: Towards generation of small graphs using variational autoencoders. In ICANN, 2018.
 [49] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: extraction and mining of academic social networks. In KDD, 2008.
 [50] Aleksei Triastcyn and Boi Faltings. Generating artificial data for private deep learning. AAAI, 2018.
 [51] Yue Wang and Xintao Wu. Preserving differential privacy in degreecorrelation based graph generation. TDP, 6(2):127–145, 2013.
 [52] Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. JASA, 105(489):375–389, 2010.
 [53] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘smallworld’ networks. nature, 393(6684):440, 1998.
 [54] Liyang Xie, Kaixiang Lin, Shu Wang, Fei Wang, and Jiayu Zhou. Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739, 2018.
 [55] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2019.

[56]
Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han.
Bridging collaborative filtering and semisupervised learning: a neural approach for poi recommendation.
In KDD, 2017.  [57] Carl Yang, Xiaolin Shi, Luo Jie, and Jiawei Han. I know you’ll be back: Interpretable new user clustering and churn prediction on a mobile social application. In KDD, 2018.
 [58] Carl Yang, Peiye Zhuang, Wenhan Shi, Alan Luu, and Pan Li. Conditional structure generation through graph variational generative adversarial nets. In NIPS, 2019.
 [59] Jiaxuan You, Bowen Liu, Zhitao Ying, Vijay Pande, and Jure Leskovec. Graph convolutional policy network for goaldirected molecular graph generation. In NIPS, 2018.
 [60] Jiaxuan You, Rex Ying, Xiang Ren, William L Hamilton, and Jure Leskovec. Graphrnn: Generating realistic graphs with deep autoregressive models. In ICML, 2018.
 [61] Aston Zhang, Xing Xie, Kevin ChenChuan Chang, Carl A Gunter, Jiawei Han, and XiaoFeng Wang. Privacy risk in anonymized heterogeneous information networks. In EDBT, 2014.
 [62] Xinyang Zhang, Shouling Ji, and Ting Wang. Differentially private releasing via deep generative model (technical report). arXiv preprint arXiv:1801.01594, 2018.
 [63] Yanjun Zhang, Xin Zhao, Xue Li, Mingyang Zhong, Caitlin Curtis, and Chen Chen. Enabling privacypreserving sharing of genomic data for gwass in decentralized networks. In WSDM, 2019.
 [64] Dongmian Zou and Gilad Lerman. Encoding robust representation for graph generation. arXiv preprint arXiv:1809.10851, 2018.
 [65] Daniel Zügner, Amir Akbarnejad, and Stephan Günnemann. Adversarial attacks on neural networks for graph data. In KDD, 2018.
Comments
There are no comments yet.