Privacy is important for everyone. Recent exposure of the Facebook user data leakage accelerates people’s privacy awareness and galvanizes the attention of researchers. On the one hand, companies need data to build models for their services which requires features learned through the data. On the other hand, analyzing the data potentially reveals certain private information from the data, which can be further inferred by adversary through attacks like the linkage attack. To tackle the dilemma between privacy and data usage, differential privacy is proposed 
. Differential privacy is a widely adopted framework for quantifying the privacy guarantees of privacy-preserving data analysis algorithms. Differential privacy has been extensively used in machine learning and data analysis for designing privacy-preserving algorithms. However, besides the conventional static data privacy, privacy concern also arises in many other scenarios. For example, people wish to remain anonymous in the online world especially when they try to publish some sensitive information. Private information is often shared by different organizations during collaboration and needs to remain anonymous to others. This kind of privacy concerning identity during the information distribution process is considered as the information source privacy and has been recognized and investigated recently.
The information source privacy is studied in 
using the differential privacy framework for the complete network. The information distribution model used in this work is gossip protocols. Gossip protocols, with their simple and efficient features, are commonly adopted in distributed applications for information exchange. In machine learning, gossip protocols are often combined with stochastic gradient descent to implement the distributed machine/deep learning[3, 4, 5]. Gossip protocols are highly robust and adaptive, therefore they are often adopted in dynamic networks like mobile networks, wireless sensor networks, and unstructured p2p networks [6, 7, 8]. They are also used to model the information spreading in social networks with direct communications . The speed of the gossip-based information spreading process is studied for general networks and multiplex networks in [9, 10]. Therefore, it is crucial to understand the limit of the privacy guarantee of gossip protocols.
The differential privacy of gossip protocols in the complete network is investigated in . However, the complete network assumption is unrealistic. In this work, we generalize the study of quantifying the source privacy of push-based gossip protocols using differential privacy to general networks. There are three major contributions of this work:
A lower bound of the differential privacy guarantees of general gossip protocols is estimated for general networks in both synchronous and asynchronous settings. The prediction uncertainty of the source node given a uniform prior is also calculated.
The differential privacy of standard gossip and private gossip protocols is studied in wireless networks, where communications are assumed to be unreliable. It is found that noise can enhance the differential privacy while it also slows down the spreading process. Through calculating and simulating the spreading times in networks with unreliable communications, the tradeoff between the differential privacy and spreading speed is observed.
Finally, the delayed monitoring, a more realistic monitoring model, is studied for general networks. The effect of the additional uncertainty brought by the delay on the differential privacy is shown.
The remainder of this paper is organized as follows. Section II illustrates the formulation of differential privacy of gossip protocols. Section III presents the system model. All theoretical results and simulation results are presented in section IV. Finally, section V concludes this paper.
Ii Problem Formulation
In this work, we consider the privacy of information source in gossip-based single-piece information spreading. The goal is to measure the capability of gossip protocols in keeping the information source anonymous. Specifically, given a connected network , where is the node set and is the edge set connecting these nodes, a node (source) initially possesses a piece of information and needs to deliver it to all other nodes in the network. Meanwhile, there are some attackers that can observe some activities and try to identify the source based on their observations. Differential privacy is a common framework for the study of privacy measure and protection and defined as follows:
A randomized algorithm with domain is -differentially private if for all and for any two databases such that (i.e., at most one row of difference in data):
Consider a source indicator vectorwith exactly one none zero value if node is the source. Given (a set of source indicator vector databases in which each database is a single column indicator vector with rows being the node’s ID) and the graph as inputs, gossip protocols can be viewed as the randomized algorithm and the output consists of all possible observations by the attacker. To measure the capability of gossip protocols in protecting the source’s identity, differential privacy of the source indicator database can be similarly defined as :
Given a graph , a gossip protocol is -differentially private in if for any possible observation and any two source indicator vectors :
where is the conditional probability of an observation given the graph and the source indicator vector .
In differential privacy, is considered as the privacy loss while is the tolerance level. Specifically, given the privacy loss and tolerance level , Eq. (2) implies that the randomized algorithm guarantees the privacy loss is bounded by with probability at least . In this work, considering the fact that, due to the topological and observation model constraints, there exists events such that (e.g., if is the observed event that node informs a node at time , then ), it is meaningless to study the privacy loss in this situation as the privacy loss is infinite. Thus, certain tolerance is needed and how well the gossip protocols can guarantee a given privacy loss is the main focus of this study.
Iii System Model
Iii-a Gossip Protocol
As a common tool used for distributed applications, classic gossip protocols share the same key communication component that can be performed by any node ( is the informed node set). Each time a node performs the action, it will contact one of its neighboring nodes with a uniform probability of ( is the degree of ). After the action finishes, the informed node set is updated if the initial node has the information and the contacted node has no information. Such protocols are also named as push protocols in literature.111In pull protocols, uninformed nodes are active and try to pull the information from informed nodes. In short, gossip protocols are defined as a class of protocols that (1) update the informed node set through the actions and (2) terminate when all nodes receive the information. Two following specific gossip protocols are considered in this work.
Standard Gossip: All informed nodes remain active during the spreading process.
Private Gossip : Once an active informed node informed a new uninforms node, it turns inactive.
Iii-B Time Model
Both synchronous and asynchronous time models are considered for the gossip-based information spreading process. In the synchronous time model, all nodes share a global discrete time clock. Each time the clock ticks, all nodes perform the action and the informed node set is updated accordingly. In the asynchronous time model, each node has its own internal clock which ticks according to a rate Poisson process. The action and update of informed node set is performed each time the clock of an informed node is ticked.
Iii-C Observation Model
It is assumed that there are a group of sensors deployed by the attacker in the network that can scan and eavesdrop the communications among the nodes. However, due to the limited eavesdropping capability, it is assumed that with a probability of , the sensors can correctly detect and observe the ongoing communication. Specifically, the observed event has the form of in the synchronous setting indicating that the attacker knows exactly when each node performs its gossip action (the attacker, therefore, can infer node’s active state at each time). In the asynchronous setting, however, the attacker does not know the exact time of each observed event, but it knows the relative order of nodes’ activities. The observed event in this case is represented by , where the conditional stands for the latent information not known to the attacker. The details are shown in Fig. 1.
Iii-D Prediction Uncertainty
The definition of differential privacy does not provide guarantees for the source indistinguishability. In order to avoid the “bad” things (i.e., the source is identified by the attacker) from happening, it is further required that some prediction uncertainty is guaranteed for a given differentially private protocol, defined as:
Given a graph , a gossip protocol can guarantee -prediction uncertainty if there is a constant such that for a uniform prior on source nodes and any :
where stands for the informed node set at time and the element of it represents the source node.
Differential privacy and prediction uncertainty are related but not the same. Especially, because of the uniform prior , by the Bayes’ formula. Moreover, given a prediction uncertainty of , it can be shown that .
Iv Main Results
To facilitate our following analysis, we need the following Lemma and definition of decay centrality.
Given any gossip protocol in a graph , let and there are two constants such that and . If the gossip protocol satisfies -differential privacy, then .
By the definition of differential privacy, , therefore, .
Given a network , consider a decay parameter , , the decay centrality of node is defined as
where is the shortest path between node and .
Decay centrality measures the ease of any node reaching other nodes. If the decay centrality of a node is large, it means that it is close to the other nodes in the network and easy to reach them. The difficulty increases as decreases.
Iv-a Differential Privacy of Gossip Protocols in General Networks
Given a network with nodes and diameter of ( is the distance between node and ), if the attacker is able to detect any communication happening within the network with a probability of , the differential privacy of gossip protocols in this network is given by:
If a gossip protocol satisfies -differential privacy and -prediction uncertainty, then we have and in the synchronous setting. In the asynchronous setting,
First, for the synchronous setting, let be the event that node ’s activity is observed by the attacker’s sensors at time . Then, the probability that such event happens given the source node is is . If source node is any other node , since node cannot initialize a communication if it is not a source node at time . Therefore, and .
In the asynchronous setting, on the one hand, let be the event that node ’s activity is observed by the attacker’s sensors and it is the first event record. It can be seen that, if the source node is , then , where stands for the event that the source node is detected during its first communication. If the source node is , then the following event needs to be considered to find out : there is no communication detected by the sensors in the network after communications have been executed. This event is denoted as . The probability of this event is . Notice that it requires at least for the information to reach node from node in any push-based gossip protocols. Therefore, ( stands for the complementary events of and it means that there is at least one communication observed by the sensors in rounds), this is by the fact that node ’s activity cannot be observed by attackers before as the information has not reached to node . In all, we have
Also, as the source node has probability to be detected during its first communication which makes it the first node in attacker’s record. Finally, by applying Lemma 1, we have
On the other hand, since , therefore, there exists a node such that,
This implies, . By Eq. (8), we have
Meanwhile, as we have , then the detection uncertainty can be calculated as
In the context of information spreading, if two nodes are distant, it takes time for information to be spread from one to the other and this time allows the attacker to detect possible distinguishable events which causes the privacy loss of the node’s identity. The network diameter , as the overall distance measure of the whole network, captures the potential privacy loss between any two nodes and becomes one deciding factor of the differential privacy lower bound in Eq. (5). The same logic is reflected on the prediction uncertainty, if every node is close to any other nodes in the network, all nodes are “similar” to each other in terms of information spreading. Therefore, the larger decay centrality a network has, the more unlikely that the attacker can identify the source node through its observations. Such network constraint in preserving privacy is unavoidable by any spreading protocols and needs to be dealt with other protection mechanisms.
Iv-B Differential Privacy of Gossip Protocols in Wireless Networks
Distributed machine learning is considered as one of the major techniques to help manage the increasing data volumes and algorithm-driven applications in modern wireless systems. As an important component in distributed machine learning , the differential privacy of gossip protocols in wireless networks is investigated in this part. It is assumed that the communications between nodes and the monitoring from the attacker are not perfect in wireless systems due to different kind of interferences (e.g., noises). To simplify the analysis, a fail probability is considered in this setting: Due to interferences, the communications can fail with a probability of between two nodes during the gossip step and it is assumed that the attacker fails to receive a report from its sensors during its monitoring with the same probability .222As the first work in exploring this area, this simple assumption is adopted. More realistic assumption will be explored in our future work. Note that the fail probability is caused by the natural conditions like the wireless channel quality, while the unperfect detection probability is due to the eavesdropping capability (e.g., computation power) of the attacker. In this case, the differential privacy of gossip protocols are shown through the following theorem.
In a wireless environment with legitimate communications and adversarial monitoring failed with a probability of and a given wireless network , the gossip-based protocols can guarantee the -differential privacy with and -prediction uncertainty with in the synchronous setting, and and in the asynchronous setting.
This is a straightforward extension from the previous results with an additional factor of .
It is obvious that differential privacy will increase in a bad wireless environment as the attacker is hard to monitor the node’s activities. However, the information spreading process is impeded as well when the fail probability is high. The information spreading time of the standard gossip protocol and private gossip protocol in this case can be estimated through the following theorem.
In a wireless environment with communications being failed with a probability of and a given wireless network , we have
In the synchronous and asynchronous settings, the private gossip takes average rounds and time to inform all nodes in the network, respectively, where is the cover time of a random walk in network .
In the asynchronous setting, if the standard gossip spreads information to all nodes in time on average, where is the spreading time of standard gossip when the communication is perfect.
Sketch of Proof: For private gossip, it is a single random walk on the graph. The average time to inform all nodes is equal to in both synchronous and asynchronous settings. Also, given a fail probability of , the interstate time is amplified by a factor of . Therefore, the average time to inform all nodes is equal to for both synchronous and asynchronous settings. Same logic can applied to standard gossip in the asynchronous setting, as in each round, there is only one node being activated and the communication fail is independent of the next node being activated. Thus, the interstate time is amplified by a factor of as well which gives the result in .
For standard gossip in the synchronous setting, multiple random walks can exists during the spreading process which renders the analysis of unreliable spreading challenging in general networks. But we argue that our results above shall remain valid in this case through following conjecture:
In the synchronous setting, the standard gossip spreads information to all nodes in time on average, where is the spreading time of standard gossip when the communication is perfect.
The above results show that the interference like noise, though improves the overall differential privacy of gossip protocols in wireless systems, also slows down the information spreading process. The trade-off between privacy and spreading speed is shown in the following simulation results. Erdős Rényi (ER) networks and Geometric Random (GR) Networks with a total number of nodes and average degree of are considered. Each point in the figure is obtained based on simulations with graph instances and Monte Carlo runs for each instance. Here, the average spreading time is considered, i.e., the average time required for of nodes in the network to be informed. The privacy-speed tradeoffs for ER and GR networks for standard gossip in the synchronous and asynchronous settings are plotted in Figs. 2 and 3, respectively. It is assumed that and privacy loss . The corresponding privacy lower bound can be calculated for ER and GR networks using Theorem 2 given the fail probability . Similar results can be obtained for the private gossip and are omitted here due to the space constraint.
In wireless networks, the interference like Noise is indeed a useful tool to help strengthen the differential privacy of gossip protocols as it lowers the probability of activity detection by the attacker. However, such benefit is based on the sacrifice of spreading speed. In calculations and simulations, it can be seen that the spreading speed is inversely proportional to while the privacy lower bound is proportional to . The differential privacy of gossip protocols can be strengthened with a small loss of spreading speed which suggests that methods like artificial noise protection can be useful in privacy-preserving information spreading.
Iv-C Differential Privacy of Gossip Protocols in Delayed Monitoring
In reality, the attacker often cannot monitor the whole information spreading process starting from the beginning. In this section, we try to estimate the differential privacy of general gossip protocols when the monitoring is delayed. To simply the analysis, it is assumed that the attacker knows the number of rounds that have passed in the synchronous setting or the number of communications that have occurred in the asynchronous setting since the beginning of information spreading.
Given a network with diameter , if the attacker starts monitoring the information spreading process rounds (or times of communications in the asynchronous case) after it begins and , the gossip-based protocols can guarantee the -differential privacy with in the synchronous setting and in the asynchronous setting
In the synchronous setting, consider two nodes such that , given the event that node
’s activity is observed by the attacker the moment it starts monitoring which is denoted as, it is clear that since it takes at least steps for the information to flow to node from node . Consider another node such that , the probability that node is informed at round is
where is a path from node to node and is the length of this path. Then .333A newly informed node has to be active in its first communication with other nodes in any gossip protocols, otherwise, there is a nonzero probability that the information cannot be spread to the whole network. Finally, by Lemma 1, .
In the asynchronous setting, denotes the event that node ’s activity is the first one observed by the attacker. Again, consider two nodes such . After communications, if the source node is , information reaches to a newly informed node , it is the closest node to node , and . As it requires at least rounds for information to reach node , consider as the event that no communications is observed by the attacker in communications. Then,
Also, consider another node such that , the probability that node is informed at th communication is
where is the probability that all node in a path are activated in a fixed order so that information reaches node after communications from node . Finally, the probability that node is activated and its gossip action is observed by the attacker is . Therefore, . Finally, by Lemma 1, the same logic as Eq. (9), and , we have .
Gossip protocols are not able to protect the source’s identity effectively during the early stage of information spreading as gossip protocols have only been performed for a few rounds introducing barely any randomization into the output observations. More randomization indeed indicates stronger privacy which requires more rounds of gossiping. Therefore, in delayed monitoring, it is more and more difficult for the attacker to identify the rumor source as the delayed time increases.
V Conclusions and Future Works
In this paper, we investigate the differential privacy of gossip protocols in general networks. A lower bound of differential privacy of all gossip protocols is given for general graphs. In the asynchronous setting, it is related to the network diameter as the difference between two nodes in the context of information spreading is determined by their distance. The prediction uncertainty of gossip protocols in this case is related to the decay centrality of the network. In wireless networks, due to possible unreliable communications, a fail probability is assumed for both the communications between nodes and the detection by the attacker. Though there exists a tradeoff between privacy and the information spreading speed, the calculations and simulations based on private and standard gossips in random networks like ER and GR networks suggest noise is effective in protecting the differential privacy of gossip protocols considering the spreading speed as the cost. Finally, in delayed monitoring, it is revealed that the differential privacy of gossip protocols is enhanced as the delayed rounds increases.
Many interesting problems remain open in this direction. For example, if the attacker is able to measure the distance between any two nodes in the network and rule out those impossible source nodes given every possible observations, what will the corresponding differential privacy be like? In addition, if the observation model is switched back to the network snapshot, how can we measure the differential privacy of gossip protocols in this case? These problems will be further explored in our future work.
-  C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
-  A. Bellet, R. Guerraoui, and H. Hendrikx, “Who started this rumor? quantifying the natural differential privacy guarantees of gossip protocols,” arXiv preprint arXiv:1902.07138, 2019.
-  S. S. Ram, A. Nedić, and V. V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference. IEEE, 2009, pp. 3581–3586.
-  P. Bianchi and J. Jakubowicz, “Convergence of a multi-agent projected stochastic gradient algorithm for non-convex optimization,” IEEE Transactions on Automatic Control, vol. 58, no. 2, pp. 391–405, 2013.
-  Y. Liu, J. Liu, and T. Basar, “Differentially private gossip gradient descent,” in 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018, pp. 2777–2782.
-  R. Chandra, V. Ramasubramanian, and K. Birman, “Anonymous gossip: Improving multicast reliability in mobile ad-hoc networks,” in Proceedings 21st International Conference on Distributed Computing Systems. IEEE, 2001, pp. 275–283.
-  A. D. Dimakis, A. D. Sarwate, and M. J. Wainwright, “Geographic gossip: Efficient averaging for sensor networks,” IEEE Transactions on Signal Processing, vol. 56, no. 3, pp. 1205–1216, 2008.
-  A. J. Ganesh, A.-M. Kermarrec, and L. Massoulié, “Peer-to-peer membership management for gossip-based protocols,” IEEE transactions on computers, vol. 52, no. 2, pp. 139–149, 2003.
-  F. Chierichetti, G. Giakkoupis, S. Lattanzi, and A. Panconesi, “Rumor spreading and conductance,” Journal of the ACM (JACM), vol. 65, no. 4, p. 17, 2018.
-  Y. Huang and H. Dai, “Multiplex conductance and gossip based information spreading in multiplex networks,” IEEE Transactions on Network Science and Engineering, 2018.