Modern technology companies often gather data from large population of users/clients to improve their services. Going beyond collecting data, more recently, users are asked to perform certain computation and send the reports back to the server, i.e., participating in federated learning or federated optimization (McMahan et al., 2017; Kairouz et al., 2021; Wang et al., 2021). The reports/data involved can be sensitive in such distributed systems, and in order to protect the privacy of users, formal privacy guarantees are sought after. For this purpose, differential privacy (DP) (Dwork et al., 2006b, a), which is widely regarded as a gold-standard notion of privacy, has seen adoption by companies such as Google (Erlingsson et al., 2014), Apple (Differential Privacy Team, 2017), Linkedin (Rogers et al., 2020) and Microsoft (Ding et al., 2017).
Early academic studies of DP assume the existence of a centralized, trusted curator. The centralized curator has access to raw data from all users, and is responsible for releasing the aggregated results with DP guarantees. Such a central model requires users to trust the curator for handling their data securely. However, many applications such as federated learning, or regulations like General Data Protection Regulation (GDPR) require different assumptions about the availability of such an entity. High-profile data breaches reported lately (Henriquez, 2020) have also given service providers second thought at collecting data in a centralized manner to avoid such risks.
Alternative trust models without a centralized, trusted curator have been proposed. Among these models, the model of local differential privacy (LDP) provides the strongest guarantees of privacy (Evfimievski et al., 2003; Kasiviswanathan et al., 2011): user does not have to trust any other entity except herself. This is achieved by randomizing her data by herself using techniques such as randomized response to achieve LDP before aggregating the data. However, LDP is known to be suffering from significant utility loss. For example, the error of real summation with users under LDP is larger than that of the central model (Chan et al., 2012).
Intermediate trust models between the local and central models are also considered in the literature. These models aim to obtain better utility under more practical privacy assumptions (Dwork et al., 2010; Cheu et al., 2019; Erlingsson et al., 2019; Chowdhury et al., 2020). The pan-private model (Dwork et al., 2010) assumes that the curator is trustworthy for now, and store the distorted values to protect against future adversaries. More recently, the shuffle model (Bittau et al., 2017) has attracted attention. Within this model, it is assumed that there exists a centralized, trusted “shuffler”. Users first randomizes their data before sending them to the shuffler, which executes permutation on the set of data to keep the identities of data anonymous. Finally, the set of data is sent to the curator, on which no trust assumption is put. Privacy amplification due to anonymity is shown to be achievable in this model, where it is enough for each user to apply a relatively small amount of randomizations/noises to the data to achieve strong overall DP guarantees (Cheu et al., 2019; Erlingsson et al., 2019). Various aspects of shuffling have been studied in the literature (Balle et al., 2019; Balcer and Cheu, 2020; Girgis et al., 2021; Liu et al., 2021; Ghazi et al., 2021; Feldman et al., 2021).
Shuffle models provide better privacy-utility trade-offs, but they ultimately rely on a centralized entity; reintroducing such an entity seems to surrender the original benefits of not depending on any centralized entity in the local model, some of which have been discussed above (GDPR constraints, data breach risks). Furthermore, achieving anonymity via a centralized entity is not trivial in practice. One realization of shuffler, Prochlo (Bittau et al., 2017), requires, at its core, the use of a Trusted Executed Environment (TEE) such as Intel Software Guard Extension (SGX). However, SGX is known to be vulnerable to side-channel attacks (Nilsson et al., 2020). Another known realization is via a centralized set of mix relays, or mix-nets (Chaum, 1981; Cheu et al., 2019), but mix-nets remain vulnerable to individual relay failures and other issues (Ren and Wu, 2010).
1.1. Our Contributions
In order to overcome the aforementioned weaknesses, we propose network shuffling, a mechanism that achieves the effect of privacy amplification without requiring any centralized and trusted entity, in contrast to previous studies of privacy amplification where the existence of such an entity is assumed by default.
Essentially, within our framework, users exchange their reports in a random, secret, and peer-to-peer manner on a graph for multiple rounds. The random and secret exchange is vital at achieving anonymity of the reports: all users are potential original holders of each report after multiple exchanges. The exchanged reports are subsequently sent to the curator. When this collection of reports is viewed in the central model, the privacy guarantees are enhanced, yielding privacy amplification via anonymity. A simple illustration of network shuffling is shown in Figure 1.
Network shuffling is inherently a decentralized approach. By comparing it with existing centralized approaches, we show that network shuffling has several favorable properties in terms of security and practicality. Concrete distributed protocols are also proposed to realize network shuffling. Furthermore, to perform a formal privacy analysis of network shuffling, we model it as a random walk on graphs. Harnessing graph theory, we give analytical results of privacy amplification, which have interesting relations with the underlying graph structure. Table 2 compares the privacy amplification result with other techniques. It can be seen that a privacy amplification of similar to other techniques can be attained with network shuffling. It is also the first general technique that achieves privacy amplification without any centralized entity as far as we know.
|No amplification (Duchi et al., 2013)|
|Uniform subsampling (Kasiviswanathan et al., 2011; Abadi et al., 2016)|
|Uniform shuffling (Erlingsson et al., 2019)|
|Uniform shuffling (w/ clones) (Feldman et al., 2021)|
|Network shuffling (ours)|
We note that our main purpose is to show, in an application and system-agnostic manner, that privacy amplification in a decentralized and peer-to-peer manner is viable through network shuffling. To make our arguments applicable in general, we necessarily make idealistic assumptions on the realization of the network shuffling mechanism. Some of the assumptions include non-colluding users and fault-tolerant communication. Such limitations and possible solutions will also be discussed in this work in later sections. Our contributions are summarized as follows:
We propose and motivate network shuffling, a practical, simple yet effective mechanism of privacy amplification that relaxes the assumption of requiring a centralized, trusted entity. We compare network shuffling with existing approaches to demonstrate its advantages (Section 3).
We formalize network shuffling as a random walk on graphs and propose minimal designs and distributed protocols that can be implemented in a simple and practical fashion (Section 4).
2. Preliminaries and notations
This section gives essential terminology, definitions, and theorems related to differential privacy, as well as relevant privacy amplification techniques for understanding our proposals. Table 2 gives some notations frequently used in this work.
Terminology. The following terminology is also used. A user or a node refers to an entity in the system which holds a piece of information, referred to as report or data. A user may send or receive reports from other users, of which the process is referred to as exchange or relay. The exchange typically takes place for multiple rounds or time steps. Users exchange reports on a path or channel, and two users are said to be connected if they are able to exchange reports with each other on the path. The system of users and paths forms a graph or communication network, or network in short. A curator or server will eventually collect all reports from the users. At times we abstract away from how network shuffling is processed in practice and assume that the report anonymization procedure satisfies several minimal requirements, particularly when discussing its theoretical properties. When we are discussing more practical details, such as protocols to achieve the requirements of report anonymization, we call them the implementation or realization of network shuffling. Last but not least, we use “log” to represent the natural logarithm.
2.1. Differential Privacy
Definition 2.1 ((, )-Differential Privacy).
Given privacy parameters and , a randomized mechanism, with domain and range satisfies (, )-differential privacy (DP) if for any two adjacent inputs and for any subset of outputs , the following holds:
Here, the notion “adjacent” is application-dependent. For with elements, it refers to and differing in one element. For such cases, w.l.o.g., we also say that the mechanism is DP at index 1 in the central model. is assumed to be smaller than and is assumed to be to provide meaningful central DP guarantees. We also say that a -DP mechanism satisfies approximate DP, or indistinguishable when .
When consists of single element, is also called a local randomizer, which provides local DP guarantees. The formal definition is given below.
Definition 2.2 (Local randomizer).
A mechanism is a -DP local randomizer if for all pairs , and are indistinguishable.
As elements satisfying -LDP are naturally -DP in the central model, privacy amplification is said to occur when the central model is -DP with .
|Total number of users|
|Report generated by user|
|LDP guarantee of local randomizer|
|-th randomized report|
|Domain of -th randomized report|
|Probability distribution of user holding a report on a graph|
(Position probability distribution)
|, irregularity measure of graph|
2.2. Privacy Amplification Techniques
We here introduce the shuffle model and other techniques of privacy amplification.
The shuffle model is a distributed model of computation, where there are users each holding report for , and a server receiving reports from users for analysis. The report from each user is first sent to a shuffler, where a random permutation is applied to all the reports for anonymization. This procedure is also known as uniform shuffling in the literature.
Other mechanisms of privacy amplification have also been considered, such as privacy amplification by subsampling (Kasiviswanathan et al., 2011), which is utilized in federated learning (McMahan et al., 2018). However, it is necessary to put trust on the server to hide the identities of subsampled users to establish privacy amplification.
A technique called random check-in is introduced in (Balle et al., 2020), where a more practical distributed protocol within the federated learning setting is studied. There, a centralized and trusted orchestrating server is still required to perform the “check-ins” of the users, which loses the appeal of our decentralized approach.
Privacy amplification by decentralization has also been proposed (Cyffers and Bellet, 2020). Deterministic (non-random) walking on graphs is considered there, and it is assumed that the central adversary does not have a local view of data (that is, the central adversary can only access the aggregated quantities). In our work, we instead utilize random walks on graphs to achieve privacy amplification while maintaining a local view of data from the central adversary’s perspective. Due to privacy model’s differences, (Cyffers and Bellet, 2020) is largely orthogonal to our work.
3. Shuffling without a centralized, trusted shuffler
We motivate the need for network shuffling by first discussing the properties of the current implementations of centralized shuffler. Then, we discuss the general properties of network shuffling before closing the section by providing the threat model and assumptions of our proposal.
3.1. Motivations for Network Shuffling
Our work aims to resolve the shortcomings faced by existing realizations of the shuffle model, particularly Prochlo and mix-nets. Let us first discuss in more detail the inadequate properties of the TEE-based Prochlo framework (Bittau et al., 2017). First, as mentioned above, it is difficult to avoid all types of side-channel attacks against TEEs (Schwarz et al., 2017; Bulck et al., 2018; Nilsson et al., 2020). Second, Prochlo must first collect and batch reports from all users before performing shuffling concurrently. This means that Prochlo is not scalable user number-wise.
Another common way of achieving anonymity is through the utilization of mix-nets (Chaum, 1981; Kwon et al., 2016). Essentially, reports are relayed through a centralized set of mix relays (without batching) to avoid single point of failure. However, an adversary still can monitor the traffic to and from the mix-nets to determine the source of (un-batched) reports, thus breaking anonymity. Cover traffic (blending authentic messages with noises to counter traffic analysis) can alleviate the risk, but as will be discussed later in this section, network shuffling is more efficient than mix-nets traffic complexity-wise in this respect. Moreover, the individual mix relays in turn become the obvious targets of attack and compromise (Ren and Wu, 2010; Freedman and Morris, 2002; Rennhard and Plattner, 2002).
The existence of a malicious insider located at the centralized shuffler (insider threat) is yet another potential risk that compromises the anonymity of reports. Generally, introducing a centralized entity simply increases the attack surface. These constraints, along with several other practical considerations discussed in Introduction, motivate us to consider alternatives which relax the assumption of the availability a centralized shuffler, but are still able to reap the benefits of shuffling, i.e., privacy amplification.
Network Shuffling. Uniform shuffling is, simply put, a mechanism that mixes up reports from distinct users to break the link between the user and her report. This is assumed in the literature to be executed by a centralized “shuffler”. The core idea of our proposal that removes the requirement of any centralized and trusted entity is simple: exchanging data among the users randomly to achieve the shuffling/privacy amplification effect, assuming that the users can communicate with the server and each other on a communication network. We denote this mechanism by network shuffling.
Initially, each user randomizes her own report using the local randomizer. Then, the user sends out the report randomly to other connected users while receiving incoming reports also from the connected users. The above procedure is iterated for a pre-determined rounds before sending the reports to the curator. This procedure of exchanging reports essentially achieves the effect of hiding the origin of the report like what a centralized shuffler does.
Network shuffling is applicable to any group of users able to form a communication network to exchange reports with each other. For example, it can be applied to users of messaging applications, where the report-exchanging network is naturally defined by the social network. It can also be applied straightforwardly to wireless sensor networks (Zhang and Zhang, 2012), as well as the Internet Protocol (IP) network overlay which is general-purpose and application-transparent (Freedman et al., 2002; Freedman and Morris, 2002; Rennhard and Plattner, 2002).
3.2. Complexity Analyses
There are a few basic properties of network shuffling that can be inferred without specifying its detailed realizations. First, in terms of memory, as each user exchanges reports without requiring to store them, the amount of memory or space taken is constant with respect to (). This is in contrast to Prochlo (). Mix-nets also have memory complexity of at best, if no batching of data is made to the relay traffic.
Prochlo requires each user to send the report once, leading to a traffic complexity of per user. Typically, mix-nets utilize cover traffic to defend against traffic analysis such that the adversary cannot distinguish whether a report is genuine or merely noise. To explicitly cover all of users, mix-nets must send cover traffic to all users, leading to a traffic complexity of per user. In contrast, each user exchanges traffic only with her neighbors in network shuffling protocols when using cover traffic, and the number of neighbors is typically much smaller than . Some protocols require relaying reports (ignoring other contributing factors; see Section 4.2), amounting to a traffic complexity of , or per user if additional relays depending on are not required.
Finally, we note that as TEE is limited in terms of memory, shuffling is processed in batches of reports, requiring multiple rounds of processing (Bittau et al., 2017). On the other hand, mix-nets and network shuffling at minimum require only simple encryption-decryption mechanisms (see Section 4.4). Overall, although requiring at most of traffic overhead, network shuffling has minimal memory and processing overhead consumption. The dependence of both memory and traffic complexities are summarized in Table 3.
We would like to emphasize that the focus of this paper is on the general, application or system-agnostic study of privacy amplification with network shuffling. A full-fledged implementation and system analysis would be specific to the underlying application or system, and is therefore out of the scope of this work. 333See (Freedman et al., 2002; Freedman and Morris, 2002; Rennhard and Plattner, 2002) for an implementation and analysis of a similarly decentralized and peer-to-peer anonymization system that is specific to IP network overlay.
Nevertheless, for the rest of this section and next, we describe a minimal and self-contained implementation of network shuffling, along with some idealistic assumptions to abstract away from application or system-specific details, but is still realistic enough to be implemented. We hope that this will help practitioners understand the minimal requirements to achieve privacy amplification via network shuffling, which is, as far as we know, the first decentralized approach of privacy amplification in the literature. Furthermore, we discuss in detail in later sections where there are rooms for refinements in our descriptions for real-world deployment.
We begin by elaborating on our threat model which forms the basis of the network shuffling protocols provided in Section 4.
|Entity space complexity|
|User traffic complexity|
3.3. Our Threat Model and Assumptions
As all users apply an -DP local randomizer to the report before sending it out, they are guaranteed with local DP quantified by . This is true even when all other parties (including other users) collaborate to attack any specific user. This forms the worst-case privacy guarantees of network shuffling.
Our privacy analysis in the following sections is however mainly against a central analyzer. In order to establish central DP, our threat model requires additional assumptions.
Non-collusion. We assume that there is no collusion among users, that is, users do not collude against certain victim user. Note that this assumption is also required by the shuffle model when privacy amplification via shuffling is considered (Wang et al., 2020).
Honest-but-curious users. All users are assumed to be honest but can be curious. That is, users will not deviate from the protocol but can try to retrieve information from the received reports. We will demonstrate a communication protocol in Section 4.4 to verify that this is achievable in practice. Besides, we assume that all users are available to participate in relaying the reports at each round. Relaxation of these assumptions will be discussed in Section 4.5.
No traffic analysis. For simplicity, we further assume that it is not possible for the adversary to perform timing or traffic analysis. Achieving this in practice may require users to send more reports than necessary to cover the traffic as described earlier, or hide the trails by sending the report along with other information. Our threat model abstracts away from these considerations. Nevertheless, we remark the final-round reports are not anonymous: the central adversary is able to link the report to the user sending it at the final round of network shuffling (but not the “original” owners of the reports). This is realistic when considering communication protocols in practice, such as those given in Section 4.
When the above assumptions on the central adversary fail to hold, the privacy guarantees degrade at worst to , privacy guarantees in the LDP setting. We also note that our threat model is reasonable compared to shuffle model’s. That is, for both cases, when all users except the victim collude with the server, or when traffic analysis is possible, the privacy guarantees drop to the LDP setting. Within the uniform shuffling framework, the server may additionally collude with the shuffler. On the other hand, network shuffling offers an alternate privacy amplification solution without this attack surface.
4. Protocols of Network Shuffling
In order to perform privacy analysis of network shuffling, we model the report exchanges as random walks on graphs. In the following, we first provide notions of random walks, focusing on relating them to applications in network shuffling. We will also quote well-known results from graph theory, particularly those related to the study of random walks on graphs. See (Lovász, 1996) for a relevant comprehensive survey of random walks on graphs. Then, we describe the distributed protocols of network shuffling.
4.1. Random Walks on Graphs
A graph, , is characterized by a set of nodes or vertices, , and a set of paired nodes, . In our case, is the communication network, and users may be viewed as nodes and represents the set of communication paths between users. In this work, we consider undirected graph, that is, each pair of connected users can send messages to each other.
A particular graph topology is of interest: -regular graph. A -regular graph is a graph in which each node has the same number, , of connected neighbors.
We define the adjacency matrix of as , where is the number of nodes in . takes values from , indicating whether the nodes and are connected. Then, the probability of user sending report to a randomly chosen recipient is characterized by the transition probability from node to node on the graph,
Denote , which represents a diagonal matrix, by . Then, we can rewrite the transition probability matrix of graph as . We write as the probability distribution of the users holding a report at time (we also refer to it as a position probability distribution). Then, the update of probability distribution due to random exchange (random walk) of report at each time step may be expressed recursively as Given a certain initial probability distribution, , we can calculate the probability distribution after time steps, as . At times, we abbreviate of graph as , and its -th component as for notational convenience.
We are also interested in the probability distribution in the long run. Stationary distribution, characterizes such a behavior:
Definition 4.1 (Stationary distribution).
A probability distribution over a set of nodes on a graph is a stationary distribution of the random walk when .
When a random walk converges to a stationary distribution eventually, we say that the walk is ergodic:
Definition 4.2 ().
A random walk is ergodic when for all initial probability distribution over a set of nodes on a graph , converges to a stationary distribution as .
The following theorem describes conditions for guaranteeing ergodicity of a random walk on a graph :
Theorem 4.3 ().
A random walk on a graph is ergodic if and only if is connected and not bipartite.
Let be the number of edges connected to each node, and be the total number of edges for an ergodic graph . It can further be shown that is a stationary distribution. A regular graph’s stationary distribution is therefore .
One is also interested in the mixing time
, the number of time steps it takes for a probability distribution to be close to the stationary distribution. To estimate the mixing time, we first consider a-regular graph and introduce several more results from spectral graph theory. 444We mainly follow the arguments given in (Williamson, 2016). More details can be found there and in the references therein. First, the transition probability matrix (equivalent to
, known as the normalized adjacency matrix) is characterized by its eigenvalues, denoted by
and its corresponding orthonormal eigenvectors,. It can be shown that and . Since for is orthonomal in , we can write any initial distribution as with . Then,
and using . Since , as .
We also quantify the concept of walk convergence as follows.
Definition 4.4 ().
The graph total variation distance between two distributions and on a graph is defined as:
Recall that the distribution at time is . Let be the spectral gap. It can be shown that:
using Cauchy-Schwartz inequality. Note that
where we have used by Minkowski inequality. Then, we have
This means that when choosing , we have
where we have used . Therefore, we can say that when the mixing time is of , the graph total variation distance is small for sufficiently large , and the probability distribution is close to the stationary distribution. Finally, as is similar to the corresponding normalized adjacency matrix for non-regular graphs, non-regular graphs also have the same convergence behavior (Williamson, 2016).
4.2. Network Shuffling as a Random Walk on Graphs
Let us elaborate on the setups of network shuffling in terms of graph theoretical notions introduced in Section 4.1.
We consider only connected graphs in our analysis. The privacy of disconnected graphs may be viewed as a parallel composition of the privacy of connected sub-graphs, meaning that shuffling occurs only within each connected sub-graphs. It is then sufficient to analyze connected graphs only.
We also assume that all users on the graph participate in network shuffling. That is, all users are required to participate in receiving and sending reports to neighboring users. Before the process of exchanging reports starts, each user is also required to have produced a randomized report to be exchanged: user for produces . After the final round of random exchanges, a user can possibly have received no report, a single report, or multiple reports. The set of reports held by user is denoted by . Figure 2 illustrates how the reports are distributed to the users at different time steps.
We analyze separately the following two scenarios at any time step:
Stationary distribution: of ergodic graph.
Symmetric distribution: of -regular graph.
The technical reason for this separation is due to the their difference in the dynamics of at time , to be explained in detail in Section 5.2.
The first scenario concerns with the modeling of any connected and non-bipartite graphs which are best analyzed (as will be shown later) with respect to the stationary distribution they converge to, hence the name stationary distribution.
The second scenario concerns with a walk that is symmetric over all nodes. Under this setup, one can w.l.o.g. analyze the privacy with respect to the first user, allowing for precise privacy analysis. Note that such a consideration is not purely of theoretical interests. Certain design (Freedman et al., 2002; Freedman and Morris, 2002; Rennhard and Plattner, 2002) implements peer discovery protocols where each user proactively selects a set of users to form a path through the network. In this case, forming a -regular graph is a reasonable consideration when each user uses the same peer discovery protocol to select a fixed number of other users to communicate with.
4.3. Reporting Protocols
We propose two user protocols of sending reports. Note that we first abstract away from security considerations of the protocol which are discussed in Section 4.4. With this abstraction, we focus on the mechanism of sending the reports to the server by users.
“All” protocol. The first protocol is described in Algorithm 1. Here, each user exchanges the reports in a random-walk manner for a pre-determined number of communication rounds. Afterwards, the user sends all reports held by her to the server. Note that the user simply sends a null response to the server if no report is on her hand during the final round. This protocol is referred to as .
“Single” protocol. The second protocol is a modification of Algorithm 1 to provide better privacy guarantees, inspired by the federated learning approach in (Balle et al., 2020). As before, users exchange the reports in a random-walk manner for a pre-determined number of communication rounds. After the final communication round, if is empty, a dummy report is sent to the server by user . Otherwise, a report is sampled uniformly from to be sent to the server. Intuitively, since each user sends only a single report irrespective of how many reports she has received, it is harder for the adversary to infer about the identity of the received reports, providing stronger privacy guarantees. Note that as a trade-off, not sending all reports to the server could induce utility loss. The protocol is described in Algorithm 2, and is referred to as .
4.4. Communication Protocol
As a concrete use case to motivate this work, let us consider collecting data from users of instant messanging apps offered by social networking services, such as Whatsapp, or Messenger offered by Facebook. Messages are commonly exchanged between two or more parties, where the sender sends messages to recipients connected to a centralized network run by a server. Within the social networking setting, users commonly interact only with other users connected on the social network. It is then natural to consider exchanging the private reports only with users connected on the social network to hide the private reports within the traffic of usual data exchanges. We note that user-to-user communication via a centralized server is not strictly required in other applications such as wireless sensor or Internet of Things (IoT) networks, where peer-to-peer (P2P) communication is feasible.
Let us consider the communication protocol running between the server and users. The server is assumed to be able to communicate with the users via a secure, authenticated and private channel. We also assume the existence of a Public Key Infrastructure (PKI), which ensures that only authenticated users can participate in the data exchange. All users utilize two types of public-private keypairs. One is for end-to-end encrypted communication with other users (), which is unique to each user, and the other is for encrypting the report to be exchanged with other users, where the server holds the corresponding private key ().
All users initially publish and receive public keys via the PKI. A user then applies local randomizer to the report, and encrypts it with . Subsequently, the user uses to send the report to another user in an end-to-end encrypted manner. After exchanging the reports for a number of rounds of communication, the user sends the reports (encrypted only with ) to the server. The server then decrypts all the received reports and perform data analysis on them. Notice that the server can link a user to her last received reports. The illustration is shown in Figure 3. We next analyze the security properties of the protocol.
Security against adversarial server. The use of prevents the exposure of the encrypted report to anyone else other than the communicating users. This end-to-end encryption especially protects the report’s privacy from the possibly adversarial server.
Security against honest-but-curious users. The use of prevents the exposure of the content of the report to anyone else other than the user applying the local randomizer and the server. This protects the report’s privacy from honest-but-curious users.
We would like to emphasize that our communication protocol based on asymmetric encryption is simple to implement in contrast to secure aggregation (Bonawitz et al., 2017) or secure shuffling, where sophisticated secure multi-party computation protocols or TEEs are required, which can be challenging in terms of implementation.
4.5. Practical Considerations
The network shuffling protocols described thus far are self-contained and secure if the threat model and assumptions given in Section 3.3 are satisfied. Here, we consider scenarios where some of the assumptions are relaxed. While the detailed study is beyond the current scope, we discuss potential solutions or workarounds by giving reference to relevant works in the literature.
Fault tolerance. In practice, users may fail to operate properly due to various reasons. They may disconnect from the network temporarily due to, e.g., battery depletion or network outage. One way to model such a situation is via the use of lazy random walk. A lazy random walk is a random walk where at each time step, the walk has a certain probability of staying at its current node instead of transitioning to other nodes. This behavior reflects the probability of certain users being disconnected temporarily and unable to exchange reports at a certain time. Another approach is to analyze random walks on a dynamic graph to study the effects of potential walker losses (Zhong et al., 2008).
Collusion. Colluding users can be a threat to the anonymity guarantees. Discussing this threat however requires more assumptions on the system in use. For example, implementations at the network layer of IP (Freedman et al., 2002; Freedman and Morris, 2002) defend against such an adversary by requiring the node to select peers in a pseudo-random way: it is highly unlikely for an adversary to control all pseudo-randomly selected nodes in a path. Another defensive method is to monitor user behavior and use collusion detection algorithms to counter such an adversary (Rennhard and Plattner, 2002). Moreover, the orchestrating server may drop users considered to be adversarial; such a dynamic scenario may be analyzed with a dynamic graph as discussed above.
5. Privacy Analysis
In this section, we first describe useful preparations for analyzing the privacy of network shuffling. Then, we study privacy theorems for the scenarios and protocols in consideration along with the proof sketches and interpretations. Some of the technical analyses are relegated to Section 6. Numerical evaluations involving the use of real-world datasets are also given.
The following notations describing the distance between distributions are convenient when illustrating the proof. Given two distributions, and that are -DP close, i.e., for all measurable outcomes , the following holds:
we denote this relation by .
We will make use of the heterogeneous advanced composition for DP (Kairouz et al., 2017). For a sequence of mechanisms, which are -DP each, the -fold adaptive composition is -DP for any , and
We next provide several other results useful for proving the main theorems of this paper.
Lemma 5.1 ().
Let denote the number of reports each of users is allocated in the protocol from Figure 1. Also let be the allocation probabilities. With probability at least , we have
The proof is based on McDiarmid’s inequality and is omitted here due to space constraint.
The following lemma is useful for extending the study of pure DP to approximate DP via the total variation distance , where are probability distributions on (see also Definition 2):
Lemma 5.2 (Balle et al., Cheu et al.).
Suppose is an -DP local randomizer with . Then there exists an -DP local randomizer such that for any we have .
See Lemma A.3 of (Balle et al., 2020). ∎
In order to study shuffling, we make use of the technique introduced in (Erlingsson et al., 2019), which reduces shuffling () to swapping () by swapping the first element with another element selected u.a.r. from the dataset, and applies the local randomizers.
Our privacy analysis concerns with how the adversary deduces the underlying data identities (assuming that is known) by making observations on the distribution of reports at the final time step, , as in Figure 2. At the heart of the proof is the reduction of the protocol to variants of the swapping algorithm (Erlingsson et al., 2019).
Compared to uniform shuffling, analyzing the privacy of network shuffling faces a few additional challenges. To understand this, let us first describe the technique (Erlingsson et al., 2019) of analyzing uniform shuffling.
The uniform shuffling mechanism may be considered as a sequence of algorithms of the form . W.l.o.g, consider two datasets differing in the first element. For any permutation, one may first permute the last elements. Then, the first element is swapped with an element uniformly sampled from the indices. It is easy to see that this is equivalent to performing uniform permutation, and hence reduces uniform shuffling to swapping. As each output has certain probability of being swapped with the first element, its distribution can be seen as a mixture of distribution, , where is independent of the first element (not swapped by the first element), and is the output distribution of randomized by the -th local randomizer (swapped by the first element). Then, one can show that and are overlapping mixtures to achieve the desired amplification (like subsampling (Balle et al., 2018)). The amplified is then obtained by bounding . Finally, the heterogeneous composition theorem (Kairouz et al., 2017) is used to compose all ’s to obtain the overall .
Network shuffling is different in at least three ways. First, the adversary can link the reports to the user last receiving them, different from uniform shuffling, where the link of the reports to the user is completely broken (randomized). Second, since in , each user may output more than one report, the decomposition of uniform shuffling does not apply. Third, as for in general, one also needs to modify the uniform sampling assumptions made in (Erlingsson et al., 2019). We next study which is vital to our analyses.
5.2. Finite-Time Privacy Guarantees
To characterize , let us take a closer look at the dynamics of message exchanges for -regular (symmetric distribution) and ergodic graphs (stationary distribution). Message exchanges on a -regular graph can be tracked (in a probabilistic sense) at each time step due to its symmetric structure. This allows us to calculate exactly and provide a precise privacy analysis at any .
On the other hand, generic ergodic graphs’ is dependent on the initial distribution especially when is finite. We resort to giving an upper (worst-case in DP notion) bound in this case. That is, we give an upper bound on for a protocol that runs for t steps and then stops. To derive the bound, it is convenient to consider it as a deviation from the stationary distribution, , as detailed in the following.
Recall from Section 4.1 that any probability distribution at time step can be expanded with the eigenvectors of : and . Then, using the fact that the orthogonal transformation preserves the inner product, the error is
The reason we compute will be made clear when we present the privacy theorems in the following, as we will see that they all depend on this quantity. We begin by providing the privacy theorem for with stationary distribution next.
5.3. with Stationary Distribution
Theorem 5.3 (“All” protocol, Stationary distribution).
Let be a -local randomizer. Let be the protocol as shown in Algorithm 1 sending all reports to the server. Then, satisfies ()-DP, with
and when the protocol runs and stops at time step . is the graph stationary distribution and is the graph spectral gap. Moreover, if is -DP for , is -DP with and .
Proof sketch. First, consider fixing (conditioning) the number of reports held by each user (e.g., user 1 holding 2 reports, user 2 holding zero report, and so on). This conditioned distribution (realized by Algorithm 3) may be seen as a distribution consisting of all permutations of data elements but with the number of reports held by each user fixed. One may then reduce such a uniform permutation to swapping and uses a variant of the swapping technique (Erlingsson et al., 2019) to bound . Finally, a concentration bound on the distribution of report sizes, and a bound on using Equation 5.2 are placed to complete the proof. The full proof is given in Section 6.
Interpretations. In Table 2, we compare the privacy amplification of network shuffling with other existing privacy amplification mechanisms. We make comparisons assuming for convenience, and hide the dependence on polylogs of and . “No amplification” means that only local randomizer is applied to each report. Note that subsampling and uniform shuffling mechanisms require a centralized and trusted server to achieve the amplification. Such an entity is powerful at breaking completely the link between the user and the reports. Network shuffling works without this advantage, but yet is still capable of achieving amplification of (albeit with a weaker exponential dependence on ).
We show in Figure 4 how the central DP guarantees change with respect to the number of communication round/time step per user. Here, three real-world graphs (see Table 4) with around the same number of users () are used to make the comparisons. is calculated to be , and we see that privacy guarantees converge at time step around . Note that the total communication overhead is multiplied by the number of time step, and there is no cover traffic as assumed in Section 3.3.
|Facebook(Rozemberczki et al., 2019a)||Twitch(Rozemberczki et al., 2019a)||Deezer(Rozemberczki et al., 2019b)||Enron(Klimt and Yang, 2004)||Google(Leskovec et al., 2009)|
5.4. with Symmetric Distribution
Theorem 5.4 (“All” protocol, Symmetric distribution).
Let be a -local randomizer. Let be the protocol as shown in Algorithm 1 sending all reports to the server. Then, satisfies ()-DP, with
and the position probability distribution of any user when the protocol runs and stops at time step . is the ratio of the largest value of to the smallest non-zero . Moreover, if is -DP for , is -DP with and .
Proof sketch. Our approach is similar to the proof of Theorem 5.3. The difference is in the conditioning of the output distribution with for some fixed with . For a pair of datasets and differing in one element, we let the differing element be w.l.o.g.. Here, the conditioned output distribution has allocated at with probability
, instead of uniform distribution like in Theorem5.3. The rest of the calculation follows closely to those given in Section 6.
Trade-off between privacy and communication overheads. We show in Figure 5 how the central DP guarantees change with respect to the number of communication rounds. Perhaps not surprisingly, with larger , the distribution “mixes” faster as the walk has more choices of node to move to. Subsequently, the privacy guarantees converge faster to the asymptotic value. Note that we are tracing the random walk exactly in this case: the walk exhibits non-monotonic behaviors in early times as it “oscillates” between its neighbors without spreading out initially. This is in contrast to Figure 4 showing the upper bound of which is decreasing monotonously.
The privacy theorems for with stationary and symmetric distributions are given below. We begin with the stationary distribution:
Theorem 5.5 (“Single” protocol, Stationary distribution).
Let be a -local randomizer. Let be the protocol as shown in Algorithm 2 sending single reports to the server. Then, satisfies ()-DP, with
and when the protocol runs and stops at time step . is the graph stationary distribution and is the graph spectral gap. Moreover, if is -DP for , is -DP with and . Particularly, when , we obtain and .
For the symmetric distribution, the privacy guarantees for the “single” protocol turn out to be almost the same as Theorem 5.5:
Theorem 5.6 (“Single” protocol, Symmetric distribution).
Let be a -local randomizer. satisfies ()-DP, with equal to the one given in Theorem 5.5, except that is the position probability distribution of any user when the protocol runs and stops at time step . Moreover, if is -DP for , is -DP with also equal to the ones given in Theorem 5.5 except that is the position probability distribution of any user at time step .
Proof sketch. In order to prove the privacy guarantees of (Algorithm 2), we utilize random replacement (Balle et al., 2020): we reduce to an algorithm which works as follows: Substitute the first element in the dataset with a pre-determined element, and choose an element in the dataset (according to a certain probability distribution) to substitute it with the original first element. Then, all elements are randomized with the local randomizer. The rest of the calculation follows similarly to that given in Section 6. 555The communication overheads of have the same trend as those of and are therefore not shown for brevity.
5.6. Numerical Analyses
Here, we provide several numerical analyses based on our privacy theorems. In the following, unless stated otherwise, we always assume that the network shuffling protocol runs and stop at time (mixing time), and present the results correct at this . Table 4 shows values of and ’s for five real-world network datasets. We see that social networks, which are our main use case, have reasonably regular structure () compared to other networks, e.g., Google web graph. The dependence of the amplified on for the real-world datasets is shown in Figure 6. It can be seen that, nevertheless, population size matters the most as Google with yields the most significant privacy amplification.
Figure 7 compares the level of privacy amplification of and . Plots are made using two datasets (Twitch and Google) which differ significantly in (9,498 and 855,802 respectively). Furthermore, in Figure 8, we show the dependence of the amplified on varying the relevant parameters (, and protocol) without assumptions on the underlying dataset, but at the limit of stationary distribution, in constrast to previous analyses.
Notice (e.g., in Figure 7) that at large , using gives better privacy amplification. However, we should not conclude that utility-wise is always better at large . This is because, as mentioned in Section 4.3, does not send all reports truthfully (user not holding any report sends a dummy report; user holding multiple reports sends only a report), leading to potential utility losses. To demonstrate this, we study a privacy-utility trade-off scenario by performing mean estimation on the Twitch dataset following a setup similar to (Chen et al., 2020): we generate -dimensional synthetic samples independently but non-identically; we set and . Each sample is normalized by setting . Additionally, we generate dummy sample (as required by ) by setting . is set to be 200.
The PrivUnit (Bhowmick et al., 2018) algorithm is applied to perturb each report for obtaining -LDP guarantees. The utility of mean estimation is quantified by the mean squared loss or the error of . The number of dummy data is determined as the expected number of user holding less than a sample (7,080 for Twitch).
In Figure 9, we plot the relationship between the central and the expected squared error. This is done by sampling a few points of , applying PrivUnit to the data, calculating the central and the expected squared error according to the protocols. It can be observed that at least in the studied parameter region, for any fixed value of , the expected squared error of is consistently smaller than that of , serving as a counter-example to the argument that is better at large privacy region. In general, the utility-privacy trade-offs of and are scenario and dataset-dependent.
6. Detailed Privacy Analysis of with Stationary Distribution
In this section, we give a detailed proof of the privacy theorem of with stationary distribution, Theorem 5.3. The proofs of other scenarios are omitted due to space limit and will be given elsewhere.
Let be the report sizes, i.e., the number of reports received by each user, for as in Algorithm 3, . The output distribution of may be considered to be a distribution conditioned on for some fixed with . Fixing the position probability distribution, the output distribution is then the same as the one produced by Algorithm 3 with report sizes , but consisting of all permutations of the original dataset . This can be viewed as a variant of shuffle model with fixed report sizes, and can be analyzed by reducing random permutation/uniform shuffling to swapping (Erlingsson et al., 2019). For a pair of datasets and differing in the first record, according to this reduction, it is sufficient to analyze , where is a procedure swapping with for uniformly sampled from . We henceforth prove the following theorem.
Theorem 6.1 ().
Let for be an -DP local randomizer. Let s.t. . We also define to be the swapping operation on dataset , where is swapped with for uniformly sampled from . Then,