1 Introduction


Privacy Model and Protocol(s)  Threat Model of Privacy  PrivacyUtility Tradeoff ^{\mathsection} (with recordlevel DP)  Robustness against Poisoning Attacks  
Server  Clients Collusion  Noise Generator  Perturbation Mechanism  STD of Noise in Aggregation  Mechanism  Impact of Added Noise  


CDP [42, 26]  trusted  all except victim  server  via robust aggregation  noise enhances robustness  
LDP [38, 50]  untrusted  all except victim  client  via robust aggregation  noise reduces robustness  
DDP+Crypto^{*} [54, 59, 40]  honestbutcurious^{⋄}  noncollude  client  ✗  –  
PRECAD (ours)^{†}  honestbutcurious^{⋄} noncolluding  all except victim  two servers  or  or  via MPC and DP  noise enhances robustness 


DDP+Crypto framework assumes a minimum number of noncolluding clients (i.e., the value of ) out of total clients, which has influence on the privacy guarantees. Symbol “–” indicates not applicable.

Honestbutcurious means that the server follows protocol instructions honestly, but will try to learn additional information.

We show the privacyutility tradeoff by fixing the same privacy cost (with required noise) and then comparing the standard deviation (STD) of the noise on the aggregation, where approaches with a smaller STD has a better utility.
represents the local model update (without noise) from client . For convenience, we ignore the scaling factor for averaging local models, and let the recordlevel sensitivity of all be 1, which can be achieved via recordlevel clipping. 
PRECAD considers two types of attacker settings on privacy, i.e., with/without a corrupted server. The latter assumption is weaker, thus needs less noise.
Federated learning (FL) [41] is an emerging paradigm that enables multiple clients to collaboratively learn models without explicitly sharing their data. The clients upload their local model updates to the server, who then shares the global average with the clients in an iterative process. This offers a promising solution to mitigate the potential privacy leakage of sensitive information about individuals (since the data stays locally with each client), such as typing history, shopping transactions, geographical locations, medical records, and etc. However, recent works have demonstrated that FL may not always provide sufficient privacy and robustness guarantees. In terms of privacy leakage, exchanging the model updates throughout the training process can still reveal sensitive information [9, 44] and cause deep leakage such as pixelwise accurate image recovery [63, 61], either to a thirdparty (including other participating clients) or the central server. In terms of robustness, FL systems are vulnerable to model poisoning attacks, where the attacker controls a subset of (malicious) clients, aiming to either prevent the convergence of the global model (a.k.a. Byzantine attacks) [25, 7], or implant a backdoor trigger into the global model to cause targeted misclassification (a.k.a. backdoor attacks) [5, 56].
To mitigate the privacy leakage in FL, Differential Privacy (DP) [22, 23] has been adopted as a rigorous privacy notion. Several existing frameworks [42, 26, 38] applied DP in FL to provide clientlevel privacy under the assumption of a trusted server: whether a client has participated in the training process cannot be inferred from the released model, and the client’s whole dataset remains private. Other works in FL [62, 38, 59, 54] focused on recordlevel privacy: whether a data record has participated during training can not be inferred by adversaries, including the server which may be untrusted. Recordlevel privacy is more relevant in crosssilo (as versus crossdevice) FL scenarios, such as multiple hospitals collaboratively learn a prediction model for COVID19, in which case what needs to be protected is the privacy of each patient (corresponding to each record in a hospital’s dataset).
In this paper, we focus on crosssilo FL with recordlevel DP, where each client possesses a set of raw records (which are aggregated from some individuals), and each record corresponds to an individual’s private data. Existing solutions in this setting can be categorized as Centralized DP (CDP), Local DP (LDP)^{1}^{1}1Following [39, 49, 54], we use LDP to refer to the client based approaches for ease of presentation, but it is different from the traditional LDP for data collection in [21, 57, 29]., and Distributed DP (DDP). In CDPbased solutions [42, 26], each client submits the raw update to a trusted server who applies a model aggregation mechanism with DP guarantees. In LDPbased solutions [38, 50], each client adds noise to the update with DP guarantees before sending it to the server, where the server and other clients are assumed as untrusted. Though relying on a weaker trust assumption, LDP approaches suffer from poor utility because the noise added by all the clients are accumulated when the server aggregates all updates (as verses in CDP only one party, the server, adds noise). In DDPbased solutions [54, 59, 40], each client adds partial noise to achieve the global DP noise as in server based (i.e., the required noise are jointly added by all clients in a distributed way), and sends the encrypted output to the server. The utilized cryptographic primitives, such as additive homomorphic encryption (HE), guarantee that all but the final result are hidden from the server and other clients. By leveraging cryptography techniques, DDPbased solutions avoid placing trust in the server (instead with an honestbutcurious assumption), and offers better utility than LDPbased solutions. However, the utility enhancement of DDPbased approaches is sensitive to the minimum number of trusted parties, which should be known in advance to derive the required noise amount, and it reduces to LDP in the worst case when allbutone clients collude to infer the victim client’s data. Furthermore, such encryption based methods prohibits the server from auditing clients’ model updates, which leaves room for malicious attacks. For example, malicious clients can introduce stealthy backdoor functions into the global model without being detected.
On the robustness of FL, recent works [53, 49] empirically observed that the noise added in the clipped gradients under CDP and/or LDP is able to defend against backdoor attacks. The intuition is that CDP limits the information learned about a specific client, while LDP does so for records in a client’s dataset. In both cases, the impact of poisoned data will be reduced, while simultaneously providing DP guarantees. However, CDP assumes a trusted server, and LDP defends against backdoor attacks only when the corrupted (malicious) clients also implement the noise augmentation mechanism. If malicious clients opt out from the LDP protocol [49], the framework becomes less robust, because the malicious clients can potentially have a bigger impact on the aggregated model when other benign clients honestly add noise (thus have less impact) to satisfy LDP. In other words, the noise of LDP actually reduces robustness in practice where the attacker has full control on the corrupted clients. Also, a concurrent work [30] shows that (in both theoretical and experimental perspectives) the classical approaches to Byzantineresilience and LDP are practically incompatible in machine learning. Furthermore, similar observations of manipulation vulnerability of LDP have been made in [16, 14] under the application of data aggregation for statistic queries.
In summary, neither CDP, LDP, nor DDP based solutions can achieve both privacy and robustness without sacrificing performance, where CDP does not protect privacy against the server, LDP has poor utility and is vulnerable to a strong attacker who has full control over malicious clients, and DDP is not able to audit (malicious) clients’ updates. The main challenge lies in the dilemma between the server learning little information from clients’ data (for privacy), while being able to detect anomalous submissions injected by malicious clients (for robustness).
In this paper, we propose a Privacy and Robustness Enhanced FL framework via CryptoAided DP (PRECAD), which is the first scheme that simultaneously provides strong DP guarantees and quantifiable robustness against model poisoning attacks, while providing good model utility. This is achieved by combining DP with secure multiparty computation (MPC) techniques (secret sharing). PRECAD involves two honestbutcurious and noncolluding servers. This setting has been widely formalized and instantiated in previous works such as [48, 51, 18, 2, 31]. In PRECAD, each client implements clipping of gradients (which will be the local updates) in both recordlevel and clientlevel, then uploads the random shares of its local update to two servers respectively. The two servers run a secretsharing based MPC protocol to securely verify clientlevel clipping, and jointly adds Gaussian noise to the aggregated result. Our protocol guarantees that the two servers can only learn the validity of clientsubmitted model update (whether it is bounded by the clipping norm, which leaks nothing about a benign client’s data), and the result of the noisy aggregation (where the privacy leakage at recordlevel is carefully accounted by DP). PRECAD simulates the CDP paradigm (a.k.a. SIMCDP [46]) in FL, but does not rely on the assumption of trusted server(s). Furthermore, due to the verifiable clientlevel clipping, the contribution of each (malicious) client to the global model is bounded. We use DP to show the bounds of robustness guarantee, where the noise in PRECAD for privacy purpose enhances robustness (in contrast, noise in LDP reduces robustness). A comparison among the different privacy models/frameworks, and their privacy/robustness guarantees (including CDP, LDP, DDP and PRECAD) is shown in Table 1.
Contributions. Our main contributions include:
1) The proposed scheme PRECAD is the first framework that simultaneously enhances privacyutility tradeoff of DP and robustness against model poisoning attacks for FL under a practical threat model: for private data inference, we assume servers are honestbutcurious and allow the collusion of all clients except the victim; for model poisoning attacks, the attackers has full control over the corrupted clients. In PRECAD, the recordlevel clipping at each client, combining with secret sharing and server perturbation, ensures recordlevel privacy, while the clientlevel clipping and secure verification ensure robustness against malicious clients.
2) We show that PRECAD satisfies recordlevel DP in Theorem 1 while providing quantifiable robustness against model poisoning attacks in Theorem 2, where the theoretical analysis shows that the noise for privacy purpose enhances the robustness guarantees.
3) We conduct several experiments to demonstrate PRECAD’s enhancement on the privacyutility tradeoff (compared with LDPbased approaches) and robustness against backdoor attacks (which is a prominent type of targeted modelpoisoning attacks). The experimental results validate our theoretical analysis that DPnoise in PRECAD enhances model robustness.
2 Preliminaries
2.1 Differential Privacy (DP)
Differential Privacy (DP) is a rigorous mathematical framework for the release of information derived from private data. Applied to machine learning, a differentially private training mechanism allows the public release of model parameters with a strong privacy guarantee: adversaries are limited in what they can learn about the original training data based on analyzing the parameters, even when they have access to arbitrary side information. The formal definition is as follows:
Definition 1 (Dp [23, 22]).
For and , a randomized mechanism with a domain (e.g., possible training datasets) and range (e.g., all possible trained models) satisfies Differential Privacy (DP) if for any two neighboring datasets and for any subset of outputs , it holds that
where a larger and indicate a less private mechanism.
Gaussian Mechanism. A common paradigm for approximating a deterministic realvalued function with a differentially private mechanism is via additive noise calibrated to ’s sensitivity , which is defined as the maximum of the absolute distance , where and are neighboring datasets. The Gaussian Mechanism is defined by , where
is the normal (Gaussian) distribution with mean 0 and standard deviation
. It was shown that the mechanism satisfies DP if and [23]. Note that we use an advanced privacy analysis tool in [20] (refer to Lemma 1 in Appendix B), which works for all .DPSGD Algorithm. The most wellknown differentially private algorithm in machine learning is DPSGD [1]
, which introduces two modifications to the vanilla stochastic gradient descent (SGD). First, a
clipping step is applied to the gradient so that the gradient is in effect bounded. This step is necessary to have a finite sensitivity. The second modification is Gaussian noise augmentation on the summation of clipped gradients, which is equivalent to applying the Gaussian mechanism to the updated iterates. The privacy accountant of DPSGD is shown in Appendix B.2.2 FL with DP
FL is a collaborative learning setting to train machine learning models. It involves multiple clients, each holding their own private dataset, and a central server (or aggregator). Unlike the traditional centralized approach, data is not stored at a central server; instead, clients train models locally and exchange updated parameters with the server, who aggregates the received local model parameters and sends them to the clients. FL involves multiple iterations. At each iteration, the server randomly chooses a subset of clients and sends them the current model parameter; then these clients locally compute training gradients according to their local datasets and send the updated parameters to the server. The latter aggregates the results and updates the global model. After a certain number of iterations (or until convergence), the final model parameter is returned as the output of the FL process.
Recordlevel v.s. Clientlevel DP. In FL, the neighboring datasets and (in Definition 1) can be defined at two distinct levels: recordlevel and clientlevel. For recordlevel DP, and are defined to be neighboring if can be formed by adding or removing a single training record/example from . On the other hand, for clientlevel DP, is obtained by adding or removing one client’s whole training dataset from . In this paper, the main privacy goal is to guarantee recordlevel DP, as this is most relevant in the FL applications we consider. On the other hand, clientlevel DP is also achieved as a byproduct of our protocol, and we will utilize it to show the robustness against model poisoning attacks.
2.3 Secret Sharing
We will make use of the additive secret sharing primitive in [19, 18]. Consider a decentralized setting with clients and servers, each client holds an integer and the servers want to compute the sum of the clients’ private values . All arithmetic of a secret sharing scheme takes place in a finite field with a public, large prime . For convenience, we use to indicate . The additive secret sharing for computing sums proceeds in three steps:
Step 1: Upload. Each client splits its private value into shares, one per server, using a secretsharing scheme. In particular, the client picks random integers , subject to the constraint: . The client then sends one share of its submission to each server through a secure (private and authenticated) channel.
Step 2: Aggregate. Each server aggregates all received shares from clients by computing the value of an accumulator .
Step 3: Publish. Once the servers have received shares from all clients, they publish their accumulator values. Computing the sum of the accumulator values yields the desired sum of the clients’ private values, as long as the modulus is larger than the final result (i.e., the sum does not overflow the modulus).
The above secret sharing scheme protects clients’ privacy in an unconditional and informationtheoretic sense: an attacker who gets hold of any subset of up to shares of (i.e., at least one of the servers is honest) learns nothing, except what the aggregate statistic itself reveals.
3 Problem Statement
3.1 System Model
We assume multiple parties in the FL system: two aggregation servers ( and ) and participating clients . The servers hold a global model and each client possesses a private training dataset . Each server communicates with the other server and each client through a secure (private and authenticated) channel. At the th iteration (), the servers randomly select a subset of clients and send the current global model parameter to them. Next, each client who receives global trains the local model from its own private dataset and sends the update (i.e., the difference between the local model and the global model) to the servers. Then, the servers update the global model by aggregating all :
(1) 
where is the learning rate of the global model and is the aggregation weight of client at the th iteration. They will keep iterating the above procedure until convergence.
Note that the model parameter is a
dimensional real vector (i.e.,
), while the utilized secret sharing scheme for secure aggregation is taken in a finite field . For any real value (or a real vector ), we can embed the real values into a finite field (with size ) using a fixedpoint representation, as long as the size of the field is large enough to avoid overflow. In this paper, we use to denote the fixedpoint representation (within a finite field ) of a real value and use to denote the share of held by , where and .3.2 Threat Model
In this paper, we consider two types of attacks: record inference attacks and model poisoning attacks, where the first one compromises data privacy and the second one compromises the model robustness. The threat models of them are different:
Record Inference Attacks. To infer one record of a benign client (who is the victim), the attacker can corrupt at most one server and a subset of clients (except the victim). We assume that the two servers are honestbutcurious and noncolluding. This setting has been widely formalized and instantiated in previous works such as [48, 51, 18, 2, 31]. Noncolluding means that they avoid revealing information to each other beyond what is allowed by the protocol definition. Honestbutcurious (a.k.a. semihonest) means that they follow the protocol instructions honestly, but will try to learn additional information. We assume the corrupted clients are also honestbutcurious, and can communicate with the corrupted server. We note that even when the corrupted clients deviate from the protocol, they do not obtain additional gain for privacy inference because their submissions only affect the parameter of the global model but not the privacy protocol.
Model Poisoning Attacks. The attacker can corrupt a subset of malicious clients (but not the servers) to implement the model poisoning attacks. We assume the attacker has full control on both the local training data and the submission to the servers over these corrupted (malicious) clients, but has no influence on other benign clients. Furthermore, the malicious clients can fully cooperate with each other to achieve a stronger attack influence. We assume that the number of malicious clients is less than benign clients. Following [5, 53, 49], we consider a modelreplacement methodology for model poisoning attacks. In FL (described in Sec. 3.1), each selected client sends the update to the servers. At th iteration, we assume only one malicious client (say client ) is selected, then he/she attempts to replace the global model by a targeted model via sending . We note that each local model may be far from the global model . However, as the global model converges, these deviations start to cancel out, i.e., . Therefore, if we assume the model has sufficiently converged, the parameter of the global model in (1) will be replaced by . When multiple malicious clients appear in the same iteration, we assume that they can coordinate with each other and divide the malicious update evenly. Furthermore, such attack can be implemented with multiple iterations.
4 Our Framework: PRECAD
In this section, we introduce the proposed framework called PRECAD. It simulates Centralized DP (CDP) in FL, without relying on the assumption of trusted server(s), and provides quantifiable robustness against malicious clients.
4.1 Overview
An illustration of our framework PRECAD is shown in Figure 1. It follows the general FL process (as discussed in Sec. 3.1) with secure aggregation via additive secret sharing (refer to Sec. 2.3 for preliminaries of secret sharing). In PRECAD, two servers engage in an interactive protocol to add Gaussian noise to the aggregated model updates, using a secure aggregation method. Note that other MPC primitives are possible, but we opt to use additive secret sharing in this paper. In each iteration, the model update of each client is supposed to be clipped before submission, and the servers run another MPC protocol based on the submitted shares, to verify the validity of each submission (ensure the norm is smaller than a threshold), in order to mitigate poisoning attacks. Since the noise for DP is added by the servers (rather than by the clients) after clients’ submissions, we will show that (in Sec. 5.2) it can actually enhance model robustness.
Main Steps. Our instantiation of PRECAD is shown in Algorithm 1. In each iteration , the servers and clients execute the following four steps:
Step 1. Selection of Participating Clients. The servers select a subset of clients to participate in the current iteration. Since both servers follow the protocol honestly, either server or can execute this step, where each client is randomly selected with probability . Then, servers send the current model parameter to these clients and wait for responses.
Step 2. Local Model Update and Submission. After receiving the request from the servers, each client trains the local model with the private local dataset , where the gradient per record is clipped by (recordlevel clipping). Then, the update is clipped by (clientlevel clipping) and the clipped update is split into two shares (see more details in Sec. 4.2), which are sent to server and respectively.
Step 3. Secure Submission Verification. Since a malicious client may send a large submission (with norm beyond the bound ), the servers, who hold shares and respectively, must securely verify whether the norm of each is indeed bounded by (refer to Sec. 4.3 for the detailed protocol). Note that, this verification step, which outputs either valid (the servers accept ) or invalid (the servers reject ), leaks nothing about a benign client’s private information because benign clients follow the protocol honestly and their local model update submissions will always be valid.
Step 4. Secure Aggregation with Noise. After verifying all submissions, server draws a real random vector and converts it into the fixedpoint representation , where
is the identity matrix with size
. Denote the set of indices of valid submissions as . Then, aggregates all valid shares with Gaussian noise by computing . Similarly, computes . By exchanging the above with each other, both servers obtain the sum (modulo ) and convert it into a real vector , which is utilized to update the global model parameter:(2) 
where is the learning rate of the global model.
Comparisons. Our framework provides recordlevel DP and robustness against model poisoning attacks. The detailed analysis of privacy and robustness is provided in Sec. 5. Comparing with the LDPbased solutions [38, 50] (where each client adds noise during the process of local training and sends the result to the server in clear) and DDPbased solutions [54, 59] (where each submission with noise augmentation is encrypted, and the central server only observes the aggregation of noisy submissions), the proposed framework PRECAD has several advantages (also refer to Table 1):
1) Better utility under the same privacy budget of DP. In the LDPbased solutions, the noisy submissions of all clients are aggregated by the server, thus the global model is updated with accumulated noise, where the variance is proportional to the number of participating clients. Although the DDPbased solutions can reduce the noise during local training (due to encrypted submissions), the variance of added noise is proportional to the ratio between the number of all clients and the minimum number of noncolluding clients (the accurate value is difficult to obtain in realworld applications). Furthermore, they reduce to the LDPbased solutions in the worst case when all other clients collude to refer the victim’s data. However, the noise in PRECAD only accumulates twice (by
and ) and is independent of the number of noncolluding clients.2) Effective defense against poisoning attacks. In FL, it is not easy to perform anomaly detection because the server cannot access clients’ private datasets. This task is even more challenging with DP, since the random noise added by clients decreases the distributional distance between normal and abnormal submissions (thus making it harder to distinguish them). Therefore, both LDPbased and DDPbased solutions are vulnerable to malicious submissions, where the LDPbased solutions require more noise under the same privacy guarantee, and the encryption of submissions in DDPbased solutions limits the capability of anomaly detection. In contrast, the secure validation step in PRECAD (which learns nothing about a benign client) can completely bound the norm of malicious submissions at clientlevel (those who violate this constraint will be detected), and the noise added by servers increases the uncertainty when the attackers attempt to modify the global model.
4.2 Local Model Update
The protocol of local model update is shown in Algorithm 2. After receiving the current global model parameter , client samples a subset of records from the local dataset, where each record is sampled with probability . For each sampled record, the corresponding gradient is computed and then clipped with (recordlevel clipping). Then, can compute the sum of the negative gradients (for gradient descent) and then clip the result by (clientlevel clipping). We denote the clipped result as , which is converted to its fixedpoint representation and then split into two shares in , which are sent to two servers respectively.
The recordlevel clipping guarantees that by removing or adding one record from client ’s dataset, the aggregation result at the serverside changes at most in terms of norm (i.e., bounded sensitivity), thus adding Gaussian noise on the aggregation provides recordlevel DP (shown in Sec. 5.1). Similarly, the clientlevel clipping guarantees clientlevel DP on the noisy aggregation. Though clientlevel DP is not our privacy goal, we will show that (in Sec. 5.2
) it can be exploited to provide robustness of the learning process against model poisoning attacks. While the recordlevel clipping can automatically achieve some clientlevel clipping, the exact clipping bound depends on the number of sampled records (which is a random variable). We use explicit clientlevel clipping to ensure the robustness is controllable.
For malicious clients who may deviate from the protocol execution and collude with other clients, the recordlevel privacy is not guaranteed because they can opt out recordlevel clipping. However, even under the presence of malicious clients, the aggregation in the serverside guarantees clientlevel privacy because the submissions without the clientlevel clipping (i.e., the norm of real vector exceeds the bound ) will be rejected by the servers during secure validation (refer to Sec. 4.3). Thus, malicious clients have to execute clientlevel clipping; otherwise, their submissions will be rejected and then have no influence on the aggregation result.
4.3 Secure Validation
After a client submits its shares of , the servers need to securely verify whether holds. For the ease of presentation, we use to denote in this subsection. Specifically, server and hold shares and separately and want to verify whether the value of is 1 without leaking any additional information, where denotes the indicator function and is the fixedpoint representation of the squared clientlevel clipping bound (which is public). To do so, we build upon the secure multiplication technique of Beaver [8], which computes the multiplication of two secretshared numbers (refer to Appendix A for implementation details). The original Beaver’s protocol is used to compute the shares of multiplication result of two private numbers, and we can extend it to the case of the inner product as follows.
We assume that and have access to a sufficient number of random onetimeuse shares of and with the constraint , where and correspond to Beaver’s triples in the original Beaver’s multiplication protocol [8]. Similar to Beaver’s triples, the random shares of and can be provided by a trusted third party (TTP), or are generated offline via cryptography techniques, such as additive homomorphic encryption [33] or oblivious transfer [32]. Since their security and performance have been demonstrated, we assume these random shares are ready for use at the initial step of our protocol. To securely compute the result of , the two severs implement the following steps:
Step 1. Server computes , and server computes . Then, they exchange the value of and with each other, and both of them hold the value of the vector .
Step 2. Server computes , and server computes , where the addition and division are computed in the field (or for vectors). Now, the servers hold the shares of , which essentially has the following representation (which is taken in the field )
which indicates .
Step 3. The servers can compute the shares and via the comparison gate (refer to [34] for more details). Finally, by exchanging the above two shares with each other, both servers can compute the value . Finally, the submission will be accepted if , or be rejected otherwise.
Note that in practice, the fixpoint representation might incur a very small difference between the original real value and the result computed in the field , which could make the protocol mistakenly reject a valid submission. To address this problem, we can instead use a slightly larger (such as ) in secure validation, while the clients are still required to clip their updates with the original value of .
4.4 Security Analysis
We discuss the correctness and security (soundness and zeroknowledge) properties of secure aggregation and secure validation in PRECAD as follows.
1) Correctness. If client is honest (on clipping with bound ), the servers will always accept . The correctness of the scheme follows by construction.
2) Soundness. In our scenario, soundness means a malicious client, who does not clip the norm of update with bound , will be detected in secure validation (except negligible probability due to the fixpoint representation). Specifically, server and check each client’ submission and assign the result to , which concludes that the servers either accept (when ) or reject (when ). Thus, any malicious client must either submit a wellformed submission or be treated as invalid.
3) ZeroKnowledge. For secure aggregation, the random splitting of secret sharing guarantees that each server gains no information about a client’s submission except the result of aggregation, where the information leakage from the aggregation result is bounded and quantified via DP (refer to Sec. 5.1). For secure validation, the two servers implement Beaver’s multiplication, where Beaver’s analysis [8] guarantees that the multiplication gate leaks no information to the servers. Note that the secure addition and secure comparison are executed by each server locally without communicating to each other, thus these steps leak no information as well. Therefore, the secure validation leaks no information except the validation result, which is always valid for benign clients, thus even the validation result leaks no information of benign clients. Finally, the whole protocol is unconditionally secure because each cryptographic primitive is unconditionally secure against adversaries with unbounded computation power, and they can be securely composed [27].
5 Privacy and Robustness Analysis
Our privacypreserving FL framework in Sec. 4 simulates CDP under the assumption that the two servers are honestbutcurious and noncolluding. Thus, PRECAD achieves SIMCDP [46] in FL. In the rest of this paper, we use DP to indicate SIMCDP for the ease of presentation. Recall that PRECAD utilizes twolevels (both recordlevel and clientlevel) of clipping, where recordlevel clipping is guaranteed for benign clients, and clientlevel clipping is guaranteed for all clients (including malicious clients) because the servers perform secure validation. Therefore, it provides recordlevel DP for benign clients and clientlevel DP for all clients, where the former is our privacy goal, and the latter can be shown to provide robustness against malicious clients who implement model poisoning attacks.
5.1 Recordlevel Differential Privacy
Without loss of generality, we assume the attacker wants to infer one record of a benign victim client . Recall that the attacker corrupts at most one server and any number of clients (except the victim client ), where all corrupted parties are assumed as honestbutcurious (a.k.a. semihonest). Note that PRECAD provides recordlevel privacy for all benign clients who implement the protocol honestly, while the privacy of malicious clients (who may deviate from the protocol) is not guaranteed. Since corrupting one or neither of the two servers have different privacy guarantees, our results have two cases, which are shown in the following theorem (the involved notations are summarized in Table 2).
Theorem 1 (Privacy Analysis).
Denote the set of parties corrupted by the attacker (for privacy breach purpose) as . Then, Algorithm 1 satisfies recordlevel DP for a benign client (i.e., ) with any and
(3) 
where
denotes the cumulative distribution function (CDF) of standard normal distribution, and
is defined by(4) 
Proof.
See Appendix C. ∎
In Theorem 1, the number of corrupted semihonest clients (except the victim client ) does not affect the privacy analysis because the privacy leakage of all iterations are accounted (which corresponds to the worst case where the leaked information in all iteration are catched by the attacker). The parameter quantifies the indistinguishability of each record inorout of client ’s dataset, where a smaller indicates a higher indistinguishability, thus provides stronger privacy guarantees (similar to the role of privacy budget). In (3), is a decreasing function w.r.t. the privacy budget . It reflects the tradeoff between the privacy budget and the small probability in DP. Also, a smaller incurs a smaller value of . Therefore, if we fix the value of function , the value of will be decreased with a decreasing , which yields a stronger DP guarantee.
The two cases in (4), i.e., whether the attacker corrupts one server or not, mainly differ in 1) the influence of the probability that each client is selected by the servers; and 2) the multiplier of the noise ( or ) that affect the quantification of DP guarantees. We explain the reasons below:
1) Since each client is randomly selected with probability in each iteration, and there are global iterations, the expected number of iterations that the victim participates in is (i.e., ). Then, the factor of in the first case can be approximated as (v.s. in the second case), where provides weaker privacy protection than because (note that ). The intuition is that the privacy amplification from clientlevel sampling holds when the sampling result is a random variable in the attacker’s view (thus improves the randomness and privacy). However, in the first case, the attacker corrupts one server and can identify the iterations that the victim actually participates in. Thus, the clientlevel sampling in this case only reduces the participating times, where only participated iterations leak ’s privacy. In contrast, the attacker in the second case is uncertain about the participating iterations, then enjoys more privacy benefit from the randomness caused by sampling probability . We note that although the random sampling/selection could be done via cryptography primitives, one of the servers ultimately will know the selection results.
2) Recall that both two servers add Gaussian noise with variance in aggregation. In the first case, no matter which server is corrupted by the attacker, only one of the Gaussian noise provides DP because the corrupted server can cancel out its noise from the aggregation. However, in the second case, both two Gaussian noises are valid against the attacker, thus the variance of the noise is in the attacker’s view. Note that the parameter in is finally cancelled out in (4) because the recordlevel clipping ensures the recordlevel sensitivity to be .
Definition  
Two noncolluding servers  
The th client (who is the victim of privacy breach)  
The total number of global iterations  
The total number of iterations that client participates in  
The probability of each client being selected in each iteration  
The probability of each record being sampled by client  
The maximum gradient norm for each record  
The maximum local model update norm for each client  
The multiplier of the additive Gaussian noise  
The CDF of standard normal distribution 
5.2 Robustness against Poisoning Attacks
Recall that for model poisoning attacks, we assume the servers are trusted, and the attacker corrupts a set of malicious clients with group size . The set of all benign clients is denoted as , and the set of all participating clients (including both benign and malicious ones) is denoted as . Consider the randomized learning mechanism (i.e., Algorithm 1) with the input to be the training dataset of or , then and are two distributions of the final model parameter learned from the dataset with or without the participation of malicious clients . The robustness of PRECAD against such attacker focuses on a fixed record
in the testing phase and a bounded loss function
for any in the model parameter space. We denote the expected loss over the random model parameter on distributions and as(5) 
where is the expected loss without attack, and is the expected loss under the attack with malicious clients. The following theorem states that due to the participation of malicious clients , the attacked loss would not be very far away from the unattacked loss (refer to Table 2 for the definitions of parameter notations).
Theorem 2 (Robustness against model poisoning attacks).
For the randomized mechanism in Algorithm 1, the expected loss defined in (5) on the model with poisoning attack has the following upperbound and lowerbound (w.r.t. the expected loss on unattacked model):
(6)  
(7) 
Recall that is the range of the loss function . The function is defined by
(8) 
where is the CDF of standard normal distribution, and is computed by
(9) 
Proof.
(Sketch) We first show that releasing the final model satisfies clientlevel DP for any . Then, by leveraging the property of DP, we can bound the expected loss on the poisoned model. The factor is introduced in the group privacy of DP. Refer to Appendix D for the full proof. ∎
The robustness analysis in Theorem 2 does not necessarily depend on the details of the attack implementation, such as what auxiliary information the attacker has, how the local poisoning model is trained, and the value of scaling factor to make a larger impact of the poisoning attack. The only assumption on the attacker is the value of (i.e., the number of malicious clients). We can observe that the robustness guarantee is stronger when and are small, where a small makes the range of loss small, and a small limits the controlling capability of the attacker. From (9), would be small when: is large, is small, and is small, where a larger introduces more noise, and smaller restrict the manipulation capability of the attacker. Note that such changes might also impact the value of and usually the main task becomes less accurate (since more noise is introduced or the global model learns less from benign clients’ data), which reflects the tradeoff between robustness and utility.
The function in (8) has a similar form as in (3), but differs in the value of and introduces an additional factor . Compared with (4), the constant in (9) no longer depends on the recordlevel sampling probability because we quantify the bound on each client’s contribution (where only influences the contribution of one record). Note that malicious clients can opt out recordlevel clipping, but clientlevel clipping can be securely verified by the servers without leaking any additional information of benign clients. The ratio occurs in (9) due to the different sensitivity on recordlevel and clientlevel. The factor is introduced in (8) via group privacy of DP because the neighboring datasets under the poisoning attack differ in a group of (malicious) clients with size . A larger leads to a larger , i.e., the privacy guarantee drops with the size of the group, because the distance of neighboring datasets in group privacy is (as versus distance of 1 in the original DP).
6 Evaluation
In this section, we demonstrate the enhancement of PRECAD on privacyutility tradeoff and poisoning robustness via experimental results on MNIST
[36]and CIFAR10
[35] datasets. All experiments are developed in Python. The experimental settings of FL mainly follow the previous work [62], and the cryptographic protocols are implemented under CrypTen library [34].6.1 Experimental Setup
Baselines. We use 1) nonprivate and 2) LDPbased solution in FL as the baseline approaches. In nonprivate setting, all clients neither implement clipping nor noise augmentation, and the server just aggregates the submissions and then updates the global model. In LDP setting, benign clients implement recordlevel clipping (with norm clipping bound ) and Gaussian noise augmentation (with standard deviation ) on the summation of multiple record gradients, then send the noisy update to the server in plaintext. Note that the privacy accountant (i.e., the value of privacy budget ) of LDP is the same as the privacy analysis of PRECAD in the threat model with one corrupted server (i.e., the first case in Theorem 1). We do not include CDP as the baseline because CDP assumes a trusted server, and our scheme PRECAD essentially is a simulation of CDP. And DDP is not included because DDP will reduce to LDP in the worst case (i.e., when in Table 1).
Datasets (nonIID). We use two datasets for our experiments: MNIST [36] and CIFAR10 [35], where the default value of the number of total clients is . To simulate the heterogeneous data distributions, we make noni.i.d. partitions of the datasets, which is a similar setup as [62] and is described below:
1) NonIID MNIST: The MNIST dataset contains 60,000 training images and 10,000 testing images of 10 classes. There are 100 clients, each holds 600 training images. We sort the training data by digit label and evenly divide it into 400 shards. Each client is assigned four random shards of the data, so that most of the clients have examples of three or four digits.
2) NonIID CIFAR10: The CIFAR10 dataset contains 50,000 training images and 10,000 test images of 10 classes. There are 100 clients, each holds 500 training images. We sample the training images for each client using a Dirichlet distribution with hyperparameter 0.5.
Evaluation Metrics. We consider main task accuracy and backdoor accuracy
(if applicable) as the evaluation metrics. The former is measured on the original test dataset (without backdoor images), while the latter is measured on a modified version of the test dataset, where
a lower backdoor accuracy indicates a stronger robustness against backdoor attacks. The detailed implementation of backdoor attacks and definition of backdoor accuracy are described in Appendix E.Model Architecture.
For MNIST dataset, we use the CNN model from PyTorch example
^{2}^{2}2https://github.com/pytorch/opacus. For CIFAR10 dataset, we use the CNN model from the TensorFlow tutorial
^{3}^{3}3https://www.tensorflow.org/tutorials/images/cnn, like the previous works [62, 42]. The hyperparameters for training are described in Appendix E.6.2 PrivacyUtility Tradeoff
Privacy budget curve (left) and accuracy curve (right) of PRECAD w.r.t. the epoch, where one epoch equals to 200 global iterations because we set the sampling ratios as
(for record) and (for client), thus . The value of (solid lines) and (dashed lines) represent the privacy budget (under fixed ) in the two cases where one of the two servers is corrupted by the attacker or not. In the first case, PRECAD provides less privacy guarantee (thus a larger value of ) than in the second case with privacy budget .In this subsection, we show the privacyutility tradeoff when there are no backdoor attacks. In this scenario, we disable the clientlevel clipping in the clientside (Line7 in Algorithm 2) and secure validation in the serverside (Line4 in Algorithm 1) of PRECAD for a fair comparison. Note that the recordlevel clipping, additive secret sharing scheme, and Gaussian noise augmentation are always maintained to achieve the recordlevel DP as discussed in Sec. 5.1.
Privacy and Accuracy Curves of PRECAD. Figure 2 shows the privacy cost and the accuracy (for MNIST dataset) of PRECAD with respect to the epoch/iteration. In each fixed epoch, we can observe the privacyutility tradeoff: a larger noise leads to more private guarantee (i.e., a smaller privacy budget ) but also leads to less accuracy. In Figure 2, the privacy budget against a stronger attacker (who corrupts one server and any number of clients) is always larger than against a weaker attacker (who only corrupts clients). In the following, we only focus on the worstcase privacy budget in the two cases, where correspond to the noise multiplier when the global model is trained with 25 epochs (i.e., ) under default hyperparameters . Note that the privacy curve (i.e., privacy budget v.s. epochs) for CIFAR10 is the same as MNIST because the hyperparameters are the same.
Accuracy Comparison. Figure 3 (left) shows how the total number of clients (i.e., the value of ) affects the accuracy. We can observe that increasing the value of has no impact on the accuracy of nonprivate setting and PRECAD, but decreases the accuracy of LDP setting significantly, because more noise is aggregated when more clients participate in each iteration. Note that for CIFAR10 dataset, the accuracy of nonprivate and PRECAD when is slightly lower than the cases when . It is because the CIFAR10 training examples of each client is sampled with random drawing (v.s. MNIST dataset is jointly divided), thus a small makes partial of training examples may not be included by any client.
Efficiency Comparison. Figure 3 (right) compares the computation efficiency of different protocols, quantified by the average run time per iteration. With an increased , the run time is increased for all approaches because more clients train their local models (we train local models in sequential due to the GPU memory limit). We can observe that PRECAD’s run time is approximately 7 of the LDPbased solution, due to the additional computation introduced by cryptography, but the absolute time is acceptable (since the additive secret sharing scheme is a lightweight cryptographic technique). In addition, the nonprivate setting has less run time than the LDP setting, because the former does not need clipping and noise augmentation steps.
6.3 Robustness against Backdoor Attacks
In this subsection, we assume that the attacker corrupts (malicious) clients to implement backdoor attacks (refer to Appendix E for the attack details). In PRECAD, all clients are required to implement clientlevel clipping with bound ; otherwise, the invalid submission will be identified, and won’t be aggregated in the global model update. As a comparison, clients’ submissions in LDPbased solution are also clipped with bound (but after the noise being added by clients), which can be directly verified because clients upload plaintext submissions in LDP. However, for the nonprivate setting, we do not clip the submissions because benign clients do not implement recordlevel clipping in this setting.
Influence of . Figure 4 shows how the value of (i.e., the number of backdoor clients) affects the main task accuracy and backdoor accuracy under different levels of privacy guarantees (i.e., different amount of added Gaussian noise). In general, a large will increase the accuracy on both main task and backdoor task (recall that backdoor attacker’s goal is to increase backdoor accuracy while maintaining main task accuracy), but the degree of increment might be different for different cases. 1) The nonprivate and LDP settings are vulnerable to backdoor attacks: merely 4 backdoor clients (for MNIST dataset) or 1 backdoor client (for CIFAR10 dataset) can increase the backdoor accuracy from to more than , where Semantic Backdoor Attack in CIFAR10 is stronger than Pixelpattern Backdoor Attack in MNIST. 2) PRECAD shows higher robustness against backdoor attacks than the other two, and smaller (i.e., with larger ) yields lower backdoor accuracy, which indicates that noise in PRECAD enhances robustness. In contrast, the noise in LDP reduces robustness from the results. 3) More backdoor clients results in a slight improvement on main task accuracy for nonprivate setting and PRECAD, but a large improvement for LDP setting. It is because the accuracy of the LDP setting is relatively low (since a larger amount of noise need to be added under the same privacy guarantee), and backdoor clients have additional advantages (they neither clip the gradient nor add noise) than benign clients on improving main task accuracy.
Influence of . Figure 5 shows how the value of (i.e., the clientlevel clipping norm bound) affects the performance of PRECAD. We can observe that a smaller value of results in higher robustness against backdoor attacks, because of the more strictly bounded impact of the malicious clients. 1) For the MNIST dataset, when we set , the backdoor accuracy is against malicious clients, and is less than against malicious clients, while the main task accuracy is reduced by only , which is a negligible influence on the utility. 2) For the CIFAR10 dataset under a stronger backdoor attack strategy, the backdoor accuracy is below when and under , where the main task accuracy does not reduce too much. 3) On the impact of different for both datasets, a smaller (stronger privacy) leads to lower backdoor accuracy (stronger robustness) with very little impact on main task accuracy.
In summary, the noise for privacy purpose in PRECAD reduces utility but enhances robustness against backdoor attacks, which implies the robustnessutility tradeoff (similar observations have been made for adversarial robustness in [37, 17]). However, the noise in LDP setting reduces both utility and robustness (refer to Table 1 for a summary of the comparison of different approaches).
7 Discussion
Limitations. Though PRECAD improves both privacyutility tradeoff of DP and poisoning robustness against malicious clients, it has several limitations: 1) The trust assumption of noncolluding servers is slightly strong, which might not hold for all application scenarios. 2) The utilized cryptography techniques, including secret sharing and MPC, incurs additional cost on both computation and communication. Thus, PRECAD might not be the most economic solution when the application has rigorous constraints on computation and communication.
Generality. In PRECAD, the secret sharing scheme can be substituted by other cryptography primitives, such as the pairwise masking strategy in [12], with the condition that the clientlevel clipping can be securely verified by the server(s). Also, the perturbation mechanism can be substituted by other DP mechanisms, such as Laplace Mechanism [22] and Exponential Mechanism [43] according to the application scenario, but the method of privacy accountant and robustness analysis would be different.
Guarantees under Malicious Setting. In PRECAD, we assume both servers implement the protocol honestly to guarantee the required privacy and robustness guarantees. However, if one of them, say server , maliciously deviates the protocol by omitting the noise augmentation (or adding less noise), PRECAD still provides the same privacy guarantees against as in Theorem 1 because the noise providing DP against is honestly added by . However, the privacy against corrupted clients and the robustness against poisoning attacks become weaker (than in Theorem 1 and Theorem 2) since the overall noise added in the global model is reduced.
A Faster Version with only Privacy Guarantee. If we only need to provide recordlevel privacy (i.e., without robustness requirement), then both the clientlevel clipping in the clientside and the secure validation in the serverside can be skipped. It would yield a more accurate model (since clipping introduces biased noise) and improve the computation efficiency.
8 Related Work
8.1 PrivacyPreserving Federated Learning
Existing approaches on privacypreserving federated learning are typically designed based on cryptography and/or DP.
Cryptobased. Aono et al. [4] used additively Homomorphic Encryption to preserve the privacy of gradients and enhance the security of the distributed learning system. Mohassel et al. [48] proposed SecureML which conducts privacypreserving learning via Secure MultiParty Computation (MPC) [60], where data owners need to process, encrypt and/or secretshare their data among two noncolluding servers in the initial setup phase. Bonawitz et al. [12] proposed a secure, communicationefficient, and failurerobust protocol for secure aggregation of individual model updates. However, all the above cryptography based protocols in some way prevent anyone from auditing clients’ updates to the global model, which leaves spaces for the malicious clients to attack. For example, malicious clients can introduce stealthy backdoor functionality into the global model without being detected.
DPbased. Differential Privacy (DP) was originally designed for the centralized scenario where a trusted database server, who has direct access to all clients’ data in the clear, wishes to answer queries or publish statistics in a privacypreserving manner by randomizing query results. In FL, McMahan et al. [42] introduces two algorithms DPFedSGD and DPFedAvg, which provides clientlevel privacy with a trusted server. Geyer et al. [26] uses an algorithm similar to DPFedSGD for the architecture search problem, and the privacy guarantee acts on clientlevel and trusted server too. Li et al. [38]
studies the online transfer learning and introduces a notion called task global privacy that works on recordlevel. However, the online setting assumes the client only interacts with the server once and does not extend to the federated setting. Zheng et al.
[62] introduced two privacy notions, that describe privacy guarantee against an individual malicious client and against a group of malicious clients on recordlevel privacy, based on a new privacy notion called differential privacy. However, the privacy analysis of this work does not consider the case when the server is corrupted, and the privacy budget in the worstcase adversary setting (i.e., all clients except the victim are malicious) is too large, thus does not provide meaningful privacy guarantee.Hybrid Solutions. Truex et al. [54] proposed a hybrid solution which utilizes thresholdbased partially additive homomorphic encryption to reduce the needed noise for recordlevel DP guarantee. Xu et al. [59] improved this hybrid solution with enhanced efficiency and accommodation of client dropout. However, these hybrid solutions are vulnerable to malicious clients (due to encryption) and the utility gain from encryption is sensitive to the number of noncolluding parties.
Privacy Amplification by Shuffling. Different from using cryptography to improve privacyutility tradeoff of DP, researchers introduced a shuffler model, which achieves a middle ground between CDP and LDP, in terms of both privacy and utility. Bittau et al. [10] was the first to propose the shuffling idea, where a shuffler is inserted between the users and the server to break the linkage between the report and the user identification. Recent work by Cheu et al. [15] analyzed the differential privacy properties of the shuffler model and shows that in some cases shuffled protocols provide strictly better accuracy than local protocols. Balle et al. [6] provided a tighter and more general privacy amplification bound result by leveraging a technique called blanket decomposition. We note that current shuffler models are mainly used in the application of local data aggregation.
8.2 Robust Federated Learning
FL systems are vulnerable to model poisoning attacks, which aim to thwart the learning of the global model (a.k.a. Byzantine attacks) or hide a backdoor trigger into the global model (a.k.a. backdoor attacks). These attacks poison local model updates before uploading them to the server. More details of poisoning attacks and other threats of FL can be found from the survey paper [39].
Byzantine Robustness. Most stateoftheart Byzantinerobust solutions play with mean or median statistics of gradient contributions. Blanchard et al. [11] proposed Krum which uses the Euclidean distance to determine which gradient contributions should be removed, and can theoretically withstand poisoning attacks of up to adversaries in the participant pool. Mhamdi et al. [45] proposed a metaaggregation rule called Bulyan, a twostep metaaggregation algorithm based on the Krum and trimmed median, which filters malicious updates followed by computing the trimmed median of the remaining updates.
Backdoor Robustness. Andreina et al. [3] incorporates an additional validation phase to each round of FL to detect backdoor. Sun et al. [53] showed that clipping the norm of model updates and adding Gaussian noise can mitigate backdoor attacks that are based on the model replacement paradigm. Xie et al. [58] provided the first general framework to train certifiably robust FL models against backdoors by exploiting clipping and smoothing on model parameters to control the global model smoothness. However, these works do not consider the privacy issue in FL.
8.3 On both Privacy and Robustness
Recently, some works tried to simultaneously achieve both privacy and robustness of FL. He et al. [31] proposed a Byzantineresilient and privacypreserving solution, which makes distancebased robust aggregation rules (such as Krum [11]) compatible with secure aggregation via MPC and secrete sharing. So et al. [52] developed a similar scheme based on the Krum aggregation, but rely on different cryptographic techniques, such as verifiable Shamir’s secret sharing and ReedSolomon code. Velicheti et al. [55] achieved both privacy and Byzantine robustness via incorporating secure averaging among randomly clustered clients before filtering malicious updates through robust aggregation. However, these works do not achieve DP against the server, since the aggregated model is directly revealed.
For the relationship between DP and robustness, Guerraoui et al. [30] provided a theoretical analysis on the problem of combining DP and Byzantine resilience in FL frameworks. They concluded that the classical approaches to Byzantineresilience and DP in distributed SGD (i.e., the LDP setting) are practically incompatible. Naseri et al. [49] presented a comprehensive empirical evaluation to show that Local and Centralized DP (LDP/CDP) are able to defend against backdoor attacks in FL. However, the malicious clients in the LDP setting are assumed to follow the DP protocol honestly (which usually does not hold in practice), and the CDP setting assumes a trusted server (which is too strong).
9 Conclusion
In this paper, we developed a novel framework PRECAD for FL to enhance the privacyutility tradeoff of DP and the robustness against model poisoning attacks, by leveraging secret sharing and MPC techniques. With the recordlevel clipping and securely verified clientlevel clipping, the noise added by servers provides both recordlevel DP and clientlevel DP. The former is our privacy goal, and the latter is utilized to show the robustness against poisoning model updates that are uploaded by malicious clients. Our experimental results validate the improvement of PRECAD on both privacy and robustness.
For future work, we will extend our framework to other cryptography primitives and DP mechanisms, and show certifiable robustness against model poisoning attacks.
References
 [1] (2016) Deep learning with differential privacy. In ACM SIGSAC Conference on Computer and Communications Security (CCS), Cited by: Appendix B, Appendix B, §2.1.

[2]
(2019)
QUOTIENT: twoparty secure neural network training and prediction
. In ACM SIGSAC Conference on Computer and Communications Security (CCS), Cited by: §1, §3.2.  [3] (2020) Baffle: backdoor detection via feedbackbased federated learning. arXiv preprint arXiv:2011.02167. Cited by: §8.2.
 [4] (2017) Privacypreserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security (TIFS) 13 (5). Cited by: §8.1.
 [5] (2020) How to backdoor federated learning. In AISTATS, Cited by: Appendix E, Appendix E, §1, §3.2.
 [6] (2019) The privacy blanket of the shuffle model. In CRYPTO, Cited by: §8.1.
 [7] (2019) A little is enough: circumventing defenses for distributed learning. NeurIPS. Cited by: §1.
 [8] (1991) Efficient multiparty protocols using circuit randomization. In CRYPTO, Cited by: Appendix A, §4.3, §4.3, §4.4.
 [9] (2018) Protection against reconstruction and its applications in private federated learning. arXiv preprint arXiv:1812.00984. Cited by: §1.
 [10] (2017) Prochlo: strong privacy for analytics in the crowd. In Symposium on Operating Systems Principles (SOSP), Cited by: §8.1.
 [11] (2017) Machine learning with adversaries: byzantine tolerant gradient descent. In NeurIPS, Cited by: §8.2, §8.3.
 [12] (2017) Practical secure aggregation for privacypreserving machine learning. In ACM SIGSAC Conference on Computer and Communications Security (CCS), Cited by: §7, §8.1.

[13]
(2020)
Deep learning with gaussian differential privacy.
Harvard Data Science Review
2020 (23). Cited by: Appendix B, Lemma 5.  [14] (2021) Data poisoning attacks to local differential privacy protocols. In USENIX Security Symposium, Cited by: §1.
 [15] (2019) Distributed differential privacy via shuffling. In EUROCRYPT, Cited by: §8.1.
 [16] (2021) Manipulation attacks in local differential privacy. In IEEE Symposium on Security and Privacy (S&P), Cited by: §1.
 [17] (2019) Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning (ICML), Cited by: §6.3.
 [18] (2017) Prio: private, robust, and scalable computation of aggregate statistics. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), Cited by: §1, §2.3, §3.2.
 [19] (2005) Share conversion, pseudorandom secretsharing and applications to secure computation. In Theory of Cryptography Conference (TCC), Cited by: §2.3.
 [20] (2019) Gaussian differential privacy. To appear in Journal of the Royal Statistical Society: Series B (Statistical Methodology). Cited by: Appendix B, Appendix B, §2.1, Lemma 1, Lemma 2, Lemma 3, Lemma 4.
 [21] (2013) Local privacy and statistical minimax rates. In IEEE Annual Symposium on Foundations of Computer Science (FOCS), Cited by: footnote 1.
 [22] (2006) Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (TCC), Cited by: §1, §7, Definition 1.
 [23] (2014) The algorithmic foundations of differential privacy.. Now Publishers. Cited by: Appendix B, §1, §2.1, Definition 1.
 [24] (2010) Boosting and differential privacy. In IEEE Annual Symposium on Foundations of Computer Science (FOCS), Cited by: Appendix B.
 [25] (2020) Local model poisoning attacks to byzantinerobust federated learning. In USENIX Security Symposium, Cited by: §1.
 [26] (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: Table 1, §1, §1, §8.1.
 [27] (2009) Foundations of cryptography: volume 2, basic applications. Cambridge university press. Cited by: §4.4.
 [28] (2017) Badnets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733. Cited by: Appendix E.
 [29] (2020) PCKV: locally differentially private correlated keyvalue data collection with optimized utility. In USENIX Security Symposium, Cited by: footnote 1.
 [30] (2021) Differential privacy and byzantine resilience in sgd: do they add up?. arXiv preprint arXiv:2102.08166. Cited by: §1, §8.3.
 [31] (2020) Secure byzantinerobust machine learning. arXiv preprint arXiv:2006.04747. Cited by: §1, §3.2, §8.3.
 [32] (2016) MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In ACM SIGSAC Conference on Computer and Communications Security (CCS), Cited by: §4.3.
 [33] (2018) Overdrive: making spdz great again. In EUROCRYPT, Cited by: §4.3.
 [34] (2021) CrypTen: secure multiparty computation meets machine learning. arXiv preprint arXiv:2109.00984. Cited by: §4.3, §6.
 [35] (2009) Learning multiple layers of features from tiny images. Cited by: §6.1, §6.
 [36] (1998) The mnist database of handwritten digits. Cited by: §6.1, §6.
 [37] (2019) Certified robustness to adversarial examples with differential privacy. In IEEE Symposium on Security and Privacy (S&P), Cited by: §6.3.
 [38] (2020) Differentially private metalearning. In International Conference on Learning Representations (ICLR), Cited by: Table 1, §1, §1, §4.1, §8.1.
 [39] (2020) Privacy and robustness in federated learning: attacks and defenses. arXiv preprint arXiv:2012.06337. Cited by: §8.2, footnote 1.
 [40] (2020) Lightweight cryptoassisted distributed differential privacy for privacypreserving distributed learning. In IEEE International Joint Conference on Neural Networks (IJCNN), Cited by: Table 1, §1.
 [41] (2017) Communicationefficient learning of deep networks from decentralized data. In AISTATS, Cited by: §1.

[42]
(2018)
Learning differentially private recurrent language models
. In International Conference on Learning Representations (ICLR), Cited by: Table 1, §1, §1, §6.1, §8.1.  [43] (2007) Mechanism design via differential privacy. In IEEE Symposium on Foundations of Computer Science (FOCS), Cited by: §7.
 [44] (2019) Exploiting unintended feature leakage in collaborative learning. In IEEE Symposium on Security and Privacy (S&P), Cited by: §1.
 [45] (2018) The hidden vulnerability of distributed learning in byzantium. In International Conference on Machine Learning (ICML), Cited by: §8.2.
 [46] (2009) Computational differential privacy. In CRYPTO, Cited by: §1, §5.
 [47] (2017) Rényi differential privacy. In IEEE Computer Security Foundations Symposium (CSF), Cited by: Appendix B.
 [48] (2017) SecureML: a system for scalable privacypreserving machine learning. In IEEE Symposium on Security and Privacy (S&P), Cited by: §1, §3.2, §8.1.
 [49] (2020) Toward robustness and privacy in federated learning: experimenting with local and central differential privacy. arXiv preprint arXiv:2009.03561. Cited by: §1, §3.2, §8.3, footnote 1.
 [50] (2018) Differentiallyprivate draw and discard machine learning. arXiv preprint arXiv:1807.04369. Cited by: Table 1, §1, §4.1.
 [51] (2020) Crypte: cryptoassisted differential privacy on untrusted servers. In ACM SIGMOD International Conference on Management of Data (SIGMOD), Cited by: §1, §3.2.
 [52] (2020) Byzantineresilient secure federated learning. IEEE Journal on Selected Areas in Communications. Cited by: §8.3.
 [53] (2019) Can you really backdoor federated learning?. arXiv preprint arXiv:1911.07963. Cited by: §1, §3.2, §8.2.

[54]
(2019)
A hybrid approach to privacypreserving federated learning.
In
ACM Workshop on Artificial Intelligence and Security
, Cited by: Table 1, §1, §1, §4.1, §8.1, footnote 1.  [55] (2021) Secure byzantinerobust distributed learning via clustering. arXiv preprint arXiv:2110.02940. Cited by: §8.3.
 [56] (2020) Attack of the tails: yes, you really can backdoor federated learning. In NeurIPS, Cited by: §1.

[57]
(2017)
Locally differentially private protocols for frequency estimation
. In USENIX Security Symposium, Cited by: footnote 1.  [58] (2021) CRFL: certifiably robust federated learning against backdoor attacks. In International Conference on Machine Learning (ICML), Cited by: §8.2.
 [59] (2019) Hybridalpha: an efficient approach for privacypreserving federated learning. In ACM Workshop on Artificial Intelligence and Security, Cited by: Table 1, §1, §1, §4.1, §8.1.
 [60] (1982) Protocols for secure computations. In IEEE Symposium on Foundations of Computer Science (FOCS), Cited by: §8.1.
 [61] (2021) See through gradients: image batch recovery via gradinversion. In CVPR, Cited by: §1.
 [62] (2021) Federated fdifferential privacy. In AISTATS, Cited by: §1, §6.1, §6.1, §6, §8.1.
 [63] (2019) Deep leakage from gradients. In NeurIPS, Cited by: §1.
Appendix A Beaver’s Multiplication Protocol
In the context of additive secret sharing discussed in Sec. 2.3, we assume the th server holds shares and and wants to compute a share of . All arithmetic in this section is in a finite field . Beaver [8] showed that the servers can use precomputed multiplication triples to evaluate multiplication gates. A multiplication triple is a onetimeuse triple of values , chosen at random subject to the constraint that . When used in the context of multiparty computation, each server holds a share of the triple. To jointly evaluate shares of the output of a multiplication gate , each server compute the following values:
Each server then broadcasts and . Using the broadcasted shares, every server can reconstruct and , which allows each of them to compute
Recall that is the number of servers (which is a public constant) and the division symbol here indicates division (i.e., inversion then multiplication) in the field . A few lines of arithmetic confirm that is a sharing of the product :
Appendix B Gaussian Differential Privacy (GDP)
Privacy Accountant. Since deep learning needs to iterate over the training data and apply gradient computation multiple times during the training process, each access to the training data incurs some privacy leakage from the overall privacy budget . The total privacy leakage (or loss) of repeated applications of additive noise mechanisms follow from the composition theorems and their refinements [23]. The task of keeping track of the accumulated privacy loss in the course of execution of a composite mechanism, and enforcing the applicable privacy policy, can be performed by the privacy accountant. Abadi et al. [1] proposed moments accountant to provide a tighter bound on the privacy loss compared to the generic advanced composition theorem [24]. Another new and more stateoftheart privacy accountant method is Gaussian Differential Privacy (GDP) [20, 13]
, which was shown to obtain a tighter result than moments accountant.
Gaussian Differential Privacy.
GDP is a new privacy notion which faithfully retains hypothesis testing interpretation of differential privacy. By leveraging the central limit theorem of Gaussian distribution, GDP has been shown to possess an
analytically tractable privacy accountant (vs. moments accountant must be done by numerical computation). Furthermore, GDP can be converted to a collection of DP guarantees (refer to Lemma 4). Note that even in terms of DP, the GDP approach gives a tighter privacy accountant than moments accountant. GDP utilizes a single parameter (called privacy parameter) to quantify the privacy of a randomized mechanism. Similar to the privacy budget defined in DP, a larger in GDP indicates less privacy guarantee. Comparing with DP, the new notion GDP can losslessly reason about common primitives associated with differential privacy, including composition, privacy amplification by subsampling, and group privacy. In the following, we briefly introduce some important properties (that will be used in the analysis of our approach) of GDP as below. The formal definition and more detailed results can be found in the original paper [20].Lemma 1 (Gaussian Mechanism for GDP [20]).
Consider the problem of privately releasing a univariate statistic of a dataset . Define the sensitivity of as , where the supremum is over all neighboring datasets. Then, the Gaussian mechanism , where , satisfies GDP.
Lemma 2 (Composition Theorem of GDP [20]).
The fold composition of GDP mechanisms is GDP.
Lemma 3 (Group Privacy of GDP [20]).
If a mechanism is GDP, then it is GDP for a group with size .
Lemma 4 (GDP to Dp [20]).
A mechanism is GDP if and only if it is DP for all , where
and denotes the CDF of standard normal (Gaussian) distribution.
Lemma 5 (Privacy Central Limit Theorem of GDP [13]).
Denote as the subsampling probability, as the total number of iterations and as the noise scale (i.e., the ratio between the standard deviation of Gaussian noise and the gradient norm bound). Then, algorithm DPSDG asymptotically satisfies GDP with privacy parameter .
In this paper, we use GDP as our primary privacy accountant method due to its good property on composition and accountant of privacy amplification in Lemma 5, and then convert the result to DP via Lemma 4. We note that other privacy accountant methods, such as moments accountant [1] and Rényi DP (RDP) [47], are also applicable to the proposed scheme and theoretical analysis, but might lead to suboptimal results.
Appendix C Proof of Privacy (Theorem 1)
We utilize GDP (introduced in Appendix B) as our privacy accountant tool. The two cases in Theorem 1 are discussed as follows.
Case 1: one server and multiple clients are corrupted. We first consider the case when server is corrupted by the attacker (the case of corrupting server is similar). Recall that server receives the share from client and the noisy share aggregation from server . To infer one record of the victim client , the attacker can obtain the maximum information from the computation of by adding with and converting the result into a real vector. In the most strong case, where all clients (except the victim ) are corrupted, the best that the attacker can do is to compute . On the other hand, since server has the information of which iterations that client participates in, the attacker only needs to consider these iterations to infer one record of client . By leveraging Lemma 5, we obtain the following Lemma.
Lemma 6 (Privacy against one server and any number of clients).
Assume the attacker corrupts one server (either or , but not the both) and any number of clients (except the victim client ). Then, for the benign client who subsamples one record with probability