1 Introduction
Recent and upcoming data privacy regulations europe2018reg
pose significant challenges for machine learning (ML) applications that collect sensitive user data at servers controlled by the owners of these applications. Federated learning (FL)
mcmahan2017communication is a promising way to address these challenges by enabling clients to jointly train ML models via a coordinating server without sharing their data.Although offering better data privacy, FL typically suffers from the disparity of model performance caused by the nonindependent and identically distributed (noniid) data distribution across clients zhu2021federated
. One of the stateoftheart approaches to address this problem in FL is using a single joint neural network, called HyperNetFL, to generate local models using personalized descriptors optimized for each client independently
shamsian2021personalized . This allows us to perform smart gradient and parameter sharing.Despite the superior performance, the unique training approach of HyperNetFL poses previously unknown risks to backdoor attacks typically carried out through poisoning in FL kairouz2019advances . In backdoor attacks, an adversary manipulates the training process to cause model misclassification on a subset of chosen data samples bagdasaryan2020backdoor ; bhagoji2019analyzing . In FL, the adversary tries to construct malicious gradients or model updates that encode the backdoor. When aggregated with other clients’ updates, the aggregated model exhibits the backdoor.
We investigate backdoor attacks against HyperNetFL and formulate robust HyperNetFL training as defenses in this work. Our developed attack (called HNTroj) is based on consistently and effectively crafting malicious local gradients across compromised clients using a single backdoorinfected model to enforce HyperNetFL generating local backdoorinfected models disregarding their personalized descriptors. An extensive analysis and evaluation using benchmark datasets in noniid settings show that HNTroj notably outperforms existing model replacement and data poisoning attacks bypassing recently developed robust federated training algorithms adapted to HyperNetFL with small numbers of compromised clients.
2 Background
Federated Learning (FL)
FL is a multiround communication protocol between a coordination server and a set of clients to jointly train a ML model. Each client has a set of training samples where is the number of training samples, is the input features associated with groundtruth label
that is onehot encoded with
categorical model outcomes. Clients try to minimize the average of their loss functions
, where is the loss function of the client and is a set of model weights. For instance, given a model , e.g., a neural network, is commonly an empirical risk minimization given a cross entropy error function to penalize the mismatch between the predicted values and the groundtruth label : .HyperNetworkbased Personalized FL (HyperNetFL)
To address the disparity of model utility across clients, HyperNetFL ha2016hypernetworks ; shamsian2021personalized uses a neural network located at the server to output the weights for each client using a (trainable) descriptor as input and model weights , that is, . HyperNetFL offers a natural way to share information across clients through the weights while maintaining the personalization of each client via the descriptor . In other words, HyperNetFL learns a family of personalized models . To achieve this goal, the clients and the server will try to minimize their loss functions: .
The training protocol of a HyperNetFL (Alg. 1) is generally similar to the protocol of typical FL (Appx. A). However, at round , there are three differences in training HyperNetFL compared with FedAvg kairouz2019advances : (1) There is no global model generated by the aggregation function in Eq. 3; (2) Each client receives the personalized weights from the HyperNetFL. The client computes the local gradient and then sends it to the server; (3) The server uses all the local gradients received from all the clients to update and the descriptors using general update rules in Lines 9 and 10 (Alg. 1); and (4) The size of the HyperNetFL weights is significanlty larger than ( size of ) causing extra computational cost at the server. This protocol is more general than using only one client at a communication round as in shamsian2021personalized . By using a small batch of clients per communication round, we observe that we can enhance the performance of the HyperNetFL and make it more reliable.
Backdoor and Poisoning Attacks
Training time poisoning attacks against ML and FL models can be classified into byzantine and backdoor attacks. In byzantine attacks, the adversarial goal is to degrade or severely damage the model test accuracy
Biggio:2012:PAA:3042573.3042761 ; Nelson:2008:EML:1387709.1387716 ; Steinhardt:2017:CDD:3294996.3295110 ; MunozGonzalez:2017:TPD:3128572.3140451 . Byzantine attacks are relatively detectable by tracking the model accuracy on validation data ozdayi2020defending . Meanwhile, in backdoor attacks, the adversarial goal is to cause model misclassification on a set of chosen inputs without affecting model accuracy on legitimate data samples. A wellknown way to carry out backdoor attacks is using Trojans gu2017badnets ; liu2017trojaning . A Trojan is a carefully crafted pattern, e.g., a brand logo, blank pixels, added into legitimate data samples causing the desired misclassification. A recently developed image warpingbased Trojan mildly deforms an image by applying a geometric transformation nguyen2021wanet to make it unnoticeable to humans and bypass all wellknown Trojan detection methods, such as Neural Cleanse wang2019neural , FinePruning liu2018fine , and STRIP gao2019strip . The adversary applies the Trojan on legitimate data samples to activate the backdoor at the inference time.The training data is scattered across clients in FL, and the server only observes local gradients. Therefore, backdoor attacks are typically carried by a small set of compromised clients fully controlled by an adversary to construct malicious local gradients and send them to the server. The adversary can apply data poisoning (DPois) and model replacement approaches to create malicious local gradients. In DPois suciu2018does ; li2016data , compromised clients train their local models on Trojaned datasets to construct malicious local gradients, such that the aggregated model at the server exhibits the backdoor. DPois may take many training rounds to implant the backdoor into the aggregated model. Meanwhile, in model replacement bagdasaryan2020backdoor , the adversary constructs malicious local gradients, such that the aggregated model at the server will closely approximate or be replaced by a predefined Trojaned model. To some extent, model replacement is highly severe since it can be effective after only one training round fang2020local .
To our knowledge, these attacks are not primarily designed for HyperNetFL, in which there is no aggregated model (Eq. 3) at the server. That poses an unknown risk of backdoors through poisoning attacks in HyperNetFL.
3 Data Poisoning in HyperNetFL
We first consider both whitebox and blackbox model replacement threat models. Although unrealistic, the whitebox setting (Appx. B) allows us to identify the upper bound risk. Meanwhile, the blackbox setting will inform a realistic risk in practice. Interested readers can refer to Appx. B for details regarding the whitebox setting and the adaptation of model replacement attacks into HyperNetFL, called HNRepl. Let us present our blackbox threat model and adapted attacks as follows.
Blackbox Threat Model
At round , an adversary fully controls a small set of compromised clients . The adversary cannot modify the training protocol of the HyperNetFL at the server and at the legitimate clients. The adversary’s goal is to implant backdoors in all local models by minimizing a backdoor poisoning objective:
(1) 
where is the (backdoor) loss function of the client given Trojaned examples with the trigger nguyen2021wanet , e.g., where is the targeted label for the sample . One can vary the portion of Trojaned samples to optimize the attack performance. This blackbox threat model is applied throughout this paper.
We found that HNRepl is infeasible in the blackbox setting, since the weights and the descriptors are hidden from all the clients. Also, there is lack of effective approach to infer (large) and given a small number of compromised clients (Appx. B).
Data Poisoning (DPois) in HyperNetFL
To address the issues of HNRepl, we look into another fundamental approach, that is applying blackbox DPois. The pseudocode of the attack is in Alg. 4 (Appx. C). At round , the compromised clients receive the personalized model weights from the server. Then, they compute malicious local gradients using their Trojan datasets, i.e., their legitimate data combined with Trojaned data samples, to minimize their local backdoor loss functions: , after a certain number of local steps of SGD. All the malicious local gradients are sent to the server. If the HyperNetFL updates the model weights and the descriptors using , the local model weights generated by the HyperNetFL will be Trojan infected. This is because the update rules of the HyperNetFL become the gradient of an approximation to the Trojaned surrogate loss
(2) 
where is the optimal local Trojaned model weights, are legitimate clients and their associated legitimate loss functions .
Disadvantages of DPois
Obviously, the larger the number of compromised clients is, i.e., a larger and a smaller , the more effective the attack will be. Although more practical than the HNRepl in poisoning HyperNetFL, there are two issues in the blackbox DPois: (1) The attack causes notable degradation in model utility on the legitimate data samples; and (2) The attack requires a more significant number of compromised clients to be successful. These disadvantages reduce the stealthiness and effectiveness of the attack, respectively.
The root cause issue of the DPois is the lack of consistency in deriving the malicious local gradient across communication rounds and among compromised clients to outweigh the local gradients from legitimate clients. First, is derived after (a small number) local steps of applying SGD to minimize the local backdoor loss function , in which the local model weights and the loss functions (i.e, and ) are varying among compromised clients due to the descriptors in addition to the their dissimilar local datasets. As a result, the (supposed to be) Trojaned model weights are unalike among compromised clients. Second, a small number of local training steps (i.e., given a limited computational power on the compromised clients) is not sufficient to approximate a good Trojaned model . The adversary can increase the local training steps if more computational power is available. However, there is still no guarantee that will be alike without the control over the dissimilar descriptors . Third, the local model weights change after every communication round and are heavily affected by the local gradients from legitimate clients. Consequently, the malicious local gradients derived across all the compromised clients do not synergistically optimize the approximation to the Trojaned surrogate loss function (Eq. 2) such that the outputs of the HyperNetFL are Trojan infected.
Thus, developing a practical, stealthy, and effective backdoor attack in HyperNetFL is nontrivial and an open problem.
4 Model Transferring Attack
To overcome the lack of consistency in deriving the malicious local gradients in DPois and avoid sudden shifts in model utility on legitimate data samples in HNRepl, we propose in this work a novel model transferring attack (HNTroj) against HyperNetFL.
In HNTroj (Alg. 2), our idea is to replace with a Trojaned model across all the compromised clients and in all communication rounds to compute the malicious local gradients: , where
is dynamic learning rate randomly sampled following a specific distribution, e.g., uniform distribution
, , and . In practice, the adversary can collect its own data that shares a similar distribution with legitimate clients to locally train the Trojaned model .By doing so, we achieve several key advantages, as follows:
(1) The gradients become more effective in creating backdoors, since is a better optimized Trojaned model than .
(2) The gradients across compromised clients synergistically approximate the Trojaned surrogate loss (Eq. 2) to closely align the outputs of to a unified Trojan model through the term disregarding the varying descriptors and their dissimilar local datasets. The new Trojaned surrogate loss is: .
(3) The gradients become stealthier since updating the HyperNetFL with will significantly improve the model utility on legitimate data samples. This is because has a better model utility on legitimate data samples than the local models of legitimate clients . More importantly, by keeping the random and dynamic learning rate only known to the compromised client , we can prevent the server from tracking our Trojaned model or identifying some suspicious behavior patterns from the compromised client.
(4) Theorem 2 (Appx. E) shows that the norm distance between the local model of a compromised client generated by the HyperNetFL and the Trojaned model , i.e., , is bounded by , where is the closest round the compromised client participated in before , and is a small error rate. When the HyperNetFL model converges, e.g., , and become tiny ensuring that the output of the HyperNetFL given the compromised client converges into a bounded and low loss area surrounding the Trojaned model , i.e., is tiny, to imitate the model convergence behavior of legitimate clients.
Consequently, HNTroj requires a smaller number of compromised clients to be highly effective compared with DPois. Also, HNTroj is stealthier than the (whitebox) HNRepl and DPois by avoiding degradation and shifts in model utility on legitimate data samples during the whole poisoning process.
5 Robust HyperNetFL Training
In this section, we first investigate the stateoftheart defenses against backdoor poisoning in FL and point out the differences between FL and HyperNetFL. We then present our robust training approaches adapted from existing defenses for HyperNetFL against HNTroj.
Existing defense approaches against backdoor poisoning in ML can be categorised into two lines: 1) Trojan detection in the inference phase and 2) robust aggregation to mitigate the impacts of malicious local gradients in aggregation functions. In this paper, we applied the stateoftheart warpingbased Trojans bypassing all the wellknown Trojan detection methods, i.e., Neural Cleanse wang2019neural , FinePruning wang2019neural , and STRIP gao2019strip , in the inference phase. HNTroj does not affect the warpingbased Trojans (Figs. 72 and 82, Appx. G); thus bypassing these detection methods. Based upon that, we focus on identifying which robust aggregation approaches can be adapted to HyperNetFL and how.
Robust Aggregation
Several works have proposed robust aggregation approaches to deter byzantine attacks in typical FL, such as coordinatewise median, geometric median, trimmed mean, or a variant and combination of such techniques yin2018byzantine . Recently proposed approaches include weightclipping and noise addition with certified bounds, ensemble models, differential privacy (DP) optimizers, and adaptive and robust learning rates (RLR) across clients and at the server hong2020effectiveness ; ozdayi2020defending .
Despite differences, existing robust aggregation focuses on analysing and manipulating the local gradients , which share the global aggregated model as the same root, i.e., . The fundamental assumption in these approaches is that the local gradients from compromised clients and from legitimate clients are different in terms of magnitude and direction.
Robust FL Training v.s. HyperNetFL
Departing from typical FL, the local gradients in HyperNetFL have different and personalized roots , i.e., . Therefore, the local gradients in HyperNetFL may diverge in magnitude and direction in their own suboptimal spaces, making it challenging to adapt existing robust aggregation methods into HyperNetFL. More importantly, manipulating the local gradients alone can significantly affect the original update rules of the HyperNetFL, which are derived based on the combination between the local gradients and the derivatives of and given the output of , i.e., and , respectively. For instance, adapting the recently developed RLR ozdayi2020defending on the local gradients can degrade the model utility on legitimate data samples to a random guess level on several benchmark datasets (Appx. D). In addition, the significantly large size of introduces an expensive computational cost in adapting (statisticsbased) robust aggregation approaches into HyperNetFL against HNTroj.
Robust HyperNetFL Training
Based on our observation, to avoid damaging the update rule of HyperNetFL, a suitable way to develop robust HyperNetFL training algorithms is to adapt existing robust aggregation on the set of ’s gradients given . It is worth noting that we may not need to modify the update rule of the descriptors since is a personalized update that does not affect the updates of any other descriptors and the model weights .
Clientlevel DP Optimizer
Among robust training against backdoor poisoning attacks, differential privacy (DP) optimizers, weightclipping and noise addition can be adapted to defend against HNTroj. Since they share the same spirit, that is, clipping gradients from all the clients before adding noise into their aggregation, we only consider DP optimizers in this paper without loss of generality. In specific, we consider a DP optimizer, which can be understood as the weight updates are not excessively influenced by any of the local gradients where . By clipping every ’s gradients under a predefined norm , we can bound the influence of a single client’s gradient to the model weights . To make the gradients indistinguishable, we add Gaussian noise into the ’s gradient aggregation: , where is a predefined noise scale. The pseudocode of our approach is in Alg. 5, Appx. F. We utilize this clientlevel DP optimizer as an effective defense against HNTroj. As in hong2020effectiveness , we focus on how parameter configurations of the clientlevel DP optimizer defend against HNTroj with minimal utility loss, regardless of the privacy provided.
Trimmed Norm
In addition, among robust aggregation approaches against byzantine attacks, medianbased approaches, trimmed mean, and variants of these techniques pillutla2019robust ; guerraoui2018hidden can be adapted to HyperNetFL against HNTroj by applying them on the gradients of . Without loss of generality, we adapt the wellapplied trimmed mean approach yin2018byzantine into HyperNetFL to eliminate potentially malicious ’s gradients in this paper. The adapted algorithm needs to be less computational resource hungry in order for it to efficiently work with the large size of . Therefore, instead of looking into each element of the ’s gradient as in trimmed mean, we trim the top % and the bottom % of the gradients that respectively have the highest and lowest magnitudes quantified by an norm, i.e., . The remaining gradients after the trimming, denoted , are used to update the HyperNetFL model weights , i.e., . The descriptors are updated normally. The pseudocode of the trimmed norm for HyperNetFL is in Alg. 6, Appx. F.
Regarding other approaches, including robustness bounds against backdoor attacks jia2020intrinsic ; xie2021crfl , our detailed analysis on why these approaches cannot be trivially adapted to HyperNetFL is in Appx. F.
6 Experimental Results
We focus on answering the following three questions in our evaluation: (1) Whether HNTroj is more effective than DPois in HyperNetwork for FL? (2) What is the percentage of compromised clients required for an effective attack? and (3) Is it possible to defend against HNTroj, and what is the cost and limitations of such defenses?
, and Clean model over different numbers of compromised clients in the CIFAR10 dataset. A complete version is in Appx.
G. (Fig. 4a has the same legend as in Fig. 4b).Data and Model Configuration
We conduct an extensive experiment on CIFAR10 krizhevsky2009learning
and Fashion MNIST datasets
xiao2017fashion . To generate noniid data distribution across clients in terms of classes and size of local training data, we randomly sample two classes for each client and a sampling rate follows distribution. We use clients and the class as a targeted class (Eq. 4) in each dataset. We divide each dataset into three nonoverlapping sets: samples for testing, samples for training , and the rest for training. The distributions of classes and size of local training data are demonstrated in Figs. 19 and 26 (Appx. G). For generating backdoor data, we use image warpingbased WaNet nguyen2021wanet , which is one of the stateoftheart backdoor attacks. We adopt the model configurations described in shamsian2021personalized for training the HyperNetFL and in nguyen2021wanet for generating backdoor images in training . In the DP optimizer, we vary the noise scale and the clipping norm . For the trimmed norm approach, we choose . We use with and . The complete details are in Appx. G.Evaluation Approach
We carry out the validation through three approaches. We first compare HNTroj with DPois and HNRepl in terms of legitimate accuracy (ACC) on legitimate data samples and backdoor successful rate (SR) on Trojaned data samples with a wide range number of compromised clients. The second approach is to investigate the effectiveness of adapted robust HyperNetFL training algorithms, including the clientlevel DP optimizer and the trimmed norm, under a variety of hyperparameter settings against HNTroj. Based upon that, the third approach provides a performance summary of both attacks and defenses to inform the surface of backdoor risks in HyperNetFL. The (average) legitimate ACC and backdoor SR across clients on testing data are as follows:
where is a Trojaned sample, if ; otherwise and is the number of testing samples in client .
HNTroj v.s. DPois and Whitebox HNRepl
Figs. 4 and 30 (Appx. G) present legitimate ACC and backdoor SR of each attack and the clean model (i.e., trained without poisoning) as a function of the communication round and the number of compromised clients under a defense free environment in the CIFAR10 dataset. It is obvious that HNTroj significantly outperforms DPois. HNTroj requires a notably small number of compromised clients to successfully backdoor the HyperNetFL with high backdoor SR, i.e., , , , , and compared with , , , , and of the DPois given , , , , and compromised clients respectively, without an undue cost in legitimate ACC, i.e., .
In addition, HNTroj does not introduce degradation or sudden shifts in legitimate ACC during the training process, regardless of the number of compromised clients, making it stealthier than DPois and (whitebox) HNRepl. This is because we consistently poison the HyperNetFL training with a relatively good Trojaned model , which achieves legitimate ACC and backdoor SR, addressing the inconsistency in deriving the malicious local gradients. There is a small legitimate ACC gap between HNTroj and the clean model, i.e., in average. However, this gap will not be noticed by the server since the clean model is invisible to the server when the compromised clients are present.
HNTroj v.s. Trimmed Norm
Since HNTroj outperforms other poisoning attacks, we now focus on understanding its performance under robust HyperNetFL training. Fig. 5 shows the performance of trimmed norm against HNTroj as a function of the number of compromised clients . There are three key observations from the results, as follows: (1) Applying trimmed norm does reduce the backdoor SR, especially when the number of compromised clients is small , i.e., backdoor SR drops in average given . However, when the number of compromised clients is a little bit larger, the backdoor SR is still at highly severe levels, i.e., given to compromised clients, regardless of a wide range of trimming level ; (2) The larger the is, the lower the backdoor SR tends to be. This good result comes with a toll on the legitimate ACC, which is notably reduced when is larger. In average, the legitimate ACC drops from to and given and , respectively. That clearly highlights a nontrivial tradeoff between legitimate ACC and backdoor SR given attacks and defenses; and (3) The more compromised clients we have, the better the legitimate ACC is when the trimming level is large, i.e., . That is because training with the Trojaned model
, which has a relatively good legitimate ACC, can mitigate the damage of large trimming levels on the legitimate ACC. In fact, a large number of compromised clients implies a better probability for the compromised clients to sneak through the trimming; thus, improving both legitimate ACC and backdoor SR.
HNTroj v.s. Clientlevel DP Optimizer
We observe a similar phenomenon when we apply the clientlevel DP optimizer as a defense against HNTroj (Fig. 41, Appx. G). First, by using small noise scales , the clientlevel DP optimizer is effective in defending against HNTroj when the number of compromised clients is small () achieving low backdoor SR, i.e., in average, while maintaining an acceptable legitimate ACC, i.e., in average. When the number of compromised clients is a little bit larger, the defense pays notably large tolls on the legitimate ACC (i.e., the legitimate ACC drops from to given ) or fails to reduce the backdoor SR (i.e., backdoor SR given ). That is consistent with our analysis. A small sufficient number of compromised clients synergistically and consistently can pull the outputs of the HyperNetFL to the Trojaned model ’s surrounded area.
Backdoor Risk Surface: Attacks and Defenses
The tradeoff between legitimate ACC and backdoor SR is nontrivially observable given many attack and defense configurations. To inform a better surface of backdoor risks, we look into a fundamental question: “What can the adversary or the defender achieve given a specific number of compromised clients?"
We answer this question by summarizing the best defending performance and the most stealthy and severe backdoor risk across hyperparameter settings in the same diagram. Given a number of compromised clients and a robust training algorithm , the best defending performance, which maximizes both the 1) legitimate ACC (i.e., ACC in short) and 2) the gap between the legitimate ACC and backdoor SR (i.e., SR in short), is identified across hyperparameters’ space of , i.e., , as follows:
where and are the legitimate ACC and backdoor SR using the specific hyperparameter configuration . Similarly, we identify the most stealthy and severe backdoor risk maximizing both the legitimate ACC and backdoor SR:
Fig. 8 summaries the best performance of both defenses and attacks in the CIFAR10 dataset as a function of the number of compromised clients through out the hyperparameter space. For instance, given the number of compromised clients , using the clientlevel DP optimizer, the best defense can reduce the backdoor SR to with a cost of drop in the legitimate ACC. Meanwhile, in a weak defense using the clientlevel DP optimizer, the adversary can increase the backdoor SR up to without sacrificing much the legitimate ACC. From Figs. 8ab, we can observe that trimmed norm is a little bit more effective than the clientlevel DP optimizer by having a wider gap between legitimate ACC and backdoor SR. From the adversary angle, to ensure the success of the HNTroj regardless of the defenses, the adversary needs to have at least compromised clients.
Results on the Fashion MNIST dataset
The results on the Fashion MNIST dataset further strengthen our observation. DPois even failed to implant backdoors into HyperNetFL (Figs. 34 and 55, Appx. G). This is because the HyperNetFL model converges faster than the model for the CIFAR10 dataset, i.e., given the simplicity of the Fashion MNIST dataset; thus, significantly reducing the poisoning probability through participating in the training of a small set of compromised clients. Thanks to the consistency in deriving malicious local gradients, HNTroj is still highly effective under the defense free environment. However, we found that the clientlevel DP optimizer can significantly mitigate HNTroj due to the model’s fast convergence; meanwhile, the trimmed norm is still failed to defend again HNTroj (Figs. 9, 12, 62).
7 Conclusion
We presented a blackbox model transferring attack (HNTroj) to implant backdoor into HyperNetFL. We overcome the lack of consistency in deriving malicious local gradients to efficiently transfer a Trojaned model to the outputs of the HyperNetFL. We multiply a random and dynamic learning rate to the malicious local gradients making the attack stealthy. To defend against HNTroj, we adapted several robust FL training algorithms into HyperNetFL. Extensive experiment results show that HNTroj outperforms blackbox DPois and whitebox HNRepl bypassing adapted robust training algorithms with small numbers of compromised clients.
References
 (1) GDPR, “The european data protection regulation,” {https://gdprinfo.eu/}, 2018.
 (2) B. McMahan, E. Moore, et al., “Communicationefficient learning of deep networks from decentralized data,” in AISTATS, 2017.
 (3) H. Zhu, J. Xu, S. Liu, and Y. Jin, “Federated learning on noniid data: A survey,” arXiv preprint arXiv:2106.06843, 2021.
 (4) A. Shamsian, A. Navon, E. Fetaya, and G. Chechik, “Personalized federated learning using hypernetworks,” ICML, 2021.
 (5) P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, et al., “Advances and open problems in federated learning,” arXiv preprint arXiv:1912.04977, 2019.
 (6) E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in AISTATS, 2020.
 (7) A.N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing federated learning through an adversarial lens,” in ICML, 2019, pp. 634–643.
 (8) D. Ha, A. Dai, and Q.V. Le, “Hypernetworks,” ICLR, 2016.

(9)
B. Biggio, B. Nelson, and P. Laskov,
“Poisoning attacks against support vector machines,”
in ICML, 2012, pp. 1467–1474.  (10) B. Nelson, M. Barreno, F.J. Chi, A.D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia, “Exploiting machine learning to subvert your spam filter,” in USENIX Workshop, 2008, pp. 7:1–7:9.
 (11) Jacob Steinhardt, Pang Wei Koh, and Percy Liang, “Certified defenses for data poisoning attacks,” in NeurIPS, 2017, pp. 3520–3532.

(12)
L. Muñoz González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee,
et al.,
“Towards poisoning of deep learning algorithms with backgradient optimization,”
in AISEC, 2017.  (13) M.S. Ozdayi, M. Kantarcioglu, and Y.R. Gel, “Defending against backdoors in federated learning with robust learning rate,” AAAI, 2021.
 (14) T. Gu, B. DolanGavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” Machine Learning and Computer Security Workshop, 2017.
 (15) Y. Liu, S. Ma, Y. Aafer, W.C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” 2017.
 (16) T.N. Anh and A.T. Tuan, “Wanet  imperceptible warpingbased backdoor attack,” in ICLR, 2021.
 (17) B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B.Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in SP, 2019, pp. 707–723.
 (18) K. Liu, B. DolanGavitt, and S. Garg, “Finepruning: Defending against backdooring attacks on deep neural networks,” in RAID, 2018, pp. 273–294.
 (19) Y. Gao, C. Xu, D. Wang, S. Chen, D.C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in ACSAC, 2019.
 (20) O. Suciu, R. Marginean, Y. Kaya, H. Daume III, and T. Dumitras, “When does machine learning fail? generalized transferability for evasion and poisoning attacks,” in USENIX, 2018, pp. 1299–1316.
 (21) B. Li, Y. Wang, A. Singh, and Y. Vorobeychik, “Data poisoning attacks on factorizationbased collaborative filtering,” NeurIPS, vol. 29, pp. 1885–1893, 2016.
 (22) M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks to byzantinerobust federated learning,” in USENIX, 2020, pp. 1605–1622.
 (23) D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantinerobust distributed learning: Towards optimal statistical rates,” in ICML, 2018, pp. 5650–5659.
 (24) S. Hong, V. Chandrasekaran, Y. Kaya, T. Dumitraş, and N. Papernot, “On the effectiveness of mitigating data poisoning attacks with gradient shaping,” arXiv preprint arXiv:2002.11497, 2020.
 (25) K. Pillutla, S.M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,” arXiv preprint arXiv:1912.13445, 2019.
 (26) R. Guerraoui, S. Rouault, et al., “The hidden vulnerability of distributed learning in byzantium,” in ICML, 2018, pp. 3521–3530.
 (27) J. Jia, X. Cao, and N.Z. Gong, “Intrinsic certified robustness of bagging against data poisoning attacks,” AAAI, 2021.
 (28) C. Xie, M. Chen, P.Y. Chen, and B. Li, “Crfl: Certifiably robust federated learning against backdoor attacks,” ICML, 2021.
 (29) A. Krizhevsky et al., “Learning multiple layers of features from tiny images,” 2009.
 (30) H. Xiao, K. Rasul, and R. Vollgraf, “Fashionmnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
 (31) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proc. of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
Appendix A Federated Learning Protocol
We consider the following FL protocol: at round , the server sends the latest model weights to a randomly sampled subset of clients . Upon receiving , the client in uses
to train their local model for some number of iterations, e.g., via stochastic gradient descent (SGD), and results in model weights
. The client computes their local gradient , and sends it back to the server. After receiving all the local gradients from all the clients in , the server updates the model weights by aggregating all the local gradients by using an aggregation function where is the size of . The aggregated gradient will be added to , that is, where is the server’s learning rate. A typical aggregation function is weighted averaging, i.e., Federated Averaging (FedAvg) applied in many papers in FL kairouz2019advances , presented as follows:(3) 
When the number of training samples is hidden from the server, one can use an unweighted aggregation function: .
Appendix B Whitebox Model Replacement Attacks in HyperNetFL
Whitebox Threat Model
At round , an adversary fully controls a compromised client , has an access to the HyperNetFL weights , all the descriptors , and the local gradient of all the clients. The adversary cannot modify the training protocol of the HyperNetFL at the server and at legitimate clients. The adversary tries to open backdoors in all local models by minimizing a backdoor poisoning objective:
(4) 
where is the (backdoor) loss function of the client given Trojaned examples with the trigger , e.g., where is the targeted label for the sample . One can vary the portion of Trojaned samples to optimize the attack performance.
When the compromised client is selected for training at a round , given the observed knowledge, the adversary’s goal is to replace the HyperNetFL model weights with a poisoned one through computing and sending a malicious local gradient to the server, such that the neural network returns a Trojaned model given all the descriptors . That means all the local model weights generated by the the HyperNetFL for every clients are Trojan infected, i.e., . In practice, the adversary can collect its own data that shares a similar distribution with legitimate clients to locally train the Trojaned model . To find , the adversary minimizes the following objective:
(5) 
Given , the adversary computes the exact malicious local gradient , where can be computed by having an access to and the descriptors . When the server update using the malicious local gradient and all the other local gradients from legitimate clients , then will be replaced with the poisoned . As a result, the adversary can activate backdoors. Note that replacing with (Eq. 5) enables us to achieve the backdoor poisoning objective in Eq. 4.
The psuedocode of HNRepl and its correctness proof are in Alg. 3 and Theorem 1. The attack success rate of this whitebox model replacement is very high right after the replacement occurs (as shown in our experiments). However, it is impractical for the adversary to gain access to the HyperNetFL model weights , all the descriptors , and the local gradients . These factors must be hidden from all the clients during the whole training process of a HyperNetFL. To inform a more realistic risk, we consider the following practical blackbox threat model.
Blackbox Threat Model
At round , an adversary fully controls a small set of compromised clients . The adversary cannot modify the training protocol of the HyperNetFL at the server and at the legitimate clients. The adversary’s goal is to implant backdoors in all local models .
To adapt the model replacement attack into the blackbox setting, the adversary performs the following two steps: (1) Leverage the small set of compromised clients in order to infer and the descriptors by training a local neural network imitating the behavior of the HyperNetFL ; and (2) Compute the malicious local gradient by using the legitimate data collected by the compromised clients where is now identified by using the local instead of as in Eq. 5: .
However, we found that this blackbox model replacement attack is not effective since approximating with a small number of compromised clients is infeasible given the large size of . In addition, we also found that model replacement attacks usually introduce sudden shifts in the model utility on legitimate data samples reducing the attacks’ stealthiness in poisoning HyperNetFL.
Theorem 1.
At a communication round , the compromised client is able to substitute the Trojaned HyperNetFL with a malicious gradient as follows:
(6) 
Proof.
At round , the server sends the latest model weights generated from the current HyperNetFL to a randomly sampled subset of clients . Upon receiving , the legitimate client in uses to train their local model, and sends it back to the server. Meanwhile, the compromised client attempts to substitute the global HyperNetFL by the poisoned model that generates the Trojaned model . In other words, the compromised client needs to submit a gradient satisfying the following condition:
(7)  
The compromised client can solve Eq.7 for the gradient it needs to submit as follows:
(8) 
Consequently, Theorem 1 holds. ∎
Appendix C Backdoor Data Poisoning in HyperNetFL
Appendix D Robust Learning Rate in HyperNetFL
We attempt to adapt the recently developed robust learning rate (RLR) ozdayi2020defending over all dimensions of the local gradients in HyperNetFL. RLR works by moving the model towards a particular direction, for each dimension, it requires a sufficient number of votes (larger than a predefined threshold ), in form of signs of the local gradients. For every dimension where the sum of signs of local gradients is less than , the learning rate is multiplied by to maximize the loss on that dimension. At round , the learning rate for the dimension of the local gradients at the server is given by
(9) 
The local gradients in HyperNetFL now become . Then, the revised update rules (Lines 9 and 10, Alg. 1) are as follows:
Comments
There are no comments yet.