1 Introduction
Federated learning (FL) McMahan et al. (2017, 2016); Yang et al. (2019); Blanco-Justicia et al. (2021); Zhang et al. (2021); Jiang et al. (2021a); Wang et al. (2021), one type of distributed machine learning Jiang et al. (2022); Chen (2022), has been proposed to train a global model, where clients update the local model parameters, such as the gradients, to the sever without sharing their private data. Considering the significant advantage in privacy-preserving, FL has been applied to various data-sensitive practical applications, e.g., loan prediction Long et al. (2020); Shingi (2020), health assessment Xu et al. (2021); Kuo and Pham (2022) and next-word prediction Hard et al. (2018); Yang et al. (2018).
In a traditional FL system, each client is supposed to contribute its own data for global model training. As a reward, the client has the privilege to use the final trained global model. In another word, the server usually distributes the final trained global model to each client, regardless of their contribution. It leads to the free-rider attack Lin et al. (2019); Zhao et al. (2022), where the clients without any contribution can obtain the high-quality global model. These clients are called free-riders. In general, the free-rider issue always exists in a shared resource environment that the free-rider who enjoys the benefits from the environment without any contribution. This issue is well studied in several areas, e.g., stock market Liu and Ma (2022), transport Jiang et al. (2021b), distributed system Li et al. (2022), etc.
In this paper, we study the free-rider attack in the FL system. Noting that several existing works are presented to address free-rider issue in FL, mainly including two aspects, outlier detection
Lin et al. (2019); Zong et al. (2018); Jiang et al. (2016) of model parameters and clients’ contribution evaluation Huang et al. (2022); Gao et al. (2022). STD-DAGMM Lin et al. (2019)is a typical outlier detection method. It is deployed on the server through a deep autoencoder Gaussian mixture model, which can detect free-riders as outliers through the learned features of model update parameters. However, it requires enough benign clients to pre-train the autoencoder. Additionally, the model updates are easy to disguise for free-riders. Notably, it is difficult to distinguish the free-riders from benign clients once the number of free-riders are exceed
. CFFL Lyu et al. (2020) is a defense approach that which the server evaluates the contribution of each client based on the validation dataset. However, there is a strong assumption that the server has enough validation data in real-world FL scenarios. Advanced free-rider can adopt camouflage that has little effect on the model accuracy, hence its contribution will not decrease rapidly. As a result, the free-rider can obtain the global model rendering the defense invalid. RFFL Xu and Lyu (2021)proposes that the server evaluates each client’s contribution based on the cosine similarity of the global model gradient and the local model gradient, which may be less effective when clients’ data are non-independent and identically distributed (Non-IID)
Zhu et al. (2021a, b).The existing defense methods against free-rider attacks are still challenged in three aspects, i.e., 1) to defend against advanced camouflaged free-riders, 2) to tackle in the scenario where multiple free-riders exist (more than 50% of clients), and 3) to balance the main task performance and the defense effect. To overcome these challenges, we reconsider the difference between benign clients and free-riders during the dynamic training process. Surprisingly, we observe that free-riders are able to use the global model that is aggregated and distributed by the server to disguise model weights similar to benign clients, but are unable to disguise the process of model weights optimization. The reason is that free-riders do not perform normal training, thus they cannot evolve as efficiently as benign clients. Therefore, we intuitively consider leveraging the model evolving information to identify free-riders. We define the evolving frequency of model weights, a statistic value that does not involve private information, to measure the difference between free-riders and benign clients, which records model weights with drastically varying values.

We visualize the clients’ weight evolving frequency in the following example for illustration purposes. Here is an FL example of training the MLP model Tolstikhin et al. (2021) with two fully connected layers and one softmax output layer on the ADULT dataset Kohavi (1996). In this example, there are five clients including four benign clients and one free-rider. The free-rider executes ordinary attack Lin et al. (2019), stochastic perturbations attack Fraboni et al. (2021), random weight attack Lin et al. (2019) and delta weight attack Lin et al. (2019), respectively. We visualize the clients’ weight evolving frequencies as shown in Fig.1. We can observe that during the training process, the weight evolving frequencies of different benign clients are similar, while there is a significant difference between the free-riders and the benign clients, especially for an ordinary attack, stochastic perturbations attack and random weight attack. Although the weight evolving frequencies of the delta weight attacks are similar to that of the benign clients, it is worth noting that the scales are different.
Inspired by the difference we observed between the free-riders and the benign clients during the FL training process, we propose a defense method based on Weight Evolving Frequency, referred to as WEF-Defense. Specifically, we define the concept of weight evolving frequency matrix (WEF-Matrix), to record the weight evolving frequency of the penultimate layer of the model. WEF-Defense calculates the variation of the weight between continuous two rounds of local training, and takes the average value of the overall variation range as the dynamic threshold to evaluate the evolving frequency of all weights. Each client needs to upload the local model’s WEF-Matrix to the server together with its model weights. Then, the server can distinguish free-riders from benign clients based on the Euclidean distance, cosine similarity and overall average frequency of the WEF-Matrix among clients. For benign clients and free-riders, the server aggregates and distributes different global models only based on their evolving frequency differences. In this way, the global model obtained by free-riders does not have model weights contributed by the benign clients, thereby preventing free-riders from stealing the trained high-quality model.
The main contributions of this paper are summarized as follows.
-
We first observe that the dynamic information during the FL’s local training is different between benign clients and free-riders. We highlight the potential of using the model weight evolving frequency during training to detect free-riders.
-
Inspired by the observation, we propose WEF-Defense. We design WEF-Matrix to collect the model weight evolving frequency during each client training process and use it as an effective means of detecting free-riders.
-
Addressing the free-rider attack when major clients are free-riders, i.e., 50% or even up to 90%, WEF-Defense adopts a personalized model aggregation strategy Tan et al. (2021) to defend the attack in an early training stage.
-
Extensive experiments on five datasets and five models have been conducted. The results show that WEF-Defense achieves better defense effectiveness () than the state-of-the-art (SOTA) baselines and identifies free-riders at an earlier stage of training. Besides, it is also effective against an adaptive attack. We further provide weight visualizations to interpret its effectiveness.
The rest of the paper is organized as follows. Related works are discussed in Section 2. The preliminaries, problem statement and methodology are detailed in Sections 3, 4 and 5, respectively. The experimental setup and analysis are presented in Sections 6 and 7, respectively. Finally, we discuss our limitation in Section 8 and conclude our work in Section 9.
2 Related Work
In this section, we review the related work and briefly summarize attack and defense methods used as baselines in the experiments.
2.1 Free-Rider Attacks on Federated Learning
According to the attacker’s camouflage tactics, free-rider attack includes ordinary attacks Lin et al. (2019), random weight attack Lin et al. (2019), stochastic perturbations attack Fraboni et al. (2021) and delta weight attack Lin et al. (2019).
Ordinary attack Lin et al. (2019) is a primitive attack without camouflage, where the malicious client does not have any local data, i.e., it does not perform local training. By participating in FL training, it obtains the global model issued by the server. Based on it, random weight attack Lin et al. (2019)
builds a gradient update matrix by randomly sampling each value from a uniform distribution in a given range
. However, it only works well in the condition of an ideal selection of the range value in advance. Besides, the randomly generated weight can not generally promise good attack performance by imitating the benign clients’ model weights. Stochastic perturbations attack Fraboni et al. (2021) is a covert free-rider attack that uploads crafted model weights by adding specific noises to the distributed global model. In this way, it is difficult for the server to effectively detect the free-riders. Compared with previous attacks, delta weight attack Lin et al. (2019)submits a crafted update to the server by calculating the difference between the last two rounds it received. Note that for machine learning training, except for the first few epochs, the weight variations at each round are small . Therefore the crafted updates could be similar to the updates of the benign clients.
2.2 Defenses against Free-Rider Attacks
The existing defense methods can be mainly categorized into two types, i.e., outlier detection of model parameters and clients’ contribution evaluation.
In the first work on the free-rider attack on FL, Jierui et al. Lin et al. (2019)
explored a possible defense based on outlier detection, named STD-DAGMM. Accordingly, the standard deviation indicator is added on the basis of the deep autoencoding Gaussian mixture model
Zong et al. (2018). Its network structure is divided into two parts: the compression network and the estimation network. Specifically, the gradient update matrix is fed into the compression network to obtain the low-dimensional output vector and the standard deviation from the input vector is calculated, which is then vector superposed with the calculated Euclidean and cosine distance metrics. Finally, concatenate this vector with the low-dimensional representation vector learned by the compression network. The output concatenated vector is fed into the estimation network for multivariate Gaussian estimation. However, the time complexity of STD-DAGMM is large, because each client is required to pre-train its network structure in an early stage. Meanwhile, when the free-riders take up more than 20% of total clients, it is difficult to select a proper threshold to distinguish the free-riders from the benign clients.
The other defense against free-rider attacks is based on clients’ contribution evaluation. Lyu et al. Lyu et al. (2020) proposed a collaborative fair federated learning, CFFL, to achieve cooperative fairness through reputation mechanisms. It mainly evaluates the contribution of each client using the server’s verification dataset. The clients iteratively update their respective reputations, and the server assigns models of different qualities according to their contributions. The higher the reputation of the clients, the better the quality of the aggregation model obtained. However, CFFL relies on proxy datasets, which is not practical in real-world applications. On this basis, Xinyi et al. Xu and Lyu (2021) proposed robust and fair federated learning, RFFL, to realize both collaborative fairness and adversarial robustness through a reputation mechanism. The server in RFFL iteratively evaluates the contribution of each client by the cosine similarity between the uploaded local gradient and the aggregated global gradient. Compared with CFFL, RFFL does not require a validation dataset in advance. However, RFFL is not effective when facing an adaptive free-rider with the ability to camouflage gradients under the Non-IID data.
3 Preliminaries and Background
3.1 Horizontal Federated Learning
Compared with the standard centralized learning paradigm, FL Huang et al. (2022); Wan et al. (2021) provides a simple and effective solution to prevent private local data from being leaked. Only global model parameters and local model parameters are allowed to communicate between the server and clients. All private training data are on the client device, inaccessible to other clients.
As one of the most wildly used FL frameworks, horizontal FL (HFL) represents a scenario where the training data of participating clients have the same feature space, but have different sample space. Its training objective can be summed up as searching for the optimal global model:
(1) |
where is the number of participating clients, represents the local model. Each local model is defined as , where represents each data sample and its corresponding label, and represents the prediction loss using the local parameter .
HFL performs distributed training by combining multiple clients, and uses the HFL classic algorithm FedAvg McMahan et al. (2017) to calculate the weighted average to update the global model weights as:
(2) |
where is the communication round, represents the model weights uploaded by the -th client participating in the -th round of training.
3.2 Free-Rider Attack
Almost all existing free-rider attacks are conducted on the HFL framework, thus we mainly address the issue of defending against free-riders on HFL. Free-riders are those clients who have no local data for normal model training, but aim to obtain the final aggregated model without any contribution. Since they are involved in the FL process, free-riders can use some knowledge about the global model (e.g., global model architecture, global model weights received at each round) to generate fake model updates to bypass the server.

Fig.2 illustrates an example of the free-rider attack in a practical scenario in the financial field, e.g., FL is adopted for the bank’s loan evaluation system. A malicious client may pretend to participate in federated training while concealing the fact that there are no data contributed locally through uploading fake model updates to the server. Consequently, the free-rider obtains a high-quality model benefiting from other clients’ valuable data and computation power.
4 Problem Statement
4.1 Problem Formulation
Suppose there are clients, denoted by ,…,. The benign clients have a local dataset , while the free-riders have no local dataset. Our goal is that in the case of free-riders in the federal system, the central server can distinguish the free-riders from the benign clients to prevent free-riders from stealing a high-quality global model.
4.2 Assumptions and Threat Model
Attacker’s Goal. The purpose of a free-rider attack is not to harm the server, but to obtain the global model issued by the server from the federated training without any local data actually contributing to the server. A free-rider can send arbitrary crafted local model updates to the server in each round of the FL training process, which can disguise itself as a benign client. The uploaded fake updates have little impact on the performance of the aggregated model, so a high-quality model can be finally obtained by the free-rider.
Attacker’s Capability
. We assume that the server is honest and does not know how many free-riders exist among the clients. If there are multiple free-riders, they can communicate and collude with each other and manipulate their model updates, but cannot access or manipulate other benign clients’ data. The free-riders have the generally accessible information in an FL system, including the local model, loss function, learning rate and FL’s aggregation rules. Free-riders use this knowledge to generate fake model weights
to bypass the server. In the -th round, the attack target of free-riders is:(3) |
where, the camouflage function uses a set of parameters to process the global model weights issued by the server in the -th round, and runs the camouflage method to generate crafted model weights aiming to bypass the free-rider detection and defense methods on the server. In addition, free-riders can also perform adaptive attacks against specific defense methods, which we discuss in Section 7.7.
Defender’s Knowledge and Capability. The server can set up defense methods against free-riders. But it does not have access to the client’s local training data, nor does it know how many free-riders exist in the FL system. However, in each training round, the server has full access to the global model as well as local model updates from all clients. Additionally, the server can request each local client to upload other non-private information, and use the information to further defend against free-riders. The goal of defense can be defined as:
(4) |
where the selection function selects model updates uploaded by benign clients as much as possible when the model is aggregated. represents the model weights uploaded by the benign clients, and represents the selected model weights. is the total number of clients.
5 WEF-Defense
5.1 Overview
The concept of sensitive neurons has been widely discussed recently
Xu et al. (2020); Malmierca et al. (2019). It is observed that when data is input to a neural network, not all neurons will be activated. Different neurons in different layers will respond to different data features with various intensities, and then the weights will vary significantly. Free-riders do not have data, and thus they do not have information to take the influence of sensitive and insensitive neurons on parameters into account when they craft their fake updates. Thus, it is difficult for a free-rider to camouflage the frequency of weight variation. Motivated by it, WEF-Defense adopts the weight evolving frequency during the local model training process as an effective means to defend against free-rider. The overview of WEF-Defense is shown in Fig.
3, including three main components: 1⃝ WEF-Matrix information collection (Section 5.2), 2⃝ client separation (Section 5.3), 3⃝ personalized model aggregation (Section 5.4).
5.2 WEF-Matrix Information Collection
To obtain effective information about the clients, the WEF-Matrix collection is divided into three steps: (i) WEF-Matrix initialization, (ii) threshold determination, (iii) WEF-Matrix calculation.
5.2.1 WEF-Matrix Initialization
We first define the WEF-Matrix, which is determined by the weights in the penultimate layer of the client
and initialized to an all-zero matrix. It records the information on weight evolving frequency in local training. We use the weights of the penultimate layer for the following reasons. The softmax output in the last layer realizes the final classification result. The closer the weights to the last layer, the greater they have the impact on the final classification result, and the more representative the weight variations in this layer are. The initialization process is as follows:
(5) |
where returns an all-zero matrix of size . has the same size as .
5.2.2 Threshold Determination
We collect the corresponding weight evolving frequency during the local training process of the client through the initialized WEF-Matrix. Before computing the WEF-Matrix, we need to determine a dynamic threshold for measuring frequency variations. Suppose a client is performing the round -th local training, and its model weights obtained after training are . We select the weights of the client in the penultimate layer, represented as . Then, we calculate the weight variations between of the -th round and of the -th round, and take the overall average variation as the threshold. Calculate the threshold of client at the ()-th round as follows:
(6) |
where returns the absolute value, is a weight value of the -th row and the -th column from the penultimate layer of the client in the ()-th round, and represent the rows and columns of , respectively.

To find out the evolution of the threshold value during training, we conduct an experiment to visualize the threshold of the -th client during training, shown in Fig.4. We use the ADULT data Kohavi (1996) and the MLP model Tolstikhin et al. (2021) for illustration. There are 50 rounds of global training and 3 rounds of local training, thus in total 150 rounds of iterations. For a benign client , we find that when the model has not converged in the first 60 rounds, the threshold variations greatly. After the model has converged, the threshold fluctuation tends to stabilize. It illustrates that the is dynamically changed in most training rounds, and this characteristic is difficult to be simulated.
5.2.3 WEF-Matrix Calculation
We calculate the weight evolving frequency in local training based on the calculated dynamic threshold. Its calculation process is as follows:
(7) |
where represents a frequency value of the -th row and the -th column of the client in the -th round, , . The number of frequencies calculated in each round will be accumulated. Finally, the client uploads the WEF-Matrix together with the model updates to the server. It is worth noting that the uploaded information does not involve the client’s data privacy.
5.3 Client Separation
To distinguish benign clients and free-riders, we use the difference of WEF-Matrix to calculate three metrics and combine them to detect free-riders. The server randomly selects a client , then based on its uploaded WEF-Matrix, calculates 1) the Euclidean distance , 2) the cosine similarity with other clients’ WEF-Matrix, and 3) the average frequency of their WEF-Matrix, as follows:
(8) |
where represents the WEF-Matrix uploaded by the client , represents the total number of clients.
(9) |
where represents the matrix dot product, and represents the 2-norm of the matrix.
(10) |
where and represent the rows and columns of , respectively.
For client , we further calculate the similarity deviation value by adding the normalized deviations , and , as follows:
(11) |
The reason why three metrics are used to calculate is to comprehensively consider various scenarios in that free-riders may exist, and reduce the success rate of free-riders bypassing defenses. Specifically, Euclidean distance can be used to effectively identify free-riders, but cannot work when the number of benign clients is close to free-riders due to its symmetric nature. Therefore, we leverage cosine similarity and average frequency to perform a better distinction. These three metrics complement each other and work together.
The server sets the reputation threshold according to the similarity deviation value, then separates benign clients and free-riders into . Through experimental evaluation, we find that the similarity deviation gap between benign clients and free-rider is large, but the similarity deviation gap between free-riders is small. Thus, free-riders can be identified by setting a certain range according to the maximum similarity deviation value. We define in the experiment, where
is a hyperparameter. We set
by conducting a preliminary study based on a small dataset, and find that such a setting is effective in general.5.4 Personalized Model Aggregation
Based on the client separation process, the server can maintain two separated models in each round, and aggregate the model updates uploaded by the two groups of clients respectively. The sever leverage the two groups to form two global models, and then distribute them to the corresponding groups, respectively. As a result, the global model trained by benign clients cannot be obtained by the free-riders. The aggregation process is as follows:
(12) |
(13) |
where and are the global model in the ()-th round and the -th round, is the local model updates uploaded by the client. and represent the number of clients in their groups, respectively.
The server separates the clients in each round. The evolving frequency of model weights collected by the server is continuously accumulated. Then, the difference in updated evolving frequency between the benign clients and the free-riders will be further enlarged.
The detailed implementation of WEF-Defense is shown in Algorithm 1. It mainly includes three steps: (1) WEF-Matrix information collection, (2) clients separation, and (3) personalized model aggregation.
5.5 Algorithm Complexity
We analyze the complexity of WEF-Defense in two parts, i.e., the information collection on the client and identification on the server.
On the client, we select the weights of the penultimate layer in the model to initialize the WEF-Matrix, then use it to record the weight evolving frequency information. Therefore, the computational complexity can be defined as:
(14) |
where is the local training epochs.
On the server, we calculate and perform model aggregation for clients in respectively. Therefore, the time complexity is:
(15) |
where is the number of clients.
Algorithm 1: WEF-Defense. | |
---|---|
Input: dataset of each client, ; the global epochs and the local epochs ; initial global model ; total number of clients ; benign clients and free-riders ; hyperparameter . | |
Output: the global models and . | |
1. | Initialization: initialize WEF-Matrix based on Equ. (5), local model . |
2 | Role: Client #WEF-Matrix Information Collection |
3. | If |
4. | |
5. | For do |
6. | Calculate according to Equ. (7) |
7. | End For |
8. | Else If |
9. | |
10. | For do |
11. | Calculate according to Equ. (7) |
12. | End For |
13. | End If |
14. | Local updates upload to Server |
15. | Role: Server #Client Separation and Personalized Model Aggregation |
16. | For do |
17. | Calculate , and according to Equs.(8),(9) and (10) |
18. | End For |
19. | For do |
20. | Calculate according to Equ. (11) |
21. | End For |
22. | Separate clients to groups according to the reputation threshold. |
23. | Calculate and according to Equs. (12) and (13) |
24. | Return: the global models and |
6 Experiments Setting
Platform: i7-7700K 4.20GHzx8 (CPU), TITAN Xp 12GiB x2 (GPU), 16GBx4 memory (DDR4), Ubuntu 16.04 (OS), Python 3.6, pytorch1.8.2 Sankaran et al. (2022).
Datasets
: We evaluate WEF-Defense on five datasets, i.e., MNIST, CIFAR-10, GTSRB, BANK and ADULT. MNIST
LeCun et al. (1998) dataset contains 70,000 real-world handwritten images with digits ranging from 0 to 9. CIFAR-10 Ayi and El-Sharkawy (2020) dataset contains 60,000 color images in 10 classes with a size of 32 x 32 and 6,000 images per class. GTSRB Sermanet and LeCun (2011) dataset contains 51,839 real-world German colored traffic signs in 43 categories. In ADULT Kohavi (1996) dataset, there are 48,843 records in total. We manually balance the ADULT dataset to have 11,687 records over 50K and 11,687 records under 50K, resulting in total of 23,374 records. BANK Moro et al. (2011) dataset is related to the direct marketing campaign of a Portuguese banking institution and has data on whether 45,211 customers subscribe to fixed deposits, each with 16 attributes. For each dataset, we conduct an 80-20 train-test split. The detailed information of datasets is shown in Table 1.Datasets | Samples | Dimensions | Classes | Models | LearningRate | Momentum | Epoches | BachSize |
---|---|---|---|---|---|---|---|---|
MNIST | 70,000 | 2828 | 10 | LeNet | 0.005 | 0.0001 | 50 | 32 |
CIFAR-10 | 60,000 | 3232 | 10 | VGG16 | 0.01 | 0.9 | 80 | 32 |
GTSRB | 51,839 | 3232 | 43 | ResNet18 | 0.001 | 0.001 | 80 | 32 |
ADULT | 23,374 | 14 | 2 | MLP | 0.0001 | 0.0001 | 50 | 32 |
BANK | 45,211 | 16 | 2 | MLP | 0.02 | 0.5 | 80 | 32 |
Data Distribution: Two typical data distribution scenarios are considered in our experiments. Independent and identically distribute (IID) data Rached et al. (2021): each client contains the same amount of data, and contains complete categories. Non-independent and identically distribute (Non-IID) data Zhu et al. (2021b): in real-world scenarios, the data among clients is heterogeneous, we consider using Dirichlet distribution Zamzami and Bouguila (2022); Rademacher and Doroslovacki (2021); Zoghbi et al. (2016) to divide the training data among clients. Specifically, we sample and divide the dataset according to the distribution of concentration parameter , assigned to each client. More specific, is the Dirichlet distribution with . With the above partitioning strategy, each client can have relatively few data samples in certain classes. Consider using =0.5 in the experiment to explore the problem of heterogeneity.
Number of clients: In all experimental scenarios, we mainly evaluate the effect of different ratios of free-riders on our method. Thus the total number of clients is 10, and the free-rider attacks are discussed for 10%, 30%, 50% and 90% of the total clients.
Models
: Different classifiers are used for various datasets. For MNIST, LeNet
El-Sawy et al. (2016) is used for classification. For more complex image datasets, CIFAR-10 and GTSRB, VGG16 Simonyan and Zisserman (2015) and ResNet18 He et al. (2016) are adopted, respectively. For structured datasets, ADULT and ADULT BANK, MLP Tolstikhin et al. (2021) is applied. Refer to Table 1 for specific parameter setting. All evaluation results are the average of 3 runs of the same setting.Hyper-Parameters: For all experiments, we set the hyperparameter .
Attack Methods: Three existing free-rider attack methods are applied to evaluate the detection performance, including random weight attack Lin et al. (2019), stochastic perturbations attack Lin et al. (2019) and delta weight attack Lin et al. (2019). Among them, the weight generation range of random weight attack uses . In the adaptive attack scenario, we design a new free-rider attack to evaluate the defense performance.
Baselines: Two defense approaches are used for comparison, including CFFL Lyu et al. (2020) based on the validation dataset, and RFFL Xu and Lyu (2021) based on cosine similarity between local gradients and aggregated global gradients. The undefended FedAvg aggregation algorithm McMahan et al. (2017) as a benchmark.
Evaluation Metrics: We evaluate the performance of the detection methods by evaluating the highest mean accuracy (HMA) of the model that can be stolen by free-riders. The lower the HMA is, the better the detection is.
7 Evaluation and Analysis
In this section, we evaluate the performance of WEF-Defense by answering the following five research questions (RQs):
-
RQ1: Dose WEF-Defense achieves the SOTA defense performance compared with baselines when defending against various free-rider attacks?
-
RQ2: Does WEF-Defense still achieve the best performance when the proportion of free-riders is higher?
-
RQ3: Will WEF-Defense affect the main task performance? What is its communication overhead?
-
RQ4: How to interpret the defense of WEF-Defense through visualizations?
-
RQ5: Can WEF-Defense defend against adaptive attack? How sensitive is the hyperparameter?
7.1 RQ1: Defense Effectiveness of WEF-Defense
In this section, we verify the defense effect of WEF-Defense compared with baselines on different datasets for different models.
IID Datasets | Attacks | FedAvg | CFFL | RFFL | WEF-Defense | ||||
---|---|---|---|---|---|---|---|---|---|
10% | 30% | 10% | 30% | 10% | 30% | 10% | 30% | ||
MNIST | RWA | 10.75 | 10.81 | 10.75 | 10.75 | 10.75 | 10.85 | 10.13 | 9.91 |
SPA | 98.26 | 97.92 | 81.35 | 71.83 | 55.01 | 68.41 | 9.43 | 10.71 | |
DWA | 98.05 | 98.36 | 77.52 | 87.81 | 87.74 | 79.21 | 9.52 | 22.21 | |
CIFAR-10 | RWA | 44.78 | 10.55 | 10.31 | 11.30 | 10.65 | 10.55 | 10.12 | 10.41 |
SPA | 84.55 | 84.12 | 68.25 | 75.53 | 10.05 | 10.85 | 9.11 | 10.66 | |
DWA | 86.45 | 84.92 | 56.43 | 59.45 | 20.59 | 20.72 | 9.42 | 19.74 | |
GTSRB | RWA | 72.40 | 6.13 | 4.86 | 4.98 | 5.06 | 5.06 | 3.56 | 4.86 |
SPA | 94.57 | 94.12 | 4.70 | 4.71 | 4.71 | 4.71 | 4.86 | 3.52 | |
DWA | 94.65 | 94.18 | 39.66 | 39.19 | 39.77 | 38.36 | 35.27 | 36.38 | |
ADULT | RWA | 50.00 | 50.00 | 76.83 | 49.44 | 74.93 | 57.85 | 50.00 | 49.96 |
SPA | 78.88 | 78.91 | 78.95 | 77.24 | 79.21 | 52.66 | 35.22 | 45.34 | |
DWA | 78.92 | 78.91 | 78.91 | 78.96 | 73.19 | 73.79 | 61.26 | 56.23 | |
BANK | RWA | 84.56 | 71.25 | 79.95 | 74.65 | 50.24 | 50.10 | 50.01 | 50.00 |
SPA | 84.66 | 83.85 | 80.55 | 79.14 | 65.64 | 67.40 | 49.95 | 50.00 | |
DWA | 84.63 | 82.88 | 75.94 | 74.65 | 71.95 | 72.45 | 63.52 | 70.00 |
Implementation Details. (1) Five datasets are tested in IID data and Non-IID data settings. The Non-IID data adopts the Dirichlet distribution to explore the problem of heterogeneity, where the distribution coefficient defaults to 0.5. (2) In general, the number of free-riders is less than that of benign clients. Consequently, in 10 clients, two scenarios with the free-rider ratio of 10% and 30% are set up, in which the camouflage method of free-rider adopts random weight attack (RWA), stochastic perturbations attack (SPA) and delta weight attack (DWA). (3) We adopt three baselines to perform the comparison, i.e., undefended FedAvg aggregation McMahan et al. (2017), RFFL Xu and Lyu (2021) and CFFL Lyu et al. (2020). We use the HMA obtained by free-rider as the evaluation metric. The results of IID data and Non-IID data are shown in Tables 2 and 3, respectively.
Results and Analysis. The results in Tables 2 and 3 demonstrate that the overall defense performance of WEF-Defense is the best compared with CFFL and RFFL in most cases. Particularly, the defense effect of WEF-Defense remains stable when dealing with the heterogeneity of Non-IID data. For all image datasets (i.e., MNIST, CIFAR-10 and GTSRB) in both tables, WEF-Defense makes the overall HMA obtained by free-riders not exceed 36.38%, where the lowest HMA is only 2.29%. For all structured datasets (i.e., ADULT and BANK) in both tables, WEF-Defense realizes the overall HMA obtained by free-riders not exceed 70%, among which the lowest HMA is only 35.22%. It is worth noting that since there are only two classes in both ADULT and BANK datasets, the HMA should reach about 50% during the initial training, i.e., random guessing. Therefore, we can conclude that WEF-Defense performs the SOTA defense compared with baselines.
Non-IID Datasets | Attacks | FedAvg | CFFL | RFFL | WEF-Defense | ||||
---|---|---|---|---|---|---|---|---|---|
10% | 30% | 10% | 30% | 10% | 30% | 10% | 30% | ||
MNIST | RWA | 78.41 | 76.55 | 25.03 | 14.49 | 10.73 | 10.81 | 10.02 | 9.99 |
SPA | 95.22 | 97.01 | 68.41 | 59.75 | 17.92 | 26.70 | 9.72 | 10.76 | |
DWA | 95.10 | 98.54 | 40.42 | 64.92 | 71.31 | 62.64 | 20.22 | 13.91 | |
CIFAR-10 | RWA | 53.91 | 9.06 | 9.52 | 10.04 | 9.50 | 11.32 | 9.05 | 9.91 |
SPA | 84.83 | 85,42 | 28.54 | 16.72 | 9.02 | 11.65 | 8.92 | 9.52 | |
DWA | 84.44 | 84.98 | 19.12 | 33.62 | 17.52 | 23.13 | 9.45 | 20.00 | |
GTSRB | RWA | 6.13 | 5.82 | 6.13 | 5.85 | 5.90 | 4.52 | 4.71 | 4.86 |
SPA | 94.00 | 92.63 | 4.71 | 4.71 | 4.72 | 4.75 | 2.29 | 4.82 | |
DWA | 94.22 | 94.00 | 23.35 | 24.86 | 28.43 | 28.73 | 30.04 | 19.11 | |
ADULT | RWA | 50.03 | 50.04 | 50.20 | 50.10 | 50.01 | 49.94 | 49.62 | 49.92 |
SPA | 76.22 | 76.02 | 59.92 | 55.42 | 54.91 | 60.00 | 49.62 | 46.42 | |
DWA | 77.00 | 78.36 | 63.62 | 57.42 | 49.92 | 50.05 | 51.82 | 49.92 | |
BANK | RWA | 71.21 | 50.02 | 50.23 | 50.21 | 50.54 | 50.10 | 50.00 | 50.00 |
SPA | 76.95 | 71.75 | 70.45 | 50.99 | 50.13 | 50.92 | 50.00 | 50.00 | |
DWA | 80.63 | 80.26 | 50.52 | 71.05 | 57.09 | 71.05 | 50.00 | 50.00 |
Besides, WEF-Defense shows more stable performance in defending against various free-rider attacks in different scenarios. For instance, in Table 2 with the IID setting, the standard deviation of HMA for WEF-Defense on image datasets is around 9.36, while that for CFFL and RFFL reaches 31.19 and 26.72, respectively. In Table 3 with the Non-IID setting, the standard deviation of HMA for WEF-Defense is around 6.69, while that for CFFL and RFFL reaches 20.04 and 18.43, respectively. The average standard deviation of WEF-Defense is about 1/3 that of baselines, which further demonstrates its stability.
Answer to RQ1: WEF-Defense shows the SOTA performance compared with baselines and prevents various free-rider attacks, whether 10% or 30% of clients are free-riders. Under the IID and Non-IID settings, on average, 1) its defense effect is 1.68 and 1.33 times that of baselines, respectively; and 2) its defense stability is 3.09 and 2.87 times that of baselines, respectively.
7.2 RQ2: Defense Effect at Higher Free-Rider Ratios
Under the traditional FL framework, more than half of the total clients of free-riders do not have much impact on the global model’s accuracy. For instance, in Table 4, free-riders with DWA realize over 80% HMA on average when the number of free-riders reaches 90% of all clients in the undefended FedAvg aggregation framework. Therefore, we consider whether a high proportion of free-riders affects defense effectiveness.
IID Datasets | Attacks | FedAvg | CFFL | RFFL | WEF-Defense | ||||
---|---|---|---|---|---|---|---|---|---|
50% | 90% | 50% | 90% | 50% | 90% | 50% | 90% | ||
MNIST | RWA | 10.82 | 10.23 | 10.65 | 9.34 | 14.39 | 10.65 | 8.71 | 10.73 |
SPA | 97.44 | 89.24 | 66.34 | 64.34 | 76.52 | 22.84 | 10.72 | 10.72 | |
DWA | 98.25 | 95.73 | 75.16 | 38.45 | 87.74 | 79.21 | 19.82 | 27.22 | |
CIFAR-10 | RWA | 11.35 | 10.72 | 11.31 | 9.90 | 11.30 | 11.30 | 10.21 | 9.30 |
SPA | 80.00 | 45.53 | 59.49 | 10.05 | 10.85 | 11.75 | 10.60 | 11.38 | |
DWA | 82.55 | 65.96 | 74.67 | 23.39 | 21.63 | 22.45 | 21.30 | 21.12 | |
GTSRB | RWA | 6.13 | 5.06 | 6.13 | 6.13 | 5.85 | 6.13 | 4.94 | 4.86 |
SPA | 93.74 | 84.36 | 4.71 | 4.87 | 4.73 | 4.91 | 4.86 | 4.86 | |
DWA | 93.34 | 84.52 | 36.74 | 29.17 | 39.58 | 35.41 | 36.10 | 36.50 | |
ADULT | RWA | 50.24 | 50.00 | 50.15 | 50.10 | 53.61 | 51.20 | 50.00 | 49.91 |
SPA | 78.66 | 72.66 | 79.14 | 63.12 | 68.11 | 54.12 | 45.52 | 47.42 | |
DWA | 79.11 | 79.15 | 78.99 | 78.94 | 73.43 | 72.92 | 56.13 | 56.32 | |
BANK | RWA | 51.03 | 51.40 | 50.15 | 50.60 | 50.30 | 50.30 | 50.14 | 50.00 |
SPA | 83.54 | 80.85 | 76.85 | 66.85 | 70.65 | 69.39 | 50.00 | 50.00 | |
DWA | 83.33 | 83.45 | 72.39 | 69.25 | 72.45 | 72.35 | 68.89 | 68.53 |
Implementation Details. (1) The IID and Non-IID are adopted for the five datasets, respectively. The Non-IID data adopts the Dirichlet distribution to explore the problem of heterogeneity, where the distribution coefficient defaults to 0.5. (2) We set the free-rider ratio to 50% and 90% in 10 clients. It helps to discover how WEF-Defense performs when the number of free-riders is equal to or much more than that of benign clients. Tables 4 and 5 show the results.
Non-IID Datasets | Attacks | FedAvg | CFFL | RFFL | WEF-Defense | ||||
---|---|---|---|---|---|---|---|---|---|
50% | 90% | 50% | 90% | 50% | 90% | 50% | 90% | ||
MNIST | RWA | 58.13 | 9.48 | 26.23 | 10.84 | 18.81 | 10.81 | 10.81 | 9.98 |
SPA | 96.86 | 98.94 | 58.42 | 16.87 | 62.92 | 33.89 | 10.72 | 10.72 | |
DWA | 97.75 | 99.21 | 55.81 | 82.89 | 61.41 | 39.92 | 21.43 | 23.45 | |
CIFAR-10 | RWA | 10.00 | 9.93 | 10.00 | 10.71 | 11.39 | 11.35 | 9.05 | 11.31 |
SPA | 85.87 | 79.53 | 10.00 | 10.32 | 10.19 | 10.23 | 11.32 | 11.33 | |
DWA | 85.73 | 85.51 | 29.25 | 40.52 | 24.63 | 19.12 | 19.18 | 16.84 | |
GTSRB | RWA | 5.97 | 6.13 | 6.13 | 5.85 | 5.80 | 6.14 | 5.46 | 4.86 |
SPA | 89.52 | 57.63 | 4.91 | 4.92 | 5.70 | 4.83 | 4.81 | 4.80 | |
DWA | 91.17 | 68.82 | 27.67 | 23.47 | 32.23 | 28.94 | 31.23 | 22.73 | |
ADULT | RWA | 50.20 | 50.00 | 50.42 | 50.00 | 52.45 | 51.18 | 50.00 | 49.72 |
SPA | 76.34 | 69.42 | 78.34 | 71.12 | 61.04 | 55.23 | 54.09 | 49.79 | |
DWA | 79.05 | 79.00 | 73.34 | 79.42 | 57.52 | 52.16 | 53.00 | 51.62 | |
BANK | RWA | 50.30 | 50.04 | 50.40 | 50.26 | 50.19 | 50.20 | 50.00 | 50.00 |
SPA | 83.33 | 50.02 | 55.15 | 53.84 | 52.42 | 57.49 | 50.00 | 50.00 | |
DWA | 82.85 | 52.55 | 63.82 | 50.14 | 68.45 | 50.73 | 63.62 | 50.00 |
Results and Analysis. The results in Tables 4 and 5 show that the defensive capability of WEF-Defense still achieves the SOTA performance when half or more than half of the clients are free-riders. For instance, on all image datasets, the HMA of global models obtained by free-riders is less than 36.50% when WEF-Defense is implemented. On all structured datasets, the HMA obtained by free-riders is less than 68.89% when WEF-Defense is conducted. These are solid evidence of the stable defense effect of WEF-Defense at high free-rider ratios.
Meanwhile, WEF-Defense is more effective in preventing free-riders from obtaining high-quality models. For instance, compared with FedAvg on both tables, the average HMA obtained by free-riders of WEF-Defense is reduced by 65%, while that of CFFL and RFFL is only reduced by 35% and 40%, respectively. Besides, WEF-Defense shows more stable performance than CFFL and RFFL when there are more free-riders than benign clients. For instance, in Table 4 with the IID setting, the standard deviation of HMA for WEF-Defense on image datasets is around 9.85, while that for CFFL and RFFL reaches 25.74 and 26.29, respectively. In Table 5 with the Non-IID setting, the standard deviation of HMA for WEF-Defense is around 7.31, while that for CFFL and RFFL reaches 21.54 and 17.59, respectively. The outstanding performance of WEF-Defense is mainly because it identifies differences in the evolution process of the local model training, which effectively avoids model update camouflage for free-riders.
Answer to RQ2: When the number of free-riders is equal to or greater than that of benign clients, WEF-Defense shows better and more stable performance compared with baselines. Under the IID and Non-IID settings, on average, 1) its defense effect is 1.41 and 1.28 times that of baselines, respectively; and 2) its defense stability is 2.64 and 2.67 times that of baselines, respectively.
7.3 Defensive Timeliness
We conduct defense timeliness analysis for the experiments in RQ1 and RQ2, where timeliness refers to earlier detection of free-riders. Since CFFL cannot provide detection results during the training, we only compare the timeliness with RFFL.
RFFL | WEF-Defense | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Datasets | Attacks | Ratio under IID | Ratio under Non-IID | Ratio under IID | Ratio under Non-IID | ||||||||||||
10% | 30% | 50% | 90% | 10% | 30% | 50% | 90% | 10% | 30% | 50% | 90% | 10% | 30% | 50% | 90% | ||
MNIST | RWA | 3/50 | 5/50 | – | – | 1/50 | 5/50 | – | – | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 |
SPA | 11/50 | 5/50 | 10/50 | 21/50 | 8/50 | 11/50 | 11/50 | – | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | |
DWA | – | – | – | – | – | – | – | – | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | |
CIFAR-10 | RWA | 10/80 | 11/80 | 16/80 | – | 11/80 | 20/80 | – | – | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 |
SPA | 11/80 | 11/80 | 11/80 | – | 11/80 | 11/80 | 16/80 | – | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | |
DWA | – | – | – | – | – | – | – | – | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | |
GTSRB | RWA | 11/80 | 11/80 | – | – | – | – | – | – | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 |
SPA | 11/80 | 11/80 | 11/80 | – | 11/80 | 11/80 | 17/80 | – | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | |
DWA | – | – | – | – | – | – | – | – | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | |
ADULT | RWA | 19/50 | 10/50 | – | – | – | – | – | – | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 |
SPA | 10/50 | 10/50 | 10/50 | – | 11/50 | – | 11/50 | 21/80 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | 1/50 | |
DWA | – | – | – | – | —- | – | – | – | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | 3/50 | |
BANK | RWA | 2/80 | 3/80 | 5/80 | – | 11/80 | 11/80 | 10/80 | – | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 |
SPA | 10/80 | 5/80 | 7/80 | – | 11/80 | 11/80 | 8/80 | 11/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | 1/80 | |
DWA | – | – | – | – | – | – | – | – | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 | 3/80 |
Method Comparison | Distributions | Ratios | |||
---|---|---|---|---|---|
10% | 30% | 50% | 90% | ||
FadAvg—WEF-Defense | IID | 1.59E-05 | 2.84E-04 | 4.63E-04 | 6.42E-04 |
Non-IID | 1.59E-05 | 3.71E-04 | 1.69E-04 | 1.83E-03 | |
CFFL—WEF-Defense | IID | 4.52E-04 | 1.38E-03 | 2.33E-03 | 4.53E-02 |
Non-IID | 8.66E-03 | 9.82E-03 | 1.03E-02 | 2.30E-02 | |
RFFL—WEF-Defense | IID | 6.96E-03 | 1.90E-02 | 1.48E-02 | 2.55E-02 |
Non-IID | 6.32E-02 | 1.93E-02 | 3.58E-02 | 2.04E-02 |
The p-value of T-test.
As shown in Table 6, WEF-Defense is capable of free-rider detection at an earlier period compared with RFFL on all datasets. For instance, for almost all cases, WEF-Defense detects free-riders in the first round, while RFFL fails to detect them until the end of training in most cases. The reason is that based on the collected WEF-Matrix information, it can distinguish free-riders and benign clients easily. Besides, it is difficult for free-riders to disguise WEF-Matrix, so WEF-Defense can identify free-riders earlier.
7.4 Significance Analysis
To illustrate the superiority of WEF-Defense’s effect, we perform a preliminary T-test for the experiments in RQ1 and RQ2, compared with baselines, to confirm whether there is a significant difference in the defense effect of WEF-Defense. The results are shown in Table 7.
For the T-test, we define the null hypothesis as that the differences between defense methods are small. From the experimental results, we can see that the overall p-value is small enough (
0.05) to reject the null hypothesis, which proves the superiority of WEF-Defense.7.5 RQ3: Trade-off between Defense and Main Task Performance
In this section, we discuss whether defensive approaches sacrifice main task performance.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
7.5.1 Comparison of Global Model Accuracy
Implementation Details. (1) Since CFFL and RFFL are contribution-based federated frameworks, we only consider testing under the IID setting. (2) We test two scenarios of 10% and 90% free-riders in 10 clients. We plot HMA obtained by benign clients and free-riders respectively, then compare HMA obtained by benign clients in the no free-rider scenario, as shown in Fig.5.
Results and Analysis. Experiments show that WEF-Defense can not only defend against free-rider attacks, but also maintains the high accuracy of global models by aggregating all benign clients via eliminating the negative effect of free-riders’ updates. In Fig.5, it is more significant in the scene where the free-rider accounts for 90%. For instance, on CIFAR-10, the overall average HMA obtained by benign clients with RFFL (44.04%) and CFFL (51.73%) is much lower than that with WEF-Defense (78.13%).
Comparing subfigures (a) and (b) in Fig.5, we notice that random weight attack decreases the global model accuracy as the number of free-riders increases. Benefiting from the personalized aggregation, WEF-Defense eliminates the impact of the random weight by leaving out the update from the free-riders. Therefore, the trained model for benign clients achieves the trade-off between accuracy and defensibility.
Observing the lines in Fig.5, we can conclude that HMA of global model trained with only benign clients and trained with WEF-Defense are close, where “” represents HMA trained with only benign clients. Comparing the main performance of different defense methods, especially on the CIFAR-10 dataset with the free-rider ratio of 90%, the main performance of the global model with baselines is affected, and only WEF-Defense achieves the expected primary performance. It is mainly attributed to WEF-Defense that separates free-riders and benign clients into groups , and adopts a personalized federated learning approach to provide them with different global models.
7.5.2 Time Complexity Analysis
Compared with the FedAvg, WEF-Defense requires each client to upload not only the updated weights of the local model, but also the WEF-Matrix for free-rider detection. Thus the communication overhead of WEF-Defense is calculated to analyze its complexity.
Datasets | Models | Methods | Ratios | |||
---|---|---|---|---|---|---|
10% | 30% | 50% | 90% | |||
MNIST | LeNet | FedAvg | 28.71 | 22.35 | 16.82 | 6.51 |
WEF-Defense | 29.97 | 24.55 | 17.82 | 6.88 | ||
CIFAR-10 | VGG16 | FedAvg | 204.98 | 170.38 | 133.51 | 54.85 |
WEF-Defense | 211.98 | 180.35 | 167.64 | 78.52 | ||
GTSRB | ResNet18 | FedAvg | 284.74 | 222.35 | 159.57 | 80.38 |
WEF-Defense | 318.23 | 252.82 | 184.79 | 85.82 | ||
ADULT | MLP | FedAvg | 7.15 | 5.01 | 3.77 | 1.62 |
WEF-Defense | 9.19 | 5.82 | 4.42 | 2.22 | ||
BANK | MLP | FedAvg | 2.84 | 1.78 | 1.54 | 0.61 |
WEF-Defense | 4.33 | 3.11 | 2.53 | 1.08 |
Implementation Details. We evaluate and compare the time cost of one epoch for normal training (i.e., FedAvg) and WEF-Defense training on each dataset. The experimental results are shown in Table 8.
Results and Analysis. The time cost of WEF-Defense is tolerable. For instance, on CIFAR-10 and GTSRB in Table 8, WEF-Defense takes about 3.41% to 43.15% more time than the normal training process. Moreover, several datasets (e.g., ADULT and BANK) are easier to train, thus their time cost is negligible.
Answer to RQ3: In summary, WEF-Defense can trade-off defense and main task performance. On average, 1) WEF-Defense outperforms CFFL and RFFL by 33.02% and 10.73% on the main task performance when facing high-proportion free-riders, respectively; 2) compared with FedAvg, WEF-Defense only takes an average of 23.8% more time.
7.6 RQ4: Interpretation of WEF-Defense via Visualization
We further illustrate the effectiveness of WEF-Defense by visualization.
|
|
|
|
![]() |
![]() |
![]() |
![]() |
Implementation Details. (1) For federated training with WEF-Defense, we visualize the global model accuracy obtained by two groups on the GTSRB dataset. The results are shown in Fig.6. (2) In the experiment under the IID setting, WEF-Matrix of the four selected benign clients and four free-rider attacks (including the original free-rider attack) are displayed in the heatmap, as shown in Fig.7. More visualization results we put in the A.
Results and Analysis. At the early stage of federated training, the server can completely separate benign clients and free-riders, as shown in Fig.6. Consequently, WEF-Defense is capable of preventing free-riders to obtain a high-quality model. Meanwhile, after free-riders are separated from the benign clients, the accuracy of global models assigned to free-riders is low or even degraded, while benign clients can train collaboratively to build high-quality models.
The superiority of defense timeliness is because WEF-Matrix can effectively distinguish benign clients from free-riders. It is obvious from Fig.7 that, on the one hand, the model weight evolving frequency of benign clients has a certain evolving pattern, e.g., some weights evolving frequency are much larger than others, indicating that during normal training, the input data has a greater impact on the weights and has strong activation. Some weights do not have a large frequency variation, indicating that some neurons are difficult to activate, resulting in a weaker optimization of the weights. On the other hand, in the free-rider’s WEF-Matrix, the original free-rider attack does not perform any operation on the model issued by the server, so the weight does not have any optimization process, resulting in the overall weight variation frequency of 0.
In the other three free-rider attacks, although different degrees of camouflage are used, it is difficult to identify sensitive and insensitive neurons because the local model did not carry out normal training. Meanwhile, due to the non-sharing between clients, stealing the optimization results of each weight is difficult. Therefore, it is a challenge for camouflage methods to correctly simulate the variation frequency of each weight, which leads to a very large difference from the WEF-Matrix of the benign client. This enables the server to separate free-riders from benign clients in the early stages of training.
Answer to RQ4: Through the visual analysis of WEF-Defense, the effectiveness of WEF-Matrix is further illustrated, it outperforms baselines in two aspects: (1) it prevents free-riders from acquiring the global model contributed by benign clients at the early stage of training; (2) it does not affect the global model trained by benign clients.
7.7 RQ5: Defense against Adaptive Attacks and Hyperparameter Sensitivity Analysis
This section discusses the effectiveness of WEF-Defense system against adaptive attack and the sensitivity analysis of hyperparameter to different experimental settings.
7.7.1 Defense against Adaptive Attack
When answering this question, we design an adaptive attack method to test the robustness of WEF-Defense. The attack absorbs the advantages of the most camouflaged delta weight attack. Meanwhile, to imitate the evolving frequency of the weight during the training process of the benign client, so as to narrow the difference with the WEF-Matrix of the benign client.
Implementation Details. (1) For our proposed WEF-Defense, we design an adaptive free-rider attack method that simulates the normal training and weight optimization process of benign clients. It adds different perturbation noise in the local training round, including , , normally distributed noise. (2) Experiments are conducted under the IID and Non-IID data of five datasets. The results are shown in the Table 9.
Ratio under IID | Ratio under Non-IID | |||||||
---|---|---|---|---|---|---|---|---|
Datasets | 10% | 30% | 50% | 90% | 10% | 30% | 50% | 90% |
MNIST | 10.14 | 20.12 | 19.71 | 24.2 | 21.74 | 15.63 | 21.52 | 23.41 |
CIFAR-10 | 10.15 | 21.53 | 20.52 | 20.45 | 9.47 | 18.31 | 18.66 | 13.63 |
GTSRB | 35.27 | 36.46 | 36.22 | 35.95 | 30.04 | 26.99 | 34.71 | 23.04 |
ADULT | 64.65 | 56.66 | 56.36 | 55.94 | 63.63 | 51.82 | 51.73 | 49.96 |
BANK | 62.55 | 70.05 | 69.55 | 68.95 | 67.45 | 50.00 | 63.67 | 50.00 |
Results and Analysis. We find that the designed adaptive attack is still difficult to obtain a high-quality model under WEF-Defense as shown in Table 9. Although we try to simulate the weights optimization process of benign clients by adding different noises. Since the free-rider itself is not trained, and the information of the local training of the benign client cannot be obtained. It is difficult for the camouflage methods to simulate the optimization process correctly, resulting in a difference from the WEF-Matrix of benign clients.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
7.7.2 Hyperparameter Analysis of Reputation Threshold
In this section, we investigate robust bounds on the reputation threshold. The selection of reputation threshold separates free-riders from benign clients by grouping clients into . A key challenge lies in choosing an appropriate reputation threshold, for example, a reputation threshold that is too large or too small may make it difficult to separate all free-riders from benign clients.
Implementation Details. The similarity deviation values for all clients in the five datasets are tested and visualized under the IID and Non-IID settings, where takes the average of the first five rounds of clients. Besides, we perform a unified analysis of the client proportions for the three free-rider attacks. The result is shown in Fig.8.
Results and Analysis. We through visual analysis, it is found that the reputation threshold selection of WEF-Defense has a certain boundary range, which explains why WEF-Defense can effectively separate benign clients and free-riders in various scenarios. Fig.8 shows that in the IID data scenario, the optional range of thresholds is larger than in the Non-IID data scenario. We guess that under the Non-IID data, there are some cases where the local data distribution of some benign clients is more extreme, resulting in a certain difference between its WEF-Matrix and other benign clients, but this does not affect the implementation of our method. The reputation threshold set in the experiment can distinguish 100% of benign clients and free-rider clients.
7.7.3 Effect of Learning Rate on Reputation Threshold
We analyze whether the learning rate has a strong effect on the bounds of the reputation threshold.
Implementation Details. On the MNIST dataset under the IID and Non-IID settings, we consider the influence of different learning rates on the reputation threshold, where the learning rates are set to 0.005, 0.01, and 0.1, respectively. The experimental results are shown in Fig.9.
Results and Analysis. The similarity deviation of the client does not fluctuate greatly under different learning rates, as can be seen from the analysis Fig.9, indicating that the effect of the learning rate on the threshold boundary is small. The reason may be that regardless of the setting of the learning rate, the optimization of the weights requires a variation process, which does not affect the formation of the WEF-Matrix. It further demonstrates that the reputation threshold is not affected by the learning rate.
Answer to RQ5: Experiments demonstrate that WEF-Defense is robust to adaptive attacks and hyperparameter . Specifically, 1) due to the significant difference between benign clients and free-riders, WEF-Defense has a strong ability to resist camouflage and can effectively defend against adaptive attacks; 2) the hyperparameter in WEF-Defense has a good adjustable range, and is not greatly affected by the learning rate.
8 Limitation and Discussion
Although WEF-Defense has demonstrated its outstanding performance in defending against various free-rider attacks, its effectiveness can still be improved in terms of Non-IID data and time cost.
Process Non-IID data. The reputation threshold boundary range under the Non-IID setting is not as wide as that under the IID setting. We speculate the reason is that there are several benign clients with poor local data quality under the Non-IID setting. These clients’ contribution to federated training may not be much more than that of free-riders. Therefore, it is necessary to improve the identification of free-riders under the Non-IID setting.
Reduce time cost. Despite the advantages of WEF-Defense in terms of defense, it can be further improved in terms of time cost. The main reason is that the client needs to upload additional information, which increases the time cost. It is worth the effort to reduce the time cost while ensuring the defense effectiveness.
9 Conclusion
In this paper, we highlight that the difference between free-riders and benign clients in the dynamic training progress can be effectively used to defend against free-rider attacks, based on which we propose WEF-Defense. WEF-Defense generally outperforms all baselines and also performs well against various camouflaged free-rider attacks. The experiments further analyze the effectiveness of WEF-Defense from five perspectives, and verify that WEF-Defense can not only defend against free-rider attacks, but also does not affect the training of benign clients. Since WEF-Defense and existing methods are complementary to each other, we plan to design a more robust and secure federated learning mechanism by exploring the potential of combining them in the future work. Besides, it is possible to conduct free-rider attack on vertical FL. In the future, we will explore the free-rider attack on vertical FL and possible defense.
10 Acknowledgements
This research is supported by the National Natural Science Foundation of China (No. 62072406), the National Key R&D Projects of China (No. 2018AAA0100801), the Key R&D Projects in Zhejiang Province (No. 2021C01117), the 2020 Industrial Internet Innovation Development Project (No.TC200H01V), “Ten Thousand Talents Program” in Zhejiang Province (No. 2020R52011), and the Key Lab of Ministry of Public Security (No. 2020DSJSYS001).
References
- Ayi and El-Sharkawy (2020) Ayi, M., El-Sharkawy, M., 2020. Rmnv2: Reduced mobilenet V2 for CIFAR10, in: 10th Annual Computing and Communication Workshop and Conference, CCWC 2020, Las Vegas, NV, USA, January 6-8, 2020, IEEE. pp. 287–292. URL: https://doi.org/10.1109/CCWC47524.2020.9031131.
- Blanco-Justicia et al. (2021) Blanco-Justicia, A., Domingo-Ferrer, J., Martínez, S., Sánchez, D., Flanagan, A., Tan, K.E., 2021. Achieving security and privacy in federated learning systems: Survey, research challenges and future directions. Eng. Appl. Artif. Intell. 106, 104468. URL: https://doi.org/10.1016/j.engappai.2021.104468.
- Chen (2022) Chen, H., 2022. Reliable and Efficient Distributed Machine Learning. Ph.D. thesis. Royal Institute of Technology, Stockholm, Sweden. URL: https://nbn-resolving.org/urn:nbn:se:kth:diva-310374.
- El-Sawy et al. (2016) El-Sawy, A., El-Bakry, H.M., Loey, M., 2016. CNN for handwritten arabic digits recognition based on lenet-5, in: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, AISI 2016, Cairo, Egypt, October 24-26, 2016, pp. 566–575. URL: https://doi.org/10.1007/978-3-319-48308-5_54.
-
Fraboni et al. (2021)
Fraboni, Y., Vidal, R.,
Lorenzi, M., 2021.
Free-rider attacks on model aggregation in federated learning, in: The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, PMLR. pp. 1846–1854.
URL: http://proceedings.mlr.press/v130/fraboni21a.html. - Gao et al. (2022) Gao, L., Li, L., Chen, Y., Xu, C., Xu, M., 2022. FGFL: A blockchain-based fair incentive governor for federated learning. J. Parallel Distributed Comput. 163, 283–299. URL: https://doi.org/10.1016/j.jpdc.2022.01.019.
- Hard et al. (2018) Hard, A., Rao, K., Mathews, R., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C., Ramage, D., 2018. Federated learning for mobile keyboard prediction. CoRR abs/1811.03604. URL: http://arxiv.org/abs/1811.03604.
-
He et al. (2016)
He, K., Zhang, X., Ren,
S., Sun, J., 2016.
Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, IEEE Computer Society. pp. 770–778.
URL: https://doi.org/10.1109/CVPR.2016.90. - Huang et al. (2022) Huang, W., Li, T., Wang, D., Du, S., Zhang, J., Huang, T., 2022. Fairness and accuracy in horizontal federated learning. Inf. Sci. 589, 170–185. URL: https://doi.org/10.1016/j.ins.2021.12.102.
- Jiang et al. (2021a) Jiang, C., Xu, C., Zhang, Y., 2021a. PFLM: privacy-preserving federated learning with membership proof. Inf. Sci. 576, 288–311. URL: https://doi.org/10.1016/j.ins.2021.05.077.
- Jiang et al. (2016) Jiang, F., Liu, G., Du, J., Sui, Y., 2016. Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183. URL: https://doi.org/10.1016/j.ins.2015.11.005.
- Jiang et al. (2022) Jiang, J., Cui, B., Zhang, C., 2022. Distributed Machine Learning and Gradient Optimization. Springer. URL: https://doi.org/10.1007/978-981-16-3420-8.
- Jiang et al. (2021b) Jiang, Y., Xu, G., Fang, Z., Song, S., Li, B., 2021b. Heterogeneous fairness algorithm based on federated learning in intelligent transportation system. J. Comput. Methods Sci. Eng. 21, 1365–1373. URL: https://doi.org/10.3233/JCM-214991.
-
Kohavi (1996)
Kohavi, R., 1996.
Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, AAAI Press. pp. 202–207.
URL: http://www.aaai.org/Library/KDD/1996/kdd96-033.php. - Kuo and Pham (2022) Kuo, T., Pham, A., 2022. Detecting model misconducts in decentralized healthcare federated learning. Int. J. Medical Informatics 158, 104658. URL: https://doi.org/10.1016/j.ijmedinf.2021.104658.
- LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324. URL: https://doi.org/10.1109/5.726791.
- Li et al. (2022) Li, D., Han, D., Weng, T., Zheng, Z., Li, H., Liu, H., Castiglione, A., Li, K., 2022. Blockchain for federated learning toward secure distributed machine learning systems: a systemic survey. Soft Comput. 26, 4423–4440. URL: https://doi.org/10.1007/s00500-021-06496-5.
- Lin et al. (2019) Lin, J., Du, M., Liu, J., 2019. Free-riders in federated learning: Attacks and defenses. CoRR abs/1911.12560. URL: http://arxiv.org/abs/1911.12560.
- Liu and Ma (2022) Liu, G., Ma, W., 2022. A quantum artificial neural network for stock closing price prediction. Inf. Sci. 598, 75–85. URL: https://doi.org/10.1016/j.ins.2022.03.064.
- Long et al. (2020) Long, G., Tan, Y., Jiang, J., Zhang, C., 2020. Federated learning for open banking, in: Yang, Q., Fan, L., Yu, H. (Eds.), Federated Learning - Privacy and Incentive. Springer. volume 12500 of Lecture Notes in Computer Science, pp. 240–254. URL: https://doi.org/10.1007/978-3-030-63076-8_17.
- Lyu et al. (2020) Lyu, L., Xu, X., Wang, Q., Yu, H., 2020. Collaborative fairness in federated learning, in: Federated Learning - Privacy and Incentive. Springer. volume 12500 of Lecture Notes in Computer Science, pp. 189–204. URL: https://doi.org/10.1007/978-3-030-63076-8_14.
- Malmierca et al. (2019) Malmierca, M.S., Niño-Aguillón, B.E., Nieto-Diego, J., Porteros, Á., Pérez-González, D., Escera, C., 2019. Pattern-sensitive neurons reveal encoding of complex auditory regularities in the rat inferior colliculus. NeuroImage 184, 889–900. URL: https://doi.org/10.1016/j.neuroimage.2018.10.012.
- McMahan et al. (2017) McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A., 2017. Communication-efficient learning of deep networks from decentralized data, in: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, PMLR. pp. 1273–1282. URL: http://proceedings.mlr.press/v54/mcmahan17a.html.
- McMahan et al. (2016) McMahan, H.B., Moore, E., Ramage, D., y Arcas, B.A., 2016. Federated learning of deep networks using model averaging. CoRR abs/1602.05629. URL: http://arxiv.org/abs/1602.05629.
- Moro et al. (2011) Moro, S., Laureano, R., Cortez, P., 2011. Using data mining for bank direct marketing: An application of the crisp-dm methodology .
-
Rached et al. (2021)
Rached, N.B., Haji-Ali, A.,
Rubino, G., Tempone, R.,
2021.
Efficient importance sampling for large sums of independent and identically distributed random variables.
Stat. Comput. 31, 79. URL: https://doi.org/10.1007/s11222-021-10055-1. - Rademacher and Doroslovacki (2021) Rademacher, P., Doroslovacki, M., 2021. Bayesian learning for regression using dirichlet prior distributions of varying localization, in: IEEE Statistical Signal Processing Workshop, SSP 2021, Rio de Janeiro, Brazil, July 11-14, 2021, IEEE. pp. 236–240. URL: https://doi.org/10.1109/SSP49050.2021.9513745.
- Sankaran et al. (2022) Sankaran, A., Alashti, N.A., Psarras, C., Bientinesi, P., 2022. Benchmarking the linear algebra awareness of tensorflow and pytorch. CoRR abs/2202.09888. URL: https://arxiv.org/abs/2202.09888.
- Sermanet and LeCun (2011) Sermanet, P., LeCun, Y., 2011. Traffic sign recognition with multi-scale convolutional networks, in: The 2011 International Joint Conference on Neural Networks, IJCNN 2011, San Jose, California, USA, July 31 - August 5, 2011, IEEE. pp. 2809–2813. URL: https://doi.org/10.1109/IJCNN.2011.6033589.
- Shingi (2020) Shingi, G., 2020. A federated learning based approach for loan defaults prediction, in: Fatta, G.D., Sheng, V.S., Cuzzocrea, A., Zaniolo, C., Wu, X. (Eds.), 20th International Conference on Data Mining Workshops, ICDM Workshops 2020, Sorrento, Italy, November 17-20, 2020, IEEE. pp. 362–368. URL: https://doi.org/10.1109/ICDMW51313.2020.00057.
- Simonyan and Zisserman (2015) Simonyan, K., Zisserman, A., 2015. Very deep convolutional networks for large-scale image recognition, in: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, pp. 1–14. URL: http://arxiv.org/abs/1409.1556.
- Tan et al. (2021) Tan, A.Z., Yu, H., Cui, L., Yang, Q., 2021. Towards personalized federated learning. CoRR abs/2103.00710. URL: https://arxiv.org/abs/2103.00710.
- Tolstikhin et al. (2021) Tolstikhin, I.O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., Lucic, M., Dosovitskiy, A., 2021. Mlp-mixer: An all-mlp architecture for vision. CoRR abs/2105.01601. URL: https://arxiv.org/abs/2105.01601.
- Wan et al. (2021) Wan, S., Lu, J., Fan, P., Shao, Y., Peng, C., Letaief, K.B., 2021. How global observation works in federated learning: Integrating vertical training into horizontal federated learning. CoRR abs/2112.01039. URL: https://arxiv.org/abs/2112.01039.
- Wang et al. (2021) Wang, F., Zhu, H., Lu, R., Zheng, Y., Li, H., 2021. A privacy-preserving and non-interactive federated learning scheme for regression training with gradient descent. Inf. Sci. 552, 183–200. URL: https://doi.org/10.1016/j.ins.2020.12.007.
- Xu et al. (2021) Xu, J., Glicksberg, B.S., Su, C., Walker, P.B., Bian, J., Wang, F., 2021. Federated learning for healthcare informatics. J. Heal. Informatics Res. 5, 1–19. URL: https://doi.org/10.1007/s41666-020-00082-4, doi:10.1007/s41666-020-00082-4.
-
Xu et al. (2020)
Xu, J., Park, S.H., Zhang,
X., 2020.
A temporally irreversible visual attention model inspired by motion sensitive neurons.
IEEE Trans. Ind. Informatics 16, 595–605. URL: https://doi.org/10.1109/TII.2019.2934144. - Xu and Lyu (2021) Xu, X., Lyu, L., 2021. A reputation mechanism is all you need: Collaborative fairness and adversarial robustness in federated learning, in: Proc. ICML Workshop on Federated Learning for User Privacy and Data Confidentiality, pp. 1–13.
- Yang et al. (2019) Yang, Q., Liu, Y., Chen, T., Tong, Y., 2019. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 10, 12:1–12:19. URL: https://doi.org/10.1145/3298981.
- Yang et al. (2018) Yang, T., Andrew, G., Eichner, H., Sun, H., Li, W., Kong, N., Ramage, D., Beaufays, F., 2018. Applied federated learning: Improving google keyboard query suggestions. CoRR abs/1812.02903. URL: http://arxiv.org/abs/1812.02903.
- Zamzami and Bouguila (2022) Zamzami, N., Bouguila, N., 2022. Sparse count data clustering using an exponential approximation to generalized dirichlet multinomial distributions. IEEE Trans. Neural Networks Learn. Syst. 33, 89–102. URL: https://doi.org/10.1109/TNNLS.2020.3027539.
- Zhang et al. (2021) Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., Gao, Y., 2021. A survey on federated learning. Knowl. Based Syst. 216, 106775. URL: https://doi.org/10.1016/j.knosys.2021.106775.
- Zhao et al. (2022) Zhao, Z., Huang, J., Roos, S., Chen, L.Y., 2022. Attacks and defenses for free-riders in multi-discriminator GAN. CoRR abs/2201.09967. URL: https://arxiv.org/abs/2201.09967.
- Zhu et al. (2021a) Zhu, H., Xu, J., Liu, S., Jin, Y., 2021a. Federated learning on non-iid data: A survey. Neurocomputing 465, 371–390. URL: https://doi.org/10.1016/j.neucom.2021.07.098.
- Zhu et al. (2021b) Zhu, H., Xu, J., Liu, S., Jin, Y., 2021b. Federated learning on non-iid data: A survey. Neurocomputing 465, 371–390. URL: https://doi.org/10.1016/j.neucom.2021.07.098.
- Zoghbi et al. (2016) Zoghbi, S., Vulic, I., Moens, M., 2016. Latent dirichlet allocation for linking user-generated content and e-commerce data. Inf. Sci. 367-368, 573–599. URL: https://doi.org/10.1016/j.ins.2016.05.047.
-
Zong et al. (2018)
Zong, B., Song, Q., Min,
M.R., Cheng, W., Lumezanu, C.,
Cho, D., Chen, H., 2018.
Deep autoencoding gaussian mixture model for unsupervised anomaly detection, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, OpenReview.net. pp. 1–19.
URL: https://openreview.net/forum?id=BJJLHbb0-.
Appendix A More Visualizations
![]() |
![]() |
![]() |
![]() |
|
|
|
|