With the development of the Internet of Things (IoT), the amount of data from end devices is exploding at an unprecedented rate. Conventional machine learning (ML) technologies encounter the problem of how to efficiently collect distributed data from various IoT devices for centralized processing [TheNextGrandChallenges]. To tackle this issue raised by transmission bottleneck, distributed machine learning (DML) has emerged to process data at the network edge in a distributive manner . DML can alleviate the burden on the central server by dividing a task into sub-tasks assigned to multiple nodes. However, DML needs to exchange samples when training a task [DBLP:conf/aistats/McMahanMRHA17], posing a serious risk of privacy leakage . As such, federated learning (FL) [DBLP:journals/corr/KonecnyMRR16], proposed by Google as a novel DML paradigm, shows its potential advantages . In a FL system, a machine learning model is trained across multiple distributed clients with local datasets and then aggregated on a centralized server. FL is able to cooperatively implement machine learning tasks without raw data transmissions, thereby promoting clients’ data privacy [9048613, DBLP:journals/corr/abs-2007-02056, DBLP:journals/tifs/WeiLDMYFJQP20]. FL has been applied to various data-sensitive scenarios, such as smart health-care, E-commerce [DBLP:journals/tist/YangLCT19], and the Google project Gboard [DBLP:journals/corr/abs-1912-01218].
However, due to centralized aggregations of models, standard FL is vulnerable to server malfunctions and external attacks, incurring either inaccurate model updates or even training failures. In order to solve this single-point-failure issue, blockchain [nakamoto2008peer, DBLP:journals/fgcs/ReynaMCSD18, 8436042] has been applied to FL systems. Leveraging advantages of blockchain techniques, the work in [DBLP:journals/corr/abs-1808-03949] developed a blockchain-enabled FL architecture to validate the uploaded parameters and investigated system performance, such as block generation rate and learning latency. Later, the work in [DBLP:conf/cyberc/MartinezFH19] incorporated Delegated Proof of Stake (DPoS) into blockchain-enabled FL to enhance the delay performance at the expense of robustness. The recent work in [DBLP:journals/tii/LuHDMZ20a] developed a tamper-proof architecture that utilized blockchain to enhance system security when sharing parameters, and proposed a novel consensus mechanism, i.e., Proof of Quality (PoQ), to optimize reward function. Since model aggregations are fulfilled by miners in a decentralized manner, the blockchained FL can solve the single-point-failure problem. In addition, owing to a validation process of local training, FL can be extended to untrustworthy devices in a public network .
Although the above mentioned works resorted to blockchain architecture for avoiding single-point-failure, they inevitably introduced a third-party, i.e., miners rooted from blockchain, to store the aggregated models distributively, causing potential information leakage. Also, these works did not analyze the convergence performance of model training, which is important for evaluating FL learning performance. In addition, the consumption of resources, e.g., computing capability, caused by mining in blockchain  is generally not taken into account in these works. However, resources consumed by mining are not negligible compared with those consumed by FL model training [DBLP:journals/fgcs/ReynaMCSD18]. Hence, blockchain-enabled FL needs to balance resource allocation between training and mining.
In this work, we propose a novel blockchain assisted decentralized FL (BLADE-FL) architecture. In BLADE-FL, training and mining processes are incorporated and implemented at each client, i.e., a client conducts both model training and mining tasks with its own computing capability. We analyze an upper bound on the loss function to evaluate the learning performance of BLADE-FL. Then we optimize the computing resource allocation between local training and mining on a client to approach optimal learning performance. We also pay special attention to a security issue that inherently exists in BLADE-FL, known as lazy clients problem. In this problem, lazy clients try to save their computing resources by directly plagiarizing models from others, leading to training deficiency and performance degradation.
The main contributions of this paper can be summarized as follows.
We propose a novel blockchain-assisted FL framework, called BLADE-FL, to overcome the issues raised by the centralized aggregations in conventional FL systems. In each round of BLADE-FL, clients first train local models and broadcast them to others. Then they play the role as miners to compete for generating a block based on the received models. Afterwards, each client aggregates these models from the verified block to form an initial model utilized for the local training in the next round. Compared with conventional blockchain-enabled FL, our BLADE-FL helps promote privacy against model leakage, and guarantees tamper-resistant model updates in a trusted blockchain network.
We analyze an upper bound on the loss function to evaluate the learning performance of BLADE-FL. In particular, we minimize the upper bound by optimizing the computing resource allocation between training and mining, and further explore the relationship among the optimal number of integrated rounds, the training time per iteration, the mining time per block, the number of clients, and the learning rate.
We develop a lazy model for BLADE-FL, where the lazy clients plagiarize others’ weights and add artificial noises. Moreover, we develop an upper bound on the loss function for this case, and investigate the impact of the number of lazy clients and the power of artificial noises on the learning performance.
We provide experimental results, which are consistent with the analytical results. In particular, the developed upper bound on the loss function is tight with reference to the experimental ones (e.g., the gap can be lower than ), and the optimized resource allocation approaches the minimum of the loss function.
|The set of training samples in the -th client|
|The -th client|
|The total number of clients|
|The total number of lazy clients|
The variance of artificial noise added by
|The total number of integrated rounds|
|The number of iterations of local training|
|The global loss function|
|The local loss function of the -th client|
|Local model weights of the -th client|
|at the -th integrated round|
|Global model weights aggregated from local|
|models at the -th integrated round|
|Local model weights of the -th lazy client|
|at the -th integrated round|
|Learning rate of gradient descent algorithm|
|Training time per iteration|
|Mining time per block|
|Total computing time constraint of a FL task|
The remainder of this paper is organized as follows. Section 2 first introduces the background of this paper. Then we propose BLADE-FL in Section 3, and optimize the upper bound on the loss function in Section 4. Section 5 investigates the issue of lazy clients. The experimental results are presented in Section 6. Section 7 concludes this paper. In addition, Table I lists the main notation used in this paper.
2.1 Federated Learning
In a FL system, there are clients with the -th client possessing the dataset of size ,
. Each client trains its local model, e.g., a deep neural network, based on its local data and transmits the trained model to the server. Upon receiving the weights from all the clients, the server performs a global model aggregation. There are a number of communication rounds for exchanging models between the server and clients. Each round consists of an uploading phase where the clients upload their local models, and a downloading phase where the server aggregates the model and broadcasts it to the clients. Clients then update their local models based on the global one.
In the -th communication round, the server performs a global aggregation according to some combining rule, e.g., , where and denote the local weights of the -th client and the aggregated weights, respectively. The global loss function is defined as [DBLP:journals/corr/abs-1907-09693], where is the local loss function of the -th client. In FL, each client is trained locally to minimize the local loss function, while the entire system is trained to minimize the global loss function . The FL system finally outputs , where is the overall communication rounds. Different from the training process in conventional DML systems [DBLP:conf/aistats/McMahanMRHA17], each client in FL only shares their local models rather than their personal data, to update the global model , promoting the clients’ privacy.
Blockchain is a shared and decentralized ledger. In a blockchain system, each block stores a group of transactions. The blocks are linked together to form a chain by referencing the hash value of the previous block. Owing to the cryptographic chain structure, any data modification within any block destroys the preceding chain structure. Therefore, it is impossible to tamper with the data that has been stored in the blockchain.
In addition, thanks to the consensus mechanism, each transaction included in the newly generated block is also immutable. The consensus mechanism validates the data within the blocks and ensures that all the nodes participating in the blockchain store the same data. The most prevalent consensus mechanism is Proof of Work (PoW), used in the Bitcoin system [nakamoto2008peer]. In Bitcoin, the process of the block generation is as follows. First, a node broadcasts a transaction with its signature to the blockchain network by the gossip protocol [DBLP:journals/sigops/DemersGHILSSST88]. Then the nodes in blockchain verify the transaction by the signature. Afterward, each node collects the verified transactions and competes to generate a new block that includes these transactions, by finding a unique, one-time number (called a nonce). This is to make the hash value of the data meet a specific target value. The node that finds the proper nonce is eligible to generate a new block and broadcasts the block to the entire network. Finally, the nodes validate the new block and append the verified block into the existing blockchain [DBLP:journals/cem/PuthalMMKD18]. Notably, the work in PoW is a mathematical problem that is easy to verify but extremely hard to solve. The nodes in the blockchain consume massive computing resources to figure out this complex problem. This process is called mining, and those who take part in it are known as miners. Because of the mining process, PoW can defense attacks on the condition that the total computing power of malicious devices are less than the sum of honest devices (i.e., 51% attack) [DBLP:journals/cacm/EyalS18].
In this context, blockchain is safe and reliable with the aid of the chain structure and consensus mechanism. Driven by these merits, we deploy the blockchain to replace the central server, and build up a decentralized FL network with privacy protection.
3 Proposed Framework
The BLADE-FL network consists of clients each with equal computing power111In this paper, computing resource and computing power are used exchangeably, and both are measured by CPU cycles per second.. In this network, each client acts as not only a trainer but also a miner, and the role transition is designed as follows. First, each client (as a trainer) trains the local model, and then broadcasts the local model to the entire network as a requested transaction of the blockchain. Second, the client (as a miner) mines the block that includes all the local models that are ready to be aggregated. Once the newly generated block is validated by the majority of clients, the verified models in the block are immutable. Without the intervention of any centralized server, each client performs the global aggregation to update its local model by using all the shared models in the validated block. Suppose that the uploading and downloading phases cannot be tampered with external attackers.
Let us consider that all the clients deploy the same time allocation strategy for local training and mining. In other words, all the clients start the training at the same time, and then turn to the mining simultaneously. In this context, for each global model update and block generation, we define an integrated round for BLADE-FL that combines a communication round of FL and a mining round of blockchain. As illustrated in Fig. 1, the -th integrated round can be specified as the following steps222In the very beginning of first integrated round, each client initializes its local parameters, such as initial weight, learning rate, etc...
Local Training. Each client performs the local training by iterating the learning algorithm times to update its own model .
Model Broadcasting and Verification. Each client signs its models by the digital signature and propagates the models as its requested transactions. Other clients verify the transactions of the requested client (i.e., identity of the client).
Mining. Upon receiving the models from others, all the clients compete to mine the -th block.
Block Validation. All the clients append the new block onto their local ledgers only if the block is validated.
Local Updating. Upon receipt of verified transactions in this block, each client updates its local model. Then the system proceeds to the -th round.
In contrast to [DBLP:journals/tii/LuHDMZ20a], BLADE-FL does not rely on an additional third-party for global aggregation, thereby promoting privacy against model leakage. From the above steps, the consensus mechanism builds a bridge between the local models from clients and the model aggregation. Thanks to PoW, BLADE-FL guarantees the tamper-resistant model update in a trusted blockchain network.
3.2 Computing Resource Allocation Model
In this subsection, we model the time required for training and mining, to show the relationship between FL and blockchain in BLADE-FL.
Block Generation Rate: The block generation rate is determined by the computation complexity of the hash function and the total computing power of the blockchain network (i.e., total CPU cycles). The average CPU cycles required to generate a block in PoW is defined as , where is the mining difficulty333Following PoW, the mining difficulty is adjusted at different intervals but maintains unaltered over each interval. Thus, we consider that the average CPU is invariant over the period with a fixed mining difficulty., and denotes the average number of total CPU cycles to generate a block [DBLP:journals/tpds/XuWLGLYG19]. Thus, we define the average generation time of a block as
where denotes the CPU cycles per second of each client. Given a fixed , is a constant.
Local Training Rate: Recall that the local training of each client contains iterations. The training time consumed by each training iteration at the -th client is given by 
where denotes the number of samples in the -th client, and denotes the number of CPU cycles required to train one sample. This paper considers that each client is equipped with the same hardware resources (e.g., CPU, battery, and cache memory). Therefore, each client is loaded with the same number of local samples, and has the same and . However, the contents of samples owned by different clients are diverse. For simplicity, we assume that each client uses the same training algorithm and trains the same number of iterations for its local model update. Consequently, each client has an identical local training time per iteration. In this context, we let , as a constant.
Consider that a typical FL learning task is required to be accomplished within a fixed duration of . Given the same hardware configuration, each client has the total number of CPU cycles . From to (1) and (2), the number of iterations for local training in each integrated round is given by
where denotes the floor function, and is a positive integer that represents the number of total integrated round. Furthermore, denotes the total training time, while is the total mining time. Under the constraint of computing time , we notice that the longer the mining takes, the shorter the training occupies. That is because that (3) implies a fundamental tradeoff in BLADE-FL, i.e., the more iterations each client trains locally, the fewer integrated rounds the BLADE-FL network performs. Moreover, due to the floor operation in (3), there may exists some computing time left, i.e., . We stress that the extra time is not sufficient to perform another integrated round, and thereby the global model cannot update during this period. In this context, we ignore this computing time and assume in the following analysis.
In what follows, we optimize the learning performance of BLADE-FL based on (3).
4 Performance Analysis of the BLADE-FL System
In this section, we evaluate the learning performance of BLADE-FL with the upper bound on the loss function in Section 4.1, and optimize the learning performance with respect to the number of integrated rounds in Section 4.2.
4.1 Achievable Upper Bound Analysis
Existing works such as - evaluated the learning performance of the standard FL based on the loss function, where a smaller value of the loss function corresponds to a learning model with higher accuracy. Recently, the work in [DBLP:journals/jsac/WangTSLMHC19] derived an upper bound on the loss function between the iterations of local training and global aggregation.
Compared with the standard FL, our BLADE-FL replaces the centralized server with a blockchain network for global aggregation. Notably, the training process and the aggregation rule are the same as the centralized FL. Thus, the derived upper bound on the loss function in [DBLP:journals/jsac/WangTSLMHC19] can be applied to BLADE-FL.
We make the following assumption for all the clients.
[For any two different and , we assume that]
is convex, i.e., ;
is -Lipschitz, i.e., ;
is L-smooth, i.e., .
According to Assumption 4.1, is convex, -Lipschitz, and L-smooth [DBLP:conf/pkdd/KarimiNS16].
The work in [DBLP:journals/jsac/WangTSLMHC19] also defined the following definition of measurements to capture the divergence between the gradient of the local loss function and that of the global loss function.
Definition ((Gradient Divergence) [DBLP:journals/jsac/WangTSLMHC19])
For each client, we define as an upper bound on , i.e., . Thus, the global gradient divergence can be expressed as .
This divergence is related to the distribution of local datasets over different clients. From [DBLP:journals/jsac/WangTSLMHC19], the following lemma presents an upper bound on the loss function in the standard FL.
An upper bound on the loss function is given by
denotes the initial weight, denotes the optimal global weight, and denotes the learning rate with , respectively.
and denotes the upper bound on the loss function in BLADE-FL.
The upper bound in (6) shows that the learning performance depends on the total number of integrated rounds , the local training time per iteration , the average mining time per block , the learning rate , the data distribution , and the total computing time . From Definition 4.1, is fixed given and the datasets of each client, and is preset. Recall that and are both constant in (1) and (2). Given any fixed , , , and , in (6) is an univariate function of . In the following theorem, we verify that is a convex function with respect to .
is convex with respect to .
From (6), we define
where and . Since is an univariate function, we can optimize to maximize . Notice that are independent of , and is a function with respect to . Therefore, we compute the first derivative and second derivative of with respect to , respectively, as
Then, we have
Thus, is convex. Since we have , we prove that is convex [DBLP:journals/pieee/LiFL20].
In practice, should not be too small, since a tiny will make the system vulnerable to external attacks .
4.2 Optimal Computing Resource Allocation
First, the following theorem shows the optimal solution that minimizes .
Given any fixed , , (or and , the optimal number of integrated rounds that minimizes the upper bound on the loss function in (6) is given by
Let and . We first have
Using (14), we obtain
Then, we approximate as a quadratic term with Taylor expansion:
Thus, can be written as
To solve the convex problem, we let , i.e.,
Finally, we have
This completes the proof.
Then, under a fixed constraint , let us focus on the effect of and on under fixed and by the following corollary444The following analytical results in Corollary 1, 2, 3, 4, and 5 are with respect to . Due to the fundamental tradeoff between and , the opposite results with respect to also hold. .
Given and , the optimal value decreases as either or goes up. In this case, more time is allocated to training when gets larger or to mining when gets larger.
This corollary is a straightforward result from Theorem 4.2.
Recall that denotes the training time per iteration, and denotes the mining time per block. From Corollary 4.2, the longer a local training iteration takes, the more computing power allocated to the local training at each client. Similarly, each client allocates more computing power to the mining when the mining time is larger.
Given fixed and , becomes larger as grows. In this case, more time is allocated to the mining.
Without approximation of (17), we first let , i.e.,
For simplicity, we let
Then, the first derivative of is given by
Notice that in (22) is a decreasing function with respect to , is an increasing function with respect to , and
is a decreasing function function with respect to , respectively. Thus, the solution of (22) drops as grows. Finally, we conclude that increases as rises.
Given fixed and , becomes smaller as grows. In this case, more time is allocated to the training.
The explanation of Corollary 4.2 is that each client may have trained an accurate local model but not an accurate global model ( is large), and thus BLADE-FL needs to perform more global aggregation especially when is small. This paper considers a number of honest clients in BLADE-FL to defend the malicious mining [DBLP:conf/trustbus/AbramsonHPPB20]. When is sufficiently large,
converges to its mean value according to the law of large number. In this context, Corollary4.2 shows that approaches a constant as converges, and further implies that is independent of .
Given fixed and , increases as goes larger. Meanwhile, the upper bound in (6) drops as grows if .
From (23), we know that increases as the learning rate rises, which leads to larger . Thus, from the proof of Corollary 4.2, descends as ascends. Then the derivative of the function with respect to is
where . It indicates that the loss function decreases as rate increases if . However, the condition is not satisfied when is sufficiently large. In this case, is not an increasing function with respect to , resulting in larger loss function. This completes the proof.
The reason behind Corollary 4.2 is that the global model may not converge when each client is allocated with limited learning resources and a small learning rate. In addition, a higher learning rate may lead to faster convergence but a less inaccurate local model. To compensate for the inaccurate training, more computing power is allocated to the local training. In practice, the learning rate is decided by the learning algorithm, and the learning rates of different learning algorithms are diverse. Therefore, we can treat as a constant in BLADE-FL.
5 Performance Analysis with Lazy clients
Different from the conventional FL, a new problem of learning deficiency caused by lazy clients emerges in the BLADE-FL system. This issue is fundamentally originated from the lack of an effective detection and penalty mechanism in an unsupervised network such as blockchain, where the lazy client is able to plagiarize models from others to save its own computing power. The lazy client does not contribute to the global aggregation, and even causes training deficiency and performance degradation. To study this issue, we first model the lazy client in Section 5.1. Then, we develop an upper bound on the loss function to evaluate the learning performance of BLADE-FL with the presence of lazy clients in Section 5.2. Next, we investigate the impact of the ratio of lazy clients and the power of artificial noises on the learning performance in Section 5.3. In this section, suppose that there exist lazy clients in BLADE-FL and . Let us define the lazy ratio as .
5.1 Model of Lazy Clients
A lazy client can simply plagiarize other models before mining a new block. To avoid being spotted by the system, each lazy client adds artificial noises to its model weights as
where denotes the set of lazy clients,. As Fig. 2 illustrates, the -th client is identified as the lazy client if it plagiarizes an uploaded model from others and add artificial noise onto it in Step ①. Except the plagiarism in Step ①, the lazy clients follow the honest clients555In this paper, we assume that each client is honest in the mining, because of the mining reward. to perform Step ②-⑤.
5.2 Achievable Upper Bound with Lazy Clients
In this subsection, we develop an upper bound on the loss function with the lazy ratio and the power of artificial noise in the following theorem.
Using the model of lazy clients in (27), an upper bound on the loss function after integrated rounds with the lazy ratio of is given by
where denotes the aggregated weights of BLADE-FL with lazy clients after integrated rounds, and denotes the performance degradation caused by lazy clients after integrated rounds.
Define the model weights of lazy clients as
where is the model parameters that are plagiarized by lazy clients.
Since is -Lipschitz, the proof of Lemma 1 in [DBLP:journals/jsac/WangTSLMHC19] has shown that
Therefore, the upper bound can be expressed as [DBLP:journals/jsac/WangTSLMHC19]
From (32), we further have
Thereafter, we use the upper bound in (28) to evaluate the learning performance of BLADE-FL with lazy clients.
5.3 Discussion on Performance with Lazy Clients
Practically, a lazy node tends not to add either huge or tiny noise in order to conceal itself. To this end, it is required that the value of is comparable to that of .
From (28), the plagiarism behavior contributes a term proportional to to the bound, while the artificial noise exhibits an impact term proportional to . This indicates that the plagiarism has a more significant effect on the learning performance compared with the noise perturbation.
Then, we reveal the impact of and on the optimal value of in the following corollary.
The optimal that minimizes in (28) decreases as either the lazy ratio or the noise variance grows.
From the definition of in (8), we let
As such, represents the loss function of BLADE-FL with lazy clients.
Since we have
we obtain that is still convex with respect to .
Furthermore, we let . Plugging this into , we have
Then we let
and express (39) as
We notice that
Thus is an increasing function with respect to . Let
Thus (41) can be rewritten as
where grows as either or increases, goes up as grows, and declines as goes up. Finally, that minimizes in (28) decreases as either or grows. This concludes the proof.
When the system is infested with a large number of lazy clients (i.e., the lazy ratio approaches 1), more computing power should be allocated to local training to compensate for the insufficient learning.
6 Experimental Results
In this section, we evaluate the analytical results with various learning parameters under limited computing time . First, we evaluate the developed upper bound in (6), and then investigate the optimal value of overall integrated rounds under training time per iteration , mining time per block , number of clients , learning rate , the ratio , and power of artificial noise .
6.1 Experimental Setting
1) Datasets: In our experiments, we use two datasets for non-IID setting to demonstrate the loss function and accuracy versus different values of .
MNIST. Standard MNIST handwritten digit recognition dataset consists of 60,000 training examples and 10,000 testing examples . Each example is a 2828 sized handwritten digit in grayscale format from 0 to 9. In Fig. 3, we illustrate several samples from the standard MNIST dataset.
Fashion-MNIST. Fashion-MNIST for clothes has 10 different types, such as T-shirt, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot, in Fig. 4.
2) FL setting. Each client progresses the learning of a Multi-Layer Perceptron (MLP) model. The MLP network has a single hidden layer that contains 256 hidden units. Each unit applies softmax function and rectified linear units of 10 classes (corresponding to the 10 digits in Fig.3 and 10 clothes in Fig. 4).
3) Parameters setting. In our experiments, we set the total computing time , the samples of each client , the number of clients , the mining time per block , the number of lazy clients , and the learning rate as default, where the time is normalized by the training time per iteration .
6.2 Experiments on Performance of BLADE-FL
Fig. 5 plots the gap between the developed upper bound in (6) and the experimental results. We set learning rate and lazy ratio in conditions (a) and (b), respectively. First, we can see that the developed bound is close but always higher than the experimental one under both conditions. Second, both the developed upper bound and the experimental results are convex with respect to , which agrees with Theorem 4.1. Third, both the upper bound in (6) and the experimental results reach the minimum at the same optimal value of .
Fig. 6 plots the experimental results of the loss function and accuracy on MNIST and Fashion-MNIST for different values of values of , while Table II shows the optimal training time and corresponding accuracy. Here, we set . First, Fig. 6(a) shows that larger leads to larger loss function. This is due to the fact that both and from (3) drops as grows. Second, from Table II, the longer a training iteration consumes, the more training time each client takes. For example, using MNIST, the training time increases from to as rises from to . This observation is consistent with Corollary 4.2.
Fig. 7 plots the experimental results of the loss function and accuracy on MNIST and Fashion-MNIST for different values of values of , while Table III shows the optimal mining time and corresponding accuracy. First, Fig. 7(a) shows that larger leads to larger loss function, since both and from (3) drops as grows. Second, from Table III, reduces as rises, but the optimal mining time goes up as rises. For example, using MNIST, the mining time increases from to as grows from to . This observation agrees with Corollary 4.2.
Fig. 8 shows the experimental results of the loss function and accuracy on MNIST and Fashion-MNIST for different values of values of , while Table IV illustrates the optimal mining time and corresponding accuracy. We set . First, from Table IV, we notice that the optimal mining time drops as increases, which is consistent with Proposition 4.2. For example, using MNIST, drops from to as rises from to . Second, from Fig. 8(a), larger leads to lower loss function. This is because the involved datasets are larger as grows, which causes a smaller loss function. Third, from both Fig. 8(a) and (b), approaches a fixed value when is sufficiently large (e.g., ). This observation is in line with Corollary 4.2.
Fig. 9 plots the experimental results of the loss function and accuracy on MNIST and Fashion-MNIST for different values of values of , while Table V illustrates the optimal mining time and corresponding accuracy. First, from Table V, we find that the optimal mining time rises as grows, which is in line with Corollary 4.2. For example, using MNIST, rises from to as grows from to . Second, from Fig. 9(a), the loss function drops as increases except . This is because grows significantly when , and our developed upper bound is no longer suitable. For example, when , the loss function increases as rises in our experiments for both MNIST and Fashion-MNIST.
6.3 Experiments on Performance with Lazy Clients
Fig. 10 plots the experimental results of the loss function and accuracy on MNIST and Fashion-MNIST for different values of values of lazy ratio , while Table VI shows the optimal training time and corresponding accuracy. We set the power of artificial noise . First, from Table VI, it is observed the optimal training time steps up as increases. For example, using MNIST, the time allocated to training rises from to as increases from to . This observation is consistent with Corollary 5.3. Second, from Fig. 10(a), the learning performance degrades as grows. This is because more lazy clients involved in the system as grows, leading to lower training efficiency.
Fig. 11 plots the experimental results of the loss function and accuracy on MNIST and Fashion-MNIST for different values of values of , while Table VII shows the optimal training time and corresponding accuracy. We set . First, from Table VII, we notice that the optimal training time grows as increases, which agrees with Corollary 5.3. For example, using MNIST, grows from to as increases from to . Second, from Fig. 11(a), the learning performance of BLADE-FL (i.e., loss function and accuracy) degrades as the noise power goes larger.
In this paper, we have proposed a BLADE-FL framework that integrates the training and mining process in each client, to overcome the single-point-failure of centralized network and maintain the privacy promoting capabilities of the FL system. In order to evaluate the learning performance of BLADE-FL, we have developed an upper bound on the loss function. Also, we have verified that the upper bound is convex with respect to the total number of integrated rounds and have minimized the upper bound by optimizing . Moreover, we have investigated a unique problem in the proposed BLADE-FL system, called the lazy client problem and have derived an upper bound on the loss function with lazy clients. We have included experimental results, which have been seen to be consistent with the analytical results. In particular, the developed upper bound is close to the experimental results (e.g., the gap can be lower than ), and the optimal that minimizes the upper bound also reaches the minimum of the loss function in the experimental results.