1 Introduction
With the development of the Internet of Things (IoT), the amount of data from end devices is exploding at an unprecedented rate. Conventional machine learning (ML) technologies encounter the problem of how to efficiently collect distributed data from various IoT devices for centralized processing [TheNextGrandChallenges]. To tackle this issue raised by transmission bottleneck, distributed machine learning (DML) has emerged to process data at the network edge in a distributive manner [8805879]. DML can alleviate the burden on the central server by dividing a task into subtasks assigned to multiple nodes. However, DML needs to exchange samples when training a task [DBLP:conf/aistats/McMahanMRHA17], posing a serious risk of privacy leakage [9090973]. As such, federated learning (FL) [DBLP:journals/corr/KonecnyMRR16], proposed by Google as a novel DML paradigm, shows its potential advantages [8951246]. In a FL system, a machine learning model is trained across multiple distributed clients with local datasets and then aggregated on a centralized server. FL is able to cooperatively implement machine learning tasks without raw data transmissions, thereby promoting clients’ data privacy [9048613, DBLP:journals/corr/abs200702056, DBLP:journals/tifs/WeiLDMYFJQP20]. FL has been applied to various datasensitive scenarios, such as smart healthcare, Ecommerce [DBLP:journals/tist/YangLCT19], and the Google project Gboard [DBLP:journals/corr/abs191201218].
However, due to centralized aggregations of models, standard FL is vulnerable to server malfunctions and external attacks, incurring either inaccurate model updates or even training failures. In order to solve this singlepointfailure issue, blockchain [nakamoto2008peer, DBLP:journals/fgcs/ReynaMCSD18, 8436042] has been applied to FL systems. Leveraging advantages of blockchain techniques, the work in [DBLP:journals/corr/abs180803949] developed a blockchainenabled FL architecture to validate the uploaded parameters and investigated system performance, such as block generation rate and learning latency. Later, the work in [DBLP:conf/cyberc/MartinezFH19] incorporated Delegated Proof of Stake (DPoS) into blockchainenabled FL to enhance the delay performance at the expense of robustness. The recent work in [DBLP:journals/tii/LuHDMZ20a] developed a tamperproof architecture that utilized blockchain to enhance system security when sharing parameters, and proposed a novel consensus mechanism, i.e., Proof of Quality (PoQ), to optimize reward function. Since model aggregations are fulfilled by miners in a decentralized manner, the blockchained FL can solve the singlepointfailure problem. In addition, owing to a validation process of local training, FL can be extended to untrustworthy devices in a public network [8470083].
Although the above mentioned works resorted to blockchain architecture for avoiding singlepointfailure, they inevitably introduced a thirdparty, i.e., miners rooted from blockchain, to store the aggregated models distributively, causing potential information leakage. Also, these works did not analyze the convergence performance of model training, which is important for evaluating FL learning performance. In addition, the consumption of resources, e.g., computing capability, caused by mining in blockchain [8946151] is generally not taken into account in these works. However, resources consumed by mining are not negligible compared with those consumed by FL model training [DBLP:journals/fgcs/ReynaMCSD18]. Hence, blockchainenabled FL needs to balance resource allocation between training and mining.
In this work, we propose a novel blockchain assisted decentralized FL (BLADEFL) architecture. In BLADEFL, training and mining processes are incorporated and implemented at each client, i.e., a client conducts both model training and mining tasks with its own computing capability. We analyze an upper bound on the loss function to evaluate the learning performance of BLADEFL. Then we optimize the computing resource allocation between local training and mining on a client to approach optimal learning performance. We also pay special attention to a security issue that inherently exists in BLADEFL, known as lazy clients problem. In this problem, lazy clients try to save their computing resources by directly plagiarizing models from others, leading to training deficiency and performance degradation.
The main contributions of this paper can be summarized as follows.

We propose a novel blockchainassisted FL framework, called BLADEFL, to overcome the issues raised by the centralized aggregations in conventional FL systems. In each round of BLADEFL, clients first train local models and broadcast them to others. Then they play the role as miners to compete for generating a block based on the received models. Afterwards, each client aggregates these models from the verified block to form an initial model utilized for the local training in the next round. Compared with conventional blockchainenabled FL, our BLADEFL helps promote privacy against model leakage, and guarantees tamperresistant model updates in a trusted blockchain network.

We analyze an upper bound on the loss function to evaluate the learning performance of BLADEFL. In particular, we minimize the upper bound by optimizing the computing resource allocation between training and mining, and further explore the relationship among the optimal number of integrated rounds, the training time per iteration, the mining time per block, the number of clients, and the learning rate.

We develop a lazy model for BLADEFL, where the lazy clients plagiarize others’ weights and add artificial noises. Moreover, we develop an upper bound on the loss function for this case, and investigate the impact of the number of lazy clients and the power of artificial noises on the learning performance.

We provide experimental results, which are consistent with the analytical results. In particular, the developed upper bound on the loss function is tight with reference to the experimental ones (e.g., the gap can be lower than ), and the optimized resource allocation approaches the minimum of the loss function.
Notation  Description 

The set of training samples in the th client  
The th client  
The total number of clients  
The total number of lazy clients  
The variance of artificial noise added by 

lazy clients  
The total number of integrated rounds  
The number of iterations of local training  
The global loss function  
The local loss function of the th client  
Local model weights of the th client  
at the th integrated round  
Global model weights aggregated from local  
models at the th integrated round  
Local model weights of the th lazy client  
at the th integrated round  
Learning rate of gradient descent algorithm  
Training time per iteration  
Mining time per block  
Total computing time constraint of a FL task 
The remainder of this paper is organized as follows. Section 2 first introduces the background of this paper. Then we propose BLADEFL in Section 3, and optimize the upper bound on the loss function in Section 4. Section 5 investigates the issue of lazy clients. The experimental results are presented in Section 6. Section 7 concludes this paper. In addition, Table I lists the main notation used in this paper.
2 Background
2.1 Federated Learning
In a FL system, there are clients with the th client possessing the dataset of size ,
. Each client trains its local model, e.g., a deep neural network, based on its local data and transmits the trained model to the server. Upon receiving the weights from all the clients, the server performs a global model aggregation. There are a number of communication rounds for exchanging models between the server and clients. Each round consists of an uploading phase where the clients upload their local models, and a downloading phase where the server aggregates the model and broadcasts it to the clients. Clients then update their local models based on the global one.
In the th communication round, the server performs a global aggregation according to some combining rule, e.g., , where and denote the local weights of the th client and the aggregated weights, respectively. The global loss function is defined as [DBLP:journals/corr/abs190709693], where is the local loss function of the th client. In FL, each client is trained locally to minimize the local loss function, while the entire system is trained to minimize the global loss function . The FL system finally outputs , where is the overall communication rounds. Different from the training process in conventional DML systems [DBLP:conf/aistats/McMahanMRHA17], each client in FL only shares their local models rather than their personal data, to update the global model [9247530], promoting the clients’ privacy.
2.2 Blockchain
Blockchain is a shared and decentralized ledger. In a blockchain system, each block stores a group of transactions. The blocks are linked together to form a chain by referencing the hash value of the previous block. Owing to the cryptographic chain structure, any data modification within any block destroys the preceding chain structure. Therefore, it is impossible to tamper with the data that has been stored in the blockchain.
In addition, thanks to the consensus mechanism, each transaction included in the newly generated block is also immutable. The consensus mechanism validates the data within the blocks and ensures that all the nodes participating in the blockchain store the same data. The most prevalent consensus mechanism is Proof of Work (PoW), used in the Bitcoin system [nakamoto2008peer]. In Bitcoin, the process of the block generation is as follows. First, a node broadcasts a transaction with its signature to the blockchain network by the gossip protocol [DBLP:journals/sigops/DemersGHILSSST88]. Then the nodes in blockchain verify the transaction by the signature. Afterward, each node collects the verified transactions and competes to generate a new block that includes these transactions, by finding a unique, onetime number (called a nonce). This is to make the hash value of the data meet a specific target value. The node that finds the proper nonce is eligible to generate a new block and broadcasts the block to the entire network. Finally, the nodes validate the new block and append the verified block into the existing blockchain [DBLP:journals/cem/PuthalMMKD18]. Notably, the work in PoW is a mathematical problem that is easy to verify but extremely hard to solve. The nodes in the blockchain consume massive computing resources to figure out this complex problem. This process is called mining, and those who take part in it are known as miners. Because of the mining process, PoW can defense attacks on the condition that the total computing power of malicious devices are less than the sum of honest devices (i.e., 51% attack) [DBLP:journals/cacm/EyalS18].
In this context, blockchain is safe and reliable with the aid of the chain structure and consensus mechanism. Driven by these merits, we deploy the blockchain to replace the central server, and build up a decentralized FL network with privacy protection.
3 Proposed Framework
In this section, we detail the proposed BLADEFL framework in Section 3.1 and the computing resource allocation model in Section 3.2.
3.1 BladeFl
The BLADEFL network consists of clients each with equal computing power^{1}^{1}1In this paper, computing resource and computing power are used exchangeably, and both are measured by CPU cycles per second.. In this network, each client acts as not only a trainer but also a miner, and the role transition is designed as follows. First, each client (as a trainer) trains the local model, and then broadcasts the local model to the entire network as a requested transaction of the blockchain. Second, the client (as a miner) mines the block that includes all the local models that are ready to be aggregated. Once the newly generated block is validated by the majority of clients, the verified models in the block are immutable. Without the intervention of any centralized server, each client performs the global aggregation to update its local model by using all the shared models in the validated block. Suppose that the uploading and downloading phases cannot be tampered with external attackers.
Let us consider that all the clients deploy the same time allocation strategy for local training and mining. In other words, all the clients start the training at the same time, and then turn to the mining simultaneously. In this context, for each global model update and block generation, we define an integrated round for BLADEFL that combines a communication round of FL and a mining round of blockchain. As illustrated in Fig. 1, the th integrated round can be specified as the following steps^{2}^{2}2In the very beginning of first integrated round, each client initializes its local parameters, such as initial weight, learning rate, etc...
4em
Local Training. Each client performs the local training by iterating the learning algorithm times to update its own model .
Model Broadcasting and Verification. Each client signs its models by the digital signature and propagates the models as its requested transactions. Other clients verify the transactions of the requested client (i.e., identity of the client).
Mining. Upon receiving the models from others, all the clients compete to mine the th block.
Block Validation. All the clients append the new block onto their local ledgers only if the block is validated.
Local Updating. Upon receipt of verified transactions in this block, each client updates its local model. Then the system proceeds to the th round.
In contrast to [DBLP:journals/tii/LuHDMZ20a], BLADEFL does not rely on an additional thirdparty for global aggregation, thereby promoting privacy against model leakage. From the above steps, the consensus mechanism builds a bridge between the local models from clients and the model aggregation. Thanks to PoW, BLADEFL guarantees the tamperresistant model update in a trusted blockchain network.
3.2 Computing Resource Allocation Model
In this subsection, we model the time required for training and mining, to show the relationship between FL and blockchain in BLADEFL.
Block Generation Rate: The block generation rate is determined by the computation complexity of the hash function and the total computing power of the blockchain network (i.e., total CPU cycles). The average CPU cycles required to generate a block in PoW is defined as , where is the mining difficulty^{3}^{3}3Following PoW, the mining difficulty is adjusted at different intervals but maintains unaltered over each interval. Thus, we consider that the average CPU is invariant over the period with a fixed mining difficulty., and denotes the average number of total CPU cycles to generate a block [DBLP:journals/tpds/XuWLGLYG19]. Thus, we define the average generation time of a block as
(1) 
where denotes the CPU cycles per second of each client. Given a fixed , is a constant.
Local Training Rate: Recall that the local training of each client contains iterations. The training time consumed by each training iteration at the th client is given by [9242286]
(2) 
where denotes the number of samples in the th client, and denotes the number of CPU cycles required to train one sample. This paper considers that each client is equipped with the same hardware resources (e.g., CPU, battery, and cache memory). Therefore, each client is loaded with the same number of local samples, and has the same and . However, the contents of samples owned by different clients are diverse. For simplicity, we assume that each client uses the same training algorithm and trains the same number of iterations for its local model update. Consequently, each client has an identical local training time per iteration. In this context, we let , as a constant.
Consider that a typical FL learning task is required to be accomplished within a fixed duration of . Given the same hardware configuration, each client has the total number of CPU cycles . From to (1) and (2), the number of iterations for local training in each integrated round is given by
(3) 
where denotes the floor function, and is a positive integer that represents the number of total integrated round. Furthermore, denotes the total training time, while is the total mining time. Under the constraint of computing time , we notice that the longer the mining takes, the shorter the training occupies. That is because that (3) implies a fundamental tradeoff in BLADEFL, i.e., the more iterations each client trains locally, the fewer integrated rounds the BLADEFL network performs. Moreover, due to the floor operation in (3), there may exists some computing time left, i.e., . We stress that the extra time is not sufficient to perform another integrated round, and thereby the global model cannot update during this period. In this context, we ignore this computing time and assume in the following analysis.
In what follows, we optimize the learning performance of BLADEFL based on (3).
4 Performance Analysis of the BLADEFL System
In this section, we evaluate the learning performance of BLADEFL with the upper bound on the loss function in Section 4.1, and optimize the learning performance with respect to the number of integrated rounds in Section 4.2.
4.1 Achievable Upper Bound Analysis
Existing works such as [3][11] evaluated the learning performance of the standard FL based on the loss function, where a smaller value of the loss function corresponds to a learning model with higher accuracy. Recently, the work in [DBLP:journals/jsac/WangTSLMHC19] derived an upper bound on the loss function between the iterations of local training and global aggregation.
Compared with the standard FL, our BLADEFL replaces the centralized server with a blockchain network for global aggregation. Notably, the training process and the aggregation rule are the same as the centralized FL. Thus, the derived upper bound on the loss function in [DBLP:journals/jsac/WangTSLMHC19] can be applied to BLADEFL.
We make the following assumption for all the clients.
Assumption

[For any two different and , we assume that]

is convex, i.e., ;

is Lipschitz, i.e., ;

is Lsmooth, i.e., .
According to Assumption 4.1, is convex, Lipschitz, and Lsmooth [DBLP:conf/pkdd/KarimiNS16].
The work in [DBLP:journals/jsac/WangTSLMHC19] also defined the following definition of measurements to capture the divergence between the gradient of the local loss function and that of the global loss function.
Definition ((Gradient Divergence) [DBLP:journals/jsac/WangTSLMHC19])
For each client, we define as an upper bound on , i.e., . Thus, the global gradient divergence can be expressed as .
This divergence is related to the distribution of local datasets over different clients. From [DBLP:journals/jsac/WangTSLMHC19], the following lemma presents an upper bound on the loss function in the standard FL.
Lemma ([DBLP:journals/jsac/WangTSLMHC19])
An upper bound on the loss function is given by
(4) 
where
(5) 
denotes the initial weight, denotes the optimal global weight, and denotes the learning rate with , respectively.
From (3), we have . Substituting into (4) yields
(6) 
where
(7) 
and denotes the upper bound on the loss function in BLADEFL.
The upper bound in (6) shows that the learning performance depends on the total number of integrated rounds , the local training time per iteration , the average mining time per block , the learning rate , the data distribution , and the total computing time . From Definition 4.1, is fixed given and the datasets of each client, and is preset. Recall that and are both constant in (1) and (2). Given any fixed , , , and , in (6) is an univariate function of . In the following theorem, we verify that is a convex function with respect to .
Theorem
is convex with respect to .
Proof:
From (6), we define
(8) 
where and . Since is an univariate function, we can optimize to maximize . Notice that are independent of , and is a function with respect to . Therefore, we compute the first derivative and second derivative of with respect to , respectively, as
(9) 
Then, we have
(10) 
and
(11) 
We substitute (9) into (11), and obtain
(12) 
Thus, is convex. Since we have , we prove that is convex [DBLP:journals/pieee/LiFL20].
Remark
In practice, should not be too small, since a tiny will make the system vulnerable to external attacks [9119406].
4.2 Optimal Computing Resource Allocation
First, the following theorem shows the optimal solution that minimizes .
Theorem
Given any fixed , , (or and , the optimal number of integrated rounds that minimizes the upper bound on the loss function in (6) is given by
(13) 
when .
Proof:
Let and . We first have
(14) 
where
(15) 
Using (14), we obtain
(16) 
Then, we approximate as a quadratic term with Taylor expansion:
(17) 
Thus, can be written as
(18) 
To solve the convex problem, we let , i.e.,
(19) 
Finally, we have
(20) 
This completes the proof.
Then, under a fixed constraint , let us focus on the effect of and on under fixed and by the following corollary^{4}^{4}4The following analytical results in Corollary 1, 2, 3, 4, and 5 are with respect to . Due to the fundamental tradeoff between and , the opposite results with respect to also hold. .
Corollary
Given and , the optimal value decreases as either or goes up. In this case, more time is allocated to training when gets larger or to mining when gets larger.
Proof:
This corollary is a straightforward result from Theorem 4.2.
Recall that denotes the training time per iteration, and denotes the mining time per block. From Corollary 4.2, the longer a local training iteration takes, the more computing power allocated to the local training at each client. Similarly, each client allocates more computing power to the mining when the mining time is larger.
Next, we investigate the impact of and on when and are fixed by the following corollaries (i.e., Corollary 4.2 and Corollary 4.2).
Corollary
Given fixed and , becomes larger as grows. In this case, more time is allocated to the mining.
Proof:
Without approximation of (17), we first let , i.e.,
(21) 
For simplicity, we let
(22) 
where
(23) 
Then, the first derivative of is given by
(24) 
Notice that in (22) is a decreasing function with respect to , is an increasing function with respect to , and
(25) 
is a decreasing function function with respect to , respectively. Thus, the solution of (22) drops as grows. Finally, we conclude that increases as rises.
Corollary
Given fixed and , becomes smaller as grows. In this case, more time is allocated to the training.
Proof:
The explanation of Corollary 4.2 is that each client may have trained an accurate local model but not an accurate global model ( is large), and thus BLADEFL needs to perform more global aggregation especially when is small. This paper considers a number of honest clients in BLADEFL to defend the malicious mining [DBLP:conf/trustbus/AbramsonHPPB20]. When is sufficiently large,
converges to its mean value according to the law of large number. In this context, Corollary
4.2 shows that approaches a constant as converges, and further implies that is independent of .Corollary
Given fixed and , increases as goes larger. Meanwhile, the upper bound in (6) drops as grows if .
Proof:
From (23), we know that increases as the learning rate rises, which leads to larger . Thus, from the proof of Corollary 4.2, descends as ascends. Then the derivative of the function with respect to is
(26) 
where . It indicates that the loss function decreases as rate increases if . However, the condition is not satisfied when is sufficiently large. In this case, is not an increasing function with respect to , resulting in larger loss function. This completes the proof.
The reason behind Corollary 4.2 is that the global model may not converge when each client is allocated with limited learning resources and a small learning rate. In addition, a higher learning rate may lead to faster convergence but a less inaccurate local model. To compensate for the inaccurate training, more computing power is allocated to the local training. In practice, the learning rate is decided by the learning algorithm, and the learning rates of different learning algorithms are diverse. Therefore, we can treat as a constant in BLADEFL.
5 Performance Analysis with Lazy clients
Different from the conventional FL, a new problem of learning deficiency caused by lazy clients emerges in the BLADEFL system. This issue is fundamentally originated from the lack of an effective detection and penalty mechanism in an unsupervised network such as blockchain, where the lazy client is able to plagiarize models from others to save its own computing power. The lazy client does not contribute to the global aggregation, and even causes training deficiency and performance degradation. To study this issue, we first model the lazy client in Section 5.1. Then, we develop an upper bound on the loss function to evaluate the learning performance of BLADEFL with the presence of lazy clients in Section 5.2. Next, we investigate the impact of the ratio of lazy clients and the power of artificial noises on the learning performance in Section 5.3. In this section, suppose that there exist lazy clients in BLADEFL and . Let us define the lazy ratio as .
5.1 Model of Lazy Clients
A lazy client can simply plagiarize other models before mining a new block. To avoid being spotted by the system, each lazy client adds artificial noises to its model weights as
(27) 
where denotes the set of lazy clients,
is the artificial noise vector following a Gaussian distribution with mean zero and variance
. As Fig. 2 illustrates, the th client is identified as the lazy client if it plagiarizes an uploaded model from others and add artificial noise onto it in Step ①. Except the plagiarism in Step ①, the lazy clients follow the honest clients^{5}^{5}5In this paper, we assume that each client is honest in the mining, because of the mining reward. to perform Step ②⑤.5.2 Achievable Upper Bound with Lazy Clients
In this subsection, we develop an upper bound on the loss function with the lazy ratio and the power of artificial noise in the following theorem.
Theorem
Using the model of lazy clients in (27), an upper bound on the loss function after integrated rounds with the lazy ratio of is given by
(28) 
where denotes the aggregated weights of BLADEFL with lazy clients after integrated rounds, and denotes the performance degradation caused by lazy clients after integrated rounds.
Proof:
Define the model weights of lazy clients as
(29) 
where is the model parameters that are plagiarized by lazy clients.
Since is Lipschitz, the proof of Lemma 1 in [DBLP:journals/jsac/WangTSLMHC19] has shown that
(30) 
Therefore, the upper bound can be expressed as [DBLP:journals/jsac/WangTSLMHC19]
(31) 
In addition, plugging (6) into (31), we have
(32) 
From (32), we further have
(33) 
If each lazy client adds the Gaussian noise with the same variance to its plagiarized model,
is a chisquare distribution with
degrees of freedom (i.e., ). Given the mean value , we have(34) 
The upper bound in (31) can be written as
(35) 
This completes the proof.
Thereafter, we use the upper bound in (28) to evaluate the learning performance of BLADEFL with lazy clients.
5.3 Discussion on Performance with Lazy Clients
Practically, a lazy node tends not to add either huge or tiny noise in order to conceal itself. To this end, it is required that the value of is comparable to that of .
Remark
From (28), the plagiarism behavior contributes a term proportional to to the bound, while the artificial noise exhibits an impact term proportional to . This indicates that the plagiarism has a more significant effect on the learning performance compared with the noise perturbation.
Then, we reveal the impact of and on the optimal value of in the following corollary.
Corollary
The optimal that minimizes in (28) decreases as either the lazy ratio or the noise variance grows.
Proof:
From the definition of in (8), we let
(36) 
As such, represents the loss function of BLADEFL with lazy clients.
Since we have
(37) 
and
(38) 
we obtain that is still convex with respect to .
Furthermore, we let . Plugging this into , we have
(39) 
Then we let
(40) 
and express (39) as
(41) 
We notice that
(42) 
Thus is an increasing function with respect to . Let
(43) 
Thus (41) can be rewritten as
(44) 
where grows as either or increases, goes up as grows, and declines as goes up. Finally, that minimizes in (28) decreases as either or grows. This concludes the proof.
When the system is infested with a large number of lazy clients (i.e., the lazy ratio approaches 1), more computing power should be allocated to local training to compensate for the insufficient learning.
6 Experimental Results
In this section, we evaluate the analytical results with various learning parameters under limited computing time . First, we evaluate the developed upper bound in (6), and then investigate the optimal value of overall integrated rounds under training time per iteration , mining time per block , number of clients , learning rate , the ratio , and power of artificial noise .
6.1 Experimental Setting
1) Datasets: In our experiments, we use two datasets for nonIID setting to demonstrate the loss function and accuracy versus different values of .
MNIST. Standard MNIST handwritten digit recognition dataset consists of 60,000 training examples and 10,000 testing examples [726791]. Each example is a 2828 sized handwritten digit in grayscale format from 0 to 9. In Fig. 3, we illustrate several samples from the standard MNIST dataset.
FashionMNIST. FashionMNIST for clothes has 10 different types, such as Tshirt, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot, in Fig. 4.

Maximal accuracy  
MNIST  Fashion  MNIST  Fashion  
MNIST  MNIST  
40  46  87.44%  59.57%  
58  70  82.16%  57.18%  
64  82  66.47%  50.11% 
2) FL setting. Each client progresses the learning of a MultiLayer Perceptron (MLP) model. The MLP network has a single hidden layer that contains 256 hidden units. Each unit applies softmax function and rectified linear units of 10 classes (corresponding to the 10 digits in Fig.
3 and 10 clothes in Fig. 4).3) Parameters setting. In our experiments, we set the total computing time , the samples of each client , the number of clients , the mining time per block , the number of lazy clients , and the learning rate as default, where the time is normalized by the training time per iteration .
6.2 Experiments on Performance of BLADEFL

Maximal accuracy  
MNIST  Fashion  MNIST  Fashion  
MNIST  MNIST  
60  30  87.47%  61.51%  
64  40  85.68%  60.34%  
72  48  79.32%  55.68% 
Fig. 5 plots the gap between the developed upper bound in (6) and the experimental results. We set learning rate and lazy ratio in conditions (a) and (b), respectively. First, we can see that the developed bound is close but always higher than the experimental one under both conditions. Second, both the developed upper bound and the experimental results are convex with respect to , which agrees with Theorem 4.1. Third, both the upper bound in (6) and the experimental results reach the minimum at the same optimal value of .

Maximal accuracy  

MNIST  Fashion  MNIST  Fashion  
MNIST  MNIST  
N=10  70  42  74.52%  52.66%  
N=15  60  36  75.74%  55.83%  
N=20  50  30  82.89%  62.91%  
N=25  50  30  83.03%  62.64% 
Fig. 6 plots the experimental results of the loss function and accuracy on MNIST and FashionMNIST for different values of values of , while Table II shows the optimal training time and corresponding accuracy. Here, we set . First, Fig. 6(a) shows that larger leads to larger loss function. This is due to the fact that both and from (3) drops as grows. Second, from Table II, the longer a training iteration consumes, the more training time each client takes. For example, using MNIST, the training time increases from to as rises from to . This observation is consistent with Corollary 4.2.
Fig. 7 plots the experimental results of the loss function and accuracy on MNIST and FashionMNIST for different values of values of , while Table III shows the optimal mining time and corresponding accuracy. First, Fig. 7(a) shows that larger leads to larger loss function, since both and from (3) drops as grows. Second, from Table III, reduces as rises, but the optimal mining time goes up as rises. For example, using MNIST, the mining time increases from to as grows from to . This observation agrees with Corollary 4.2.

Maximal accuracy  
MNIST  Fashion  MNIST  Fashion  
MNIST  MNIST  
54  30  74.70%  58.57%  
60  54  88.17%  72.50%  
72  42  85.51%  70.14% 
Fig. 8 shows the experimental results of the loss function and accuracy on MNIST and FashionMNIST for different values of values of , while Table IV illustrates the optimal mining time and corresponding accuracy. We set . First, from Table IV, we notice that the optimal mining time drops as increases, which is consistent with Proposition 4.2. For example, using MNIST, drops from to as rises from to . Second, from Fig. 8(a), larger leads to lower loss function. This is because the involved datasets are larger as grows, which causes a smaller loss function. Third, from both Fig. 8(a) and (b), approaches a fixed value when is sufficiently large (e.g., ). This observation is in line with Corollary 4.2.

Maximal accuracy  
MNIST  Fashion  MNIST  Fashion  
MNIST  MNIST  
30  50  85.53%  54.86%  
40  50  85.33%  54.76%  
50  80  78.11%  48.92%  
50  80  78.80%  46.25% 
Fig. 9 plots the experimental results of the loss function and accuracy on MNIST and FashionMNIST for different values of values of , while Table V illustrates the optimal mining time and corresponding accuracy. First, from Table V, we find that the optimal mining time rises as grows, which is in line with Corollary 4.2. For example, using MNIST, rises from to as grows from to . Second, from Fig. 9(a), the loss function drops as increases except . This is because grows significantly when , and our developed upper bound is no longer suitable. For example, when , the loss function increases as rises in our experiments for both MNIST and FashionMNIST.
6.3 Experiments on Performance with Lazy Clients
Fig. 10 plots the experimental results of the loss function and accuracy on MNIST and FashionMNIST for different values of values of lazy ratio , while Table VI shows the optimal training time and corresponding accuracy. We set the power of artificial noise . First, from Table VI, it is observed the optimal training time steps up as increases. For example, using MNIST, the time allocated to training rises from to as increases from to . This observation is consistent with Corollary 5.3. Second, from Fig. 10(a), the learning performance degrades as grows. This is because more lazy clients involved in the system as grows, leading to lower training efficiency.

Maximal accuracy  
MNIST  Fashion  MNIST  Fashion  
MNIST  MNIST  
30  50  78.35%  57.44%  
50  50  77.22%  53.19%  
50  50  59.96%  52.06%  
50  60  50.94%  44.08% 
Fig. 11 plots the experimental results of the loss function and accuracy on MNIST and FashionMNIST for different values of values of , while Table VII shows the optimal training time and corresponding accuracy. We set . First, from Table VII, we notice that the optimal training time grows as increases, which agrees with Corollary 5.3. For example, using MNIST, grows from to as increases from to . Second, from Fig. 11(a), the learning performance of BLADEFL (i.e., loss function and accuracy) degrades as the noise power goes larger.
7 Conclusions
In this paper, we have proposed a BLADEFL framework that integrates the training and mining process in each client, to overcome the singlepointfailure of centralized network and maintain the privacy promoting capabilities of the FL system. In order to evaluate the learning performance of BLADEFL, we have developed an upper bound on the loss function. Also, we have verified that the upper bound is convex with respect to the total number of integrated rounds and have minimized the upper bound by optimizing . Moreover, we have investigated a unique problem in the proposed BLADEFL system, called the lazy client problem and have derived an upper bound on the loss function with lazy clients. We have included experimental results, which have been seen to be consistent with the analytical results. In particular, the developed upper bound is close to the experimental results (e.g., the gap can be lower than ), and the optimal that minimizes the upper bound also reaches the minimum of the loss function in the experimental results.