## I Introduction

Google introduced federated learning (FL) as a cost-effective approach for distributed machine learning without relying on a centralized data center

[2017-FL-First]. In general, FL enables devices to collaboratively learn a shared model without sending data from devices to a centralized server. As a result, considerable research has been conducted to implement the FL concept in practical situations, particularly in massive IoT networks.Recently, many studies have been conducted regarding the implementation of FL for massive IoT networks [2020-Survey-FL-MEC, 2021-FL-Survey-IoT]. The work in [FL-2018-ConvergenceTime] proposed a joint distance-based user selection and resource block allocation algorithm to minimize the FL convergence time. In [FL-2019-SchedulingPolicies], the authors proposed different user scheduling policies for an FL system to enhance the FL convergence rate by reducing the number of communication rounds. In [FL-2019-Optimization-AnalysisModel], the authors addressed the trade-off between computation and communication efficiency considering the impact of various wireless communication factors (e.g., devices’ power constraints and local data sizes). Therefore, they could optimize the total energy consumption of the FL training process under different settings for IoT devices. The research in [2021-Comm-EnergyFL-Optimization] proposed a lower boundary for the number of FL training iterations. Consequently, they formulated a joint computation and transmission optimization problem to minimize total energy consumption for local computation and network communication. The authors in [2021-FL-LearningRateOptimization-OTA] utilized a dynamic learning–rate method to reduce the aggregation error of FL in wireless networks. Nevertheless, these works focused on the vanilla FedAvg [2017-FL-First], in which no communication efficient method, such as model quantization [2020-FL-TernaryFL] and model sparsification [2020-FL-FedPAQ] methods, was integrated into the FL process. Therefore, the impact of lossy compression on FL is still under-explored.

The main contribution of this work is the development of a novel framework that can support the FL process with high model compression by investigating the relationship between the FL convergence rate and model weight variance.

To the best of our knowledge, this is the first work providing FL convergence–rate analysis under a distortion rate from model compression. To this end, we propose a compression-aided FL model for IoT networks, in which a server located at the base station (BS) estimates the long-term communication load of the entire network, which is necessary for the FL convergence. We then formulate a joint resource-allocation and user-selection problem for FL aiming to minimize the total communication time. In this problem, we first consider the minimum number of participating IoT devices for the FL process, which must ensure the FL convergence.We then employ a coalition game to find an efficient bandwidth assignment for the proposed optimization problem.

The rest of this article is organized as follows. In Section II, we introduce the system model, fundamental knowledge about FL, and problem formulation. In Section II-B, we investigate the relationship between the FL convergence rate and other features of compression-aided FL. The proposed algorithm is presented in Section III. Simulations are provided in Section IV. Finally, Section V concludes this paper.

## Ii System Model and Problem Formulation

We consider a multi-IoT device single-server wireless network consisting of a set of IoT devices and one base station (BS).For FL, the BS selects IoT devices to participate in each FL communication round. These IoT devices and the BS jointly execute an FL algorithm for data analysis and model inference. The IoT devices collect from environment a set of

data samples and process them via deep neural networks. We assume that the data in our research are independent and identically distributed (IID). With this assumption, we can have

, where is the number of data samples collected by the -th IoT device each round.In this paper, we use the Orthogonal frequency-division multiple access (OFDMA) as the multiple-access scheme in the uplink. We assume that there are total sub-channels, the set of which is denoted by . We denote as the sub-channel. Each IoT device can only occupy one sub-channel.

### Ii-a Federated Learning

In this work, we employ a cooperative training process between IoT devices and the server. Every participating IoT devices share the loss functions from their training operations to find the aggregated loss at the server. The global loss function at communication round

is given by(1) |

Here, denotes the participation decision variable, where indicates that IoT device participates in the FL process and contributes the local model parameters for the aggregation at the server; otherwise, . In (1), denotes the global model parameters, denotes the loss function on IoT device after the -th training round, and is the number of selected IoT devices. It is noteworthy that different loss function is employed in different IoT devices. In an alternative way, we can demonstrate (1) as follows:

(2) |

In each round, the server selects a fixed IoT devices to join the FL training process. Hence, we have .

### Ii-B Lossy Federated Learning over Model Distortion

In the concept of compression-aided FL, the trained model in each IoT device will be compressed before being transmitted to enhance communication efficiency. However, using lossy compression (i.e., the data is compressed at an extremely high rate) may cause considerable information loss (i.e., compressing and decompressing cause the data to progressively lose quality). To be more specific, information loss can be understood as the reduction in mutual information between the original and decompressed model on devices [MF-2006-InformationTheory]. This phenomenon leads to the distortion in data reconstruction, which can be represented by the mean distance between every data point of the original and decompressed models, as . Here, the distance can be measured using various approaches (e.g., a mean square error approach) [MF-2006-InformationTheory]. Thereby, the error between the original and reconstructed models leads to the lossy-FL flop compared to the conventional FL methods (e.g., FedAvg) due to model modification, which cause training divergence in the FL process.

Nevertheless, the work in [FL-2021-HCFL] proved that increasing the number of participating IoT devices can greatly plunge the training error, thus, achieve a faster convergence rate. Observing that, we aim to formulate the problem that can find an equilibrium at which both the communication efficiency and convergence rate are satisfied. To achieve this equilibrium, we propose a function that estimates the relationship among the number of participating IoT devices, model distortion (owing to the lossy compression scheme), and the FL convergence rate.

#### Ii-B1 Distortion Effect in Federated Learning

The compression-integrated FL method always generates distortion between the original model parameters and the model reconstructed from the compressed data [MF-2006-InformationTheory]. Specifically, the reconstructed model at the IoT devices when applying a compression scheme can be represented as follows:

(3) |

As proven in [FL-2021-HCFL]

, the reconstructed model bias follows a Gaussian distribution with a standard deviation of

. The relationship between the deviation and other FL settings, such as the compressor’s reconstruction loss and the number of participating IoT devices , are given by(4) |

As observed from (4), the standard deviation is inversely proportional to . Intuitively, as the number of participating IoT devices increases, the reconstructed model parameters is dramatically getting closer to their original values.

#### Ii-B2 Assumptions and Notations

To evaluate the FL performance, we make the following assumptions for the local loss functions at IoT devices .

###### Assumption 1.

are -smooth , .

###### Assumption 2.

are -strongly convex , .

###### Assumption 3.

Assuming that the -th device samples dataset from their domain () at iteration , we have the upper-boundary for the stochastic gradient’s variance per device as .

###### Assumption 4.

For all as devices’ indices, and at any iteration , we have the following boundary for the stochastic gradient: .

#### Ii-B3 Convergence Rate Analysis

###### Theorem 1.

Let Assumption 1 to 4 hold and are defined inward, we define the following notations:

(5) | ||||

(6) | ||||

(7) | ||||

(8) |

where

is the number of epochs required for local training. The subscript

represents the expected distance between local loss on each IoT device and the global loss of the FL system. By choosing learning rate , , we have the following theorem:(9) |

where denotes the ceiling operator. is the lower boundary for the total communication time of the proposed algorithm.

Proof. Due to the space limitation, the proof is omitted.

### Ii-C Communication Model

In this section, we discuss the communication delay model in a FL-integrated wireless network. We apply OFDMA for the uplink data transmission. In this network, each IoT device can only occupy one sub-channel. Thus, the data rate of the -th IoT device at communication round via the -th sub-channel is given by

(10) |

where indicates that the -th IoT device is selected to upload the data to the server in round ; otherwise. Meanwhile, indicates that the -th IoT device occupies the -th sub-channel in round ; otherwise. Additionally, denotes channel power gain, denotes the transmit power, is the channel bandwidth, and is the noise power spectral density. Given the achievable rate of all IoT devices, , we have the total transmission time in the -th communication round of the model update from IoT device to server as follows:

(11) |

where . The size of the compressed model is denoted by . In the general FL setting [2017-FL-First], IoT devices are assumed to use the same model for convenient FL implementation. Thus, the same value is applied on every device. By calculating the total transmission time in the -th communication round, we formulate the long-term transmission time minimization problem, as presented in Section II-D.

### Ii-D Problem Formulation

We concentrate on the total communication time for the compression-integrated FL process. Our objective is to minimize the model transmission time to achieve the desired global accuracy. Specifically, the transmission time is calculated by summing up the transmission time from communication rounds. In which, the transmission time at the -th round is calculated independently as in (11). Normally, in various works (e.g.,[2022-FL-OptimizeWirelessIoTNetworks, 2021-FL-JointCommFramework]), the minimum transmission time is estimated based on the loss function of the FL. However, the lack of understanding of FL’s behavior makes the problem difficult to obtain the solution, leading the problem to be prone to overfitting. To make the problem feasible, we apply the Theorem (i.e., we find the optimal lower bound on

so that the total transmission time is minimized). Intuitively, by reducing the lower bound, we can reduce the expected value of the probability distribution of

value. As a consequence, has a high chance to decrease as the lower bound is reduced. For convenience, we define , and . The optimization problem is mathematically formulated as:(12a) | ||||

s.t. | (12b) | |||

(12c) | ||||

(12d) | ||||

(12e) | ||||

(12f) |

where (12b) is the feasible condition for the number of participating IoT devices. Meanwhile, (12c) indicates that the number of participating IoT devices in communication round is equal to . (12d) indicates that each sub-channel is occupied by at most one IoT device in each communication round. (12e) indicates that one IoT devices can use more than one sub-channel. (12f) is used to ensure that all sub-channels will be utilized.

## Iii Proposed Algorithm

From (12), we can observe that the objective function depends on integer variables (i.e., ), and each variable is limited by the constraints. With limited searching space, the optimal solution can be obtained by exhaustive search algorithm. However, the exhaustive search is impossible since the complexity is
^{1}^{1}1, .
, which increases significantly when the network size increases. Besides, we notice that the objective function is the sum of elements. Meanwhile , the number of communication rounds, plays a vital role in this problem. Specifically, the number of communication rounds not only impacts on the total communication time, but also the computational requirements. Furthermore, only depends on the number of participating IoT devices per round, . Inspiring from the above justifications, we suggest to solve the problem in (12) by firstly optimizing the total number of communication rounds, , with respect to the number of selected IoT devices per round, . We then optimize the total time consumption for each round, , with respect to the selection of selected IoT devices, , and the sub-channel assignment, .

### Iii-a Communication rounds minimization problem

In this subsection, we aim to optimize the number of communication rounds under the constraint of the number of selected IoT devices per round. The problem is given by

(13a) | ||||

s.t. | (13b) |

We observe that this problem is non-convex even after relaxing the discrete variable to continuous one. However, since the complexity of the problem increases linearly with the increase in the number of IoT devices, the optimal number of IoT devices per round can be found by searching over all possible candidates as described in Algorithm 1.

### Iii-B Communication time consumption minimization

Given the number of selected IoT devices per round, our mission is to choose the IoT devices and assign sub-channels to them for model uploading. The problem of minimizing the total transmission time at the -th communication round can be written as

(14a) | ||||

s.t. | (14b) |

Occasionally, the device selection variable is utilized to optimize the energy consumption in FL. However, the locations of IoT devices are geographically fixed, which can leads to the static distribution of devices’ channel gain. Therefore, the optimum value of is lack of randomness, and may lead to the bias in choosing participating IoT devices for the FL’s communication round. Thus, the FL loses the generalization over all of the dataset (i.e., the lack of data sampling over the whole network). Thus, we apply a random selection scheme for the optimization problem in (14) to obtain . The problem in 14 is then optimized with respect to as follows:

(15a) | ||||

s.t. | (15b) |

where is the achievable rate of the -th IoT device using the -th sub-channel. Recall that the uplink achievable rate of an IoT device depends only on its sub-channel selection. However, the problem in (15) is still difficult to solve because of the large number of available sub-channel and number of IoT devices. In this case, a coalition game method can be used to provide an efficient sub-channel assignment with close optimal solution compared to the exhaustive search algorithm. To be specific, the problem in (15) can be seen as a trading game among IoT devices in the system. In this game, IoT devices share a set of sub-channels, and each device selects sub-channel considering the sub-channel selections of other devices to obtain a common target, which is the minimum communication time, . We denote the set of sub-channels assigned to the -th IoT device as , and we have . Following that, the sub-channels will be exchanged between subsets, , to achieve the sub-optimal sub-channel assignment structure for the system. However, a sub-channel can be changed from one IoT device to another only when the channel exchange provide lower total time consumption.

Note that, each IoT device needs at least one sub-channel to communicate with the server. Therefore, when an IoT device has just one sub-channel, the controller will exchange the sub-channel of this IoT device with other IoT device instead of moving the sub-channel. The sub-channel assignment based on the coalition game method is described in Algorithm 2. Algorithm 2 starts by initializing the sub-channel assignment structure. The controller checks over all sub-channel in the system in turn. Specifically, the number of sub-channels assigned to the same IoT devices is firstly verified (as in line ). This step is used to decide whether the sub-channel should be used by another device (line ) or should be exchanged with other sub-channel from another IoT device (line ). Next, if the switch of the location of the considered sub-channel can offer better performance to the system (line ), we then form a new sub-channel structure.

###### Remark 1 (Convergence and complexity analysis).

Since the number of sub-channels and IoT devices are constrained, the number of sub-channel candidates is also limited. Therefore, Algorithm 2 will converge after certain iterations. Besides, assuming that Algorithm 2 converses after iterations. At each iteration, the system requires total times of data rate calculation. Therefore, the complexity of Algorithm 2 is . Similarly, the complexity of the exhaustive search is .

## Iv Performance Evaluation

In this section, we evaluate the performance of our proposed methods under various scenarios. Firstly, we analyze how the number of participating IoT devices affects to the model convergence rate. Then, we explain why the value of is chosen in problem 13. Next, to demonstrate the effectiveness of our methods, we make comparisons with several benchmarks. We compare the proposed sub-channel assignment algorithm to the bandwidth fairness method, in which the controller allocates almost the same amount of bandwidth to IoT devices without considering the channel conditions. The considered communication area is a circle with radius m, in which IoT devices are distributed randomly. Several important parameters of the system is provided in Table I.

Cell radius | 200 [m] |
---|---|

Transmit power of each IoT device | 23 [dBm] |

Bandwidth | 180 [kHz] |

Power spectral density of the thermal noise | -174 [dBm/Hz] |

### Iv-a Evaluation on convergence rate theorem

Figure 0(a) illustrates the convergence rate of the FL algorithm when applying different numbers of participating devices. As can be seen from the figure, when the number of participating devices is low (i.e., ), the FL process tends to diverge after first rounds of global aggregation. The reason is that the FL process is affected by various distortion rates of the lossy compression at the devices. When becomes large, the FL process converges efficiently to the optimal model. Nevertheless, when is large enough, the convergence rate shows a tendency to stabilize (e.g., there is no significant difference in convergence rate between and ). Thus, by choosing a suitable value of , we can achieve the desired convergence rate while reducing the communication burden.

To choose an appropriate , we observe the convergence of the FL process when changing as, as illustrated in Figure (0(b))). Figure (0(b)) reveals the number of communication rounds needed to achieve a certain accuracy for FL (i.e., when the distance between the current model to the optimal model is less than ). As we can see from the figure, when we use more than devices to participate each communication round, we can achieve the convergence in approximately 250 rounds. As a consequence, can be chosen from to as a nearly optimum. With the chosen , we can ensure that the FL process can converge in the optimal time. Therefore, we only need to focus on resource allocation to achieve the optimal transmission time.

### Iv-B Evaluation on coalition game based method

Figures 2 describes the change of total transmission time when the network size increases. To be specific, after obtaining the number of participating IoT devices per communication round, the controller will randomly select a set of IoT devices to upload their models to the server. The final results were obtained by averaging simulations in . From Figure 1(a), there is a downward trend in the transmission time of both the proposed method and the bandwidth-fairness scheme. The reason is that increasing the number of sub-channels provides more bandwidth to IoT devices, thus, it increases the achievable rates and reduce transmission time. However, the proposed coalition game-based method can significantly reduce the transmission time, especially when the system has a small number of sub-channels. Specifically, the proposed method only needs seconds to transmit all data when the number of sub-channels is set to . Meanwhile, the bandwidth fairness requires seconds. Even when the number of sub-channels increases to , the transmission time of the proposed method is around seconds, and that of the bandwidth-fairness scheme is almost double, seconds.

The effectiveness of the coalition game-based method can be seen in Figure 1(b). The transmission time increases with the rise in the number of participating IoT devices. Following that, the transmission time of the proposed method increases linearly with the increase in the number of participating IoT devices (), from seconds at to seconds at . Meanwhile, the transmission time of the bandwidth-fairness scheme soars rapidly, and reaches seconds at .

## V Conclusion

In this work, we propose a joint framework to optimize the total communication time for a compression-aided FL algorithm. We have employed the coalition game framework as a simple and straight forward method to control the number of participating devices and bandwidth allocation. In future work, we will develop a reinforcement-learning solution to the considered problem, which is can adapt to any stochastic IoT networks.