I Introduction
Today’s mobile devices are generating an unprecedented amount of data every day. Leveraging the recent success of machine learning (ML) and artificial intelligence (AI), this rich data has the potential to power a wide range of new functionalities and services, such as learning the activities of smart phone users, predicting health events from wearable devices or adapting to pedestrian behavior in autonomous vehicles. With the help of multiaccess edge computing (MEC) servers, ML models can be quickly trained/updated using this data to adapt to the changing environment without moving the data to the remote cloud data center, which is envisioned in intelligent nextgeneration communication systems
[27]. Furthermore, due to the growing storage and computational power of mobile devices as well as privacy concerns associated with uploading personal data, it is increasingly attractive to store and process data directly on mobile devices. Federate learning (FL) [11] is thus proposed as a new distributed ML framework, where mobile devices collaboratively train a shared ML model with the coordination of an edge server while keeping all the training data on device, thereby decoupling the ability to do ML from the need to upload/store the data to/in a public entity.A typical FL service involves a number of mobile devices (a.k.a., participating clients) and an edge server (a.k.a., a parameter server) to train a ML model, which lasts for a number of learning rounds. In each round, the clients download the current ML model from the server, improve it by learning from their local data, and then upload the individual model updates to the server; the server then aggregates the local updates to improve the shared model. For example, the seminal work [22] proposed the FedAvg algorithm in which the global model is obtained by averaging the parameters of local models. Although other FL algorithms differ in the specifics, the majority of them follow the same procedure. Because the clients work in the same wireless network to download and upload models, how to allocate the limited wireless bandwidth among the participating clients has a crucial impact on the resulting FL speed and efficiency. Therefore, resource allocation for wireless FL systems is attracting much attention recently in the wireless communications community [3, 32]. Compared to resource allocation in traditional throughputmaximizing wireless networks, the resource allocation objective and outcome become considerably different for wireless FL due to its unique requirements and characteristics.
Although existing works have made meaningful progress towards efficient resource allocation for wireless FL, they share the common limitation that only a single FL service was considered. As MLpowered applications grow and become more diverse, it is anticipated that the wireless network will host multiple coexisting FL services, the set of which may also dynamically change over time. See Figure 1 for an illustration of the multiFL service scenario. The presence of multiple FL services makes resource allocation for wireless FL much more challenging. First, the achievable FL performance depends on not only intraservice resource allocation among the participating clients within each FL service but also interservice resource allocation among different FL services, and these two levels of allocation decisions are also strongly coupled. Second
, the FL service providers may adopt different FL algorithms and choose different configurations (e.g., number of participating clients, number of epochs of local training, etc.), yet this information is not always available to the wireless network operator due to privacy concerns when making resource allocation decisions.
Third, because FL service providers have their individual goals, they may have incentives to untruthfully reveal their valuation of the wireless bandwidth if by doing so they gain advantages in the interservice bandwidth allocation. Without the correct information, there is no guarantee on the overall system performance. Finally, as in any multiuser system, resource allocation should strike a good balance between efficiency and fairness – every FL service provider should obtain a reasonable share of the wireless resource to train their ML models using FL.In this paper, we make an initial effort to study wireless FL with multiple coexisting FL services, which share the same bandwidth to train their respective ML models. Our focus is on the efficient bandwidth allocation among different FL services as well as among the participating clients within each FL service, thereby understanding the interplay between these two levels of allocation decisions. Our main contributions are summarized as follows:

We formalize a twolevel bandwidth allocation problem for multiple FL services coexisting in the wireless network, which may start and complete at different time depending on their own demand and FL requirements. The model is general enough for any FL algorithm that involves downloading, local learning, uploading and global aggregation in each learning round, and hence has wide applicability in realworld systems. In addition, we explicitly take fairness into consideration when optimizing bandwidth allocation to ensure that no FL service is starved of bandwidth.

We consider two use cases depending on the nature/goals of the FL service providers. In the first case, FL service providers are fully cooperative to maximize the overall system performance. For this, we design a distributed optimization algorithm based on dual decomposition to solve the twolevel bandwidth allocation problem. The algorithm keeps all FLrelated information at the individual FL service provider side without sharing it with the network operator, thereby reducing the communication overhead and enhancing privacy protection.

We further consider a second case where FL service providers are selfishly maximizing their own performance. To address the selfishness issue, we design a multibid auction mechanism, which is able to elicit the FL service providers’ truthful valuation of bandwidth based on their submitted bids. With a fairnessadjusted ex post charge, the proposed auction mechanism is able to make a tunable tradeoff between efficiency and fairness.
The rest of this paper is organized as follows. Section II discusses related works. Section III builds the system model. Section IV formulates the problem for the cooperative case and develops a distributed bandwidth allocation algorithm. Section V studies the selfish service providers case and develops a multiauction mechanism. Section VI performs simulations. Concluding remarks are made in Section VII.
Ii Related Work
A lot of research has been devoted to tackling various challenges of FL, including but not limited to developing new optimization and model aggregation methods [9, 14, 7], handling noni.i.d. and unbalanced datasets [16, 40, 29], dealing with the straggler problem [31], preserving model and data privacy [6, 10], and ensuring fairness [23, 15]. A comprehensive review of these challenges can be found in [13, 17, 37]. In particular, the communication aspect of FL has been recognized as a primary bottleneck due to the tension between uploading a large amount of model data for aggregation and the limited network resource to support this transmission, especially in a wireless environment. In this regard, early research on communicationefficient FL largely focuses on reducing the amount of transmitted data while assuming that the underlying communication channel has been established, e.g., updating clients with significant training improvement [4]
, compressing the gradient vectors via quantization
[18], or accelerating training using sparse or structured updates [1]. More recent research starts to address this problem from a more communication system perspective, e.g., using a hierarchical FL network architecture [19] that allows partial model aggregation, and leveraging the wireless transmission property to perform analog model aggregation over the air [36].As wireless networks are envisioned as a main deployment scenario of FL, wireless resource allocation for FL is another active research topic. Many existing works [34, 39, 32] study the tradeoff between local model update and global model aggregation. Client selection is essential to enable FL at scale and address the straggler problem. Different types of joint bandwidth allocation and client scheduling policies [35, 38, 30, 25, 2] have been proposed to either minimize the training loss or the training time. In all these works, resource allocation is carried out among clients of a single FL service, while assuming that the FL service itself has already received dedicated resource. In stark contrast, our paper studies a network consisting of multiple coexisting FL services and performs resource allocation at both the FL service level and the client level. We notice that a related problem where multiple FL services are being trained at the same time is also considered in a recent work [24]. In that paper, different FL services run on the same set of clients and a joint computation and communication resource scheduling problem is studied. In our paper, different FL services have their separate client sets which may experience very different channel qualities and hence we focus only on the bandwidth allocation problem. Moreover, while [24] assumed that all clients are obedient, we study the possible selfish nature of FL service providers and highlight bandwidth allocation fairness.
Considering each FL service as a “user”, our problem is a special type of resource allocation problems for multiuser wireless networks. While many concepts and techniques adopted in this paper, including proportional fairness [21], dual decomposition [26] and multibid auction [20], have seen applications in other multiuser wireless resource allocation domains, applying them in multiservice FL requires special treatment as two levels of resource allocation are involved in our problem. In particular, there is no closedform expression of how the performance (i.e., learning speed) of a FL service depends on the resource allocation among its clients. Therefore, understanding the interdependency of intraservice and interservice bandwidth allocation is essential. Furthermore, we put an emphasis on the resource fairness among different FL services by designing a new fairnessadjusted multibid auction mechanism in the selfish FL service provider case, thereby achieving a tunable tradeoff between efficiency and fairness. We point out that there are some existing works [5, 28, 8, 12] on designing incentive mechanisms for client participation of a single FL service. These works are very different from our paper in terms of both the problem and the approaches, and do not consider fairness when designing the mechanism.
Iii System Model
We consider a wireless network where machine learning models are trained using Federated Learning (FL). The wireless network has a total bandwidth , and the network operator has to allocate this bandwidth among concurrent FL services when needed to enable their individual training. Because new FL services may start and old FL services may finish over time, bandwidth allocation has to be periodically performed to adapt to the current active FL services. Therefore, we divide time into periods and let the length of a period be . At the beginning of each period , a set of FL services are active and require wireless bandwidth to carry out their training. These services are either newly initiated services in period or continuing services from the previous period. A FL service finishes and hence exits the wireless network when a certain termination criteria is satisfied (e.g., the training loss is below a threshold, the testing accuracy is above a threshold, or other convergence criterion), which usually varies across FL services and are prespecified by the corresponding service provider. Therefore, a FL service may span multiple periods. The wall clock time (i.e. the number of periods) that a FL service takes to finish depends on the difficulty and other inherent characteristics of the service itself as well as how much wireless resource is allocated to this service in each period for which it stays and how this bandwidth is further allocated among its participating clients. In what follows, we first formulate the clientlevel (i.e., intraservice) bandwidth allocation problem and then describe the servicelevel (i.e., interservice) bandwidth allocation problem.
Iiia IntraService Bandwidth Allocation
To understand how bandwidth allocation affects FL performance, let us consider a single representative FL service in one period (period index is dropped for conciseness). Suppose that this service is allocated with a bandwidth in this period, which is further allocated among its participating clients, the set of which is denoted by . For each client , let be its computing speed, and and be the uplink and downlink wireless channel gains to the parameter server of service , respectively, which are assumed to be invariant within a period. We consider a synchronized FL model for each FL service, where a number of FL rounds take place in a period. Nonetheless, different FL services do not have to be synchronized – they learn at their own pace. See Figure 2 for an illustration.
A FL round consists of four stages: download transmission, local computation, upload transmission and global computation:

Download Transmission (DT). Each FL round starts with a DT stage in which each client downloads the current global model from its parameter server residing on the base station. Suppose client is allocated with bandwidth , then its DT rate is following Shannon’s equation, where is the transmission power of parameter server and is the noise power. For notational convenience, we denote as the DT base rate of client . Let be the download data size (e.g., the size of the global model), then the DT latency is .

Local Computation (LC). With the current global model, each client then updates its local model using its local dataset. Depending on the ML model complexity, the local dataset size and the number of episodes in local training, the perround local computation workload is denoted by . Therefore, the LC latency of client is .

Upload Transmission (UT). Once local update is finished, client transmits the result to the parameter server . Given the bandwidth , its UT rate is , where is the transmission power of client and is the noise power. Again, for notational convenience, we denote as the UT base rate of client . Let be the data size that has to be transmitted to the parameter server, then the UT latency of client is .

Global Computation (GC). Finally, once the local updates of all clients are received by parameter server , the global model is updated. Let be the global model update workload and be the computing speed of parameter server , then the GC latency is .
Note that our framework is applicable to a vast set of FL algorithms (e.g., FedAvg, FedSGD) that can be chosen for service . For instance, the downloaded/uploaded data may be the model itself, the compressed version of the model, or the model gradient information. For the purpose of bandwidth allocation, it is sufficient to describe the FL service as a tuple .
In synchronized FL, the parameter server updates the global model until it has received the local updates from all participating clients. Hence, the length of a FL round of service is determined by the total latency of the slowest client, i.e. . To minimize the FL round length of service so that more FL rounds can be executed in a period, one has to optimally allocate bandwidth among the clients of service . Given , the intraservice bandwidth allocation problem can be formulated as
(1) 
Let denote the optimal solution to Eqn. (1). Then the optimal FL frequency of service is , which is used to represent the FL speed of service . Note that this means FL rounds can be performed in one period.
IiiB InterService Bandwidth Allocation
In a period, multiple active FL services may be active and require wireless bandwidth to carry out learning. Since they share a total bandwidth , how this bandwidth is allocated among different services will determine their achievable learning frequencies , thus the convergence speed in terms of the wall clock time. In this paper, we consider two scenarios depending on the goals of the FL service providers and how interservice bandwidth allocation is implemented. In the first scenario, all FL service providers are cooperative, and their goal is to maximize the FL performance of the overall system. Therefore, it is equivalent to the network operator solving a systemwide optimization problem. In the second scenario, the FL service providers are selfish who care about only their own FL performance. As these service providers are competing for the limited bandwidth resource, addressing their incentive issues is crucial. In this paper, we design a fairnessadjusted multibid auction mechanism for the interservice bandwidth allocation in this case. In the following two sections, we discuss these two scenarios separately.
Iv Cooperative service providers
In the cooperative service providers scenario, the network operator directly decides the bandwidth allocation to maximize the overall system performance. As in any multiuser network, bandwidth allocation for multiservice FL has to address both efficiency and fairness – every active FL service should get a reasonable share of the bandwidth. Thus, we adopt the notion of proportional fairness [21], a metric widely used in multiuser resource allocation, and aim to solve the following optimization problem:
subject to  (2) 
where we drop the period index and let be the number of active FL services in the period for conciseness. The objective function adds a “1” inside the logarithmic to ensure that the function value is always nonnegative. This change has very little impact on the final allocation since the frequency is often much larger than 1 in a period. Note also that the above interservice bandwidth allocation problem Eqn. (2) implicitly incorporates the intraservice problem as is the solution to Eqn. (1).
Iva Optimal Solution to the IntraService Problem
We first investigate the optimal solution to the intraservice bandwidth allocation problem and see how it can be used to solve the interservice problem. According to our system model and Eqn. (1), the intraservice bandwidth allocation is equivalent to
(3)  
subject to  (4)  
(5) 
where we let and for notational convenience. Clearly, the optimal solution must satisfy
(6) 
Therefore, the optimal solves the following equality,
(7) 
Although we do not have a closedform solution of , a bisection algorithm can be constructed to easily solve the above problem to obtain the optimal and consequently the optimal frequency as a function of . Furthermore, the property of can be characterized in the following lemma.
Lemma 1.
is a differentiable, increasing and concave function for .
Proof.
Let us consider the inverse function defined by Eqn. (7). It is easy to see that for , is a monotonically increasing function in with and as . Therefore, for , is also monotonically increasing. The firstorder derivative of is
(8) 
Therefore, is differentiable for and
(9) 
The secondorder derivative can also be computed as follows:
(10) 
This proves that is a concave function for . ∎
With Lemma 1, it is straightforward to see that the interservice bandwidth allocation problem (2) is a convex optimization problem.
Proposition 1.
The interservice bandwidth allocation problem (2) is an equalityconstrained convex optimization problem.
Proof.
Because is concave, is concave and increasing, the composition is also a concave function. Then it is straightforward to see that the problem is a concave maximization problem with an equality constraint. ∎
IvB Distributed Algorithm for InterService Bandwidth Allocation
We now proceed with solving the interservice bandwidth allocation problem. While various centralized algorithms, such the Newton’s method, can efficiently solve the interservice problem Eqn. (2) given the fact that it is a convex optimization problem, we prefer a distributed algorithm where individual FL service providers do not share their FL algorithm details and clientlevel information with each other or the network operator. This way reduces the communication overhead and preserves privacy of the client devices of individual FL service providers. Our algorithm is developed based on dual decomposition [26] as follows.
We first relax the total bandwidth constraint to be , and then form the Lagrangian by relaxing the coupling constraint:
(11) 
where is the Lagrange multipier associated with the total bandwidth constraint, and is the Lagrangian to be maximized by service provider . Such dual decomposition results in each service provider solving, for a given , the following problem
(12) 
where the solution is unique due to the strict concavity of according to Lemma 1. Specifically, to solve this maximization problem, we only need to solve its firstorder condition,
(13) 
which can be converted to solve using
(14) 
Clearly, the lefthand side is an increasing function of for and thus, a simple bisection algorithm can be devised to solve Eqn. (14) to obtain . Then plugging (hence ) into Eqn. (7) yields the optimal .
Let be the local dual function for service provider . Then the master dual problem is
(15) 
Since is unique, it follows that the dual function is differentiable and the following gradient method can be used to iteratively update :
(16) 
where is the iteration index, is a sufficiently small positive stepsize, and denotes the projection onto the nonnegative orthant. The dual variable will converge to the dual optimum as . Since the duality gap for the interservice problem Eqn. (2) is zero and the solution to Eqn. (12) is unique, the primal variable will also converge to the primal optimal variable .
Algorithm 1 summarizes the distributed interservice bandwidth allocation (DISBA) algorithm. The algorithm works iteratively. In each iteration, the operator sends the current to all service providers. Then, each service providers solves for using its local information and sends the result to the network operator. The network operator finally updates for the next iteration’s computation. The algorithm terminates until converges.
V Selfish service providers
In the previous section, the distributed interservice bandwidth allocation works by letting each FL service provider compute the allocated bandwidth given . This, however, creates an opportunity for a selfish service provider to misreport its computation result that favors itself but reduces the system performance as a whole. In fact, even if the interservice bandwidth allocation problem (2) is solved in a centralized way, similar selfish behavior may still undermine the efficient system operation as a selfish service provider may misreport its FL service and client parameters (e.g., FL workload, client computing power and channel gains etc.), which will alter the frequency function used at the operator side. With a wrong frequency function , the operator will not be able to determine the true optimal bandwidth allocation.
In this section, we address the selfishness issue in interservice bandwidth allocation by designing a multibid auction mechanism. This auction mechanism will ensure that the FL service providers are using their true FL frequency functions when making bandwidth bids.
Va Multibid Auction
First, we describe the general rules of the multibid auction mechanism.
VA1 Bidding
At the beginning of each bandwidth allocation period, each service provider submits a set of bids . For each , is a twodimensional bid, where is the requested bandwidth and is the unit price that service provider is willing to pay to get the requested bandwidth . Without loss of generality, we assume that bids are sorted according to the price such that . Let denote the set of multibids that a service provider can submit.
VA2 Bandwidth Allocation and Charges
Once the network operator collects all multibids from all service providers, denoted by , it computes and implements the interservice bandwidth allocation . Each service provider then further allocates to its clients to perform FL. At the end of the period, the network operator determines the charges for all service providers depending on the allocated bandwidth and the realized FL performance.
Now, a couple of issues remain to be addressed. First, how to compute the bandwidth allocation and determine the charges given the service providersubmitted multibids? Second, do the service providers have incentives to truthfully report their valuations of the bandwidth? These are the questions to be addressed in the next subsections.
VB Market Clearing Prices with Full Information
We first consider a simpler case where the service providers truthfully report the complete FL frequency function to the network operator. This analysis will provide us with insights on how to design bandwidth allocation and charging rules in the more difficult multibid auction case.
Recall that is the optimal FL frequency of service if it has bandwidth . Taking into account the price paid to obtain this bandwidth, the (net) utility of service provider is
(17) 
Now, if the bandwidth were sold at the unit price , then service provider would buy bandwidth in order to maximize its utility. We call the bandwidth demand function (BDF), and it is easy to show that by checking the firstorder condition of Eqn. (17). On the other hand, if service provider requires a bandwidth , then the service provider would pay a unit price no more than . We call the marginal valuation function (MVF).
VB1 Market clearing price
With the complete information of and hence BDF for all service providers, the network operator can compute the market clearing price (MCP) so that . One can prove that the MCP is unique and optimal in the sense that it maximizes the total (equivalently, average) FL frequency.
Proposition 2.
The market clearing price is unique and maximizes the total FL frequency .
Proof.
According to Lemma 1, is an increasing function. Therefore, the BDF, which is the inverse function of is also increasing. As a result, there exists a unique solution to the increasing function .
To show that maximizes , consider the following maximization problem
(18) 
This is clearly a convex optimization problem. Consider its KarushKuhnTucker conditions. In particular, the stationarity condition is
(19) 
where is the Lagrangian multiplier associated with the constraint. The solution requires
(20) 
Together with the feasibility constraint, this is equivalent to imposing a homogeneous market clearing price. ∎
Because is a monotonically decreasing function in , a bisection algorithm can be easily designed to find the unique market clearing price so that .
VB2 Fairnessadjusted costs
One major issue with the above pricing scheme is that it ignores fairness among the service providers: although it maximizes efficiency in terms of the average FL frequency according to Proposition 2, it is possible that the average FL frequency is maximized at an operating point where a few service providers are allocated with most of the bandwidth while some service providers obtain very little bandwidth. In this paper, we design and incorporate a fairnessadjusted charging scheme into the above pricing scheme. The payment of service provider now consists of two parts as follows:

The first part of the payment depends on the amount of bandwidth allocated to the service provider , and the unit price set by the operator. Specifically, this payment is .

The second part of the payment depends on the realized FL frequency of service provider . Specifically, service provider will be charged a fairnessadjusted cost of at the end of the period once has been realized, where is a tunable parameter.
With these payments, service provider ’s utility becomes
(21) 
where . Comparing this new utility function Eqn. (21) with Eqn. (17), we make the following remarks. First, the fairnessadjusted cost essentially replaces with . The decision problem remains largely the same except that now we have a different benefit function. Second, in the new utility function Eqn. (17), given any allocated bandwidth , it is still in the service provider’s interest to perform the optimal clientlevel bandwidth allocation to maximize . This is because is an increasing function in for . Therefore, we can directly write as a function of the optimal FL frequency . Third, to charge the fairnessadjusted cost, the network operator does not need to know the exact function . Rather, it only has to know the realized FL frequency at the end of the current period. This is key to achieving fairness in multibid auction where FL service providers do not report the complete FL frequency function .
We call the modified bandwidth demand function (mBDF). Likewise, we call the modified marginal valuation function (mMVF). The network operator can similarly compute the modified market clearing price (mMCP) so that . Using a similar argument that proves Proposition 2, one can prove Proposition 3 as follows.
Proposition 3.
The mMCP is unique and the resulting bandwidth allocation maximizes .
Proof.
Because is a concave increasing function, is also concave and increasing. This further shows that is concave and increasing. Following similar arguments in the proof of Theorem 2 proves the bandwidth allocation as a result of mMCP maximizes . ∎
The parameter makes a tradeoff between efficiency and fairness. On the one hand, setting reduces the problem to the total FL frequency maximization problem. On the other hand, setting achieves proportional fairness among the service providers.
VC Bandwidth Allocation and Charging Rules
Now, we are ready to describe the bandwidth allocation and charging rules in fairnessadjusted multibid auction. In this subsection, each service provider submits only a multibid instead of the complete FL frequency function . However, we will assume that the service providers are truthfully submitting their bids, which will be proven indeed true in the next subsection. Specifically, we say that a bid is truthful if the bandwidth demand and the price that FL service provider is willing to pay satisfy the mBDF because it reveals FL service provider ’s true valuation of bandwidth after taking into consideration the fairnessadjusted costs. A multibid is truthful if all bids are truthful.
Definition 1.
(Truthful Multibid) A multibid is truthful if , is such that .
The network operator does not know the BDF (and hence the mBDF) of each FL service provider because it does not have access to the FL frequency function . Nonetheless, suppose service provider submitted a truthful multibid , then the operator can compute a pseudomBDF using these bids to have some idea of the actual mBDF. Specifically, given the submitted multibid , a leftcontinuous step function can be used to describe the pseudomBDF as follows,
(22) 
Essentially, the pseudomBDF uses to approximate the bandwidth demand for prices in the range . Similarly, the operator can also construct a pseudomMVF (pseudoMVF), an approximation of service provider ’s actual mMVF using the submitted multibid, as follows,
(23) 
In other words, the pseudomMVF uses to approximate the marginal value for bandwidth allocation in the range . We illustrate the pseudomBDF and pseudomMVF in Figure 3.
The aggregated pseudomBDF is the sum of pseudomBDFs of all FL service providers:
(24) 
The pseudomMCP is the largest possible price so that the aggregated pseudomBDF exceeds the total available bandwidth, i.e.,
(25) 
This implies that reducing the mMCP by just a little bit will result in the supply (i.e., the total available bandwidth ) being no greater than the demand. Because every individual pseudomBDF function is a step function with steps, the aggregated pseudomBDF is also a step function with at most steps. Therefore, the complexity of computing is at most .
Next, we describe our bandwidth allocation and charging rules. For notational convenience, we denote when this limit exists for a function and all .
VC1 Bandwidth allocation
With the pseudomMCP , our bandwidth allocation rule is as follows: if FL service provider submits the multibid (and thereby declares the associated functions and ), then it receives bandwidth , with
(26) 
In other words: (1) Each FL service provider receives an amount of bandwidth it asks for at the lowest price for which supply exceeds the pseudobandwidth demand. (2) If all bandwidth is not allocated yet, the surplus is shared among service providers. This share is done proportionally to as we notice that , and ensures that all bandwidth is allocated.
VC2 Charging
Given the submitted multibids , each service provider is charged a payment as follows,
(27) 
The first term on the righthand side is based on the exclusioncompensation principle in secondprice auction mechanisms [33]: service provider pays so as to cover the “social opportunity cost”, namely the loss of utility it imposes on all other service providers by its presence. The second term on the righthand side is the fairnessadjusted cost, which is charged at the end of each period after the actual FL frequency is realized and observed.
Considering both the achieved FL frequency and the payment, FL service provider ’s utility is therefore
(28) 
VD Incentives of Truthful Reporting
In the previous subsection, we assumed that the every service provider truthfully submits its bid. Now, we prove that this assumption indeed “approximately” holds under the designed bandwidth allocation and charging rules.
We first study the individual rationality of the designed mechanism.
Definition 2.
A mechanism is said to be individual rational if no service provider can be worse off from participating in the auction than if it had declined to participate.
Proposition 4.
If FL service provider submits a truthful multibid , then .
Proof.
Next, we show that truthful reporting is approximately incentive compatible, i.e., a service provider cannot do much better than simply reveal its true valuation.
Proposition 5.
Consider any truthful multibid for service provider , and any other multibid , , we have
(30) 
where
(31) 
with and .
Proof.
The proof follows [Proposition 2, [20]]. ∎
The above proposition shows that if service provider submits a truthful multibid , then every other multibid necessarily corresponds to an increase of utility no larger than . In other words, a truthful bidding brings service provider the best utility possible up to a gap . Importantly, this value does not depend on the number of other service providers or the multibids they submit. In the game theoretic terminology, the situation where all service providers submit truthful multibids is an ex post Nash equilibrium, where , in the sense that no service provider could have improved its utility by more than if it had submitted a different multibid.
VE An Uniform MultiBidding Example
To conclude the multibid auction mechanism design, we illustrate a uniform multibidding approach as an example of how to decide the multibid of an individual service provider. Instead of having the service provider submitting both prices and bandwidth requests, the operator can announce prices to service provider and let service provider report its requested bandwidth at these price points. This way, the operator has a better control over how the service providers make multibids to avoid multibids that may result in a large , which may reduce service provider’s incentives to truthfully report. Because the operator does not know the demand function of service provider
, a natural approach is to uniformly distribute these
prices in the range where is the largest price at which the service provider may still request a positive amount of bandwidth. Specifically,(32) 
Assume that the network operator has prior knowledge , , , and on the lower/upper bounds on the parameters, then can be upper bounded by
(33) 
Thus, the operator can set the uniform prices as
(34) 
Note that there is an intrinsic tradeoff on the choice of . On the one hand, a large allows the pseudoBDF and pseudoMVF to more accurately reflect the true BDF and MVF at an increased complexity and signaling overhead. On the other hand, a smaller makes multibiding easier but the discrepancy between the pseudo functions and the true functions will introduce a larger performance loss.
Vi Simulations
In this section, we conduct simulations to evaluate the performance of the proposed methods.
Via Simulation Setup
The simulated wireless network adopts an OFDMA system with a total bandwidth of MHz. The period length is set as
. The number of clients of a FL service is drawn from a Normal distribution with mean 25. In every period, a new FL task may start following a scheduled plan, which is defined by a Poisson distribution with the mean interval
. By tuning , we adjust the FL service demand, and a smallerwill more likely lead to more concurrent FL services in a period as an FL service often lasts multiple periods. Each FL service has a predetermined target training accuracy, and when the accuracy reaches the target, the FL service terminates and exits the wireless network. The clients’ wireless channel gain is modeled as independent freespace fading where the average path loss is from a Normal distribution with different mean and variance in different circumstances. The variance of the complex white Gaussian channel noise is set as
. For each client, the local training time is uniformly randomly drawn from s. We fix the global aggregation time to be. We consider typical neural network sizes in the range of
Mbits. The upload transmission power is uniformly randomly between 0.05 and 0.15 W, and the download transmission power is uniformly randomly between 0.1 and 0.3 W.ViB Convergence of DISBA in the Cooperative Case
We first illustrate the convergence behavior of DISBA in the cooperative FL service provider case in a representative period with 5 concurrent FL services. These services have 10, 12, 14, 16, 18 clients, respectively. In Figure 5, we show the computed FL frequency for each service provider before convergence. As Figure 5 shows, the bandwidth allocation quickly converges to the optimal allocation for a convergence tolerance gap . Eventually, the resulting FL frequencies of these FL services in this period are reported in Table I. We further show in Table II the computation time of DISBA for different values of the tolerance gap and step size. The time values are measured on a desktop computer with Intel Core i59400 2.9GHz GPU and 16GB memory.
Service Index  Number of Clients  Bandwidth Ratio  Frequency 
1  10  0.182  113 
2  12  0.196  107 
3  14  0.209  102.6 
4  16  0.205  90.4 
5  18  0.205  81.2 
Tolerated Gap  Step Size  # of Iterations  Time(s) 

1e3  0.1  131  0.332 
1e3  0.5  37  0.094 
5e3  0.1  72  0.169 
5e3  0.5  26  0.069 
ViC Fairnessadjusted Multibid Auction in the Selfish Case
We perform fairnessadjusted multibid auction in the same representative period as in the last subsection, with and . The pseudomBDFs of the FL service providers and the aggregated pseudomBDF are illustrated in Figures 7 and 7, respectively. The pseudoMCP is also shown in Figure 7. Table III reports the resulting bandwidth allocation and achieved FL frequency.
Service Index  Number of Clients  Bandwidth Ratio  Frequency 

1  10  0.164  105.82 
2  12  0.177  99.52 
3  14  0.217  105.46 
4  16  0.218  94.4 
5  18  0.223  86.56 
As we briefly mentioned in Section V, there is a tradeoff when selecting the number of bids . On the one hand, a larger increases the computational complexity for searching for the pseudoMCP and determining the eventual bandwidth allocation. On the other hand, a larger improves the precision of the pseudoMCP, thereby improving the allocation performance. In Figure 8, we demonstrate the overall performance by varying . As can be seen, as increases, the overall performance will increase while each FL service provider needs to submit more bids to the server which will cause transmission delays and data backlogs.
The parameter plays an important role in the selfish owner case, which makes a tradeoff between efficiency and fairness. With a larger , the whole system sees fairness as more important, and conversely, the whole system is more concerned with the overall efficiency. The market clearing price is reflected in Figure 10 and the overall utility is shown in Figure 10. With the increase of , the market clearing price and the total utility will decrease, which can be treated as a concession to achieve fairness between different FL services.
ViD Performance Comparison
In the following experiments, we compare our proposed algorithms with three benchmark algorithms.

EqualClient (EC): Bandwidth is equally allocated to the clients. Therefore, each client gets a bandwidth of .

EqualService (ES): Bandwidth is equally allocated to the FL services. That is, each FL service gets a bandwidth of . However, each FL service provider still performs the optimal intraservice bandwidth allocation among its clients.

Proportional (PP): Each FL service obtains a bandwidth that is proportional to the number of its client. That is, FL service obtains a bandwidth of . This bandwidth is further allocated among its clients following the optimal intraservice bandwidth allocation.
We start by comparing the proposed algorithms with benchmarks in the perperiod setting. The overall performance is shown in Figure 11. In this setting, there are five FL services with a random number of clients drawn from a Normal distribution with mean 20 and variance 10 and random channel conditions drawn from a Normal distribution with mean 85 and variance 15, and the result is averaged over 20 runs. As can be seen, our DISBA algorithm for the cooperative case (labeled as Coop) has the best performance, and the auction mechanism for the selfish case (labeled as Self) also outperforms the other benchmarks. Although ES and PP also perform the intraservice bandwidth allocation, the heterogeneity of the client number and channel conditions render them suboptimal.
Because FL is a longterm process, we further investigate the longterm performance of the proposed algorithms. In the longterm setting, 10 FL services join the wireless network at different times controlled by the parameterized Poisson process and the FL service will be removed from the wireless network when its test accuracy has converged. Although the convergence of FL is complexly affected by many factors including the adopted FL algorithm, dataset and the selected clients, we assume that each of these 10 FL services require 2000 FL rounds, which is a typical value observed in the literature [11], to reach convergence in order to provide a meaningful comparison of the algorithms in a controlled environment. Whenever a FL service has been run for 2000 rounds, it exits the system.
Figure 12 illustrates the average duration (in terms of the number of periods) of all FL services by running different algorithms for , where the client number of a FL service is drawn from a Normal distribution with mean 25 and variance 15 and the channel condition of a FL service is drawn from a Normal distribution with mean 85 and variance 15. The results are averaged over 20 runs. We can see that the proposed algorithms achieve the smallest average duration compared to the benchmarks, confirming their fast FL convergence even in the longrun.
Next, we study the impact of the client number heterogeneity (which reflects the FL service size heterogeneity) on the performance of different algorithms. To this end, the client number of a FL service is drawn from a Normal distribution with mean 25 and we change the variance between 0 and 15 to adjust the heterogeneity degree. The result is shown in Figure 13
: as the variance increases (i.e. a higher degree of heterogeneity), the mean of the average duration decreases, while the standard deviation of average duration increases. This is understandable because a higher degree of heterogeneity causes wireless bandwidth to be more unevenly distributed among the FL services, thereby degrading the overall FL performance. Notably, the performance gain of our proposed algorithms increases as the variance increases, which demonstrates the superior ability of our algorithms to handle the heterogeneous case.
Furthermore, we also investigate the impact of the channel condition heterogeneity on the FL performance. In these simulations, the average channel condition of a FL service is drawn from a Normal distribution with mean 85 and we change the variance between 0 and 15 to adjust the heterogeneity degree. The channel conditions of clients of this FL are further drawn from a Normal distribution with a mean being the instantiated average channel condition. In Figure 14, we observe a similar phenomenon as in Figure 13, which further confirms the advantage of adopting our proposed algorithms.
Finally, we study the influence of the mean arrival interval parameter on the resulting average FL duration. in Figure 15, with the increasing of , the average duration of the FL services decreases. This is because when is small, many FL services pile up and coexist in the wireless network, thereby reducing the wireless bandwidth an individual FL service can receive.
Vii Conclusion
This paper studied a bandwidth allocation problem for multiple FL services in a wireless network, which has not been well studied in the literature. The considered problem consists of two interconnected subproblems, intraservice resource allocation, and interservice resource allocation. By solving these problems, we optimally allocate bandwidth resources to multiple FL services and their corresponding clients to speed up the training process and meanwhile guarantee fairness for both cooperative and selfish FL service providers cases. Our method has shown superior performance compared to the benchmarks. However, there are several future research works that can be done to extend the impact of this work. For example, this paper takes FL frequency as the key metric to be optimized, but the true performance of FL is affected by the dataset, federated optimization algorithm, and many others. In addition, when a client can simultaneously participate in multiple FL services, resource allocation has to consider both the wireless bandwidth and client computing resources.
References
 [1] (2017) Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021. Cited by: §II.
 [2] (2020) Convergence time optimization for federated learning over wireless networks. arXiv preprint arXiv:2001.07845. Cited by: §II.
 [3] (2020) A joint learning and communications framework for federated learning over wireless networks. IEEE Transactions on Wireless Communications. Cited by: §I.
 [4] (2018) LAG: lazily aggregated gradient for communicationefficient distributed learning. In Advances in Neural Information Processing Systems, pp. 5050–5060. Cited by: §II.
 [5] (2019) Joint service pricing and cooperative relay communication for federated learning. In 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 815–820. Cited by: §II.
 [6] (2018) Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866. Cited by: §II.
 [7] (2019) On the convergence of local descent methods in federated learning. arXiv preprint arXiv:1910.14425. Cited by: §II.
 [8] (2019) Incentive design for efficient federated learning in mobile networks: a contract theory approach. In 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS), pp. 1–5. Cited by: §II.
 [9] (2019) Scaffold: stochastic controlled averaging for ondevice federated learning. arXiv preprint arXiv:1910.06378. Cited by: §II.
 [10] (2019) Blockchained ondevice federated learning. IEEE Communications Letters 24 (6), pp. 1279–1283. Cited by: §II.
 [11] (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §I, §VID.
 [12] (2020) Auction based incentive design for efficient federated learning in cellular wireless networks. In 2020 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6. Cited by: §II.
 [13] (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 37 (3), pp. 50–60. Cited by: §II.
 [14] (2018) Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127. Cited by: §II.
 [15] (2019) Fair resource allocation in federated learning. arXiv preprint arXiv:1905.10497. Cited by: §II.
 [16] (2019) On the convergence of fedavg on noniid data. arXiv preprint arXiv:1907.02189. Cited by: §II.
 [17] (2020) Federated learning in mobile edge networks: a comprehensive survey. IEEE Communications Surveys & Tutorials. Cited by: §II.
 [18] (2017) Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887. Cited by: §II.
 [19] (2020) Clientedgecloud hierarchical federated learning. In ICC 20202020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §II.
 [20] (2004) Multibid auctions for bandwidth allocation in communication networks. In IEEE INFOCOM 2004, Vol. 1. Cited by: §II, §VD, §VD.
 [21] (1999) Bandwidth sharing: objectives and algorithms. In IEEE INFOCOM’99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No. 99CH36320), Vol. 3, pp. 1395–1403. Cited by: §II, §IV.
 [22] (2017) Communicationefficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §I.
 [23] (2019) Agnostic federated learning. arXiv preprint arXiv:1902.00146. Cited by: §II.
 [24] (2020) Toward multiple federated learning services resource sharing in mobile edge networks. arXiv preprint arXiv:2011.12469. Cited by: §II.
 [25] (2019) Client selection for federated learning with heterogeneous resources in mobile edge. In ICC 20192019 IEEE International Conference on Communications (ICC), pp. 1–7. Cited by: §II.
 [26] (2006) A tutorial on decomposition methods for network utility maximization. IEEE Journal on Selected Areas in Communications 24 (8), pp. 1439–1451. Cited by: §II, §IVB.
 [27] (2019) Wireless network intelligence at the edge. Proceedings of the IEEE 107 (11), pp. 2204–2239. Cited by: §I.
 [28] (2019) Motivating workers in federated learning: a stackelberg game perspective. IEEE Networking Letters 2 (1), pp. 23–27. Cited by: §II.
 [29] (2019) Robust and communicationefficient federated learning from noniid data. IEEE transactions on neural networks and learning systems. Cited by: §II.
 [30] (2020) Device scheduling with fast convergence for wireless federated learning. In ICC 20202020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §II.
 [31] (2017) Federated multitask learning. Advances in neural information processing systems 30, pp. 4424–4434. Cited by: §II.
 [32] (2019) Federated learning over wireless networks: optimization model design and analysis. In IEEE INFOCOM 2019IEEE Conference on Computer Communications, pp. 1387–1395. Cited by: §I, §II.
 [33] (1961) Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance 16 (1), pp. 8–37. Cited by: §VC2.
 [34] (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 37 (6), pp. 1205–1221. Cited by: §II.
 [35] (2020) Client selection and bandwidth allocation in wireless federated learning networks: a longterm perspective. arXiv preprint arXiv:2004.04314. Cited by: §II.
 [36] (2020) Federated learning via overtheair computation. IEEE Transactions on Wireless Communications 19 (3), pp. 2022–2035. Cited by: §II.
 [37] (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–19. Cited by: §II.
 [38] (2020) Energyefficient radio resource allocation for federated edge learning. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1–6. Cited by: §II.

[39]
(2020)
Experiencedriven computational resource allocation of federated learning by deep reinforcement learning
. In Proc. of IPDPS, Cited by: §II.  [40] (2018) Federated learning with noniid data. arXiv preprint arXiv:1806.00582. Cited by: §II.
Comments
There are no comments yet.