Bandwidth Allocation for Multiple Federated Learning Services in Wireless Edge Networks

01/10/2021
by   Jie Xu, et al.
1

This paper studies a federated learning (FL) system, where multiple FL services co-exist in a wireless network and share common wireless resources. It fills the void of wireless resource allocation for multiple simultaneous FL services in the existing literature. Our method designs a two-level resource allocation framework comprising intra-service resource allocation and inter-service resource allocation. The intra-service resource allocation problem aims to minimize the length of FL rounds by optimizing the bandwidth allocation among the clients of each FL service. Based on this, an inter-service resource allocation problem is further considered, which distributes bandwidth resources among multiple simultaneous FL services. We consider both cooperative and selfish providers of the FL services. For cooperative FL service providers, we design a distributed bandwidth allocation algorithm to optimize the overall performance of multiple FL services, meanwhile cater to the fairness among FL services and the privacy of clients. For selfish FL service providers, a new auction scheme is designed with the FL service owners as the bidders and the network provider as the auctioneer. The designed auction scheme strikes a balance between the overall FL performance and fairness. Our simulation results show that the proposed algorithms outperform other benchmarks under various network conditions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 12

page 14

page 15

page 16

page 20

page 21

page 22

04/14/2021

Resource Rationing for Wireless Federated Learning: Concept, Benefits, and Challenges

We advocate a new resource allocation framework, which we term resource ...
04/09/2020

Client Selection and Bandwidth Allocation in Wireless Federated Learning Networks: A Long-Term Perspective

This paper studies federated learning (FL) in a classic wireless network...
08/05/2021

On Addressing Heterogeneity in Federated Learning for Autonomous Vehicles Connected to a Drone Orchestrator

In this paper we envision a federated learning (FL) scenario in service ...
02/01/2021

Relational Consensus-Based Cooperative Task Allocation Management for IIoT-Health Networks

IIoT services focused on industry-oriented services often require object...
01/22/2019

Elastic Multi-resource Network Slicing: Can Protection Lead to Improved Performance?

In order to meet the performance/privacy requirements of future data-int...
03/19/2020

Federated Learning for Task and Resource Allocation in Wireless High Altitude Balloon Networks

In this paper, the problem of minimizing energy and time consumption for...
02/04/2020

Distributed Resource Allocation for Network Slicing of Bandwidth and Computational Resource

Network slicing has been considered as one of the key enablers for 5G to...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Today’s mobile devices are generating an unprecedented amount of data every day. Leveraging the recent success of machine learning (ML) and artificial intelligence (AI), this rich data has the potential to power a wide range of new functionalities and services, such as learning the activities of smart phone users, predicting health events from wearable devices or adapting to pedestrian behavior in autonomous vehicles. With the help of multi-access edge computing (MEC) servers, ML models can be quickly trained/updated using this data to adapt to the changing environment without moving the data to the remote cloud data center, which is envisioned in intelligent next-generation communication systems

[27]. Furthermore, due to the growing storage and computational power of mobile devices as well as privacy concerns associated with uploading personal data, it is increasingly attractive to store and process data directly on mobile devices. Federate learning (FL) [11] is thus proposed as a new distributed ML framework, where mobile devices collaboratively train a shared ML model with the coordination of an edge server while keeping all the training data on device, thereby decoupling the ability to do ML from the need to upload/store the data to/in a public entity.

A typical FL service involves a number of mobile devices (a.k.a., participating clients) and an edge server (a.k.a., a parameter server) to train a ML model, which lasts for a number of learning rounds. In each round, the clients download the current ML model from the server, improve it by learning from their local data, and then upload the individual model updates to the server; the server then aggregates the local updates to improve the shared model. For example, the seminal work [22] proposed the FedAvg algorithm in which the global model is obtained by averaging the parameters of local models. Although other FL algorithms differ in the specifics, the majority of them follow the same procedure. Because the clients work in the same wireless network to download and upload models, how to allocate the limited wireless bandwidth among the participating clients has a crucial impact on the resulting FL speed and efficiency. Therefore, resource allocation for wireless FL systems is attracting much attention recently in the wireless communications community [3, 32]. Compared to resource allocation in traditional throughput-maximizing wireless networks, the resource allocation objective and outcome become considerably different for wireless FL due to its unique requirements and characteristics.

Fig. 1: System Overview.

Although existing works have made meaningful progress towards efficient resource allocation for wireless FL, they share the common limitation that only a single FL service was considered. As ML-powered applications grow and become more diverse, it is anticipated that the wireless network will host multiple co-existing FL services, the set of which may also dynamically change over time. See Figure 1 for an illustration of the multi-FL service scenario. The presence of multiple FL services makes resource allocation for wireless FL much more challenging. First, the achievable FL performance depends on not only intra-service resource allocation among the participating clients within each FL service but also inter-service resource allocation among different FL services, and these two levels of allocation decisions are also strongly coupled. Second

, the FL service providers may adopt different FL algorithms and choose different configurations (e.g., number of participating clients, number of epochs of local training, etc.), yet this information is not always available to the wireless network operator due to privacy concerns when making resource allocation decisions.

Third, because FL service providers have their individual goals, they may have incentives to untruthfully reveal their valuation of the wireless bandwidth if by doing so they gain advantages in the inter-service bandwidth allocation. Without the correct information, there is no guarantee on the overall system performance. Finally, as in any multi-user system, resource allocation should strike a good balance between efficiency and fairness – every FL service provider should obtain a reasonable share of the wireless resource to train their ML models using FL.

In this paper, we make an initial effort to study wireless FL with multiple co-existing FL services, which share the same bandwidth to train their respective ML models. Our focus is on the efficient bandwidth allocation among different FL services as well as among the participating clients within each FL service, thereby understanding the interplay between these two levels of allocation decisions. Our main contributions are summarized as follows:

  • We formalize a two-level bandwidth allocation problem for multiple FL services co-existing in the wireless network, which may start and complete at different time depending on their own demand and FL requirements. The model is general enough for any FL algorithm that involves downloading, local learning, uploading and global aggregation in each learning round, and hence has wide applicability in real-world systems. In addition, we explicitly take fairness into consideration when optimizing bandwidth allocation to ensure that no FL service is starved of bandwidth.

  • We consider two use cases depending on the nature/goals of the FL service providers. In the first case, FL service providers are fully cooperative to maximize the overall system performance. For this, we design a distributed optimization algorithm based on dual decomposition to solve the two-level bandwidth allocation problem. The algorithm keeps all FL-related information at the individual FL service provider side without sharing it with the network operator, thereby reducing the communication overhead and enhancing privacy protection.

  • We further consider a second case where FL service providers are selfishly maximizing their own performance. To address the selfishness issue, we design a multi-bid auction mechanism, which is able to elicit the FL service providers’ truthful valuation of bandwidth based on their submitted bids. With a fairness-adjusted ex post charge, the proposed auction mechanism is able to make a tunable trade-off between efficiency and fairness.

The rest of this paper is organized as follows. Section II discusses related works. Section III builds the system model. Section IV formulates the problem for the cooperative case and develops a distributed bandwidth allocation algorithm. Section V studies the selfish service providers case and develops a multi-auction mechanism. Section VI performs simulations. Concluding remarks are made in Section VII.

Ii Related Work

A lot of research has been devoted to tackling various challenges of FL, including but not limited to developing new optimization and model aggregation methods [9, 14, 7], handling non-i.i.d. and unbalanced datasets [16, 40, 29], dealing with the straggler problem [31], preserving model and data privacy [6, 10], and ensuring fairness [23, 15]. A comprehensive review of these challenges can be found in [13, 17, 37]. In particular, the communication aspect of FL has been recognized as a primary bottleneck due to the tension between uploading a large amount of model data for aggregation and the limited network resource to support this transmission, especially in a wireless environment. In this regard, early research on communication-efficient FL largely focuses on reducing the amount of transmitted data while assuming that the underlying communication channel has been established, e.g., updating clients with significant training improvement [4]

, compressing the gradient vectors via quantization

[18], or accelerating training using sparse or structured updates [1]. More recent research starts to address this problem from a more communication system perspective, e.g., using a hierarchical FL network architecture [19] that allows partial model aggregation, and leveraging the wireless transmission property to perform analog model aggregation over the air [36].

As wireless networks are envisioned as a main deployment scenario of FL, wireless resource allocation for FL is another active research topic. Many existing works [34, 39, 32] study the trade-off between local model update and global model aggregation. Client selection is essential to enable FL at scale and address the straggler problem. Different types of joint bandwidth allocation and client scheduling policies [35, 38, 30, 25, 2] have been proposed to either minimize the training loss or the training time. In all these works, resource allocation is carried out among clients of a single FL service, while assuming that the FL service itself has already received dedicated resource. In stark contrast, our paper studies a network consisting of multiple co-existing FL services and performs resource allocation at both the FL service level and the client level. We notice that a related problem where multiple FL services are being trained at the same time is also considered in a recent work [24]. In that paper, different FL services run on the same set of clients and a joint computation and communication resource scheduling problem is studied. In our paper, different FL services have their separate client sets which may experience very different channel qualities and hence we focus only on the bandwidth allocation problem. Moreover, while [24] assumed that all clients are obedient, we study the possible selfish nature of FL service providers and highlight bandwidth allocation fairness.

Considering each FL service as a “user”, our problem is a special type of resource allocation problems for multi-user wireless networks. While many concepts and techniques adopted in this paper, including proportional fairness [21], dual decomposition [26] and multi-bid auction [20], have seen applications in other multi-user wireless resource allocation domains, applying them in multi-service FL requires special treatment as two levels of resource allocation are involved in our problem. In particular, there is no closed-form expression of how the performance (i.e., learning speed) of a FL service depends on the resource allocation among its clients. Therefore, understanding the inter-dependency of intra-service and inter-service bandwidth allocation is essential. Furthermore, we put an emphasis on the resource fairness among different FL services by designing a new fairness-adjusted multi-bid auction mechanism in the selfish FL service provider case, thereby achieving a tunable tradeoff between efficiency and fairness. We point out that there are some existing works [5, 28, 8, 12] on designing incentive mechanisms for client participation of a single FL service. These works are very different from our paper in terms of both the problem and the approaches, and do not consider fairness when designing the mechanism.

Iii System Model

We consider a wireless network where machine learning models are trained using Federated Learning (FL). The wireless network has a total bandwidth , and the network operator has to allocate this bandwidth among concurrent FL services when needed to enable their individual training. Because new FL services may start and old FL services may finish over time, bandwidth allocation has to be periodically performed to adapt to the current active FL services. Therefore, we divide time into periods and let the length of a period be . At the beginning of each period , a set of FL services are active and require wireless bandwidth to carry out their training. These services are either newly initiated services in period or continuing services from the previous period. A FL service finishes and hence exits the wireless network when a certain termination criteria is satisfied (e.g., the training loss is below a threshold, the testing accuracy is above a threshold, or other convergence criterion), which usually varies across FL services and are pre-specified by the corresponding service provider. Therefore, a FL service may span multiple periods. The wall clock time (i.e. the number of periods) that a FL service takes to finish depends on the difficulty and other inherent characteristics of the service itself as well as how much wireless resource is allocated to this service in each period for which it stays and how this bandwidth is further allocated among its participating clients. In what follows, we first formulate the client-level (i.e., intra-service) bandwidth allocation problem and then describe the service-level (i.e., inter-service) bandwidth allocation problem.

Iii-a Intra-Service Bandwidth Allocation

To understand how bandwidth allocation affects FL performance, let us consider a single representative FL service in one period (period index is dropped for conciseness). Suppose that this service is allocated with a bandwidth in this period, which is further allocated among its participating clients, the set of which is denoted by . For each client , let be its computing speed, and and be the uplink and downlink wireless channel gains to the parameter server of service , respectively, which are assumed to be invariant within a period. We consider a synchronized FL model for each FL service, where a number of FL rounds take place in a period. Nonetheless, different FL services do not have to be synchronized – they learn at their own pace. See Figure 2 for an illustration.

Fig. 2: Bandwidth Allocation among Multiple FL Services.

A FL round consists of four stages: download transmission, local computation, upload transmission and global computation:

  • Download Transmission (DT). Each FL round starts with a DT stage in which each client downloads the current global model from its parameter server residing on the base station. Suppose client is allocated with bandwidth , then its DT rate is following Shannon’s equation, where is the transmission power of parameter server and is the noise power. For notational convenience, we denote as the DT base rate of client . Let be the download data size (e.g., the size of the global model), then the DT latency is .

  • Local Computation (LC). With the current global model, each client then updates its local model using its local dataset. Depending on the ML model complexity, the local dataset size and the number of episodes in local training, the per-round local computation workload is denoted by . Therefore, the LC latency of client is .

  • Upload Transmission (UT). Once local update is finished, client transmits the result to the parameter server . Given the bandwidth , its UT rate is , where is the transmission power of client and is the noise power. Again, for notational convenience, we denote as the UT base rate of client . Let be the data size that has to be transmitted to the parameter server, then the UT latency of client is .

  • Global Computation (GC). Finally, once the local updates of all clients are received by parameter server , the global model is updated. Let be the global model update workload and be the computing speed of parameter server , then the GC latency is .

Note that our framework is applicable to a vast set of FL algorithms (e.g., FedAvg, FedSGD) that can be chosen for service . For instance, the downloaded/uploaded data may be the model itself, the compressed version of the model, or the model gradient information. For the purpose of bandwidth allocation, it is sufficient to describe the FL service as a tuple .

In synchronized FL, the parameter server updates the global model until it has received the local updates from all participating clients. Hence, the length of a FL round of service is determined by the total latency of the slowest client, i.e. . To minimize the FL round length of service so that more FL rounds can be executed in a period, one has to optimally allocate bandwidth among the clients of service . Given , the intra-service bandwidth allocation problem can be formulated as

(1)

Let denote the optimal solution to Eqn. (1). Then the optimal FL frequency of service is , which is used to represent the FL speed of service . Note that this means FL rounds can be performed in one period.

Iii-B Inter-Service Bandwidth Allocation

In a period, multiple active FL services may be active and require wireless bandwidth to carry out learning. Since they share a total bandwidth , how this bandwidth is allocated among different services will determine their achievable learning frequencies , thus the convergence speed in terms of the wall clock time. In this paper, we consider two scenarios depending on the goals of the FL service providers and how inter-service bandwidth allocation is implemented. In the first scenario, all FL service providers are cooperative, and their goal is to maximize the FL performance of the overall system. Therefore, it is equivalent to the network operator solving a system-wide optimization problem. In the second scenario, the FL service providers are selfish who care about only their own FL performance. As these service providers are competing for the limited bandwidth resource, addressing their incentive issues is crucial. In this paper, we design a fairness-adjusted multi-bid auction mechanism for the inter-service bandwidth allocation in this case. In the following two sections, we discuss these two scenarios separately.

Iv Cooperative service providers

In the cooperative service providers scenario, the network operator directly decides the bandwidth allocation to maximize the overall system performance. As in any multi-user network, bandwidth allocation for multi-service FL has to address both efficiency and fairness – every active FL service should get a reasonable share of the bandwidth. Thus, we adopt the notion of proportional fairness [21], a metric widely used in multi-user resource allocation, and aim to solve the following optimization problem:

subject to (2)

where we drop the period index and let be the number of active FL services in the period for conciseness. The objective function adds a “1” inside the logarithmic to ensure that the function value is always non-negative. This change has very little impact on the final allocation since the frequency is often much larger than 1 in a period. Note also that the above inter-service bandwidth allocation problem Eqn. (2) implicitly incorporates the intra-service problem as is the solution to Eqn. (1).

Iv-a Optimal Solution to the Intra-Service Problem

We first investigate the optimal solution to the intra-service bandwidth allocation problem and see how it can be used to solve the inter-service problem. According to our system model and Eqn. (1), the intra-service bandwidth allocation is equivalent to

(3)
subject to (4)
(5)

where we let and for notational convenience. Clearly, the optimal solution must satisfy

(6)

Therefore, the optimal solves the following equality,

(7)

Although we do not have a closed-form solution of , a bi-section algorithm can be constructed to easily solve the above problem to obtain the optimal and consequently the optimal frequency as a function of . Furthermore, the property of can be characterized in the following lemma.

Lemma 1.

is a differentiable, increasing and concave function for .

Proof.

Let us consider the inverse function defined by Eqn. (7). It is easy to see that for , is a monotonically increasing function in with and as . Therefore, for , is also monotonically increasing. The first-order derivative of is

(8)

Therefore, is differentiable for and

(9)

The second-order derivative can also be computed as follows:

(10)

This proves that is a concave function for . ∎

With Lemma 1, it is straightforward to see that the inter-service bandwidth allocation problem (2) is a convex optimization problem.

Proposition 1.

The inter-service bandwidth allocation problem (2) is an equality-constrained convex optimization problem.

Proof.

Because is concave, is concave and increasing, the composition is also a concave function. Then it is straightforward to see that the problem is a concave maximization problem with an equality constraint. ∎

Iv-B Distributed Algorithm for Inter-Service Bandwidth Allocation

We now proceed with solving the inter-service bandwidth allocation problem. While various centralized algorithms, such the Newton’s method, can efficiently solve the inter-service problem Eqn. (2) given the fact that it is a convex optimization problem, we prefer a distributed algorithm where individual FL service providers do not share their FL algorithm details and client-level information with each other or the network operator. This way reduces the communication overhead and preserves privacy of the client devices of individual FL service providers. Our algorithm is developed based on dual decomposition [26] as follows.

We first relax the total bandwidth constraint to be , and then form the Lagrangian by relaxing the coupling constraint:

(11)

where is the Lagrange multipier associated with the total bandwidth constraint, and is the Lagrangian to be maximized by service provider . Such dual decomposition results in each service provider solving, for a given , the following problem

(12)

where the solution is unique due to the strict concavity of according to Lemma 1. Specifically, to solve this maximization problem, we only need to solve its first-order condition,

(13)

which can be converted to solve using

(14)

Clearly, the left-hand side is an increasing function of for and thus, a simple bi-section algorithm can be devised to solve Eqn. (14) to obtain . Then plugging (hence ) into Eqn. (7) yields the optimal .

Let be the local dual function for service provider . Then the master dual problem is

(15)

Since is unique, it follows that the dual function is differentiable and the following gradient method can be used to iteratively update :

(16)

where is the iteration index, is a sufficiently small positive step-size, and denotes the projection onto the non-negative orthant. The dual variable will converge to the dual optimum as . Since the duality gap for the inter-service problem Eqn. (2) is zero and the solution to Eqn. (12) is unique, the primal variable will also converge to the primal optimal variable .

Algorithm 1 summarizes the distributed inter-service bandwidth allocation (DISBA) algorithm. The algorithm works iteratively. In each iteration, the operator sends the current to all service providers. Then, each service providers solves for using its local information and sends the result to the network operator. The network operator finally updates for the next iteration’s computation. The algorithm terminates until converges.

1:Input to Network Operator: total bandwidth , step size , convergence gap
2:Input to service provider : FL service parameters , channel gains and computing speed of its clients .
3:Initialization: set and equal to some non-negative value
4:while  do
5:     Network Operator sends to all service providers
6:     Each service provider obtains by solving Eqn. (12) using bi-section
7:     Each service provider sends to Network Operator
8:     Network Operator updates according to Eqn. (16)
9:     
10:end while
Algorithm 1 Distributed Inter-Service Bandwidth Allocation (DISBA)

V Selfish service providers

In the previous section, the distributed inter-service bandwidth allocation works by letting each FL service provider compute the allocated bandwidth given . This, however, creates an opportunity for a selfish service provider to mis-report its computation result that favors itself but reduces the system performance as a whole. In fact, even if the inter-service bandwidth allocation problem (2) is solved in a centralized way, similar selfish behavior may still undermine the efficient system operation as a selfish service provider may mis-report its FL service and client parameters (e.g., FL workload, client computing power and channel gains etc.), which will alter the frequency function used at the operator side. With a wrong frequency function , the operator will not be able to determine the true optimal bandwidth allocation.

In this section, we address the selfishness issue in inter-service bandwidth allocation by designing a multi-bid auction mechanism. This auction mechanism will ensure that the FL service providers are using their true FL frequency functions when making bandwidth bids.

V-a Multi-bid Auction

First, we describe the general rules of the multi-bid auction mechanism.

V-A1 Bidding

At the beginning of each bandwidth allocation period, each service provider submits a set of bids . For each , is a two-dimensional bid, where is the requested bandwidth and is the unit price that service provider is willing to pay to get the requested bandwidth . Without loss of generality, we assume that bids are sorted according to the price such that . Let denote the set of multi-bids that a service provider can submit.

V-A2 Bandwidth Allocation and Charges

Once the network operator collects all multi-bids from all service providers, denoted by , it computes and implements the inter-service bandwidth allocation . Each service provider then further allocates to its clients to perform FL. At the end of the period, the network operator determines the charges for all service providers depending on the allocated bandwidth and the realized FL performance.

Now, a couple of issues remain to be addressed. First, how to compute the bandwidth allocation and determine the charges given the service provider-submitted multi-bids? Second, do the service providers have incentives to truthfully report their valuations of the bandwidth? These are the questions to be addressed in the next subsections.

V-B Market Clearing Prices with Full Information

We first consider a simpler case where the service providers truthfully report the complete FL frequency function to the network operator. This analysis will provide us with insights on how to design bandwidth allocation and charging rules in the more difficult multi-bid auction case.

Recall that is the optimal FL frequency of service if it has bandwidth . Taking into account the price paid to obtain this bandwidth, the (net) utility of service provider is

(17)

Now, if the bandwidth were sold at the unit price , then service provider would buy bandwidth in order to maximize its utility. We call the bandwidth demand function (BDF), and it is easy to show that by checking the first-order condition of Eqn. (17). On the other hand, if service provider requires a bandwidth , then the service provider would pay a unit price no more than . We call the marginal valuation function (MVF).

V-B1 Market clearing price

With the complete information of and hence BDF for all service providers, the network operator can compute the market clearing price (MCP) so that . One can prove that the MCP is unique and optimal in the sense that it maximizes the total (equivalently, average) FL frequency.

Proposition 2.

The market clearing price is unique and maximizes the total FL frequency .

Proof.

According to Lemma 1, is an increasing function. Therefore, the BDF, which is the inverse function of is also increasing. As a result, there exists a unique solution to the increasing function .

To show that maximizes , consider the following maximization problem

(18)

This is clearly a convex optimization problem. Consider its Karush-Kuhn-Tucker conditions. In particular, the stationarity condition is

(19)

where is the Lagrangian multiplier associated with the constraint. The solution requires

(20)

Together with the feasibility constraint, this is equivalent to imposing a homogeneous market clearing price. ∎

Because is a monotonically decreasing function in , a bi-section algorithm can be easily designed to find the unique market clearing price so that .

V-B2 Fairness-adjusted costs

One major issue with the above pricing scheme is that it ignores fairness among the service providers: although it maximizes efficiency in terms of the average FL frequency according to Proposition 2, it is possible that the average FL frequency is maximized at an operating point where a few service providers are allocated with most of the bandwidth while some service providers obtain very little bandwidth. In this paper, we design and incorporate a fairness-adjusted charging scheme into the above pricing scheme. The payment of service provider now consists of two parts as follows:

  • The first part of the payment depends on the amount of bandwidth allocated to the service provider , and the unit price set by the operator. Specifically, this payment is .

  • The second part of the payment depends on the realized FL frequency of service provider . Specifically, service provider will be charged a fairness-adjusted cost of at the end of the period once has been realized, where is a tunable parameter.

With these payments, service provider ’s utility becomes

(21)

where . Comparing this new utility function Eqn. (21) with Eqn. (17), we make the following remarks. First, the fairness-adjusted cost essentially replaces with . The decision problem remains largely the same except that now we have a different benefit function. Second, in the new utility function Eqn. (17), given any allocated bandwidth , it is still in the service provider’s interest to perform the optimal client-level bandwidth allocation to maximize . This is because is an increasing function in for . Therefore, we can directly write as a function of the optimal FL frequency . Third, to charge the fairness-adjusted cost, the network operator does not need to know the exact function . Rather, it only has to know the realized FL frequency at the end of the current period. This is key to achieving fairness in multi-bid auction where FL service providers do not report the complete FL frequency function .

We call the modified bandwidth demand function (mBDF). Likewise, we call the modified marginal valuation function (mMVF). The network operator can similarly compute the modified market clearing price (mMCP) so that . Using a similar argument that proves Proposition 2, one can prove Proposition 3 as follows.

Proposition 3.

The mMCP is unique and the resulting bandwidth allocation maximizes .

Proof.

Because is a concave increasing function, is also concave and increasing. This further shows that is concave and increasing. Following similar arguments in the proof of Theorem 2 proves the bandwidth allocation as a result of mMCP maximizes . ∎

The parameter makes a tradeoff between efficiency and fairness. On the one hand, setting reduces the problem to the total FL frequency maximization problem. On the other hand, setting achieves proportional fairness among the service providers.

V-C Bandwidth Allocation and Charging Rules

Now, we are ready to describe the bandwidth allocation and charging rules in fairness-adjusted multi-bid auction. In this subsection, each service provider submits only a multi-bid instead of the complete FL frequency function . However, we will assume that the service providers are truthfully submitting their bids, which will be proven indeed true in the next subsection. Specifically, we say that a bid is truthful if the bandwidth demand and the price that FL service provider is willing to pay satisfy the mBDF because it reveals FL service provider ’s true valuation of bandwidth after taking into consideration the fairness-adjusted costs. A multi-bid is truthful if all bids are truthful.

Definition 1.

(Truthful Multi-bid) A multi-bid is truthful if , is such that .

The network operator does not know the BDF (and hence the mBDF) of each FL service provider because it does not have access to the FL frequency function . Nonetheless, suppose service provider submitted a truthful multi-bid , then the operator can compute a pseudo-mBDF using these bids to have some idea of the actual mBDF. Specifically, given the submitted multi-bid , a left-continuous step function can be used to describe the pseudo-mBDF as follows,

(22)

Essentially, the pseudo-mBDF uses to approximate the bandwidth demand for prices in the range . Similarly, the operator can also construct a pseudo-mMVF (pseudo-MVF), an approximation of service provider ’s actual mMVF using the submitted multi-bid, as follows,

(23)

In other words, the pseudo-mMVF uses to approximate the marginal value for bandwidth allocation in the range . We illustrate the pseudo-mBDF and pseudo-mMVF in Figure 3.

Fig. 3: Pseudo-mBDF and Pseudo-mMVF.

The aggregated pseudo-mBDF is the sum of pseudo-mBDFs of all FL service providers:

(24)

The pseudo-mMCP is the largest possible price so that the aggregated pseudo-mBDF exceeds the total available bandwidth, i.e.,

(25)

This implies that reducing the mMCP by just a little bit will result in the supply (i.e., the total available bandwidth ) being no greater than the demand. Because every individual pseudo-mBDF function is a step function with steps, the aggregated pseudo-mBDF is also a step function with at most steps. Therefore, the complexity of computing is at most .

Next, we describe our bandwidth allocation and charging rules. For notational convenience, we denote when this limit exists for a function and all .

V-C1 Bandwidth allocation

With the pseudo-mMCP , our bandwidth allocation rule is as follows: if FL service provider submits the multi-bid (and thereby declares the associated functions and ), then it receives bandwidth , with

(26)

In other words: (1) Each FL service provider receives an amount of bandwidth it asks for at the lowest price for which supply exceeds the pseudo-bandwidth demand. (2) If all bandwidth is not allocated yet, the surplus is shared among service providers. This share is done proportionally to as we notice that , and ensures that all bandwidth is allocated.

V-C2 Charging

Given the submitted multi-bids , each service provider is charged a payment as follows,

(27)

The first term on the right-hand side is based on the exclusion-compensation principle in second-price auction mechanisms [33]: service provider pays so as to cover the “social opportunity cost”, namely the loss of utility it imposes on all other service providers by its presence. The second term on the right-hand side is the fairness-adjusted cost, which is charged at the end of each period after the actual FL frequency is realized and observed.

Considering both the achieved FL frequency and the payment, FL service provider ’s utility is therefore

(28)

V-D Incentives of Truthful Reporting

In the previous subsection, we assumed that the every service provider truthfully submits its bid. Now, we prove that this assumption indeed “approximately” holds under the designed bandwidth allocation and charging rules.

We first study the individual rationality of the designed mechanism.

Definition 2.

A mechanism is said to be individual rational if no service provider can be worse off from participating in the auction than if it had declined to participate.

Proposition 4.

If FL service provider submits a truthful multi-bid , then .

Proof.

By Lemma 1, it is straightforward to see that has the following properties:

  • is differentiable and

  • is positive, non-increasing and continuous

  • , , , .

Therefore, satisfies [Assumption 1, [20]]. According to [Property 10, [20]], we have

(29)

which is equivalent to . Therefore, . ∎

Next, we show that truthful reporting is approximately incentive compatible, i.e., a service provider cannot do much better than simply reveal its true valuation.

Proposition 5.

Consider any truthful multi-bid for service provider , and any other multi-bid , , we have

(30)

where

(31)

with and .

Proof.

The proof follows [Proposition 2, [20]]. ∎

The above proposition shows that if service provider submits a truthful multi-bid , then every other multi-bid necessarily corresponds to an increase of utility no larger than . In other words, a truthful bidding brings service provider the best utility possible up to a gap . Importantly, this value does not depend on the number of other service providers or the multi-bids they submit. In the game theoretic terminology, the situation where all service providers submit truthful multi-bids is an ex post -Nash equilibrium, where , in the sense that no service provider could have improved its utility by more than if it had submitted a different multi-bid.

V-E An Uniform Multi-Bidding Example

To conclude the multi-bid auction mechanism design, we illustrate a uniform multi-bidding approach as an example of how to decide the multi-bid of an individual service provider. Instead of having the service provider submitting both prices and bandwidth requests, the operator can announce prices to service provider and let service provider report its requested bandwidth at these price points. This way, the operator has a better control over how the service providers make multi-bids to avoid multi-bids that may result in a large , which may reduce service provider’s incentives to truthfully report. Because the operator does not know the demand function of service provider

, a natural approach is to uniformly distribute these

prices in the range where is the largest price at which the service provider may still request a positive amount of bandwidth. Specifically,

(32)

Assume that the network operator has prior knowledge , , , and on the lower/upper bounds on the parameters, then can be upper bounded by

(33)

Thus, the operator can set the uniform prices as

(34)

Note that there is an intrinsic trade-off on the choice of . On the one hand, a large allows the pseudo-BDF and pseudo-MVF to more accurately reflect the true BDF and MVF at an increased complexity and signaling overhead. On the other hand, a smaller makes multi-biding easier but the discrepancy between the pseudo functions and the true functions will introduce a larger performance loss.

Vi Simulations

In this section, we conduct simulations to evaluate the performance of the proposed methods.

Vi-a Simulation Setup

The simulated wireless network adopts an OFDMA system with a total bandwidth of MHz. The period length is set as

. The number of clients of a FL service is drawn from a Normal distribution with mean 25. In every period, a new FL task may start following a scheduled plan, which is defined by a Poisson distribution with the mean interval

. By tuning , we adjust the FL service demand, and a smaller

will more likely lead to more concurrent FL services in a period as an FL service often lasts multiple periods. Each FL service has a pre-determined target training accuracy, and when the accuracy reaches the target, the FL service terminates and exits the wireless network. The clients’ wireless channel gain is modeled as independent free-space fading where the average path loss is from a Normal distribution with different mean and variance in different circumstances. The variance of the complex white Gaussian channel noise is set as

. For each client, the local training time is uniformly randomly drawn from s. We fix the global aggregation time to be

. We consider typical neural network sizes in the range of

Mbits. The upload transmission power is uniformly randomly between 0.05 and 0.15 W, and the download transmission power is uniformly randomly between 0.1 and 0.3 W.

Vi-B Convergence of DISBA in the Cooperative Case

We first illustrate the convergence behavior of DISBA in the cooperative FL service provider case in a representative period with 5 concurrent FL services. These services have 10, 12, 14, 16, 18 clients, respectively. In Figure 5, we show the computed FL frequency for each service provider before convergence. As Figure 5 shows, the bandwidth allocation quickly converges to the optimal allocation for a convergence tolerance gap . Eventually, the resulting FL frequencies of these FL services in this period are reported in Table I. We further show in Table II the computation time of DISBA for different values of the tolerance gap and step size. The time values are measured on a desktop computer with Intel Core i5-9400 2.9GHz GPU and 16GB memory.

Fig. 4: Frequency of Each Service before Convergence
Fig. 5: Bandwidth of Each Service before Convergence
Service Index Number of Clients Bandwidth Ratio Frequency
1 10 0.182 113
2 12 0.196 107
3 14 0.209 102.6
4 16 0.205 90.4
5 18 0.205 81.2
TABLE I: Resulted Bandwidth Allocation and Frequency of Each FL Service (Cooperative)
Tolerated Gap Step Size # of Iterations Time(s)
1e-3 0.1 131 0.332
1e-3 0.5 37 0.094
5e-3 0.1 72 0.169
5e-3 0.5 26 0.069
TABLE II: Computational Complexity for the Cooperative Provider Case

Vi-C Fairness-adjusted Multi-bid Auction in the Selfish Case

We perform fairness-adjusted multi-bid auction in the same representative period as in the last subsection, with and . The pseudo-mBDFs of the FL service providers and the aggregated pseudo-mBDF are illustrated in Figures 7 and 7, respectively. The pseudo-MCP is also shown in Figure 7. Table III reports the resulting bandwidth allocation and achieved FL frequency.

Fig. 6: Pseudo-mBDF of Individual FL Services
Fig. 7: Aggregated Pseudo-mBDF and Pseudo-MCP
Service Index Number of Clients Bandwidth Ratio Frequency
1 10 0.164 105.82
2 12 0.177 99.52
3 14 0.217 105.46
4 16 0.218 94.4
5 18 0.223 86.56
TABLE III: Optimal Bandwidth and Frequency of Each Service (Selfish)

As we briefly mentioned in Section V, there is a trade-off when selecting the number of bids . On the one hand, a larger increases the computational complexity for searching for the pseudo-MCP and determining the eventual bandwidth allocation. On the other hand, a larger improves the precision of the pseudo-MCP, thereby improving the allocation performance. In Figure 8, we demonstrate the overall performance by varying . As can be seen, as increases, the overall performance will increase while each FL service provider needs to submit more bids to the server which will cause transmission delays and data backlogs.

Fig. 8: Overall Performance in the Selfish Service Providers Case with Different

The parameter plays an important role in the selfish owner case, which makes a tradeoff between efficiency and fairness. With a larger , the whole system sees fairness as more important, and conversely, the whole system is more concerned with the overall efficiency. The market clearing price is reflected in Figure 10 and the overall utility is shown in Figure 10. With the increase of , the market clearing price and the total utility will decrease, which can be treated as a concession to achieve fairness between different FL services.

Fig. 9: Pseudo-MCP with Different
Fig. 10: Total Utility with Different

Vi-D Performance Comparison

In the following experiments, we compare our proposed algorithms with three benchmark algorithms.

  • Equal-Client (EC): Bandwidth is equally allocated to the clients. Therefore, each client gets a bandwidth of .

  • Equal-Service (ES): Bandwidth is equally allocated to the FL services. That is, each FL service gets a bandwidth of . However, each FL service provider still performs the optimal intra-service bandwidth allocation among its clients.

  • Proportional (PP): Each FL service obtains a bandwidth that is proportional to the number of its client. That is, FL service obtains a bandwidth of . This bandwidth is further allocated among its clients following the optimal intra-service bandwidth allocation.

We start by comparing the proposed algorithms with benchmarks in the per-period setting. The overall performance is shown in Figure 11. In this setting, there are five FL services with a random number of clients drawn from a Normal distribution with mean 20 and variance 10 and random channel conditions drawn from a Normal distribution with mean 85 and variance 15, and the result is averaged over 20 runs. As can be seen, our DISBA algorithm for the cooperative case (labeled as Coop) has the best performance, and the auction mechanism for the selfish case (labeled as Self) also outperforms the other benchmarks. Although ES and PP also perform the intra-service bandwidth allocation, the heterogeneity of the client number and channel conditions render them suboptimal.

Fig. 11: Per-period FL Performance of Different Algorithms

Because FL is a long-term process, we further investigate the long-term performance of the proposed algorithms. In the long-term setting, 10 FL services join the wireless network at different times controlled by the -parameterized Poisson process and the FL service will be removed from the wireless network when its test accuracy has converged. Although the convergence of FL is complexly affected by many factors including the adopted FL algorithm, dataset and the selected clients, we assume that each of these 10 FL services require 2000 FL rounds, which is a typical value observed in the literature [11], to reach convergence in order to provide a meaningful comparison of the algorithms in a controlled environment. Whenever a FL service has been run for 2000 rounds, it exits the system.

Figure 12 illustrates the average duration (in terms of the number of periods) of all FL services by running different algorithms for , where the client number of a FL service is drawn from a Normal distribution with mean 25 and variance 15 and the channel condition of a FL service is drawn from a Normal distribution with mean 85 and variance 15. The results are averaged over 20 runs. We can see that the proposed algorithms achieve the smallest average duration compared to the benchmarks, confirming their fast FL convergence even in the long-run.

Fig. 12: Average Duration Period of FL Services

Next, we study the impact of the client number heterogeneity (which reflects the FL service size heterogeneity) on the performance of different algorithms. To this end, the client number of a FL service is drawn from a Normal distribution with mean 25 and we change the variance between 0 and 15 to adjust the heterogeneity degree. The result is shown in Figure 13

: as the variance increases (i.e. a higher degree of heterogeneity), the mean of the average duration decreases, while the standard deviation of average duration increases. This is understandable because a higher degree of heterogeneity causes wireless bandwidth to be more unevenly distributed among the FL services, thereby degrading the overall FL performance. Notably, the performance gain of our proposed algorithms increases as the variance increases, which demonstrates the superior ability of our algorithms to handle the heterogeneous case.

Fig. 13: Impact of Client Number Heterogeneity

Furthermore, we also investigate the impact of the channel condition heterogeneity on the FL performance. In these simulations, the average channel condition of a FL service is drawn from a Normal distribution with mean 85 and we change the variance between 0 and 15 to adjust the heterogeneity degree. The channel conditions of clients of this FL are further drawn from a Normal distribution with a mean being the instantiated average channel condition. In Figure 14, we observe a similar phenomenon as in Figure 13, which further confirms the advantage of adopting our proposed algorithms.

Fig. 14: Impact of the Channel Condition Heterogeneity

Finally, we study the influence of the mean arrival interval parameter on the resulting average FL duration. in Figure 15, with the increasing of , the average duration of the FL services decreases. This is because when is small, many FL services pile up and co-exist in the wireless network, thereby reducing the wireless bandwidth an individual FL service can receive.

Fig. 15: Average Duration With Varying

Vii Conclusion

This paper studied a bandwidth allocation problem for multiple FL services in a wireless network, which has not been well studied in the literature. The considered problem consists of two interconnected subproblems, intra-service resource allocation, and inter-service resource allocation. By solving these problems, we optimally allocate bandwidth resources to multiple FL services and their corresponding clients to speed up the training process and meanwhile guarantee fairness for both cooperative and selfish FL service providers cases. Our method has shown superior performance compared to the benchmarks. However, there are several future research works that can be done to extend the impact of this work. For example, this paper takes FL frequency as the key metric to be optimized, but the true performance of FL is affected by the dataset, federated optimization algorithm, and many others. In addition, when a client can simultaneously participate in multiple FL services, resource allocation has to consider both the wireless bandwidth and client computing resources.

References

  • [1] A. F. Aji and K. Heafield (2017) Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021. Cited by: §II.
  • [2] M. Chen, H. V. Poor, W. Saad, and S. Cui (2020) Convergence time optimization for federated learning over wireless networks. arXiv preprint arXiv:2001.07845. Cited by: §II.
  • [3] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui (2020) A joint learning and communications framework for federated learning over wireless networks. IEEE Transactions on Wireless Communications. Cited by: §I.
  • [4] T. Chen, G. Giannakis, T. Sun, and W. Yin (2018) LAG: lazily aggregated gradient for communication-efficient distributed learning. In Advances in Neural Information Processing Systems, pp. 5050–5060. Cited by: §II.
  • [5] S. Feng, D. Niyato, P. Wang, D. I. Kim, and Y. Liang (2019) Joint service pricing and cooperative relay communication for federated learning. In 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 815–820. Cited by: §II.
  • [6] C. Fung, C. J. Yoon, and I. Beschastnikh (2018) Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866. Cited by: §II.
  • [7] F. Haddadpour and M. Mahdavi (2019) On the convergence of local descent methods in federated learning. arXiv preprint arXiv:1910.14425. Cited by: §II.
  • [8] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y. Liang, and D. I. Kim (2019) Incentive design for efficient federated learning in mobile networks: a contract theory approach. In 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS), pp. 1–5. Cited by: §II.
  • [9] S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh (2019) Scaffold: stochastic controlled averaging for on-device federated learning. arXiv preprint arXiv:1910.06378. Cited by: §II.
  • [10] H. Kim, J. Park, M. Bennis, and S. Kim (2019) Blockchained on-device federated learning. IEEE Communications Letters 24 (6), pp. 1279–1283. Cited by: §II.
  • [11] J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, and D. Bacon (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §I, §VI-D.
  • [12] T. H. T. Le, N. H. Tran, Y. K. Tun, Z. Han, and C. S. Hong (2020) Auction based incentive design for efficient federated learning in cellular wireless networks. In 2020 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6. Cited by: §II.
  • [13] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 37 (3), pp. 50–60. Cited by: §II.
  • [14] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith (2018) Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127. Cited by: §II.
  • [15] T. Li, M. Sanjabi, A. Beirami, and V. Smith (2019) Fair resource allocation in federated learning. arXiv preprint arXiv:1905.10497. Cited by: §II.
  • [16] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang (2019) On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189. Cited by: §II.
  • [17] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y. Liang, Q. Yang, D. Niyato, and C. Miao (2020) Federated learning in mobile edge networks: a comprehensive survey. IEEE Communications Surveys & Tutorials. Cited by: §II.
  • [18] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally (2017) Deep gradient compression: reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887. Cited by: §II.
  • [19] L. Liu, J. Zhang, S. Song, and K. B. Letaief (2020) Client-edge-cloud hierarchical federated learning. In ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §II.
  • [20] P. Maille and B. Tuffin (2004) Multibid auctions for bandwidth allocation in communication networks. In IEEE INFOCOM 2004, Vol. 1. Cited by: §II, §V-D, §V-D.
  • [21] L. Massoulie and J. Roberts (1999) Bandwidth sharing: objectives and algorithms. In IEEE INFOCOM’99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No. 99CH36320), Vol. 3, pp. 1395–1403. Cited by: §II, §IV.
  • [22] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §I.
  • [23] M. Mohri, G. Sivek, and A. T. Suresh (2019) Agnostic federated learning. arXiv preprint arXiv:1902.00146. Cited by: §II.
  • [24] M. N. Nguyen, N. H. Tran, Y. K. Tun, Z. Han, and C. S. Hong (2020) Toward multiple federated learning services resource sharing in mobile edge networks. arXiv preprint arXiv:2011.12469. Cited by: §II.
  • [25] T. Nishio and R. Yonetani (2019) Client selection for federated learning with heterogeneous resources in mobile edge. In ICC 2019-2019 IEEE International Conference on Communications (ICC), pp. 1–7. Cited by: §II.
  • [26] D. P. Palomar and M. Chiang (2006) A tutorial on decomposition methods for network utility maximization. IEEE Journal on Selected Areas in Communications 24 (8), pp. 1439–1451. Cited by: §II, §IV-B.
  • [27] J. Park, S. Samarakoon, M. Bennis, and M. Debbah (2019) Wireless network intelligence at the edge. Proceedings of the IEEE 107 (11), pp. 2204–2239. Cited by: §I.
  • [28] Y. Sarikaya and O. Ercetin (2019) Motivating workers in federated learning: a stackelberg game perspective. IEEE Networking Letters 2 (1), pp. 23–27. Cited by: §II.
  • [29] F. Sattler, S. Wiedemann, K. Muller, and W. Samek (2019) Robust and communication-efficient federated learning from non-iid data. IEEE transactions on neural networks and learning systems. Cited by: §II.
  • [30] W. Shi, S. Zhou, and Z. Niu (2020) Device scheduling with fast convergence for wireless federated learning. In ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §II.
  • [31] V. Smith, C. Chiang, M. Sanjabi, and A. S. Talwalkar (2017) Federated multi-task learning. Advances in neural information processing systems 30, pp. 4424–4434. Cited by: §II.
  • [32] N. H. Tran, W. Bao, A. Zomaya, N. M. NH, and C. S. Hong (2019) Federated learning over wireless networks: optimization model design and analysis. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 1387–1395. Cited by: §I, §II.
  • [33] W. Vickrey (1961) Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance 16 (1), pp. 8–37. Cited by: §V-C2.
  • [34] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 37 (6), pp. 1205–1221. Cited by: §II.
  • [35] J. Xu and H. Wang (2020) Client selection and bandwidth allocation in wireless federated learning networks: a long-term perspective. arXiv preprint arXiv:2004.04314. Cited by: §II.
  • [36] K. Yang, T. Jiang, Y. Shi, and Z. Ding (2020) Federated learning via over-the-air computation. IEEE Transactions on Wireless Communications 19 (3), pp. 2022–2035. Cited by: §II.
  • [37] Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–19. Cited by: §II.
  • [38] Q. Zeng, Y. Du, K. Huang, and K. K. Leung (2020) Energy-efficient radio resource allocation for federated edge learning. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1–6. Cited by: §II.
  • [39] Y. Zhan, P. Li, and S. Guo (2020)

    Experience-driven computational resource allocation of federated learning by deep reinforcement learning

    .
    In Proc. of IPDPS, Cited by: §II.
  • [40] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. Cited by: §II.