FMore: An Incentive Scheme of Multi-dimensional Auction for Federated Learning in MEC

02/22/2020 ∙ by Rongfei Zeng, et al. ∙ Northeastern University Hong Kong Baptist University 0

Promising federated learning coupled with Mobile Edge Computing (MEC) is considered as one of the most promising solutions to the AI-driven service provision. Plenty of studies focus on federated learning from the performance and security aspects, but they neglect the incentive mechanism. In MEC, edge nodes would not like to voluntarily participate in learning, and they differ in the provision of multi-dimensional resources, both of which might deteriorate the performance of federated learning. Also, lightweight schemes appeal to edge nodes in MEC. These features require the incentive mechanism to be well designed for MEC. In this paper, we present an incentive mechanism FMore with multi-dimensional procurement auction of K winners. Our proposal FMore not only is lightweight and incentive compatible, but also encourages more high-quality edge nodes with low cost to participate in learning and eventually improve the performance of federated learning. We also present theoretical results of Nash equilibrium strategy to edge nodes and employ the expected utility theory to provide guidance to the aggregator. Both extensive simulations and real-world experiments demonstrate that the proposed scheme can effectively reduce the training rounds and drastically improve the model accuracy for challenging AI tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Mobile Edge Computing (MEC) [1],[2]

, considered as a promising architecture for future networks, enables edge nodes to locally collect and process various data with the remote cloud coordination, which especially appeals to the Internet of Things (IoT), social networking, 5G, etc. In these scenarios, huge amounts of data are generated and further employed by machine learning to provide AI-driven services, such as classification, recommendation, and prediction. However, the proliferation of data will gradually phase-out the traditional paradigm of centrally processing all the data at a remote cloud. Fortunately, edge nodes equipped with powerful computing capability, sufficient Flash storage, etc., accelerate the adoption of local data processing. Recent studies have shown that more than 90% of data will be stored and processed locally in the near future

[3]. The salient features of MEC attract not only researchers but also investors from the capital market. An analyst from Goldman Sachs believes that MEC will change the world we live in [4].

MEC might also boost the widespread use of federated learning. Federated learning [5], an emerging division of machine learning, allows collaboratively training a shared model with distributed data, without the need for centralized storage at a cloud. Moreover, federated learning endeavors to address the privacy issue of users who would hesitate to upload their private data to a remote cloud. These two prominent features fascinate the industry. Google applies federated learning to the AI-enabled application Gboard for mobile users [6]

, and the open-source framework FATE of federated learning was published by WeBank in April 2019. Some other instances include FedVision, emoji prediction

[7], anti-money laundering with multiple banks [8], etc. Finally, federated learning coupled with MEC is considered as one of the most promising solutions to the AI-driven service provision.

A plethora of studies concentrate on federated learning [9], [10], which has already become a hot topic in both academia and industry in recent years. Starting from the impressive work in [5], researchers focus on the performance improvement of federated learning [11], [12]. They studied the comparison of synchronous and asynchronous aggregations [13], the compression of information exchanged in the global aggregation [14], [15], the control algorithm to trade off local updates and global aggregations [16], etc. The security and privacy issue of federated learning is another popular topic [17], [18]

. For example, the chained anomaly detection scheme

[19], secure global aggregation algorithms [20], and the privacy-preserving mechanism [21] have been proposed in the past two years. In these studies, a critical and optimistic assumption is that voluntary participation of local node is required, without any returns, which does not hold in realistic scenarios of MEC.

The incentive mechanism is essential and crucial to federated learning in MEC. Since learning operations at edge nodes will consume various resources, such as battery, bandwidth, and computation power, rational edge nodes would not like to get involved in this voluntary collaboration, without any compensation [22]. Moreover, although federated learning does not need edge nodes to upload their raw data to the remote cloud for the privacy concern, smart malicious attackers may still infer the source information from model parameters [23]. Some potential threats aggravate the reluctance of participation for edge nodes. For service providers, the performance of federated learning is negatively impacted, without sufficient participation of high-quality nodes [24]. In sum, the incentive mechanism is indispensable for federated learning.

Unfortunately, previous incentive mechanisms in other scenarios cannot be directly applied to federated learning in MEC. Most importantly, there exists a widening resource gap between different edge nodes [3], and this gap might deteriorate the performance of federated learning. Consequently, the proposed incentive scheme should encourage more participation of high-quality nodes and choose them to eventually improve the performance of federated learning. Furthermore, resources provided by edge nodes are multi-dimensional and dynamic, and edge nodes are selected in each game. Besides, the proposed scheme should not introduce much computational cost and communication overhead since these resources are constrained at some nodes. In short, these prominent features should be considered seriously in the design of incentive mechanism in MEC.

In this paper, we study the incentive problem to motivate more high-quality edge nodes with low cost to participate in collaborative learning and eventually improve the performance of federated learning in MEC. To achieve this goal, we borrow and extend the model of multi-dimensional procurement auction proposed by Che in [25]. The aggregator broadcasts bid asks with the selection criteria before participators separately submit bids containing resource qualities and the expected payment. Then, the aggregator chooses winners according to the sorted scores. We provide each node with a unique Nash equilibrium strategy to maximize the expected profit, and give guidance to the aggregator to obtain the expected resources, both of which are among the most challenging tasks in the design of incentive scheme. To demonstrate the performance of our proposal, we implement a smart simulator and test with multiple datasets and learning models. We also deploy a real system with 32 nodes.

The main contributions of this paper are three-fold.

  1. We present a multi-dimensional incentive framework FMore for federated learning. FMore covers a series of scoring functions and is Pareto efficient for some specific cases. It uses game theory to derive optimal strategies for the edge participators, and leverages the expected utility theory to guide the aggregator to effectively obtain the desired resources.

  2. The proposed scheme is lightweight and Incentive Compatible (IC). The computational overhead and communication costs are negligible in the realistic deployment, and IC indicates that it is useless for edge nodes to declare false qualities in FMore.

  3. The results of extensive simulations show that FMore is able to speed up federated training via reducing training rounds by 51.3% on average and improve the model accuracy by 28% for the tested CNN and LSTM models. Real implementations with 31 edge nodes and one aggregator also witness the improvement of model accuracy by 44.9% and the reduction of training time by 38.4%.

The remainder of this paper is organized as follows. Section II introduces the system model and some preliminaries. In Section III, we present the proposed incentive scheme FMore with the multi-dimensional auction, followed by some theoretical results in Section IV. Extensive performance evaluations are presented in Section V. Section VI surveys related work, and Section VII concludes the whole paper.

Ii System Model and Preliminaries

Ii-a System Model

In this paper, we consider a typical MEC network, where edge nodes such as micro servers, home gateways, laptops, and sensors, are connected to a remote cloud. The number of potential edge nodes is large, and various resources of each node are dynamic. Edge nodes have constrained resources for federated learning since they have other important tasks. In Fig. 1, an aggregator exists in the remote cloud to orchestrate federated learning with distributed edge nodes. To obtain a well-trained global model, the aggregator has motivations to pay for the recruited edge nodes. Edge nodes demonstrate no intention to upload their private data to the remote cloud, and would not like to offer their dynamic resources to the aggregator unless they are paid for their contributions. Both the aggregator and edge nodes are assumed to obey the contracts they negotiate, and edge nodes are also assumed to be trustable that they will provide what they bid. Many techniques such as blacklist can be applied to the defaulter. Similar to [25], [26], we also adopt the independent private value model for edge nodes (sellers) and aggregator (buyer). Finally, some threats, e.g. collusion attacks and false name attacks, are not considered in this paper. In Table I, we summarize some notations frequently used through this paper.

Fig. 1: The system model of mobile edge computing

Ii-B Preliminaries

Federated learning is designed to train a shared global model that minimizes the global loss function

in a cooperative and distributed manner [5]. Formally, the goal of federated learning is to find model parameters , which satisfy

(1)

Typically, the training process takes a number of rounds to converge. In each round, the aggregator randomly chooses nodes from all the edge nodes, and then distributes the global parameter , where is the iteration index, to those selected nodes. Based on the global parameter , the chosen node trains the shared model with its local data, i.e.,

(2)

where the parameter is the step size. After the local training, these nodes upload their model parameters to the aggregator, and the aggregator generates global parameters of as

(3)

where is the data size of node . Then, the aggregator will initialize the next round of training by randomly choosing nodes. When the accuracy of global model satisfies the requirement or the training time exceeds the predefined threshold, this training process terminates. Briefly, federated learning consists of many iterations of global aggregation and local training in Fig. 2(a). Also, model accuracy and training rounds are two critical performance metrics. Finally, we should mention that our proposed scheme can be applied to this classic federated learning [5] as well as other paradigms [27].

Notation Description
total number of edge nodes and size of winner set
number of resource types
winner set and edge node set
quality of th resource of user

quality vector of user

declared quality of th resource
payment of user
private cost parameter of user
cumulative distribution function of
probability density function of
scoring function given by the aggregator
cost function
profit functions of user and the aggregator
utility function of the aggregator
Nash equilibrium strategy of node
Nash equilibrium strategies except node
probability of edge node being selected
auction with winners
TABLE I: The notations frequently used in this paper

Iii FMore: The Proposed Incentive Scheme

In this section, we present the incentive mechanism FMore based on the multi-dimensional procurement auction and detail the design rationale for each step in FMore. To explicitly illustrate our proposal, we also describe a walk-through example with five edge nodes. Further discussions are provided for specific scenarios as well.

Fig. 2: The procedure of RandFL and FMore

Iii-a The Description of FMore

The proposed incentive framework FMore consists of six steps, i.e., bid ask, bid collection, winner determination, task assignment, local training, and global aggregation, in each round of training, as shown in Fig. 2(b). The latter three steps are similar to the classic federated learning in [5] (referred to as RandFL). The computational cost and communication overhead are only introduced in the former three steps, which should be considered seriously in the design of FMore.

(1) Bid Ask: In each round of federated learning, the aggregator initially broadcasts a scoring rule , where is the quality vector of resources, and is the expected payment that the edge node bids with the provision of . The resources considered in FMore include local data, computation capability, bandwidth, CPU cycle, etc. In addition, the aggregator leverages the scoring function to choose participators. We formulate as a quasi-linear function

(4)

where subscript is the node index. Comparing with the size of model parameters, we can neglect communication overhead in this step. This is because only a score function and simple requirements are delivered from the aggregator to edge nodes, and the corresponding data size is just a few bytes.

Many scoring functions can be included in FMore. For instance, can be set as the utility function of the aggregator. Some classic utility functions include the perfect substitution utility function, the perfect complementary function, and the general Cobb-Douglas function, which are separately denoted as

where are coefficients. We may add the constraint , but it is not imperative. In the above functions, the additive form is preferred to perfect substitution resources such as GPU and CPU, while the perfect complementary form might be the best choice for scenarios where both bandwidth and computing power are considered simultaneously.

(2) Bid Collection: When edge nodes receive a bid ask with scoring function , they separately base on their available resources to decide whether to bid or not. According to the private value model in [22], edge node has a private cost parameter , and then it can get the private cost function . Note that the cost function is an increasing function of . In this paper, we assume single crossing conditions , , and , which mean the marginal cost increases with the parameter . Before bidding, each node learns its private cost parameter and gets the Cumulative Distribution Function (CDF) from the historical data. It is assumed that is independently and identically distributed over the range of . There also exists a positive and continuously differentiable density function .

How much to bid? As a rational edge node, node needs to choose and to maximize the following profit function

(5)

In this optimization problem, one of the constraints is Individual Rationality (IR), which implies that any node will not participate in federated learning when its profit is negative. In other words, . Let denote that node will not join in the training. In the next section, we will present the theoretical results of optimal strategy for each edge node to compete with others.

When edge node submits its bid to the aggregator, the technique of sealed-bid auction is adopted, indicating that this bid is only known to the aggregator and node . The sealed-bid auction is quite suitable for network scenarios and can be easily implemented by FMore.

(3) Winner Determination: When the aggregator collects sufficient bids or the timer with a predefined threshold expires, the aggregator finishes the bid collection process. Then, it starts to determine the winners. In this paper, we extend the classic multi-dimensional auction to multiple winners. In the winner determination, the aggregator has to maximize the profit function as

(6)

where is the winner set, and is the utility function of . Similar to the literature [25], we also assume that , , , and . Moreover, the constraint of IR, i.e., , should be satisfied for the rational aggregator as well.

In FMore, the aggregator chooses edge nodes with the best scores to construct the winner set . The parameter

is decided by the aggregator and can be estimated with historical data. Besides the winner determination, the aggregator has to perform the payment allocation. Both the first-price auction and the second-price auction can be applied to FMore. We use the first-price auction for simplicity in this paper.

(4) Task Assignment, Local Training and Global Aggregation: The last three steps are similar to the classic federated learning RandFL, where winners locally train the model with declared resources, according to Eq. (2). After finishing local updates, they submit the result of model parameters to the aggregator and then obtain the corresponding payment. If any edge node does not comply with the contract, it will be put into the blacklist by the aggregator.

The pseudocode of FMore is given in Algorithm 1. Compared with RandFL, our scheme FMore just adds one round of information exchange between edge nodes and the aggregator, and the total communication cost is a linear function of . The computational cost contains the calculation of optimal strategy at each edge node and the sorting operation at the aggregator. The time complexity of optimal strategy computation is linear, which can be found in Section IV. Thus, our proposed scheme is lightweight, which is much more appropriate for MEC.

Iii-B A Walk-Through Example

In Fig. 3, we present a walk-through example with five edge nodes and consider two types of resources, i.e., training data and bandwidth. The data size and bandwidth are separately over the range of and . The public scoring function is set as , where coefficients and are to balance different types of resources and both set to 0.5. In addition, , and are normalized by the technique of min-max normalization to compute the scores for simplicity. It should be noted that the strategy for each node might not be optimal in this example, and we will provide the Nash equilibrium strategy to a rational node in Section IV.

In the first round of training, these five edge nodes individually submit the bids as , , , , and . After collecting all the bids, the aggregator computes the scores, sorts them in the descending order, and chooses three winners ( and ) with top three scores. The payments for winners are 0.175, 0.221, and 0.300 in the first-price auction. The aggregator distributes the global parameters to these three winners for their local learnings. When winners finish local trainings, the aggregator performs the global aggregation as Eq. (3) and then terminates this round of training.

In the second round, these nodes might change their bids as , , , , and . We take node as an example to illustrate the dynamic provision of resources. The reasons why node changes its bid include but not limit to: (1) the available resources are changed; (2) the private cost parameter is reestimated and revised; and (3) node trades revenue for others such as reputation. Any unknown reason leads to that node submits the bid as and it is ranked as the first this time. The selection of winners is similar to the last round, and the set is constructed. These three nodes are responsible for local training in this round, and payments are 0.16, 0.15, and 0.3 in the first-price auction. Similarly, we perform the processes of bid ask, bid collection, winner determination, task assignment, local training, and global aggregation iteratively until the model accuracy satisfies the requirement.

Fig. 3: The example of FMore with five nodes

Iii-C Discussion

In MEC, the resource provision is dynamic and distinct, and there exist some nodes that have sufficient local data with high quality. However, the situation is changed in some other scenarios where the resources of participators are relatively stable and the local data size is tremendously small for each node. In such catastrophic cases, selecting fixed nodes with limited data and inferior-quality resources may negatively affect the performance of federated learning. For instance, the overfitting problem is frequently encountered in such scenarios. To tackle these problems, we extend FMore for widespread application.

In the winner determination phase of FMore, top-score nodes are definitely added to the winner set. Now, we revise this step as follows: nodes in the descending order of scores will be individually added to the winner set with probability until nodes are chosen in the set . This can be achieved by changing Line 11 of Algorithm 1. We name this extension as -FMore, and FMore is a special case of -FMore with . In -FMore, we can construct the winner set of nodes with probability . It can be easily verified that the probability approaches to one with many appropriate parameters. In addition, the parameter should be carefully set to balance the model accuracy and the training speed in extreme scenarios, since small might deteriorate FMore into the classic federated learning RandFL. In Section V, we demonstrate the impacts of parameter on the performance of federated learning.

Iv Optimal Strategy and Utility Analysis

In this section, we first analyze Nash equilibrium strategies for edge nodes and present closed-form theoretical results. The impacts of parameters and are also studied here. Then, we provide guidance to the aggregator to get the expected resources. Finally, we also prove that FMore is Pareto efficient and IC.

In FMore, it is the key to discover the Nash equilibrium strategy for an edge node. The Nash equilibrium strategy indicates that it is the optimal choice for node , no matter what strategies other players choose. We present the definition of Nash Equilibrium in our auction as follows:

Definition 1.

Nash Equilibrium: The strategy set of all the participators is a Nash Equilibrium, if any edge node has

where , and is any of the strategies for node .

The Nash equilibrium strategy for edge node consists of two components, i.e., the qualities of resources and the expected payment. For the former part, Che’s Theorem 1 has already found that the choice of quality is only relevant to the private cost parameter . In other words, Che’s Theorem 1 shows that quality can be independently chosen. Then, we provide the unique Nash equilibrium strategy in the one-winner game in Che’s Theorem 2 and extend the theoretical results to the two-winners game in Proposition 1. Since the proofs of Che’s Theorem 1 and Che’s Theorem 2 can be referred to in [25], we omit here for space limitation. We also omit the subscript in the following analysis for simplicity.

Che’s Theorem 1.

In the first-price auction with winners, the quality of resource is chosen at for all , where .

Che’s Theorem 2.

The unique Nash equilibrium strategy for each node in the first-price auction with one winner is given as

Proposition 1.

The unique Nash equilibrium for each node in the first-price auction with two winners can be denoted as

The proof of Proposition 1 is similar to Che’s Theorem 2. The only difference is that the probability is computed as the sum of winning probability with the first score and winning probability with the second score. Interested readers can refer to [25] for details.

Theorem 1.

The unique Nash equilibrium of each node in the first-price auction with winners can be denoted as

(7)
(8)
(9)
(10)
Proof.

The expected profit of edge node is denoted as

(11)

We define the maximum score and as

Since the CDF of is , we can use the Envelope theorem to get the CDF of as . We also define . Then, the expected profit can be represented by

To get the maximum , we have

and . Define . Then, we can easily get the first order linear differential equation

(12)

We can solve this equation to get a unique with the initial condition . Then, we can get

Since the quality holds at the equilibrium point, we can get the equilibrium as

Thus, we obtain the conclusion. ∎

It should be noted that the closed-form of is extremely complicated. We can use classic numerical methods, e.g., the Euler method and the Runge-Kutte method, to get the result of . Here, the Euler method can be described as

(13)
(14)

where is the step size. Eq. (12) can be represented like Eq. (13). Then, we can get with the complexity of linear time.

Theorem 2.

In a game with winners, the expected profit of edge node is a decreasing function of the total node number .

Proof.

From the proof of Theorem 1, we can easily have

Since CDF , we can have that the profit function is a decreasing function of , when . ∎

Theorem 2 conforms to the fact that when more edge nodes join in the game, the competition becomes more severe, and the profit for each node is correspondingly reduced. Hence, the increase of will benefit the aggregator, which is the reason why the incentive mechanism is significant. Next, we will demonstrate that the profit of the participator is increased by the design of multiple winners as well.

Theorem 3.

In a game with nodes, the expected profit of each edge node is an increasing function of winner number .

Proof.

The equilibrium strategies for th and th winner are separately denoted as and . We also use to denote the profit in the game having winners. Then, we have

It can be seen that

Since the strategy might not be the Nash equilibrium strategy for the game with winners, we have

Thus, we can get that and is an increasing function of . ∎

Proposition 2.

Suppose that all the participators have the same private value and we must select winners from nodes, then adding probability to each node will not impact its winning probability.

It is an ideal model assumption in Proposition 2, the proof of which is given in Appendix A. In realistic scenarios, the private value is not identical for most nodes. For the node selected by FMore with high probability, -FMore will negatively impact its winning probability. On the contrary, the winning probability of a low-score node will be improved by our -FMore. More nodes are involved by -FMore, and the critical parameter should be carefully chosen. In sum, -FMore improves the performance of federated learning due to the increased data diversity in extreme cases.

Proposition 3.

In a game with multi-dimensional resources and winners, the choice of is independent with . The quality can be separately computed according to Che’s Theorem 1.

For multi-dimensional resources, the quality combination is computed by maximizing , and the result contains a set of quality combinations. The proof of Proposition 3 is presented in Appendix B. For the aggregator, it can set the weight of in the function to get what it needs. In the following proposition, we will provide guidance to the aggregator to get the expected resources in an efficient market.

Proposition 4.

When we consider the general Cobb-Douglas utility function and the additive cost function , where and , the aggregator can adjust the parameters to get different proportion of resources. That is

where is the estimation of cost coefficient for according to the historical data in the public and efficient market.

Proposition 4 can be proved with the expected utility theory that the general Cobb-Douglas utility function is maximized with the cost constraint. The proof is given in Appendix C. In this way, the aggregator is able to get the expected resources from a macro view.

Theorem 4.

When the utility function of the aggregator is equal to and has the additive form, our proposed FMore is Pareto efficient.

Proof.

Pareto efficiency is equivalent to that the social surplus is maximized. The social surplus is given as

Since the quality of each winner is chosen as , we can directly arrive at the conclusion. ∎

Theorem 5.

FMore is Incentive Compatible (IC).

Proof.

The payment is computed by maximizing the expected profit in Eq. (11) with the combination of quality , and the corresponding score is . If a node declares the malicious quality and payment , we can have

We can find , which indicates that the declared malicious quality will negatively impact the winning probability. Thus, FMore is IC. ∎

V Performance Evaluation

In this section, we demonstrate the performance of FMore via both simulations and real-world experiments. A smart simulator is developed to comprehensively analyze the performance with a large number of edge nodes, while we also present the performance improvement with dynamic multi-dimensional resources in a realistic scenario.

(a) The accuracy and loss for CNN with MNIST-O
(b) The accuracy and loss for CNN with MNIST-F
(c) The accuracy and loss for CNN with CIFAR-10
(d) The accuracy and loss for LSTM with HPNews

V-a Setup

We design a smart simulator based on Tensorflow to study the performance of FMore. In this simulator, we consider four classic datasets, i.e., MNIST (referred to as MNIST-O), Fashion MNIST (referred to as MNIST-F), CIFAR-10, and news category dataset (referred to as HPNews). The former three are collections of pictures. For instance, CIFAR-10 contains 60,000 color images of 10 different types of objects. The last news category dataset HPNews is a collection of 200,000 news headlines of HuffPost from 2012 to 2018. The underlying models include two CNNs (one

111The CNN for MNIST has 8 layers with the following structure: Convolutional Convolutional MaxPool Dropout Flatten Full connected Dropout Fully connected Softmax. This configuration is similar to the model in [5]. for MNIST-O and MNIST-F, the other222The CNN for CIFAR-10 has 11 layers with the following structure: Convolutional Dropout MaxPool Convolutional Dropout MaxPool Flatten Dropout Full connected Dropout Fully connected Softmax. for CIFAR-10) and LSTM. Similar to [5], non-IID data distribution of sample data is studied across different edge nodes. FMore, the classic federated learning (referred to as RandFL), and federated learning with fixed node selection (referred to as FixedFL) are all implemented here.

Our simulator consists of one aggregator and 100 participators (). In each round of training, winners are selected to join in the cooperative training. The resources considered in the simulator are two-dimensional, i.e., data size and data category . The participator computes the Nash equilibrium strategy via the Euler method. For the aggregator, the score function is , where is set to the data size, is the proportion of data category over the range of , and the coefficient is set to 25. The determination of winners is based on the first-score sealed auction. Ties are resolved by the flip of a coin. The default parameters are adopted throughout the entire simulations unless explicitly specified.

We also implement FMore in a realistic HPC Cluster with one aggregator and 31 nodes. The specifications of these 32 nodes include Intel Core i7 CPU, 8GB DDR, 1T HDD+256G SSD, 1Gbps Ethernet, and Linux Ubuntu 18.04 OS. All these nodes are connected by a switch. The resources considered here include computing power, bandwidth, and data size. The scoring function is , where the coefficients , and are separately set as 0.4, 0.3, 0.3 for computing power, bandwidth and data size. The computing power is tuned by the number of CPU cores in the experiments. The data size is allocated over the range of for the accuracy test. Nodes randomly choose different quantities of resources in each round of training. All the results are the average of five experiments for both simulations and real-world experiments in this section.

V-B Simulations

The goals of FMore include motivating more nodes with high quality and low cost to participate in cooperative federated learning and improving the performance of global model. In essence, performance improvement is our final goal for federated learning. Here, we mainly discuss the results for the performance improvement of FMore from a variety of aspects.

(1) Model Accuracy and Loss: From Fig. 3(a) to Fig. 3(d), we can see that the model accuracy of FMore is larger than those of RandFL and FixFL after 20 rounds of training. When the underlying model is complicated or the training task is challenging, the accuracy gap between FMore and the other two is large, since training CIFAR-10 and HPnews needs more data with high quality and low cost. At the 20th round of learning in LSTM, the accuracy of FMore is 60.4%, while FixFL is only 40.6%. Another contribution is that FMore accelerates the training speed by 50% for MNIST-O (accuracy 95%), 42% for MNIST-F (accuracy 84%), 45% for CIFAR-10 (accuracy 50%), and 68% for HPNews (accuracy 46%), comparing with RandFL. The performance improvement is attributed to the selection of high-quality nodes, as shown in Fig. 8. Finally, the speedup of training especially benefits the aggregator, when the total payment of aggregator is constrained, or nodes would hesitate to participate in the training for a long time.

(2) The Impacts of Parameter : The increasing number of will improve data diversity and offer more opportunities for the aggregator to select nodes with high-quality resources and low cost, which in turn improves the accuracy and training speed. In Fig. 9, the number of training round is reduced by 28% to get the accuracy of 84%, comparing with . In each round, the accuracy with is larger than that with . When is large enough and 10% nodes are selected from all the nodes, the improvement of model accuracy is constrained. For , data diversity is already satisfied. Moreover, the increase of reduces the payment for each node, which also benefits the aggregator.

(3) The Impacts of Parameter : The large parameter reduces scores of winners since each node has more opportunities to be chosen. From Fig. 10(b), we can find that the payment is increased as well, which conforms to Theorem 3. On the other hand, the large will feed the model with more data, which might benefit the model accuracy. In Fig. 10(a), to get the accuracy of 86%, 20 rounds of training are required for , while 15 rounds are enough for . It can be seen that the large speeds up the training process. When is too large, the margin profit of training speed is limited. The training results for and are similar in our simulations.

(4) The Impacts of Parameter : We also use the parameter to increase data diversity in some extreme scenarios. In Fig. 11(b), we can find that the winner scores with are more scattered than that with . When , almost 66.6% nodes selected by -FMore are among top 30 scores. When , -FMore approaches to RandFL. Moreover, -FMore performs better than FMore in small-data-size scenarios which require more data diversity for federated learning. Note that the increase of data diversity prevents the overfitting problem but sacrifices the speed of learning, shown in Fig. 11(a). When , the accuracy only achieves 85%, which can be achieved at the 11th round with .

(e) CNN with CIFAR-10
(f) LSTM with HPNews
Fig. 8: The distribution of score
(a) Rounds vs the accuracy
(b) Payment and score with
Fig. 10: The training speed and payment with parameter

V-C Real-world Experiments

In the realistic deployment, the performance improvement of FMore includes two aspects, i.e., the accuracy improvement and the reduction of training time. From Fig. 12, we can find that the model accuracy is 59.9% for CIFAR-10 after the 20th round of training in FMore. FMore increases the accuracy by 44.9%, comparing with RandFL. Similar accuracy improvement is shown in the LSTM model as well. In addition, there exist some accuracy jitters in RandFL. For the reduction of training time, FMore outperforms RandFL as well, as shown in Fig. 13. The total training time of 20 rounds is 1119.3s for CIFAR-10 in FMore, which reduces the training time by 38.4%. To achieve the accuracy of 50% for CIFAR-10, RandFL needs almost 17 rounds (1552.7s), while FMore only requires 8 rounds (427.7s). The advantages of FMore become increasingly prominent when we collaboratively training challenging AI tasks.

Vi Related Work

Vi-a Mobile Edge Computing

MEC has drawn increasing attention in recent years [1]. Most of the studies focus on service placement [2], [11], [28], task scheduling [29], deployment issues [30], [31], etc. Among these studies, Wang firstly considered the performance issue of federated learning in MEC systems, and proposed an efficient control algorithm that trades off local updates and global aggregation to minimize the loss function with the constraint of resources [16]. Another interesting work is given in [32]

, where both deep reinforcement learning and federated learning are employed to optimize edge computing, caching and communication in MEC.

(a) Rounds vs the accuracy
(b) Payment and score with
Fig. 9: The training speed and payment with parameter
(a) The training speed with
(b) The proportion of selected node
Fig. 11: The performance impacts of parameter

Vi-B Federated Learning

Federated learning, firstly proposed by McMahan in [5], has become a fascinating topic in the machine learning community [33]. It is designed for privacy-concerning scenarios where local nodes would not like to upload and share their private data. Federated Learning is quite different from another impressive technique called distributed machine learning [34]. Distributed machine learning is adopted to deal with the massive data set and partition subsets of data to many nodes. Nowadays, plenty of studies focus on federated learning. Kairou et. al. summarized 438 papers and presented recent advances and open problems in the field of federated learning [35].

Many papers concentrate on the performance improvement of federated learning [36], [37]. In [14], Sattler proposed the sparse ternary compression scheme for non-IID data. Zhao also focused on the non-IID data and presented a method of sharing a small subset of data between all the edge nodes to improve the accuracy of federated learning [13]

. For Stochastic Gradient Descent (SGD), Wang presented and analyzed the cooperative SGD method, and provided convergence guarantees for the existing algorithms

[15]. In [10]

, Yu provided theoretical studies on the comparison of model averaging and mini-batch SGD. Nishio studied the node selection problem with resource constraints and provided a heuristic algorithm to find qualified nodes

[24]. This paper is a little similar to our work, but Nishio neglects the incentive mechanism, which is quite significant for MEC systems.

Both security and privacy are important concerns for federated learning [27]. Bonawitz proposed a secure global aggregation algorithm that allows the server to compute without learning each user’s contribution [17]. Impressively, Wang explored the user-level privacy leakage against federated learning by attacks from malicious servers, and proposed a framework with GAN to discriminate category and client identity of input samples [21]. For the local privacy, Bhowmick designed an optimal locally differentially private scheme for statistical learning problems [20]. In [19], Preuveneers considered attacks from local models with malicious training samples and provided a chained anomaly detection method for federated learning. In [18], Bagdasaryan identified that participators can inject hidden backdoors into the global model and proposed a new model poisoning methodology with model replacement.

All these studies assume that edge nodes voluntarily participate in federated learning, without requiring any returns, which does not hold in the realistic scenario of MEC. They neglect the incentive issue of federated learning, except for the work in [23]. Kang utilized the contract theory to motivate nodes to participate in federated training in mobile networks [23]. Kang and we almost simultaneously discover the significance of the incentive issue. However, there exist some main differences: (1) Kang considers the incentive problem in the monopoly market, where mobile terminals can only decide whether to accept the contracts or not. The efficiency of Kang’s scheme is decided by the total number of contracts. In our scheme, edge nodes have more opportunities to submit any combination of resources and the expected payment, and the buyer “aggregator” can choose any node with qualified resources. (2) In Kang’s scheme, the computational cost of calculating the optimal contracts is NP-hard, while edge nodes only need linear time to get the optimal strategy in FMore. (3) Node selection is not provided in [23], while we not only motivate more high-quality edge nodes to participate in the training but also select those suitable nodes with low costs.

Fig. 12: The performance of CIFAR-10 in realistic deployment

Vi-C Procurement Auction

The incentive schemes with procurement auction are designed to address varieties of problems, such as the allocation of radio-frequency spectrum [26], crowdsensing [22], display advertising [38], and client-assisted cloud storage systems [39], [40]. Unfortunately, none of them can be directly applied to federated learning in MEC, since they just considered the specific property in their problems. In MEC, resources provided by edge nodes are multi-dimensional and dynamic, and winners are selected in each game. In addition, the proposed scheme should be lightweight and able to improve the performance of federated learning with well-chosen nodes. Consequently, we need to design a novel incentive scheme for federated learning in MEC.

Fig. 13: The training speed for CIFAR-10 in realistic deployment

Vii Conclusion

In this paper, we have considered the incentive mechanism for federated learning in MEC and proposed a lightweight and efficient scheme FMore. FMore adopts the multi-dimensional procurement auction with winners. In FMore, edge nodes can obtain the Nash equilibrium strategy from our theoretical results in linear computation time. We also provide guidance to the aggregator to get the expected resources. We develop a simulator with two models and four datasets to demonstrate the advantages of FMore. Extensive simulations show that FMore can reduce the training rounds by almost 51.3% and improve the accuracy by 28% for the LSTM model. We also implement a real-world system with 32 nodes in a Linux HPC cluster, and find that the training time is reduced by 38.4% while model accuracy is increased by 44.9%. In this paper, the budget constraint of the aggregator is not considered, which is left for future work. In addition, whether the probability should be identical or distinct for each node remains to be studied.

References