1 Introduction
The ubiquitous deployment of Internetconnected mobile devices leads to the amount of data generated at the network edge increasing exponentially, fostering the transformative computing paradigm, namely mobile edge computing [29]. According to a recent report, the global market of edge computing is $3.6 billion in 2020 and is anticipated to reach $15.7 billion by 2025 [6]. Facilitated by faster networking technologies such as 5G, edge computing becomes promising to support realtime applications, which calls for vigorous data processing and analysis capability at the edge. Thanks to the explosive growth of artificial intelligence, edge computing becomes more intelligent via implementing machine learning (ML) algorithms to achieve various functions such as classification and prediction.
However, since the data generated at the edge devices may be highly sensitive to the end users, it might be inappropriate to deploy conventional centralized ML algorithms which need to physically collect all training data from the devices. Federated learning (FL), a representative of distributed ML, turns into the aptest for edge computing, based on which the edge server and all connected devices accomplish training the same ML model in a collaborative manner, and thus this paradigm is also termed federated edge learning (FEL) [12, 15]. More specifically, no device explicitly uploads the generated data in FEL, but their data can still contribute to training the shared ML model by iterative local learning, global aggregating, and updating [18].
Within this collaboration system, the most challenging but critical issue is to guarantee that all participants cooperate tacitly. To fulfill this goal, two lines of research have been carried out, namely inprocess [19, 8, 28, 38, 4, 33, 2, 27, 30, 24, 1, 35, 31, 32, 3] and inadvance [22, 10, 34, 37, 36, 23, 13] FEL optimization, with the former improving the FEL system performance via optimizing learning algorithms or communication configurations during the FEL process, while the latter achieving the desirable performance through designing effective schemes to better establish and maintain the FEL system by avoiding inefficiency before the FEL process begins. Usually, taking precautions can enhance the FEL system as a preparative, so inadvance optimization becomes more costefficient than checking for the leaks during the working process. The stateoftheart accomplishes this objective via either device selection [22, 10, 34], which directly filters out unqualified devices, or incentive mechanism design [37, 36, 23], which relies on a strong assumption of perfect information in the Stackelberg game [21]. Nevertheless, in practice, we may not have enough devices that can afford the elimination, and the devices may not own the full knowledge about each other.
In this paper, we consider that an edge server and multiple devices collaborate in an FEL process repeatedly to optimize user experience in the long run. The server is the coordinator in charge of the whole FL process while the devices contribute their local learning results to obtain the globally trained model as a compensation at the end of each FL round^{1}^{1}1
We term the “round” in this paper as finishing a specific FL task and obtaining a welltrained ML model, instead of one time of local training in FL or an epoch in the traditional ML model training phase.
. Within the whole FL process, the local training is only visible to and manageable by individual devices, leaving the room for selfish behaviors of perfunctorily contributing to the FEL via training the ML model using partial local datasets. To suppress this phenomenon, we utilize the multiplayer simultaneous game to model the interactions between the edge server and devices in an FEL system, where none of them has perfect information about others, and aim at eliciting the full contribution of devices from the perspective of the server, instead of intolerantly eliminating malicious devices. However, the tight coupling of action and utility in this game makes it a dilemma for the server to play against devices because recklessly changing behaviors can lead to the server a decreasing utility. This brings us a question: is it possible for the server to entice full contributions from the devices without concerning about its utility loss?To answer this question, we resort to the extortion scheme which was first introduced as a special form of the zerodeterminant (ZD) strategy [25]. By employing the extortion strategy, any player can independently control the proportion between the expected utility of itself and that of the opponent, which implies the potential to help the server control the utility in playing against devices. Nonetheless, the classical extortion strategy is derived for the twoplayer game, which is not applicable to our problem involving multiple players. In addition, it is clearly not efficient to directly carry it out between the server and every device in a onebyone manner. To address this challenge, we put forward a collective extortion (CE) strategy, which can achieve the goal of effortlessly controlling the overall utility of all devices with only onetime setting for the server. What’s important, we comprehensively analyze the potential of the proposed CE strategy on enforcing the full cooperation of the devices, and further validate that it works impartially for all players with respect to utilities.
The main contributions are summarized as follows:

We model the interactions between the edge server and devices in FEL as a multiplayer simultaneous game, based on which, for the first time, we derive the powerful CE strategy to efficiently control the relative utility proportion between the CE adopter and a group of opponents.

The proposed CE strategy can not only effectively suppress the selfish behaviors of devices in FEL via enforcing their full contributions, but also enrich the theoretical system of game theory through extending the original twoplayer extortion strategy to the multiplayer situation, and thus enlarging its application scope.

We demonstrate the effectiveness and fairness of the proposed CE strategy on driving the full cooperation of the devices with both theoretical analysis and experimental evaluations, which benefits the longterm system stability and liveness.
The rest of this paper comprises the following six sections. Section 2 investigates the most related work in improving FEL performance and Section 3 introduces our problem formulation. In Section 4, we deduce the CE strategy for the multiplayer situation, followed by the analysis on its potential to enforce the full contribution from the devices in Section 5. Experimental evaluations are presented in Section 6. And we conclude this paper in Section 7.
2 Related Work
Existing research focusing on enhancing the overall system performance of FEL can be classified into
inprocess and inadvance optimization, depending on whether the operation steps lie in the FEL process or before that.For the inprocess optimization, researchers tried to improve the FEL performance via designing advanced learning algorithms [19, 8, 28, 38, 4, 33, 2, 27, 30, 24] or optimizing communication configurations [1, 35, 31, 32, 3]. In [19], Mills et al. proposed an adapting FedAvg algorithm based on the Adam optimization, which overcomes the shortcoming of the original FedAvg with longer convergence time in dealing with the nonindependent identically distributed data generated in internetofthings (IoT). Considering about the constrained resources of edge devices, Jiang et al. [8] proposed a scheme named PruneFL to adaptively adjust the model size for reducing training cost while maintaining comparable accuracy with the full model. To better control the global aggregation frequency in edge computing with limited resources, Wang et al. [28] theoretically analyzed the gradient descent convergence bound. Leveraging on the overtheair computation, several studies [38, 4, 33, 2] achieved more efficient FL aggregation by taking advantage of the superposition of signals in the wireless multiple access channel. Tran et al. [27] considered the tradeoff between computation and communication latency and that between learning time and energy consumption in FL for wireless networks via solving a nonconvex optimization problem. To deal with the straggler concern in FEL, a framework named ELFISH was proposed in [30]
to achieve resourceaware learning via dynamically masking computationintensive neurons, while Prakash
et al. designed CodedFedL [24] based on coded computing to inject structured redundancy in FL to compensate the negative impacts of straggling updates. On the other hand, aiming to facilitate the FEL from the perspective of communications, optimal resource allocation was investigated in [1, 35, 16] and various transmission scheduling policies were designed in [31, 32, 3, 20].For the inadvance FEL performance optimization, there are several recent studies which mainly focus on device selection [22, 10, 34, 11] and incentive mechanism design [37, 36, 23, 13, 17]. In [22], to achieve the best learning result, a novel protocol was devised to select qualified devices according to their computational resource and communication conditions. In [10], Kang et al. proposed a reputation based mechanism for screening out reliable devices to obtain highquality model updates in FEL using the contract theory. To facilitate vehicular edge learning, selectively collecting good local model updates was considered in [34] using the twodimension contract theory. Besides, Zhan et al.
designed deep reinforcement learning (DRL) based incentive mechanisms for edgebased FL in
[37, 36], where the optimal pricing strategy of the aggregator and the best contribution strategy of the participants can be derived based on the hierarchical Stackelberg game. While Pandey et al. solved the incentive problem in FL with communication efficiency consideration using a crowdsourcing framework and the twostage Stackelberg game for equilibrium analysis, Le et al. studied the incentive mechanism design for FL in wireless scenario via an auction game.Relying on the power of taking precautions in enhancing the FEL performance, one can find that the existing studies either rigidly filter out unsatisfied devices or assume the availability of perfect information to implement, which can be impractical as there is hardly redundant number of participants or full knowledge about each other in FEL. To overcome these shortcomings, we utilize the multiplayer simultaneous game to model the interactions between the edge server and devices with nobody having perfect knowledge of others, and then design an effective CE strategy to enforce the full contribution of selfish devices with fairness guaranteed.
3 Problem Formulation
3.1 System Model
As illustrated in Fig. 1, we consider an FEL system consisting of one edge server, denoted as , and a set of edge devices, denoted as . The system aims at providing better services to end users via conducting collaborative machine learning based on the data generated by all edge devices. Specifically, we assume that the FEL is conducted in a roundbyround manner, where the round is defined as accomplishing a certain learning task with the objective of training a global ML model with good performance. Each device joins a round of FEL task by contributing the locally learned results obtained through training the initial ML model using the local dataset for multiple iterations. As a compensation, the server, who works as the FEL coordinator, returns the final welltrained ML model to the participating devices once the current round of FEL task finishes.
However, some devices may behave selfishly by utilizing partial of his^{2}^{2}2We use “he” and “she” to respectively represent anyone of the devices and the server.
local data to conduct the local training of ML model, by which they can make extra profits, such as saving computational resources and using the rest of the data to further improve the final ML model only for themselves. This sort of malicious behavior comes to be difficult for the server to timely detect and prevent due to the following two reasons. First, the server has no access to the local datasets held by devices for directly acquiring their size information or training efforts; second, the data distribution of devices is usually skewed in FEL, making it impossible to infer the size information, either. In this case, the server may behave strategically via choosing to return or not return the final ML model to the devices, thus helping suppress the selfishness in an opportunistic way, which will be detailed in the next subsection.
For better understanding, we summarize major notations used in the following sections in Table I.
Notation  Explanation 

The action of the server playing against device  
The action of any device  
The utility of the server  
The utility of any device  
The profit of the server  
The cost of the server sending the final model to  
The error of the final model  
The profit of the device  
The extra income of the device using partial data 
3.2 Game Formulation
It is clear that neither the server nor any device can know the action of each other when they make their own decisions, which can be exactly modeled by a multiplayer simultaneous game. Even though it seems that there are only two types of players, i.e., the edge server and the device, the number of players involved in the decision making and outcome witnessing of this game is multiple. In particular, the number of devices playing against the server in this FEL scenario can be large, and every device has his own preference on game strategy selection and operates with independent system parameters related to their benefits and costs.
Formally, we define the server’s action of returning the final ML model to the device as cooperation () and the action of not sharing the welltrained ML model as defection (). For the device, we regard the action of conducting local learning using the full local dataset in a round of FEL as cooperation (), while the behavior of employing only partial local data for FEL training can be viewed as defection (). For clarity, we utilize to denote the action of the server playing against device and to express the action of device in this game. Thus, we have , where .
It is worth noting that in the case of , the specific amount of data utilized by each device during the FEL process can be heterogeneous from other peering devices. Here we treat any selfish behavior of not fully using the local data for model training as defection no matter how severe or slight this malicious action is. This qualitative consideration makes it easy for us to focus more on the elimination of devices’ undesirable activities in the subsequent quantitative modeling and algorithm design sections.
Given the above actions, we can define the utility function of device as
(1) 
where are scale parameters, is the profit the device can obtain according to the server’s action of whether or not returning the final model, and represents the extra income that the device can make by not fully using his local data to train the model, such as the spared computation, communication, and energy resource consumption.
Here we have because the server’s returned final model can enable the device to provide more efficient service to the end user so as to increase the user’s satisfactory degree, which can be regarded as a higher payoff for the device. For easy expression, we use and to respectively represent and . Considering that the cooperation action of contributing to FEL based on the full dataset leaves no extra room for the device to make more profit, we assume . For the selfish behavior of using only partial local data for training the ML model, with denoting the percentage of device ’s dataset contributed to FEL^{3}^{3}3As is a parameter related to the personal preference of each device regarding being selfish, here we assume that
is a relatively stable value, not fluctuating drastically in the game rounds, which can be approximately estimated by the edge server through historical behaviors.
, we can define , where is a devicedependent positive constant indicating the heterogeneity of devices.Next we define the utility of the server as
(2) 
where are scalars; refers to the profit of the server gained from this round of FEL with the globally trained ML model and
denotes the action vector of the devices;
is the cost of the server to send device the final trained model. Since the final model returned to all devices is the same, the main cost of sending it to every device is assumed to be the same as an example here^{4}^{4}4For different costs of the server to send the final model to devices, the overall research methodology proposed in this paper can still be applied although the derivation details may vary., with , where as a positive scalar denotes the overall cost of the server, and .The profit of the server obtained from the final model can be relatively complicated to depict, which is generally dependent on the specific ML model trained in the FEL system. In this paper, taking the convolutional neural network (CNN) based classifier as an example, we can describe
as follows:(3) 
where are positive scalars, and represents the classification error of the final trained model, jointly determined by the actions of all devices. Specifically, the server’s profit reaches the maximum if approaches zero; and if the error is too large, becomes very small. Inspired by the powerlaw function proposed in [5, 9], we can define an exemplary as
(4) 
In the above equation, denotes the data size of device ; and are tuning scalars to depict the nonlinear relationship between the classification error and the training data size, where the larger the total data size used for training, the smaller the error. Combining (3) and (4), one can find that the less the number of defective devices, the larger the effective global training dataset, the smaller the error, which results in the larger profit for the server. In the extreme case where all devices choose (or ), can reach the minimum (or maximum), and accordingly, turns to be the maximum (or minimum), denoted as (or ).
Note that for other ML model training tasks in FEL, we may propose different formulas to describe the profit function , but its main characteristics about all cooperative devices producing while all defective devices leading to will generally hold. Therefore, the overall analysis framework, as well as the subsequent full contribution enforcement scheme, can still work in a similar way.
Theorem 3.1.
The FEL system can form to function only when and .
Proof.
To ensure that such an FEL system comprising one server and multiple devices functions well, the basic requirement is that allcooperation behaviors can make it more beneficial than the case of all defection for any player. Otherwise, there is not enough motivation for any device or server to collaboratively participate in this FEL.
For device , the utility of the allcooperation case is and that of alldefection is . The above requirement leads to , which is equivalent to .
Similarly, for the server, the utility with cooperation actions from all players is , while all defection results in the utility of . Thus the FEL system requires that , which equals . ∎
Based on the above definitions of utilities, we can formally define an FEL game as follows.
Definition 3.1 (FEL Game).
In the FEL system consisting of one server and devices, their interactions regarding whether to return the final model and whether to fully contribute to the learning process can be defined as a normalform game with .
3.3 Dilemma in the FEL Game
In fact, there exists a defection dilemma in the FEL game, which can be summarized in the following theorem.
Theorem 3.2.
In the FEL game defined in 3.1, is the best action for any player.
Proof.
For any rational player, the best action can be derived by comparing the utility values under situations of choosing and . For any device , the server’s action being or clearly affects his utility, thus the device can consider these two cases separately. If , his utility is , and since , there exists , which leads to his best action of . If , the device’s utility becomes , where the function enforces the best action for the device again. In other words, no matter what action the server takes, the best action of the device is to defect.
Similarly, for the server, no matter what the action vector of the devices is, the only factor affecting her utility that she can control is . Referring to (2), it can be concluded that only when the last item becomes zero can be maximized, which corresponds to . ∎
According to Theorem 3.2, one can observe that the individual optimal action in the game among the server and the devices is always , which means that the device always decides to take part in the FEL using partial dataset and the server never shares the final welltrained model to any device. This is obviously harmful for the overall benefit of the FEL system where the global model cannot be trained based on all generated data, leading to the reduced model performance. Thus, it becomes critical to solve this alldefection dilemma. Here we consider that the server is in charge of driving the cooperation from the devices due to the following two reasons. First, as the upperlevel controller of the FEL system, the server hopes to obtain an optimal collaborative learning result, which becomes the motivation for her to get rid of this undesired situation; second, as the coordinator, the server can exert punishment to defective devices via not returning the final model, which indicates her capability to suppress malice.
To elicit full contributions from the devices, one intuitive solution for the server is to design cooperation incentive schemes, which usually costs more for the server to entice profitdriven devices. Thus, it is imperative to design a new scheme embedded in this multiplayer game process while preventing any interest loss for the server. Referring to (2), one can observe that the utility of the server is collectively affected by the actions of all devices as well as herself. Thus, any reckless behavior change without a delicate plan would lead to undesired damage for the server, making it a critical challenge for the server to manage the behaviors of the devices without concerning her own utility. Inspired by [25], we find that the extortion mechanism, as a type of the zerodeterminant (ZD) strategy, presents the merit of enabling the adopter to unilaterally control a proportional relationship between the expected utilities of two players, which implies the potential of helping solve the server’s challenge.
However, the conventional extortion strategy was originally developed for the twoplayer game, which is not directly applicable to our problem. Although one possible application idea is to carry it out between the server and each device, we can clearly notice the low efficiency of this onebyone method. Thus, we resort to extending the extortion strategy to the multiplayer scenario and name it as the collective extortion (CE) strategy, which will be elaborated in the next section.
4 Collective Extortion Strategy
As mentioned above, the classical extortion strategy derived in the twoplayer game cannot effectively fit in the FEL game scenario. In this section, we extend the twoplayer extortion strategy to the multiplayer version, namely the CE strategy, which can solve the defection dilemma in the FEL game without suffering from the inefficiency of directly implementing the extortion strategy for each device.
To be specific, we aim to enable the server to collectively control the overall utilities of all devices so as to further drive their cooperation behaviors, so here we set the action of the server playing against all devices to be homogeneous, denoted as . Since there exist devices, the number of players in our FEL game is with each player choosing from two actions and . And thus there exist possible game results in total, which can be expressed as follows,
where denotes the th game result.
In light of the conclusion in [25] that it is not disadvantageous for the shortmemory player compared to the longmemory one, we assume that both the server and the devices have onestep memory and select their actions based on the game results in the last round. Thus, one can introduce the definitions of their mixed strategies as follows.
Definition 4.1 (Mixed Strategy of the Server).
The server’s mixed strategy is defined as with
denoting her conditional probability of choosing cooperation given the game result in the last round
.Definition 4.2 (Mixed Strategy of the Device ).
The device ’s mixed strategy is defined as with denoting the conditional cooperation probability of device given the game result in the last round .
Accordingly, the defection probability of the server is and that of is , where . Then the Markov state transition matrix of this FEL game can be written as
where the element is the probability of transiting from the previous game result to the current one and can be defined as
In the above equation, and are calculated according to
where and denote the actions of the server and device in the round with game result , respectively. And they are assigned values according to
In other words, when the server’s behavior is in the current game result , it is the former part of functioning and thus ; otherwise, . Next, is derived in the same way according to the action of device .
Then we define a nonnegative vector with the feature of
, denoting the probability distribution over all possible game results in the stable state. Since
is the transition matrix, we know that when the Markov process reaches the stable state, there exists , which equals with denoting the unit matrix and .Let and be the adjugate and determinant operations on a matrix, respectively. According to the Cramer’s rule, there exists . Comparing it with the above equation, one can conclude that is proportional to every row of . Accordingly, the dot product of and any vector can be proportionally calculated by
(5) 
where , denotes the th column of , and “” represents the proportional relationship.
Next, in light of the fact that the elementary transformation on any matrix does not change its determinant value, we conduct column transformations on the matrix in (5). More specifically, we first locate that the th column of this matrix refers to the game result of the server’s cooperation and all devices’ defection, i.e., , and the th element in this column can be expressed as ; when adding all columns before the th column to , we obtain the new form of this column as follows:
Then (5) can be written as
It is clear that the th column is only related to the strategy of the server. Therefore, given any constant parameter , the server can adjust the strategy to meet the condition so as to achieve
(6) 
because the th column and the last one of the matrix are proportional to each other.
In fact, the above proportional value can be converted to a real value by normalizing on the value of , where denotes the allone vector with the size of . In particular, the expected utility of the server, denoted by , and that of device , denoted by , can be calculated by
where and are respectively the utility vector of the server and that of device following the same order of game results and can be calculated according to (2) and (1).
Next, we can derive the CE strategy as follows.
Theorem 4.1.
By setting the strategy to satisfy
(7) 
the server can enforce an extortionate relationship between their expected utilities
(8) 
with being the extortion factor.
Proof.
Given the expressions of and , the server can enforce a zero value for any linear combination of the expected payoffs based on (6). Particularly, if the server hopes to realize an extortionate share of expected utilities larger than the allcooperation payoff , the server can set because the utility relationship is equivalent to . Accordingly, we can know that the server’s strategy needs to comply with . ∎
With a feasible strategy satisfying the above condition, the server can unilaterally control to ensure that her own expected utility difference to , i.e., the utility at allcooperation state, is always times of the sum of all devices’ expected utility differences to . Based on the oneforall feature of this strategy, we name it the collective extortion (CE) strategy. In fact, CE not only expands the application scope of the original extortion strategy from the twoplayer game to the multiplayer game, but also is effective to solve the problem of full contribution stimulation which will be elaborated in the next section.
It is worth noting that the base values in the CE strategy, i.e., the subtrahends and in (8), can be other values as long as the strategy has feasible solutions to satisfy the corresponding condition similar to (7). For example, in the twoplayer game scenario, the original extortion strategy was proposed by using the payoffs at alldefection state as the base values [25], where the feasibility of the extortion strategy was analyzed accordingly; while in [7], the range of base values are demonstrated to be between the payoffs of alldefection and allcooperation.
5 Full Contribution Enforcement Based on CE
As mentioned earlier, the server can fulfill an extortionate relationship between the expected utilities of herself and those of all devices via elaborately setting a CE strategy. In this section, we further explore the potential of this strategy in stimulating full cooperation of the devices so as to solve our problem defined in Section 3.
5.1 Feasibility of the CE Strategy
According to (7), one can get the server’s strategy as
Given a certain , its feasibility is dependent on the utility vectors of the server and the devices. Denote and , . Considering that , the constraints of the utility vectors vary in the following two cases:
Case 1: .
Case 2: .
5.2 Potential of the CE Strategy to Drive Devices’ Cooperation
Under a feasible CE strategy adopted by the server, one can analyze its potential of driving the devices to fully utilize their local datasets in FEL. To that aim, we first assume that each device in the FEL game searches for the best response strategy in an evolutionary manner. The reason is that the device lacks the global game information compared to the server who can interact with all devices in the game. Here we assume that a device adjusts his strategy with the goal of improving his own utility regardless of the strategy or utility of the server. Inspired by [26], we define the following strategy evolving path for an evolutionary device^{5}^{5}5Here we discuss one of the devices as a representative and thus omit the subscript for brevity. with denoting his cooperation probability at round ,
(9) 
where refers to the expected utility of cooperation, and represents the total expected utility. With denoting the expected utility of defection, the total expected utility can be calculated by
(10) 
Referring to the right side of (9), we can find that the numerator is a part of the denominator, resulting in .
To investigate whether the proposed CE strategy can drive the full cooperation of devices, we need to study the condition of increasing. According to (9), we can find that only when can the cooperation probability of the device increase in the next round. Combining it with (10), we can derive the sufficient condition of the CE strategy being able to enforce the device become more cooperative as follows:
for . In fact, in the case of , there exists according to (10) and thus is always 1, which never requires any function of the CE strategy.
Therefore, in the following, we focus on solving the problem of when , can the CE strategy function to elicit the cooperation from the device? Referring to the abovederived sufficient condition, we can find that this problem turns to be whether the CE strategy can lead to .
Recalling the power of the CE strategy presented in Section 4, the server’s strategy works on the whole set of devices according to (8) and (7). To study the effect of the CE strategy on any individual device, we consider two possible situations of devices in the FEL game:

Devices are homogeneous using the same strategy and receiving the same utility;

Devices are heterogeneous with various strategies and utilities.
Then, for both situations, we can demonstrate that the device tends to cooperate under the server’s CE strategy, which are respectively presented in the following theorems.
Theorem 5.1.
In the case of all devices with the same strategy and utility, the server utilizing the CE strategy can enforce any evolutionary device to obtain the cooperation probability .
Proof.
For situation S1 where all devices involved in FEL are homogeneous, since everyone uses the same strategy and the server exerts one uniform strategy to all of them as well, we study the cooperation probability of any device here as an representative. According to (8), we can derive the expected utility of the device as
(11) 
Next, we analyze the expected utilities of the evolutionary device with different actions, i.e., and . More specifically, when the device takes the cooperation action, the server’s expected utility depends on her own action, where leads to while results in according to (2). Based on the above equation (11), one can find that the server’s expected utility brings two possible payoffs for the device, which are
Assuming that the cooperation of the server at round is , the expected cooperation payoff of the device can be calculated by
(12) 
While the device chooses the action , the expected utility of the server would become and for and , respectively. And according to (11), the device’s payoff can be
Thus, the expected defection payoff of the device turns to be
(13) 
Theorem 5.2.
In the case of all devices with different strategies and utilities, the server’s CE strategy can drive an evolutionary device to get .
Proof.
Given the heterogeneous devices in situation S2, to focus on the behavior of (any) one specific device , we assume that the strategies of other devices are given fixed, and thus their expected utilities are also certain values. To comply with (8), we denote
Then the expected utility of in this case turns to be
Similar to the proof of Theorem 5.1, we can calculate according to , where
For the calculation of , we have
Due to the same reason of , we can obtain and in this situation as well, resulting in , which can lead to the gradual increase of until approaching to 1. ∎
From the above two theorems, we can tell that the CE strategy can theoretically incentivize the final cooperation of any device involving in the FEL game with an evolutionary mindset no matter in the homogeneous or heterogeneous device settings. In other words, devices can usually be driven to participate in the FEL process with fully using their local datasets and contributing to the global learning without any reservation.
5.3 Fairness of the CE Strategy
Given the vigorous force of the CE strategy in stimulating devices’ collaboration, one may concern about what if the server behaves defectively via not returning the final welltrained model to the devices so as to save sending cost for obtaining a better utility? This question will be investigated in detail as follows.
According to Theorems 5.1 and 5.2, the final actions of the devices become cooperation as the number of game rounds increases. Nevertheless, the server can still select her action from and . However, according to the following theorem, one can see that the best action for the server with the CE strategy to keep the longterm stability is to eventually choose .
Theorem 5.3.
The final action of the server adopting the CE strategy is cooperation.
Proof.
After enough number of FEL game rounds, devices choose cooperation eventually. Then the server’s cooperation can bring the cooperative device the utility with the game result , and her defection action can make the cooperative device obtain the utility at the game result .
Referring to (8), one can find that the cooperative server forming the game state can still make it hold stably since the right side turns out to be zero with in the long run. However, if the server chooses to be defective constantly, the right side of (8) would become negative because the device’s utility in this case is , which is less than due to , and thus there exists . This is clearly unfavorable for a reasonable server. Thus, the best action for the server is also cooperation in the long run. ∎
Based on the above theorem, we can conclude that our proposed CE strategy employed by the server is fair for all players, which would result in all cooperation and bring the samelevel utility to the server and devices.
6 Experimental Evaluation
In this section, we conduct a series of experiments to demonstrate the effectiveness of the proposed CE strategy in eliciting full cooperation from all devices in the FEL game and other attractive features mentioned in the previous section. The machine used for simulation experiments is a desktop computer with a 3.59 GHz 6Core processor and 16 GB memory. In all experiments, we fix the number of devices
. Scalar parameters for devices are randomly set following uniform distributions with
, and .While for the server, the parameter values independent of the ML model are firstly set as . To appropriately set the parameters related to the profit function
which is closely depending on the specific ML task, we utilize the MNIST database
[14] using 6,000 data samples to train a 2layer CNN classifier, where each device is assumed to generate 750 samples in noniid manner. The obtained fitting parameters in (4) are and with 95% confidence, and . Further, we fix in (3) and obtain the extreme values of as and . Note that we also test other sets of parameter values satisfying the requirements shown in Theorem 3.1 and Section 5.1, but we obtain similar results which are omitted for brevity. Besides, each experiment is repeated 20 times to obtain the average for statistical confidence.6.1 Effectiveness of the CE strategy to enforce full cooperation
To figure out whether the proposed CE strategy adopted by the server can enforce full cooperation from any evolutionary device as theoretically proved in Section 5.2, we compare it with four classical strategies, namely ALLC (all cooperation), ALLD (all defection), TFT (titfortat), and WSLS (winstayloseshift). The first two strategies are easy to understand where the server stays constantly cooperative or defective. The TFT strategy means that the server behaves according to the device’s previous action while in WSLS the server keeps on choosing an action if it brings a high utility and switches to the other action otherwise.
Taking the first device as an example, we report the comparative experiment results in Fig. 2, where his initial cooperation probability varies as to indicate the robustness of our proposed CE strategy. It is clear that no matter how cooperative the device is at the beginning, the server adopting the CE strategy can elicit the final cooperation of the evolutionary device. As increases, the time consumption to achieve the stable state is less. This is because the more cooperative the device, the easier to drive his full cooperation. It is clear that other strategies cannot achieve this goal as all of them lead to the cooperation probability approaching zero finally.
6.2 Fairness of the CE strategy
Next, we explore whether the CE strategy is fair for both the server and the devices. We compare their utilities at the stable state in five cases where the server adopts different strategies. Specifically, we set the initial cooperation probability of a device as in this experiment and present the experimental results in Fig. 3. It is worth noting that since the utility of the server and that of the device are different in values according to the definitions in (1) and (2), we utilize a metric termed relative utility, which is calculated by the ratio of the actual utility to the utility at the allcooperation state, to study the fairness of each strategy.
According to Fig. 3, one can find that only the proposed CE strategy can achieve almost the same relative utility level for both the server and the device, which approximately equals 1, indicating that both of them obtain the stable utility with the value equivalent to the utility when all cooperate, i.e., and . This clearly demonstrates the fairness of the CE strategy in incentivizing the full cooperation of all devices. For other cases, one can find that the ALLC strategy makes the server suffer from a severe loss since the evolutionary device can strategically exploit her friendliness and behave defectively to obtain a higher utility. The ALLD and TFT strategies lead to similar results for them where the server gains slightly less than the device. This is because the server cannot be fully exploited with ALLD and TFT strategies but the device in these cases will not be driven to cooperate, and thus both of them obtain less profit compared to the situation where the server adopts the CE strategy. The WSLS strategy also makes the server acquire less but performs better than the case of ALLC.
Knowing that the CE strategy can lead to full cooperation of any evolutionary device and achieve almost the same level of stable utilities for both sides, we continue to investigate the dynamics of utility changing with time. In Figs. 4 and 5, we first plot both utilities at the stable state with four initial device cooperation probabilities, and then depict the dynamic change of the utilities in each round with each reflecting one case of . It can be observed that brings no difference to the stable utilities as shown in the bar graph, while the dynamics of utilities varies according to the device’s initial cooperation probability. More specifically, with the increase of , the utilities of both sides can converge faster. In other words, the more cooperative the devices at the beginning, the quicker they can reach the stable state, which is coincident with the results of cooperation probability’s evolution presented in Fig. 2.
Further, we study the dynamics of utilities with the server adopting four other classical strategies and report the experimental results in Fig. 6. One can find that four classical strategies bring different evolution utility paths, especially at the beginning, but all of them converge to the stable result in which the server obtains less utility than the device, which cannot meet the server’s expectation.
6.3 Impacts of the extortion factor
As can be observed in Section 4, the extortion factor in (8) plays an important role in affecting the degree of utility difference between the server and all devices. To uncover the impact of on the FEL game, we investigate the changing trend of the cooperation probability from any device and the corresponding utility evolution dynamics with different extortion factor in this section, where the initial cooperation probability of the device is set as . Detailed experimental results are respectively reported in Figs. 7 and 8.
According to Fig. 7, we can observe that the higher the extortion factor, the longer time is needed for the device becoming fully cooperative. Taking the case of as an example, the convergence round of realizing is about 10; while for , the cooperation probability of the device converges to 1 after 50 rounds. This phenomenon suggests that even though the server can relatively dominate in the FEL game using the CE strategy, it is not a wise choice for her to enforce severely imbalance expected utilities since the time consumption for eliciting the cooperation from devices can be large.
With respect to the impact of on the utilities of the server and the device, we can have some clues from Fig. 8. Although the specific evolution paths of the instant utilities are different with varying , the stable results are the same where each player obtains the utility of mutual cooperation. This outcome implies that the extortion factor in the CE strategy has few impact on the utilities that each player can obtain at the stable state. The underlying reason is that the power CE strategy can drive the device to fully collaborate given any , which leads to mutual cooperation and thus the same level of relative utilities for all players. In fact, this consequence is also complying with the fairness feature of the CE strategy as presented earlier.
7 Conclusion
In this paper, we investigate the problem of optimizing the FEL system performance via eliminating the selfish device behaviors. Specifically, we model the interactions between the edge server and the devices as a multiplayer simultaneous game, based on which we derive a CE strategy to collectively control the proportional relationship between the utility of the server and that of the devices. Based on this CE strategy, the server can efficiently enforce full contribution of all devices without concerning about her utility, which is both theoretically analyzed and experimentally evaluated. Essentially, the proposed CE strategy is impartial for both the adopter and the opponents, indicating its liveness to maintain the stability of the FEL systems.
In the future, we plan to examine the efficiency and scalability of the proposed gametheoretic scheme in playing against selfish devices in FEL. Besides, we will explore more intelligent solutions about countering other malicious behaviors of devices in FEL, where dynamically joining and leaving the learning process will be discussed to describe more realistic scenarios.
References
 [1] (2020) Hierarchical federated learning across heterogeneous cellular networks. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8866–8870. Cited by: §1, §2.
 [2] (2019) Wireless federated distillation for distributed edge learning with heterogeneous data. In 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–6. Cited by: §1, §2.
 [3] (2020) Update aware device scheduling for federated learning at the wireless edge. In 2020 IEEE International Symposium on Information Theory (ISIT), pp. 2598–2603. Cited by: §1, §2.

[4]
(2020)
Machine learning at the wireless edge: distributed stochastic gradient descent overtheair
. IEEE Transactions on Signal Processing 68, pp. 2155–2169. Cited by: §1, §2.  [5] (2018) Why is my classifier discriminatory?. In Advances in Neural Information Processing Systems, pp. 3539–3550. Cited by: §3.2.
 [6] Edge computing market. Note: https://www.marketsandmarkets.com /MarketReports/edgecomputingmarket133384090.htmlAccessed: 20200730 Cited by: §1.
 [7] (2018) Payoff control in the iterated prisoner’s dilemma. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 296–302. Cited by: §4.
 [8] (2019) Model pruning enables efficient federated learning on edge devices. arXiv preprint arXiv:1909.12326. Cited by: §1, §2.
 [9] (2018) Predicting accuracy on large datasets from smaller pilot data. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 450–455. Cited by: §3.2.
 [10] (2019) Incentive mechanism for reliable federated learning: a joint optimization approach to combining reputation and contract theory. IEEE Internet of Things Journal 6 (6), pp. 10700–10714. Cited by: §1, §2.
 [11] (2020) Reliable federated learning for mobile networks. IEEE Wireless Communications 27 (2), pp. 72–80. Cited by: §2.
 [12] (2020) Federated learning for edge networks: resource optimization and incentive mechanism. IEEE Communications Magazine 58 (10), pp. 88–93. Cited by: §1.
 [13] (2021) An incentive mechanism for federated learning in wireless cellular network: an auction approach. IEEE Transactions on Wireless Communications (Early Access). Cited by: §1, §2.
 [14] (1998) Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §6.
 [15] (2021) Towards federated learning in uavenabled internet of vehicles: a multidimensional contractmatching approach. IEEE Transactions on Intelligent Transportation Systems. Cited by: §1.
 [16] (2021) Decentralized edge intelligence: a dynamic resource allocation framework for hierarchical federated learning. IEEE Transactions on Parallel and Distributed Systems 33 (3), pp. 536–550. Cited by: §2.
 [17] (2020) Hierarchical incentive mechanism design for federated machine learning in mobile networks. IEEE Internet of Things Journal 7 (10), pp. 9575–9588. Cited by: §2.
 [18] (2020) Federated learning for 6g communications: challenges, methods, and future directions. China Communications 17 (9), pp. 105–118. Cited by: §1.
 [19] (2020) Communicationefficient federated learning for wireless edge intelligence in iot. IEEE Internet of Things Journal 7 (7), pp. 5986–5994. Cited by: §1, §2.
 [20] (2020) Joint auctioncoalition formation framework for communicationefficient federated learning in uavenabled internet of vehicles. IEEE Transactions on Intelligent Transportation Systems 22 (4), pp. 2326–2344. Cited by: §2.
 [21] (2018) A stackelberg game approach toward sociallyaware incentive mechanisms for mobile crowdsensing. IEEE Transactions on Wireless Communications 18 (1), pp. 724–738. Cited by: §1.
 [22] (2019) Client selection for federated learning with heterogeneous resources in mobile edge. In 2019 IEEE International Conference on Communications (ICC), pp. 1–7. Cited by: §1, §2.
 [23] (2020) A crowdsourcing framework for ondevice federated learning. IEEE Transactions on Wireless Communications 19 (5), pp. 3241–3256. Cited by: §1, §2.
 [24] (2020) Coded computing for lowlatency federated learning over wireless edge networks. IEEE Journal on Selected Areas in Communications 39 (1), pp. 233–250. Cited by: §1, §2.
 [25] (2012) Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences 109 (26), pp. 10409–10413. Cited by: §1, §3.3, §4, §4.
 [26] (1982) Evolution and the theory of games. Cambridge university press. Cited by: §5.2.
 [27] (2019) Federated learning over wireless networks: optimization model design and analysis. In 2019 IEEE Conference on Computer Communications (INFOCOM), pp. 1387–1395. Cited by: §1, §2.
 [28] (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 37 (6), pp. 1205–1221. Cited by: §1, §2.
 [29] (2019) Edge computing security: state of the art and challenges. Proceedings of the IEEE 107 (8), pp. 1608–1631. Cited by: §1.
 [30] (2019) Elfish: resourceaware federated learning on heterogeneous edge devices. arXiv preprint arXiv:1912.01684. Cited by: §1, §2.
 [31] (2020) Agebased scheduling policy for federated learning in mobile edge networks. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8743–8747. Cited by: §1, §2.
 [32] (2019) Scheduling policies for federated learning in wireless networks. IEEE transactions on communications 68 (1), pp. 317–333. Cited by: §1, §2.
 [33] (2020) Federated learning via overtheair computation. IEEE Transactions on Wireless Communications 19 (3), pp. 2022–2035. Cited by: §1, §2.
 [34] (2020) Federated learning in vehicular edge computing: a selective model aggregation approach. IEEE Access 8, pp. 23920–23935. Cited by: §1, §2.
 [35] (2020) Energyefficient radio resource allocation for federated edge learning. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1–6. Cited by: §1, §2.
 [36] (2020) An incentive mechanism design for efficient edge learning by deep reinforcement learning approach. In IEEE INFOCOM 2020  IEEE Conference on Computer Communications, Vol. , pp. 2489–2498. Cited by: §1,
Comments
There are no comments yet.