Nothing Wasted: Full Contribution Enforcement in Federated Edge Learning

The explosive amount of data generated at the network edge makes mobile edge computing an essential technology to support real-time applications, calling for powerful data processing and analysis provided by machine learning (ML) techniques. In particular, federated edge learning (FEL) becomes prominent in securing the privacy of data owners by keeping the data locally used to train ML models. Existing studies on FEL either utilize in-process optimization or remove unqualified participants in advance. In this paper, we enhance the collaboration from all edge devices in FEL to guarantee that the ML model is trained using all available local data to accelerate the learning process. To that aim, we propose a collective extortion (CE) strategy under the imperfect-information multi-player FEL game, which is proved to be effective in helping the server efficiently elicit the full contribution of all devices without worrying about suffering from any economic loss. Technically, our proposed CE strategy extends the classical extortion strategy in controlling the proportionate share of expected utilities for a single opponent to the swiftly homogeneous control over a group of players, which further presents an attractive trait of being impartial for all participants. Moreover, the CE strategy enriches the game theory hierarchy, facilitating a wider application scope of the extortion strategy. Both theoretical analysis and experimental evaluations validate the effectiveness and fairness of our proposed scheme.



There are no comments yet.


page 12


Federated Learning in Mobile Edge Networks: A Comprehensive Survey

In recent years, mobile devices are equipped with increasingly advanced ...

Federated Learning for Physical Layer Design

Model-free techniques, such as machine learning (ML), have recently attr...

In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning

Recently, along with the rapid development of mobile communication techn...

Clustered Vehicular Federated Learning: Process and Optimization

Federated Learning (FL) is expected to play a prominent role for privacy...

Roadmap for Edge AI: A Dagstuhl Perspective

Based on the collective input of Dagstuhl Seminar (21342), this paper pr...

Toward Multiple Federated Learning Services Resource Sharing in Mobile Edge Networks

Federated Learning is a new learning scheme for collaborative training a...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The ubiquitous deployment of Internet-connected mobile devices leads to the amount of data generated at the network edge increasing exponentially, fostering the transformative computing paradigm, namely mobile edge computing [29]. According to a recent report, the global market of edge computing is $3.6 billion in 2020 and is anticipated to reach $15.7 billion by 2025 [6]. Facilitated by faster networking technologies such as 5G, edge computing becomes promising to support real-time applications, which calls for vigorous data processing and analysis capability at the edge. Thanks to the explosive growth of artificial intelligence, edge computing becomes more intelligent via implementing machine learning (ML) algorithms to achieve various functions such as classification and prediction.

However, since the data generated at the edge devices may be highly sensitive to the end users, it might be inappropriate to deploy conventional centralized ML algorithms which need to physically collect all training data from the devices. Federated learning (FL), a representative of distributed ML, turns into the aptest for edge computing, based on which the edge server and all connected devices accomplish training the same ML model in a collaborative manner, and thus this paradigm is also termed federated edge learning (FEL) [12, 15]. More specifically, no device explicitly uploads the generated data in FEL, but their data can still contribute to training the shared ML model by iterative local learning, global aggregating, and updating [18].

Within this collaboration system, the most challenging but critical issue is to guarantee that all participants cooperate tacitly. To fulfill this goal, two lines of research have been carried out, namely in-process [19, 8, 28, 38, 4, 33, 2, 27, 30, 24, 1, 35, 31, 32, 3] and in-advance [22, 10, 34, 37, 36, 23, 13] FEL optimization, with the former improving the FEL system performance via optimizing learning algorithms or communication configurations during the FEL process, while the latter achieving the desirable performance through designing effective schemes to better establish and maintain the FEL system by avoiding inefficiency before the FEL process begins. Usually, taking precautions can enhance the FEL system as a preparative, so in-advance optimization becomes more cost-efficient than checking for the leaks during the working process. The state-of-the-art accomplishes this objective via either device selection [22, 10, 34], which directly filters out unqualified devices, or incentive mechanism design [37, 36, 23], which relies on a strong assumption of perfect information in the Stackelberg game [21]. Nevertheless, in practice, we may not have enough devices that can afford the elimination, and the devices may not own the full knowledge about each other.

In this paper, we consider that an edge server and multiple devices collaborate in an FEL process repeatedly to optimize user experience in the long run. The server is the coordinator in charge of the whole FL process while the devices contribute their local learning results to obtain the globally trained model as a compensation at the end of each FL round111

We term the “round” in this paper as finishing a specific FL task and obtaining a well-trained ML model, instead of one time of local training in FL or an epoch in the traditional ML model training phase.

. Within the whole FL process, the local training is only visible to and manageable by individual devices, leaving the room for selfish behaviors of perfunctorily contributing to the FEL via training the ML model using partial local datasets. To suppress this phenomenon, we utilize the multi-player simultaneous game to model the interactions between the edge server and devices in an FEL system, where none of them has perfect information about others, and aim at eliciting the full contribution of devices from the perspective of the server, instead of intolerantly eliminating malicious devices. However, the tight coupling of action and utility in this game makes it a dilemma for the server to play against devices because recklessly changing behaviors can lead to the server a decreasing utility. This brings us a question: is it possible for the server to entice full contributions from the devices without concerning about its utility loss?

To answer this question, we resort to the extortion scheme which was first introduced as a special form of the zero-determinant (ZD) strategy [25]. By employing the extortion strategy, any player can independently control the proportion between the expected utility of itself and that of the opponent, which implies the potential to help the server control the utility in playing against devices. Nonetheless, the classical extortion strategy is derived for the two-player game, which is not applicable to our problem involving multiple players. In addition, it is clearly not efficient to directly carry it out between the server and every device in a one-by-one manner. To address this challenge, we put forward a collective extortion (CE) strategy, which can achieve the goal of effortlessly controlling the overall utility of all devices with only one-time setting for the server. What’s important, we comprehensively analyze the potential of the proposed CE strategy on enforcing the full cooperation of the devices, and further validate that it works impartially for all players with respect to utilities.

The main contributions are summarized as follows:

  • We model the interactions between the edge server and devices in FEL as a multi-player simultaneous game, based on which, for the first time, we derive the powerful CE strategy to efficiently control the relative utility proportion between the CE adopter and a group of opponents.

  • The proposed CE strategy can not only effectively suppress the selfish behaviors of devices in FEL via enforcing their full contributions, but also enrich the theoretical system of game theory through extending the original two-player extortion strategy to the multi-player situation, and thus enlarging its application scope.

  • We demonstrate the effectiveness and fairness of the proposed CE strategy on driving the full cooperation of the devices with both theoretical analysis and experimental evaluations, which benefits the long-term system stability and liveness.

The rest of this paper comprises the following six sections. Section 2 investigates the most related work in improving FEL performance and Section 3 introduces our problem formulation. In Section 4, we deduce the CE strategy for the multi-player situation, followed by the analysis on its potential to enforce the full contribution from the devices in Section 5. Experimental evaluations are presented in Section 6. And we conclude this paper in Section 7.

2 Related Work

Existing research focusing on enhancing the overall system performance of FEL can be classified into

in-process and in-advance optimization, depending on whether the operation steps lie in the FEL process or before that.

For the in-process optimization, researchers tried to improve the FEL performance via designing advanced learning algorithms [19, 8, 28, 38, 4, 33, 2, 27, 30, 24] or optimizing communication configurations [1, 35, 31, 32, 3]. In [19], Mills et al. proposed an adapting FedAvg algorithm based on the Adam optimization, which overcomes the shortcoming of the original FedAvg with longer convergence time in dealing with the non-independent identically distributed data generated in internet-of-things (IoT). Considering about the constrained resources of edge devices, Jiang et al. [8] proposed a scheme named PruneFL to adaptively adjust the model size for reducing training cost while maintaining comparable accuracy with the full model. To better control the global aggregation frequency in edge computing with limited resources, Wang et al. [28] theoretically analyzed the gradient descent convergence bound. Leveraging on the over-the-air computation, several studies [38, 4, 33, 2] achieved more efficient FL aggregation by taking advantage of the superposition of signals in the wireless multiple access channel. Tran et al. [27] considered the trade-off between computation and communication latency and that between learning time and energy consumption in FL for wireless networks via solving a non-convex optimization problem. To deal with the straggler concern in FEL, a framework named ELFISH was proposed in [30]

to achieve resource-aware learning via dynamically masking computation-intensive neurons, while Prakash

et al. designed CodedFedL [24] based on coded computing to inject structured redundancy in FL to compensate the negative impacts of straggling updates. On the other hand, aiming to facilitate the FEL from the perspective of communications, optimal resource allocation was investigated in [1, 35, 16] and various transmission scheduling policies were designed in [31, 32, 3, 20].

For the in-advance FEL performance optimization, there are several recent studies which mainly focus on device selection [22, 10, 34, 11] and incentive mechanism design [37, 36, 23, 13, 17]. In [22], to achieve the best learning result, a novel protocol was devised to select qualified devices according to their computational resource and communication conditions. In [10], Kang et al. proposed a reputation based mechanism for screening out reliable devices to obtain high-quality model updates in FEL using the contract theory. To facilitate vehicular edge learning, selectively collecting good local model updates was considered in [34] using the two-dimension contract theory. Besides, Zhan et al.

designed deep reinforcement learning (DRL) based incentive mechanisms for edge-based FL in

[37, 36], where the optimal pricing strategy of the aggregator and the best contribution strategy of the participants can be derived based on the hierarchical Stackelberg game. While Pandey et al. solved the incentive problem in FL with communication efficiency consideration using a crowdsourcing framework and the two-stage Stackelberg game for equilibrium analysis, Le et al. studied the incentive mechanism design for FL in wireless scenario via an auction game.

Relying on the power of taking precautions in enhancing the FEL performance, one can find that the existing studies either rigidly filter out unsatisfied devices or assume the availability of perfect information to implement, which can be impractical as there is hardly redundant number of participants or full knowledge about each other in FEL. To overcome these shortcomings, we utilize the multi-player simultaneous game to model the interactions between the edge server and devices with nobody having perfect knowledge of others, and then design an effective CE strategy to enforce the full contribution of selfish devices with fairness guaranteed.

3 Problem Formulation

3.1 System Model

As illustrated in Fig. 1, we consider an FEL system consisting of one edge server, denoted as , and a set of edge devices, denoted as . The system aims at providing better services to end users via conducting collaborative machine learning based on the data generated by all edge devices. Specifically, we assume that the FEL is conducted in a round-by-round manner, where the round is defined as accomplishing a certain learning task with the objective of training a global ML model with good performance. Each device joins a round of FEL task by contributing the locally learned results obtained through training the initial ML model using the local dataset for multiple iterations. As a compensation, the server, who works as the FEL coordinator, returns the final well-trained ML model to the participating devices once the current round of FEL task finishes.

However, some devices may behave selfishly by utilizing partial of his222We use “he” and “she” to respectively represent anyone of the devices and the server.

local data to conduct the local training of ML model, by which they can make extra profits, such as saving computational resources and using the rest of the data to further improve the final ML model only for themselves. This sort of malicious behavior comes to be difficult for the server to timely detect and prevent due to the following two reasons. First, the server has no access to the local datasets held by devices for directly acquiring their size information or training efforts; second, the data distribution of devices is usually skewed in FEL, making it impossible to infer the size information, either. In this case, the server may behave strategically via choosing to return or not return the final ML model to the devices, thus helping suppress the selfishness in an opportunistic way, which will be detailed in the next subsection.

For better understanding, we summarize major notations used in the following sections in Table I.

Fig. 1: The FEL system architecture.
Notation Explanation
The action of the server playing against device
The action of any device
The utility of the server
The utility of any device
The profit of the server
The cost of the server sending the final model to
The error of the final model
The profit of the device
The extra income of the device using partial data
TABLE I: Summary of Notations.

3.2 Game Formulation

It is clear that neither the server nor any device can know the action of each other when they make their own decisions, which can be exactly modeled by a multi-player simultaneous game. Even though it seems that there are only two types of players, i.e., the edge server and the device, the number of players involved in the decision making and outcome witnessing of this game is multiple. In particular, the number of devices playing against the server in this FEL scenario can be large, and every device has his own preference on game strategy selection and operates with independent system parameters related to their benefits and costs.

Formally, we define the server’s action of returning the final ML model to the device as cooperation () and the action of not sharing the well-trained ML model as defection (). For the device, we regard the action of conducting local learning using the full local dataset in a round of FEL as cooperation (), while the behavior of employing only partial local data for FEL training can be viewed as defection (). For clarity, we utilize to denote the action of the server playing against device and to express the action of device in this game. Thus, we have , where .

It is worth noting that in the case of , the specific amount of data utilized by each device during the FEL process can be heterogeneous from other peering devices. Here we treat any selfish behavior of not fully using the local data for model training as defection no matter how severe or slight this malicious action is. This qualitative consideration makes it easy for us to focus more on the elimination of devices’ undesirable activities in the subsequent quantitative modeling and algorithm design sections.

Given the above actions, we can define the utility function of device as


where are scale parameters, is the profit the device can obtain according to the server’s action of whether or not returning the final model, and represents the extra income that the device can make by not fully using his local data to train the model, such as the spared computation, communication, and energy resource consumption.

Here we have because the server’s returned final model can enable the device to provide more efficient service to the end user so as to increase the user’s satisfactory degree, which can be regarded as a higher payoff for the device. For easy expression, we use and to respectively represent and . Considering that the cooperation action of contributing to FEL based on the full dataset leaves no extra room for the device to make more profit, we assume . For the selfish behavior of using only partial local data for training the ML model, with denoting the percentage of device ’s dataset contributed to FEL333As is a parameter related to the personal preference of each device regarding being selfish, here we assume that

is a relatively stable value, not fluctuating drastically in the game rounds, which can be approximately estimated by the edge server through historical behaviors.

, we can define , where is a device-dependent positive constant indicating the heterogeneity of devices.

Next we define the utility of the server as


where are scalars; refers to the profit of the server gained from this round of FEL with the globally trained ML model and

denotes the action vector of the devices;

is the cost of the server to send device the final trained model. Since the final model returned to all devices is the same, the main cost of sending it to every device is assumed to be the same as an example here444For different costs of the server to send the final model to devices, the overall research methodology proposed in this paper can still be applied although the derivation details may vary., with , where as a positive scalar denotes the overall cost of the server, and .

The profit of the server obtained from the final model can be relatively complicated to depict, which is generally dependent on the specific ML model trained in the FEL system. In this paper, taking the convolutional neural network (CNN) based classifier as an example, we can describe

as follows:


where are positive scalars, and represents the classification error of the final trained model, jointly determined by the actions of all devices. Specifically, the server’s profit reaches the maximum if approaches zero; and if the error is too large, becomes very small. Inspired by the power-law function proposed in [5, 9], we can define an exemplary as


In the above equation, denotes the data size of device ; and are tuning scalars to depict the non-linear relationship between the classification error and the training data size, where the larger the total data size used for training, the smaller the error. Combining (3) and (4), one can find that the less the number of defective devices, the larger the effective global training dataset, the smaller the error, which results in the larger profit for the server. In the extreme case where all devices choose (or ), can reach the minimum (or maximum), and accordingly, turns to be the maximum (or minimum), denoted as (or ).

Note that for other ML model training tasks in FEL, we may propose different formulas to describe the profit function , but its main characteristics about all cooperative devices producing while all defective devices leading to will generally hold. Therefore, the overall analysis framework, as well as the subsequent full contribution enforcement scheme, can still work in a similar way.

Theorem 3.1.

The FEL system can form to function only when and .


To ensure that such an FEL system comprising one server and multiple devices functions well, the basic requirement is that all-cooperation behaviors can make it more beneficial than the case of all defection for any player. Otherwise, there is not enough motivation for any device or server to collaboratively participate in this FEL.

For device , the utility of the all-cooperation case is and that of all-defection is . The above requirement leads to , which is equivalent to .

Similarly, for the server, the utility with cooperation actions from all players is , while all defection results in the utility of . Thus the FEL system requires that , which equals . ∎

Based on the above definitions of utilities, we can formally define an FEL game as follows.

Definition 3.1 (FEL Game).

In the FEL system consisting of one server and devices, their interactions regarding whether to return the final model and whether to fully contribute to the learning process can be defined as a normal-form game with .

3.3 Dilemma in the FEL Game

In fact, there exists a defection dilemma in the FEL game, which can be summarized in the following theorem.

Theorem 3.2.

In the FEL game defined in 3.1, is the best action for any player.


For any rational player, the best action can be derived by comparing the utility values under situations of choosing and . For any device , the server’s action being or clearly affects his utility, thus the device can consider these two cases separately. If , his utility is , and since , there exists , which leads to his best action of . If , the device’s utility becomes , where the function enforces the best action for the device again. In other words, no matter what action the server takes, the best action of the device is to defect.

Similarly, for the server, no matter what the action vector of the devices is, the only factor affecting her utility that she can control is . Referring to (2), it can be concluded that only when the last item becomes zero can be maximized, which corresponds to . ∎

According to Theorem 3.2, one can observe that the individual optimal action in the game among the server and the devices is always , which means that the device always decides to take part in the FEL using partial dataset and the server never shares the final well-trained model to any device. This is obviously harmful for the overall benefit of the FEL system where the global model cannot be trained based on all generated data, leading to the reduced model performance. Thus, it becomes critical to solve this all-defection dilemma. Here we consider that the server is in charge of driving the cooperation from the devices due to the following two reasons. First, as the upper-level controller of the FEL system, the server hopes to obtain an optimal collaborative learning result, which becomes the motivation for her to get rid of this undesired situation; second, as the coordinator, the server can exert punishment to defective devices via not returning the final model, which indicates her capability to suppress malice.

To elicit full contributions from the devices, one intuitive solution for the server is to design cooperation incentive schemes, which usually costs more for the server to entice profit-driven devices. Thus, it is imperative to design a new scheme embedded in this multi-player game process while preventing any interest loss for the server. Referring to (2), one can observe that the utility of the server is collectively affected by the actions of all devices as well as herself. Thus, any reckless behavior change without a delicate plan would lead to undesired damage for the server, making it a critical challenge for the server to manage the behaviors of the devices without concerning her own utility. Inspired by [25], we find that the extortion mechanism, as a type of the zero-determinant (ZD) strategy, presents the merit of enabling the adopter to unilaterally control a proportional relationship between the expected utilities of two players, which implies the potential of helping solve the server’s challenge.

However, the conventional extortion strategy was originally developed for the two-player game, which is not directly applicable to our problem. Although one possible application idea is to carry it out between the server and each device, we can clearly notice the low efficiency of this one-by-one method. Thus, we resort to extending the extortion strategy to the multi-player scenario and name it as the collective extortion (CE) strategy, which will be elaborated in the next section.

4 Collective Extortion Strategy

As mentioned above, the classical extortion strategy derived in the two-player game cannot effectively fit in the FEL game scenario. In this section, we extend the two-player extortion strategy to the multi-player version, namely the CE strategy, which can solve the defection dilemma in the FEL game without suffering from the inefficiency of directly implementing the extortion strategy for each device.

To be specific, we aim to enable the server to collectively control the overall utilities of all devices so as to further drive their cooperation behaviors, so here we set the action of the server playing against all devices to be homogeneous, denoted as . Since there exist devices, the number of players in our FEL game is with each player choosing from two actions and . And thus there exist possible game results in total, which can be expressed as follows,

where denotes the -th game result.

In light of the conclusion in [25] that it is not disadvantageous for the short-memory player compared to the long-memory one, we assume that both the server and the devices have one-step memory and select their actions based on the game results in the last round. Thus, one can introduce the definitions of their mixed strategies as follows.

Definition 4.1 (Mixed Strategy of the Server).

The server’s mixed strategy is defined as with

denoting her conditional probability of choosing cooperation given the game result in the last round


Definition 4.2 (Mixed Strategy of the Device ).

The device ’s mixed strategy is defined as with denoting the conditional cooperation probability of device given the game result in the last round .

Accordingly, the defection probability of the server is and that of is , where . Then the Markov state transition matrix of this FEL game can be written as

where the element is the probability of transiting from the previous game result to the current one and can be defined as

In the above equation, and are calculated according to

where and denote the actions of the server and device in the round with game result , respectively. And they are assigned values according to

In other words, when the server’s behavior is in the current game result , it is the former part of functioning and thus ; otherwise, . Next, is derived in the same way according to the action of device .

Then we define a non-negative vector with the feature of

, denoting the probability distribution over all possible game results in the stable state. Since

is the transition matrix, we know that when the Markov process reaches the stable state, there exists , which equals with denoting the unit matrix and .

Let and be the adjugate and determinant operations on a matrix, respectively. According to the Cramer’s rule, there exists . Comparing it with the above equation, one can conclude that is proportional to every row of . Accordingly, the dot product of and any vector can be proportionally calculated by


where , denotes the -th column of , and “” represents the proportional relationship.

Next, in light of the fact that the elementary transformation on any matrix does not change its determinant value, we conduct column transformations on the matrix in (5). More specifically, we first locate that the -th column of this matrix refers to the game result of the server’s cooperation and all devices’ defection, i.e., , and the -th element in this column can be expressed as ; when adding all columns before the -th column to , we obtain the new form of this column as follows:

Then (5) can be written as

It is clear that the -th column is only related to the strategy of the server. Therefore, given any constant parameter , the server can adjust the strategy to meet the condition so as to achieve


because the -th column and the last one of the matrix are proportional to each other.

In fact, the above proportional value can be converted to a real value by normalizing on the value of , where denotes the all-one vector with the size of . In particular, the expected utility of the server, denoted by , and that of device , denoted by , can be calculated by

where and are respectively the utility vector of the server and that of device following the same order of game results and can be calculated according to (2) and (1).

Next, we can derive the CE strategy as follows.

Theorem 4.1.

By setting the strategy to satisfy


the server can enforce an extortionate relationship between their expected utilities


with being the extortion factor.


Given the expressions of and , the server can enforce a zero value for any linear combination of the expected payoffs based on (6). Particularly, if the server hopes to realize an extortionate share of expected utilities larger than the all-cooperation payoff , the server can set because the utility relationship is equivalent to . Accordingly, we can know that the server’s strategy needs to comply with . ∎

With a feasible strategy satisfying the above condition, the server can unilaterally control to ensure that her own expected utility difference to , i.e., the utility at all-cooperation state, is always times of the sum of all devices’ expected utility differences to . Based on the one-for-all feature of this strategy, we name it the collective extortion (CE) strategy. In fact, CE not only expands the application scope of the original extortion strategy from the two-player game to the multi-player game, but also is effective to solve the problem of full contribution stimulation which will be elaborated in the next section.

It is worth noting that the base values in the CE strategy, i.e., the subtrahends and in (8), can be other values as long as the strategy has feasible solutions to satisfy the corresponding condition similar to (7). For example, in the two-player game scenario, the original extortion strategy was proposed by using the payoffs at all-defection state as the base values [25], where the feasibility of the extortion strategy was analyzed accordingly; while in [7], the range of base values are demonstrated to be between the payoffs of all-defection and all-cooperation.

5 Full Contribution Enforcement Based on CE

As mentioned earlier, the server can fulfill an extortionate relationship between the expected utilities of herself and those of all devices via elaborately setting a CE strategy. In this section, we further explore the potential of this strategy in stimulating full cooperation of the devices so as to solve our problem defined in Section 3.

5.1 Feasibility of the CE Strategy

According to (7), one can get the server’s strategy as

Given a certain , its feasibility is dependent on the utility vectors of the server and the devices. Denote and , . Considering that , the constraints of the utility vectors vary in the following two cases:

Case 1: .

Case 2: .

5.2 Potential of the CE Strategy to Drive Devices’ Cooperation

Under a feasible CE strategy adopted by the server, one can analyze its potential of driving the devices to fully utilize their local datasets in FEL. To that aim, we first assume that each device in the FEL game searches for the best response strategy in an evolutionary manner. The reason is that the device lacks the global game information compared to the server who can interact with all devices in the game. Here we assume that a device adjusts his strategy with the goal of improving his own utility regardless of the strategy or utility of the server. Inspired by [26], we define the following strategy evolving path for an evolutionary device555Here we discuss one of the devices as a representative and thus omit the subscript for brevity. with denoting his cooperation probability at round ,


where refers to the expected utility of cooperation, and represents the total expected utility. With denoting the expected utility of defection, the total expected utility can be calculated by


Referring to the right side of (9), we can find that the numerator is a part of the denominator, resulting in .

To investigate whether the proposed CE strategy can drive the full cooperation of devices, we need to study the condition of increasing. According to (9), we can find that only when can the cooperation probability of the device increase in the next round. Combining it with (10), we can derive the sufficient condition of the CE strategy being able to enforce the device become more cooperative as follows:

for . In fact, in the case of , there exists according to (10) and thus is always 1, which never requires any function of the CE strategy.

Therefore, in the following, we focus on solving the problem of when , can the CE strategy function to elicit the cooperation from the device? Referring to the above-derived sufficient condition, we can find that this problem turns to be whether the CE strategy can lead to .

Recalling the power of the CE strategy presented in Section 4, the server’s strategy works on the whole set of devices according to (8) and (7). To study the effect of the CE strategy on any individual device, we consider two possible situations of devices in the FEL game:

  1. Devices are homogeneous using the same strategy and receiving the same utility;

  2. Devices are heterogeneous with various strategies and utilities.

Then, for both situations, we can demonstrate that the device tends to cooperate under the server’s CE strategy, which are respectively presented in the following theorems.

Theorem 5.1.

In the case of all devices with the same strategy and utility, the server utilizing the CE strategy can enforce any evolutionary device to obtain the cooperation probability .


For situation S1 where all devices involved in FEL are homogeneous, since everyone uses the same strategy and the server exerts one uniform strategy to all of them as well, we study the cooperation probability of any device here as an representative. According to (8), we can derive the expected utility of the device as


Next, we analyze the expected utilities of the evolutionary device with different actions, i.e., and . More specifically, when the device takes the cooperation action, the server’s expected utility depends on her own action, where leads to while results in according to (2). Based on the above equation (11), one can find that the server’s expected utility brings two possible payoffs for the device, which are

Assuming that the cooperation of the server at round is , the expected cooperation payoff of the device can be calculated by


While the device chooses the action , the expected utility of the server would become and for and , respectively. And according to (11), the device’s payoff can be

Thus, the expected defection payoff of the device turns to be


Since , there clearly exists and , which leads to by comparing (5.2) and (5.2), thus concluding the proof of the theorem. ∎

Theorem 5.2.

In the case of all devices with different strategies and utilities, the server’s CE strategy can drive an evolutionary device to get .


Given the heterogeneous devices in situation S2, to focus on the behavior of (any) one specific device , we assume that the strategies of other devices are given fixed, and thus their expected utilities are also certain values. To comply with (8), we denote

Then the expected utility of in this case turns to be

Similar to the proof of Theorem 5.1, we can calculate according to , where

For the calculation of , we have

Due to the same reason of , we can obtain and in this situation as well, resulting in , which can lead to the gradual increase of until approaching to 1. ∎

From the above two theorems, we can tell that the CE strategy can theoretically incentivize the final cooperation of any device involving in the FEL game with an evolutionary mindset no matter in the homogeneous or heterogeneous device settings. In other words, devices can usually be driven to participate in the FEL process with fully using their local datasets and contributing to the global learning without any reservation.

5.3 Fairness of the CE Strategy

Given the vigorous force of the CE strategy in stimulating devices’ collaboration, one may concern about what if the server behaves defectively via not returning the final well-trained model to the devices so as to save sending cost for obtaining a better utility? This question will be investigated in detail as follows.

According to Theorems 5.1 and 5.2, the final actions of the devices become cooperation as the number of game rounds increases. Nevertheless, the server can still select her action from and . However, according to the following theorem, one can see that the best action for the server with the CE strategy to keep the long-term stability is to eventually choose .

Theorem 5.3.

The final action of the server adopting the CE strategy is cooperation.


After enough number of FEL game rounds, devices choose cooperation eventually. Then the server’s cooperation can bring the cooperative device the utility with the game result , and her defection action can make the cooperative device obtain the utility at the game result .

Referring to (8), one can find that the cooperative server forming the game state can still make it hold stably since the right side turns out to be zero with in the long run. However, if the server chooses to be defective constantly, the right side of (8) would become negative because the device’s utility in this case is , which is less than due to , and thus there exists . This is clearly unfavorable for a reasonable server. Thus, the best action for the server is also cooperation in the long run. ∎

Based on the above theorem, we can conclude that our proposed CE strategy employed by the server is fair for all players, which would result in all cooperation and bring the same-level utility to the server and devices.

6 Experimental Evaluation

In this section, we conduct a series of experiments to demonstrate the effectiveness of the proposed CE strategy in eliciting full cooperation from all devices in the FEL game and other attractive features mentioned in the previous section. The machine used for simulation experiments is a desktop computer with a 3.59 GHz 6-Core processor and 16 GB memory. In all experiments, we fix the number of devices

. Scalar parameters for devices are randomly set following uniform distributions with

, and .

While for the server, the parameter values independent of the ML model are firstly set as . To appropriately set the parameters related to the profit function

which is closely depending on the specific ML task, we utilize the MNIST database

[14] using 6,000 data samples to train a 2-layer CNN classifier, where each device is assumed to generate 750 samples in non-iid manner. The obtained fitting parameters in (4) are and with 95% confidence, and . Further, we fix in (3) and obtain the extreme values of as and . Note that we also test other sets of parameter values satisfying the requirements shown in Theorem 3.1 and Section 5.1, but we obtain similar results which are omitted for brevity. Besides, each experiment is repeated 20 times to obtain the average for statistical confidence.

6.1 Effectiveness of the CE strategy to enforce full cooperation

To figure out whether the proposed CE strategy adopted by the server can enforce full cooperation from any evolutionary device as theoretically proved in Section 5.2, we compare it with four classical strategies, namely ALLC (all cooperation), ALLD (all defection), TFT (tit-for-tat), and WSLS (win-stay-lose-shift). The first two strategies are easy to understand where the server stays constantly cooperative or defective. The TFT strategy means that the server behaves according to the device’s previous action while in WSLS the server keeps on choosing an action if it brings a high utility and switches to the other action otherwise.

Taking the first device as an example, we report the comparative experiment results in Fig. 2, where his initial cooperation probability varies as to indicate the robustness of our proposed CE strategy. It is clear that no matter how cooperative the device is at the beginning, the server adopting the CE strategy can elicit the final cooperation of the evolutionary device. As increases, the time consumption to achieve the stable state is less. This is because the more cooperative the device, the easier to drive his full cooperation. It is clear that other strategies cannot achieve this goal as all of them lead to the cooperation probability approaching zero finally.

(a) .
(b) .
(c) .
(d) .
Fig. 2: Cooperation probability dynamics of the evolutionary device given different strategies adopted by the server.

6.2 Fairness of the CE strategy

Next, we explore whether the CE strategy is fair for both the server and the devices. We compare their utilities at the stable state in five cases where the server adopts different strategies. Specifically, we set the initial cooperation probability of a device as in this experiment and present the experimental results in Fig. 3. It is worth noting that since the utility of the server and that of the device are different in values according to the definitions in (1) and (2), we utilize a metric termed relative utility, which is calculated by the ratio of the actual utility to the utility at the all-cooperation state, to study the fairness of each strategy.

According to Fig. 3, one can find that only the proposed CE strategy can achieve almost the same relative utility level for both the server and the device, which approximately equals 1, indicating that both of them obtain the stable utility with the value equivalent to the utility when all cooperate, i.e., and . This clearly demonstrates the fairness of the CE strategy in incentivizing the full cooperation of all devices. For other cases, one can find that the ALLC strategy makes the server suffer from a severe loss since the evolutionary device can strategically exploit her friendliness and behave defectively to obtain a higher utility. The ALLD and TFT strategies lead to similar results for them where the server gains slightly less than the device. This is because the server cannot be fully exploited with ALLD and TFT strategies but the device in these cases will not be driven to cooperate, and thus both of them obtain less profit compared to the situation where the server adopts the CE strategy. The WSLS strategy also makes the server acquire less but performs better than the case of ALLC.

Fig. 3: Stable relative utilities of the server and the device given different strategies adopted by the server.

Knowing that the CE strategy can lead to full cooperation of any evolutionary device and achieve almost the same level of stable utilities for both sides, we continue to investigate the dynamics of utility changing with time. In Figs. 4 and 5, we first plot both utilities at the stable state with four initial device cooperation probabilities, and then depict the dynamic change of the utilities in each round with each reflecting one case of . It can be observed that brings no difference to the stable utilities as shown in the bar graph, while the dynamics of utilities varies according to the device’s initial cooperation probability. More specifically, with the increase of , the utilities of both sides can converge faster. In other words, the more cooperative the devices at the beginning, the quicker they can reach the stable state, which is coincident with the results of cooperation probability’s evolution presented in Fig. 2.

Fig. 4: Relative utilities of the server and the device at the stable state.
(a) .
(b) .
(c) .
(d) .
Fig. 5: Relative utilities of the server and the device in dynamic.

Further, we study the dynamics of utilities with the server adopting four other classical strategies and report the experimental results in Fig. 6. One can find that four classical strategies bring different evolution utility paths, especially at the beginning, but all of them converge to the stable result in which the server obtains less utility than the device, which cannot meet the server’s expectation.

(a) ALLC.
(b) ALLD.
(c) TFT.
(d) WSLS.
Fig. 6: Dynamic relative utilities of the server and the device given different strategies adopted by the server.

6.3 Impacts of the extortion factor

As can be observed in Section 4, the extortion factor in (8) plays an important role in affecting the degree of utility difference between the server and all devices. To uncover the impact of on the FEL game, we investigate the changing trend of the cooperation probability from any device and the corresponding utility evolution dynamics with different extortion factor in this section, where the initial cooperation probability of the device is set as . Detailed experimental results are respectively reported in Figs. 7 and 8.

According to Fig. 7, we can observe that the higher the extortion factor, the longer time is needed for the device becoming fully cooperative. Taking the case of as an example, the convergence round of realizing is about 10; while for , the cooperation probability of the device converges to 1 after 50 rounds. This phenomenon suggests that even though the server can relatively dominate in the FEL game using the CE strategy, it is not a wise choice for her to enforce severely imbalance expected utilities since the time consumption for eliciting the cooperation from devices can be large.

With respect to the impact of on the utilities of the server and the device, we can have some clues from Fig. 8. Although the specific evolution paths of the instant utilities are different with varying , the stable results are the same where each player obtains the utility of mutual cooperation. This outcome implies that the extortion factor in the CE strategy has few impact on the utilities that each player can obtain at the stable state. The underlying reason is that the power CE strategy can drive the device to fully collaborate given any , which leads to mutual cooperation and thus the same level of relative utilities for all players. In fact, this consequence is also complying with the fairness feature of the CE strategy as presented earlier.

Fig. 7: Cooperation probability dynamics of the evolutionary device for different .
(a) .
(b) .
(c) .
(d) .
Fig. 8: Relative utilities of the server and the device in dynamic with varying .

7 Conclusion

In this paper, we investigate the problem of optimizing the FEL system performance via eliminating the selfish device behaviors. Specifically, we model the interactions between the edge server and the devices as a multi-player simultaneous game, based on which we derive a CE strategy to collectively control the proportional relationship between the utility of the server and that of the devices. Based on this CE strategy, the server can efficiently enforce full contribution of all devices without concerning about her utility, which is both theoretically analyzed and experimentally evaluated. Essentially, the proposed CE strategy is impartial for both the adopter and the opponents, indicating its liveness to maintain the stability of the FEL systems.

In the future, we plan to examine the efficiency and scalability of the proposed game-theoretic scheme in playing against selfish devices in FEL. Besides, we will explore more intelligent solutions about countering other malicious behaviors of devices in FEL, where dynamically joining and leaving the learning process will be discussed to describe more realistic scenarios.


  • [1] M. S. H. Abad, E. Ozfatura, D. Gunduz, and O. Ercetin (2020) Hierarchical federated learning across heterogeneous cellular networks. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8866–8870. Cited by: §1, §2.
  • [2] J. Ahn, O. Simeone, and J. Kang (2019) Wireless federated distillation for distributed edge learning with heterogeneous data. In 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–6. Cited by: §1, §2.
  • [3] M. M. Amiri, D. Gündüz, S. R. Kulkarni, and H. V. Poor (2020) Update aware device scheduling for federated learning at the wireless edge. In 2020 IEEE International Symposium on Information Theory (ISIT), pp. 2598–2603. Cited by: §1, §2.
  • [4] M. M. Amiri and D. Gündüz (2020)

    Machine learning at the wireless edge: distributed stochastic gradient descent over-the-air

    IEEE Transactions on Signal Processing 68, pp. 2155–2169. Cited by: §1, §2.
  • [5] I. Chen, F. D. Johansson, and D. Sontag (2018) Why is my classifier discriminatory?. In Advances in Neural Information Processing Systems, pp. 3539–3550. Cited by: §3.2.
  • [6] Edge computing market. Note: /Market-Reports/edge-computing-market-133384090.htmlAccessed: 2020-07-30 Cited by: §1.
  • [7] D. Hao, K. Li, and T. Zhou (2018) Payoff control in the iterated prisoner’s dilemma. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 296–302. Cited by: §4.
  • [8] Y. Jiang, S. Wang, B. J. Ko, W. Lee, and L. Tassiulas (2019) Model pruning enables efficient federated learning on edge devices. arXiv preprint arXiv:1909.12326. Cited by: §1, §2.
  • [9] M. Johnson, P. Anderson, M. Dras, and M. Steedman (2018) Predicting accuracy on large datasets from smaller pilot data. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 450–455. Cited by: §3.2.
  • [10] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang (2019) Incentive mechanism for reliable federated learning: a joint optimization approach to combining reputation and contract theory. IEEE Internet of Things Journal 6 (6), pp. 10700–10714. Cited by: §1, §2.
  • [11] J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani (2020) Reliable federated learning for mobile networks. IEEE Wireless Communications 27 (2), pp. 72–80. Cited by: §2.
  • [12] L. U. Khan, S. R. Pandey, N. H. Tran, W. Saad, Z. Han, M. N. Nguyen, and C. S. Hong (2020) Federated learning for edge networks: resource optimization and incentive mechanism. IEEE Communications Magazine 58 (10), pp. 88–93. Cited by: §1.
  • [13] T. H. T. Le, N. H. Tran, Y. K. Tun, M. N. Nguyen, S. R. Pandey, Z. Han, and C. S. Hong (2021) An incentive mechanism for federated learning in wireless cellular network: an auction approach. IEEE Transactions on Wireless Communications (Early Access). Cited by: §1, §2.
  • [14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §6.
  • [15] W. Y. B. Lim, J. Huang, Z. Xiong, J. Kang, D. Niyato, X. Hua, C. Leung, and C. Miao (2021) Towards federated learning in uav-enabled internet of vehicles: a multi-dimensional contract-matching approach. IEEE Transactions on Intelligent Transportation Systems. Cited by: §1.
  • [16] W. Y. B. Lim, J. S. Ng, Z. Xiong, J. Jin, Y. Zhang, D. Niyato, C. Leung, and C. Miao (2021) Decentralized edge intelligence: a dynamic resource allocation framework for hierarchical federated learning. IEEE Transactions on Parallel and Distributed Systems 33 (3), pp. 536–550. Cited by: §2.
  • [17] W. Y. B. Lim, Z. Xiong, C. Miao, D. Niyato, Q. Yang, C. Leung, and H. V. Poor (2020) Hierarchical incentive mechanism design for federated machine learning in mobile networks. IEEE Internet of Things Journal 7 (10), pp. 9575–9588. Cited by: §2.
  • [18] Y. Liu, X. Yuan, Z. Xiong, J. Kang, X. Wang, and D. Niyato (2020) Federated learning for 6g communications: challenges, methods, and future directions. China Communications 17 (9), pp. 105–118. Cited by: §1.
  • [19] J. Mills, J. Hu, and G. Min (2020) Communication-efficient federated learning for wireless edge intelligence in iot. IEEE Internet of Things Journal 7 (7), pp. 5986–5994. Cited by: §1, §2.
  • [20] J. S. Ng, W. Y. B. Lim, H. Dai, Z. Xiong, J. Huang, D. Niyato, X. Hua, C. Leung, and C. Miao (2020) Joint auction-coalition formation framework for communication-efficient federated learning in uav-enabled internet of vehicles. IEEE Transactions on Intelligent Transportation Systems 22 (4), pp. 2326–2344. Cited by: §2.
  • [21] J. Nie, J. Luo, Z. Xiong, D. Niyato, and P. Wang (2018) A stackelberg game approach toward socially-aware incentive mechanisms for mobile crowdsensing. IEEE Transactions on Wireless Communications 18 (1), pp. 724–738. Cited by: §1.
  • [22] T. Nishio and R. Yonetani (2019) Client selection for federated learning with heterogeneous resources in mobile edge. In 2019 IEEE International Conference on Communications (ICC), pp. 1–7. Cited by: §1, §2.
  • [23] S. R. Pandey, N. H. Tran, M. Bennis, Y. K. Tun, A. Manzoor, and C. S. Hong (2020) A crowdsourcing framework for on-device federated learning. IEEE Transactions on Wireless Communications 19 (5), pp. 3241–3256. Cited by: §1, §2.
  • [24] S. Prakash, S. Dhakal, M. R. Akdeniz, Y. Yona, S. Talwar, S. Avestimehr, and N. Himayat (2020) Coded computing for low-latency federated learning over wireless edge networks. IEEE Journal on Selected Areas in Communications 39 (1), pp. 233–250. Cited by: §1, §2.
  • [25] W. H. Press and F. J. Dyson (2012) Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences 109 (26), pp. 10409–10413. Cited by: §1, §3.3, §4, §4.
  • [26] J. M. Smith and J. M. M. Smith (1982) Evolution and the theory of games. Cambridge university press. Cited by: §5.2.
  • [27] N. H. Tran, W. Bao, A. Zomaya, N. M. NH, and C. S. Hong (2019) Federated learning over wireless networks: optimization model design and analysis. In 2019 IEEE Conference on Computer Communications (INFOCOM), pp. 1387–1395. Cited by: §1, §2.
  • [28] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan (2019) Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications 37 (6), pp. 1205–1221. Cited by: §1, §2.
  • [29] Y. Xiao, Y. Jia, C. Liu, X. Cheng, J. Yu, and W. Lv (2019) Edge computing security: state of the art and challenges. Proceedings of the IEEE 107 (8), pp. 1608–1631. Cited by: §1.
  • [30] Z. Xu, Z. Yang, J. Xiong, J. Yang, and X. Chen (2019) Elfish: resource-aware federated learning on heterogeneous edge devices. arXiv preprint arXiv:1912.01684. Cited by: §1, §2.
  • [31] H. H. Yang, A. Arafa, T. Q. Quek, and H. V. Poor (2020) Age-based scheduling policy for federated learning in mobile edge networks. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8743–8747. Cited by: §1, §2.
  • [32] H. H. Yang, Z. Liu, T. Q. Quek, and H. V. Poor (2019) Scheduling policies for federated learning in wireless networks. IEEE transactions on communications 68 (1), pp. 317–333. Cited by: §1, §2.
  • [33] K. Yang, T. Jiang, Y. Shi, and Z. Ding (2020) Federated learning via over-the-air computation. IEEE Transactions on Wireless Communications 19 (3), pp. 2022–2035. Cited by: §1, §2.
  • [34] D. Ye, R. Yu, M. Pan, and Z. Han (2020) Federated learning in vehicular edge computing: a selective model aggregation approach. IEEE Access 8, pp. 23920–23935. Cited by: §1, §2.
  • [35] Q. Zeng, Y. Du, K. Huang, and K. K. Leung (2020) Energy-efficient radio resource allocation for federated edge learning. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1–6. Cited by: §1, §2.
  • [36] Y. Zhan and J. Zhang (2020) An incentive mechanism design for efficient edge learning by deep reinforcement learning approach. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, Vol. , pp. 2489–2498. Cited by: §1,