I Introduction
In recent years, Internet of Things (IoT) has grown rapidly and billions of smart devices are expected to be added over next few years. These devices generate a tremendous amount of data, from health information [celik2018soteria] to social networking [zeng2017end]
. Deep learning models use this data for training and enhancing intelligence of various data driven IoT applications. Most of the IoT devices connect to a central cloud platform to use cloud services. These cloud services are crucial for storage of the datasets and model learning. However, use of cloud services requires additional latency in real time applications. To overcome this issue, edge devices are used for local data training which also safeguards privacy of personal data. Unlike constrained IoT devices, such devices have the capability to support Machine Learning (ML) models and have been used in various applications. For example, video doorbell performs training on its local datasets, and identifies person at the door.
Deep learning models are often associated with the size of training dataset. Under a reasonable learning mechanism, more training data will enhance the accuracy and performance of a trained model. However, in the era of big data, data is often distributed and cannot be brought together due to personal privacy constraints. Collaborative Deep Learning (CDL) allows multiple IoT devices to train their models, without revealing associated personal data. CDL offers an attractive tradeoff between privacy and utility of data sets. Recent research [jiang2019lightweight, chen2019communication] have discussed the privacy issues of local training devices and the impact of communication latency between IoT edge devices and Parameter Server (PS). However, the strategic behavior of the rational local training devices have not been discussed in previous research, i.e., the authors have assumed that all IoT devices are altruistic. Altruistic devices are ones which always follow a suggested protocol (what all devices have decided to follow initially) regardless whether they are benefiting or losing by following this protocol. However, devices are not altruistic in real life, they are rational. Rational devices are the ones which will deviate from suggested protocol if they think that they will be benefited more by following a different protocol. In our proposed system model, we assume that all the mobile edge devices are rational.
A mobile edge device, which has low quality data, always wants to be a part of CDL to increase accuracy of its local model. Other mobile edge devices, who have high quality data, do not want to collaborate with low quality data holder mobile edge device. Therefore, there is a dilemma for mobile edge devices to participate or not in CDL. In this paper, we address this research problem of learner’s dilemma by proposing a general system model, a CDL game model, and a novel clusterbased fair strategy which enables each participant to cooperate in CDL based on the clusters formed to achieve overall benefit to itself in training the local ML model. We also evaluate our CDL game model and novel clusterbased strategy in smart home deployment using ARAS dataset[SmartHome]. The main contributions of this paper are as follows.

We identify the problem of unfair cooperation of participants in CDL. A local training device, which has low quality data builds its learning model to take advantage from other device, which has high quality data.

We address this research problem by analyzing the behavior of mobile edge devices using a gametheoretic model, where each device aims at maximizing the accuracy of its local model with minimal cost of participating in CDL.

We introduce a system model for CDL and propose a solution of above defined problem.

We also implement a clusterbased fair algorithm on ARAS dataset [SmartHome], and the results reflect that proposed solution elicit cooperation in CDL.
The rest of paper is organized as follows. Section II presents relevant work and related background. System model along with rational assumption is discussed in Section III. Game model and game analysis are explained in Section IV and Section V respectively. Section VI presents implementation of proposed system model along with results. Section VII concludes the paper with future research directions.
Ii Related Work
In this section, we describe related work on information leakage on deep learning models, privacypreserving deep learning and game models.
Iia Information leakage on Deep Learning Models
Information leakage of individuals’ private data has become a well known problem for deep learning models. Data masking techniques, such as pseudonymize and anonymize are used to prevent this problem. In pseudonymize, data can be traced back into its original state, whereas it becomes impossible to return data into its original state in anonymize. However, indirect reidentification could be possible in anonymize. For example, Netflix released a hundred million anonymized film ratings which was matched with the other dataset Internet Movie Database (IMDb).
Cloud platforms such as Google and Amazon offer various services ”AI Deep Learning”. Any customer can upload a dataset to use the service and pay to build a prediction model, which works as blackbox API. The membership inference attack on blackbox API is discussed in [shokri2017membership, yeom2018privacy]. An attacker asks queries to target the model and receives the model’s prediction. Rahman et al. [rahman2018membership] show that differentially private deep model can also fail against membership inference attack. A novel whitebox membership inference attack was proposed by Nasr et al. [nasr2018comprehensive], against deep learning algorithms to measure their training datasets membership leakage. Melis et al. [melis2019exploiting] demonstrate that the updated parameter leaks information of participants, thus develops passive and active inference attacks to exploit this leakage.
IiB PrivacyPreserving Deep Learning
Each participant has its own sensitive datasets, which needs to be protected the dataset from information leakage. Various privacy mechanisms, such as Secure Multiparty Communication (SMC) [kerschbaum2009practical], Homomorphic Encryption (HE) [rivest1978data], and Differential Privacy (DP) [dwork2014algorithmic] have been proposed to protect the datasets in CDL. SMC helps to protect intermediate steps of the computation. Mohassel et al. [mohassel2017secureml] adopt a twoserver model for privacypreserving training, used by previous work on privacypreserving deep learning via SMC [nikolaenko2013privacy1].
However, Aono et al. [aono2018privacy] pointed out that the local data may be actually leaked to an honestbutcurious server. Using additively HE techniques fix several problems and also have some drawbacks. To obscure an individual’s identity, DP adds mathematical noise to a small sample of the individual’s usage pattern. Prior work [abadi2016deep, jiang2019lightweight, shokri2015privacy, weng2018deepchain] use differential privacy on privacypreserving CDL system to protect privacy of training data. However, Hitaj et al. [hitaj2017deep]
pointed out that privacy preserving deep learning approach is failed to protect data privacy and demonstrated that a malicious participant can learn personal information of other participant through Generative Adversarial Network (GAN) learning.
The most dominant technique to optimize the loss function is Stochastic Gradient Descent (SGD). SGD is a method to find the optimal parameter configuration for a ML algorithm. SGD is applied in various privacypreserving deep learning models
[abadi2016deep, melis2019exploiting, mohassel2017secureml, nasr2018comprehensive]. PS receives the gradients from mobile edge devices by using different approaches like round robin, random order [shokri2015privacy], cosine distance [chen2018machine], time based [weng2018deepchain]. The server aggregates the received parameters using Federated Averaging algorithm [mcmahan2016communication], and weighted aggregation strategy [chen2018machine]. Federated averaging algorithm introduced for model averaging combines local SGD on each client with a server. It is robust for unbalanced and nonIID data distributions, and reduce rounds of communication needed to train a Deep Learning (DL) model.IiC Game Models
In prior academic research, game theory has been applied into data privacy game to analyze privacy and accuracy. Pejo et al. [pejo2018price] defined two player game, in which one is privacy concerned and other not. Esposito et al. [esposito2018securing] proposed a game model to analyze the interaction between a provider (global ML model) and a requester (local ML model) within a CDL model. In this literature, there have been various game models about privacyaccuracy tradeoff and energyefficient solution. However, to the best of our knowledge, there is no prior work to utilize game theory to analyze mobile edge devices’ rational behavior in a selfish environment. Therefore, we construct a game model for rational mobile edge devices in CDL and analyze the game.
Iii System Model
We first generically outline details of the CDL model, in which all edge computing devices, such as mobile phones and IoT devices are assumed to be altruistic. Then, we clarify the rationality assumptions of mobile edge devices in CDL model.
Iiia Collaborative Deep Learning Model
Figure 1 illustrates the main components of the system model. Consider there are N
number of mobile edge devices, and each mobile edge device is connected with multiple IoT devices. These IoT devices generate huge amount of data, which is used for training to build ML model. These devices train their data to build local models in a collaborative manner without compromising data privacy, which is beneficial for mobile edge devices and IoT devices. In our model, we assume that each mobile edge device maintains a local vector of neuralnetwork parameters,
. The PS maintains a separate parameter vector . Each edge device can initialize parameters (weights) where i=1,2,3,..N randomly or by downloading their latest values from PS. Each edge device trains a local model and optimize the loss value using SGD. Here, one weight sample is selected at random in each optimization step. This process continues until SGD converges to a local optimum. Let E be the error or loss value, i.e., the difference between the true value of the objective function and the computed output of the network, it can be based on norm or cross entropy. The backpropagation algorithm computes the partial derivate of E with respect to each parameter inand updates the parameter so as to reduce its gradient. We refer to one full iteration over all available input data as an epoch. All mobile edge devices train their local models simultaneously through PS.
(1) 
There is no need for any coordination among all local training devices. They can influence each other’s training indirectly, via PS. PS receives local gradients from each edge devices and aggregates them with global parameter w^{global}. After updating this global parameter, each participant downloads w^{global} parameter from PS and starts training based on global parameter. There are various scenarios to exchange the parameters from PS to mobile edge device. In this model, PS exchanges the parameters asynchronously, i.e. PS does not wait for all local gradients from all edge devices. When a participant trains his local model, others may update their parameters through PS. This process continues until the model achieves the desired output.
(2) 
IiiB Mobile Edge Device costs
We now characterize the costs (computation and communication costs) borne by mobile edge devices and IoT devices to their participation in CDL system. It should be noted that our goal is not to arrive at a precise quantification of these costs, rather to characterize them such that they could be used to analyze the strategic behavior of the devices while participating in CDL. The CDL system is basically grouped into two phases: (1) Training phase, and (2) Participating phase. During the training phase, each device builds a local model and initialize their weights to train the network. During training, each participant calculates its local gradients to upload on PS. PS aggregates all the local gradients and sends back to each device. The updated parameters are downloaded by each device and training process continues until loss value becomes negligible.
Thus, we can characterize the total cost for a mobile edge device to participate in an epoch to build ML model based on the cost for executing the above two phases. For the training phase, a mobile edge device bears a cost c^{plocal}, which is computation cost to build a local ML model. Another computation cost is c^{pglobal} for training a local model using updated global parameters. Accordingly, for executing the participation phase, a mobile edge devices bears another costs c^{m’} and c^{m}. The cost c^{m} is communication cost, where a mobile edge device uploads its parameters to PS, the cost c^{m’} is also communication cost, where a mobile edge device downloads the updated parameters from PS. The average per mobile edge devices cost for participation in each epoch of collaborative deep learning system can be characterized as
(3) 
One point that needs further clarification is why a participant may choose not to spend these costs c^{m}, c^{m’}, and c^{pglobal}. Our rationality assumption provides this clarification.
IiiC Rationality Assumption
Prior research in distributed DL [chen2018machine] have assumed a byzantine adversary where mobile edge devices or IoT devices controlled by adversary can be arbitrarily malicious, i.e. malicious participant could arbitrarily deviate from suggested protocol in CDL or could arbitrarily drop communication between mobile edge device and PS. However, here we assume that mobile edge devices and IoT devices are honest but they are selfish. In this setup, the notion of rationality means that a rational device choose to participate or not to maximize its profit in CDL.
Symbol  Definition 

Number of mobile edge devices  
Total number of IoT devices  
Batch size  
Numbers of local epoch  
Generated data from IoT device i  
Local gradient of participant i  
Global parameter  
Learning rate  
Privacy mechanism  
Loss value of participant i, train individually  
Loss value of participant i,train collaboratively  
Loss value of participant i, train individually on auxiliary dataset  
coefficient  
Computation cost to build a local model  
Computation cost to build a global model  
Communication cost to upload the parameters to PS  
Communication cost to download the parameters from PS  
Total cost for build a ML model  
Number of cooperative participants  
Number of defective participants  
Iv The Collaborative Deep Learning Game
We present a game model of CDL system with multiple mobile edge devices in a honest but selfish environment. We introduce a game model with players that we refer to as the collaborative learning game G. In this game, the edge devices send their local gradients to PS to learn a common objective without compromising the privacy of data. PS aggregates the gradients and creates a global model. This updated global model is downloaded by all participating edge devices, where exists a social dilemma for all defection behavior.
Iva Game Model
Game theory allows for modeling situations of conflict and for predicting the behavior of participants when they interact with each other. In our CDL game G, mobile edge devices who are connected with multiple IoT devices are participants, they interact with PS simultaneously without having any knowledge about each other. The Game G is a static game, because all participants must choose their strategy simultaneously. The Game G is a tuple , where is the set of players, is the set of strategies and is the set of payoff values.

Players (): The set of players corresponds to the set of mobile edge devices who received a common objective from PS to build its own local model in CDL game G.

Strategy (): Each participant can choose between two actions (i) Cooperative () or (ii) Defective (). Hence the set of strategies in this game is = {,}. Strategy of mobile edge devices determines whether participates in CDL. In particular, if a participant plays strategy, i.e., it will send its local gradients to PS and downloads updated parameters from PS to update its local model. Here, the participant pays total cost. In contrast, if a participant neither uploads its local gradients to PS nor downloads the updated global parameters from PS, i.e. the mobile edge devices plays strategy. Thus, participants saves its communication costs , and global computation cost . Here, this participant is not part of CDL and trains its local model individually.

Payoff (): At a high level, the players’ goal in CDL game G is to maximize their utility, which is a function of the loss value and its costs. In this work, we do not consider the adversarial aspect of players; hence, the gain includes only the accuracy improvements on the model for a particular player as a benefit while the costs are used for training the models and communication between participant and PS.
Here, the benefit and the cost are not on the same scale as the first depends on the loss value while the latter on cost. To make them comparable, we introduce a coefficient: the benefit is multiplied with B.
Now, we compute the payoff of mobile edge devices in this game. If we assume that the participant is cooperative, i.e. . Similarly, if is defective, i.e. , and these payoff can be defined as follows.
(4) 
(5) 
Based on the above calculated utilities, we analyze the game G as discussed in the next section.
V Game Analysis
In order to get an insight into strategic behavior of participants, we apply the most fundamental gametheoretic concept, known as Nash Equilibrium, introduced by John Nash [nash1951non].
Definition 1. A Nash Equilibrium is a concept of game theory where none of the players can unilaterally change their strategy to increase their payoff.
In other words, if in a noncooperative game all strategies are mutual best responses to each other, then no player has any motivation to deviate unilaterally from the given strategy one Nash Equilibrium strategy profile. For example, in any prisoners’ dilemma game, there is always a cooperative strategy and a defecting strategy. If both players use cooperative strategy, then it yields the best outcome for the players. If the players do not cooperate with one another, then they choose defecting strategy in the hope of attaining individual gain at the rival’s expense. In prisoners’ dilemma defecting strategy strictly dominates the cooperation strategy. Hence, the only Nash Equilibrium in prisoners’ dilemma, is a mutual defection.
Based on the cost and benefit of mobile edge devices to learn a neuralnetwork model, we build a oneshot CDL game model G. In the following theorems, we show that the game G is a public good game.
Theorem 1. In CDL game G, if a participant builds its local ML model, then G reduces to a public good game.
Proof. Let us consider all N participants follow defective strategy where all participants neither send their local gradients to PS nor download updated parameters from PS, i.e., no communication between mobile edge device and PS. So, participants do not pay any communication costs , , and global computation cost . Now each participant trains local data sets to build its ML model individually and minimizes its loss value . None of participants cannot change his strategy profile unilaterally. Let us assume if a participant deviates from defect strategy to cooperate strategy unilaterally, then participant will pay all these costs ( + + ). The payoff of cooperate strategy is less than defect strategy, so All is a Nash equilibrium profile and G is a public good game.
Theorem 2 further shows we can never enforce an all cooperate strategy in game G, and therefore, we could not establish a Nash Equilibrium.
Theorem 2. In CDL game G, if a participant builds its local ML model, then we cannot establish AllCooperation strategy profile as a Nash Equilibrium.
Proof. We first assume that all N participants have already cooperated in collaborative learning (i.e., all cooperate strategy profile) and payed communication costs as well as global computation cost. We can compute the payoff of each participant by Equation (4). Hence, if a participant deviates from the cooperation and play defection unilaterally, its payoff would be equal to Equation (5), which is always greater than cooperative payoffs at Equation (4). Hence, each participant has incentive to deviate unilaterally and increases its payoff. Then, the All cooperate strategy profile is never a Nash Equilibrium.
Va ClusterBased Representation
Each participant has loss values of all other participants, which is calculated on auxiliary dataset. Before the start of the game, each participant has to choose his strategy to play the collaborative game G. However, in the beginning of the game, the participant is not sure about his strategy, which will depend on other participant’s strategy. Therefore, all the participants are in dilemma to choose a strategy between and
. We solve this problem by proposing the clusterbased fair strategy algorithm. Kmeans clustering is an unsupervised ML technique, whose purpose is to segment a data set into K clusters. Each participant applies kmeans cluster algorithm on all loss values (onedimensional data).
Vi Numerical Analysis
To evaluate our proposed clusterbased fair strategy, we apply this novel strategy on smart home datasets.
Via Datasets
We use ARAS datasets to build smart home interaction model, since this data is available publicly [SmartHome]. ARAS dataset is real world dataset for activity recognition with ambient sensing. The living residents did not follow any specific rules to live in smart homes. This dataset contains two real smart home data with multiple residents for one month. It contains 3000 daily life activities captured by 26 million sensor readings in smart homes. This dataset also has ground truth labels for activities, which enables to develop a new sophisticated ML smart home interaction model.
ViB Experimental Setup
We simulate the proposed system model with N number of mobile edge devices associated with smart homes. IoT devices are connected with one mobile edge device in each smart home. We partitioned ARAS dataset unevenly into 10 participants. For unbalanced datasets setting, the data is sorted by class and divided into two cases: (a) low quality dataset, where the participant receives data partition from a single class, and (b) high quality, where participant receives data partition from 27 classes. Figure 2 shows unbalanced partitioning of the dataset, smart home1 generates high quality data (multiple IoT devices), while smart home10 generates low quality dataset (one IoT device). The following parameters are used for Algorithm 1 and 2: batch size K = 10 or 100, H = 1 or 3, = 0.01.
ViC Results
Our goal in this work is to design a mechanism for eliciting cooperation in CDL. Clusterbased fair strategy enforces participants for cooperation in CDL; however, Theorem 1 and 2 proves that participants defect in CDL. For the unbalanced datasets, the clusters of loss values are shown in figure 3. The results show that 80% participants collaborates with other participants, and 20% participants learns individually by choosing strategy in the game G.
Vii Conclusion and Future Work
In this paper, we presented a system model of CDL, and introduced the problem of strategic behavior of mobile edge devices in CDL system. We evaluated rationality of mobile edge devices in CDL using game theory model, a CDL game. We also evaluated the Nash Equilibrium (NE) strategy profile for each scenario, where the learning mobile edge devices are enforced to cooperate using our clusterbased fair strategy in CDL. We believe that this work is the first step towards a deeper understanding of the effect of noncooperative behavior in CDL. For future work, we plan to extend the model and evaluation to determine the accuracy of ML/DL model and to train our proposed model with other IoT datasets.
Comments
There are no comments yet.