A Crowdsourcing Framework for On-Device Federated Learning

11/04/2019
by   Shashi Raj Pandey, et al.
17

Federated learning (FL) rests on the notion of training a global model in a decentralized manner. Under this setting, mobile devices perform computations on their local data before uploading the required updates to improve the global model. However, when the participating clients implement an uncoordinated computation strategy, the difficulty is to handle the communication efficiency (i.e., the number of communications per iteration) while exchanging the model parameters during aggregation. Therefore, a key challenge in FL is how users participate to build a high-quality global model with communication efficiency. We tackle this issue by formulating a utility maximization problem, and propose a novel crowdsourcing framework to leverage FL that considers the communication efficiency during parameters exchange. First, we show an incentive-based interaction between the crowdsourcing platform and the participating client's independent strategies for training a global learning model, where each side maximizes its own benefit. We formulate a two-stage Stackelberg game to analyze such scenario and find the game's equilibria. Second, we formalize an admission control scheme for participating clients to ensure a level of local accuracy. Simulated results demonstrate the efficacy of our proposed solution with up to 22

READ FULL TEXT VIEW PDF

Authors

page 11

page 16

page 18

page 21

page 25

page 27

page 32

page 33

10/16/2021

Incentivize to Build: A Crowdsourcing Framework for Federated Learning

Federated learning (FL) rests on the notion of training a global model i...
11/29/2021

Crowdsourcing-based Multi-Device Communication Cooperation for Mobile High-Quality Video Enhancement

The widespread use of mobile devices propels the development of new-fash...
12/08/2020

RC-SSFL: Towards Robust and Communication-efficient Semi-supervised Federated Learning System

Federated Learning (FL) is an emerging decentralized artificial intellig...
03/26/2021

Prior-Independent Auctions for the Demand Side of Federated Learning

Federated learning (FL) is a paradigm that allows distributed clients to...
11/29/2021

FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization

The underlying assumption of recent federated learning (FL) paradigms is...
05/20/2022

FedNoiL: A Simple Two-Level Sampling Method for Federated Learning with Noisy Labels

Federated learning (FL) aims at training a global model on the server si...
08/24/2021

Data-Free Evaluation of User Contributions in Federated Learning

Federated learning (FL) trains a machine learning model on mobile device...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Background and motivation

Recent years have admittedly witnessed a tremendous growth in the use of Machine Learning (ML) techniques and its applications in mobile devices. On one hand, according to International Data Corporation, the shipments of smartphones reached 3 billions in 2018 [2], which implies a large crowd of mobile users generating personalized data via the interaction with mobile applications, or with the use of in-built sensors (e.g., cameras, microphones and GPS) exploited efficiently by mobile crowdsensing paradigm (e.g., for indoor localization, traffic monitoring, navigation [3], [4], [5], [6]

). On the other hand, mobile devices are getting empowered extensively with specialized hardware architectures and computing engines such as the CPU, GPU and DSP (e.g., energy efficient Qualcomm Hexagon Vector eXtensions on Snapdragon 835

[7]

) for solving diverse machine learning problems. Gartner predicts that 80 percent of smartphones will have on-device AI capabilities by 2022. With dedicated chipsets, it will empower smartphone makers to achieve market gain by offering more secured facial recognition system, the ability to understand user behaviors and offer predictive future

[8]. This means on-device intelligence will be ubiquitous!

In the backdrop to these exciting possibilities with on-device intelligence, a White House report on principle of data minimization had been published in 2012 to advocate the privacy of consumer data [9]. The direct application of this is the ML technique that leaves the training data distributed on the mobile devices, called Federated Learning [7], [10], [11], [12], [13]. This technique unleashes a new collaborative ecosystem in ML to build a shared learning model while keeping the training data locally on user devices, which complies with the data minimization principle and protects user data privacy. Unlike the conventional approaches of collecting all the training data in one place to train a learning model, the mobile users (participating clients) perform computation for the updates on their local training data with the current global model parameters, which are then aggregated and broadcasted back by the centralized coordinating server. This is an iterative process that undergoes until an accuracy level of the learning model is reached. By this way, FL decouples the training process to learn a global model by eliminating the mobility of local training data.

In another report, research organizations estimate that over 90% of the data will be stored and processed locally

[14] (e.g., at the network edge), which provides an immense exposure to extract the benefits of FL. Also, because of the huge market potential of the untapped private data, FL is a promising tool to exploit more personalized service oriented applications.

Local computations at the devices and their communication with the centralized coordinating server are interleaved in a complex manner to build a global learning model. Therefore, having a communication-efficient FL framework [12], [15] requires solving several challenges. Furthermore, because of small data samples per device to train a high-quality learning model, the difficulty is to incentivize a large number of mobile users to ensure cooperation. This important aspect in FL has been overlooked so far, where the question is how can we motivate a number of participating clients, collectively providing a large number of data samples to enable FL without sharing their private data? Note that, both participating clients and the server can benefit from training a global model. However, to fully reap the benefits of high-quality updates, the MEC has to stimulate clients for participation. In particular, under heterogeneous scenarios, such as an adaptive and cognitive-communication network, client’s participation in FL can spur collaboration and provide benefits for operators to accelerate and deliver network-wide services [16]. Similarly, clients in general are not concerned with the reliability and scalability issues of FL [17]. Therefore, to incentivize users to participate in the collaborative training, we require a market place. For this purpose, we present a value-based compensation mechanism to the participating clients, such as a bounty (e.g., data discount package), as per their level of participation in the crowdsourcing framework. This is reflected in terms of local accuracy level, i.e., quality of solution to the local subproblem

, in which the framework will protect the model from imperfect updates by restricting the clients trying to compromise the model (for instance, with skewed data because of its i.i.d nature or data poisoning)

[3].”

The goal of this paper is two-fold: First, we formalize an incentive mechanism to develop a participatory framework for mobile clients to perform FL for improving the global model. Second, we address the challenge of maintaining communication efficiency while exchanging the model parameters with a number of participating clients during aggregation. Specifically, communication efficiency in this scenario accounts for communications per iteration with an arbitrary algorithm to maintain an acceptable accuracy level for the global model.

I-B Contributions

In this work, we design and analyze a novel crowdsourcing framework to realize the FL vision. Specifically, our contributions are summarized as follows:

  • A crowdsourcing framework to enable communication-efficient FL. We design a crowdsourcing framework, in which FL participating clients iteratively solve the local learning subproblems for an accuracy level subject to an offered incentive. We then establish a communication-efficient cost model for the participating clients. We then formulate an incentive mechanism to induce the necessary interaction between the multi-access edge computing (MEC) server and the participating clients for the FL in Section IV.

  • Solution approach using Stackelberg game. With the offered incentive, the participating clients independently choose their strategies to solve the local subproblem for a certain accuracy level in order to minimize their participation costs. Correspondingly, the MEC server builds a high quality centralized model characterized by its utility function, with the data distributed over the participating clients by offering the reward. We exploit this tightly coupled motives of the participating clients and the MEC server as a two-stage Stackelberg game. The equivalent optimization problem is characterized as a mixed-boolean programing which requires an exponential complexity effort for finding the solution. We analyze the game’s equilibria and propose a linear complexity algorithm to obtain the optimal solution.

  • Participant’s response analysis and case study

    . We next analyze the response behavior of the participating clients via the solutions of the Stackelberg game, and establish the efficacy of our proposed framework via case studies. We show that the linear-complexity solution approach attains the same performance as the mixed-boolean programming problem. Furthermore, we show that our mechanism design can achieve the optimal solution while outperforming a heuristic approach for attaining the maximal utility with up to

    of gain in the offered reward.

  • Admission control strategy. Finally, we show that it is significant to have certain participating clients to guarantee the communication efficiency for an accuracy level in FL. We formulate a probabilistic model for threshold accuracy estimation and find the corresponding number of participation required to build a high-quality learning model. We analyze the impact of the number of participation in FL while determining the threshold accuracy level with closed-form solutions. Finally, with numerical results we demonstrate the structure of admission control model for different configurations.

The remainder of this paper is organized as follows. We review related work in Section II, and present the system model in Section III. In Section IV, we formulate an incentive mechanism with a two-stage Stackelberg game, and investigate the Nash equilibrium of the game with simulation results in Section V. An admission control strategy is formulated to define a minimum local accuracy level, and numerical analysis is presented in Section VI. Finally, conclusions are drawn in Section VII.

Ii Related Work

The unprecedented amount of data necessitates the use of distributed computational framework to provide solutions for various machine learning applications [11][15]. Using distributed optimization techniques, researches on decentralized machine learning largely focused on competitive algorithms to train learning models the number of cluster nodes [18], [19], [20], [21], with balanced and i.i.d data.

Setting a different motivation, FL recently has attracted an increasing interest [7], [11], [12], [13], [15], [22] in which collaboration of the number of devices with non-i.i.d and unbalanced data is adapted to train a learning model. In the pioneering works [11], [12], the authors presented the setting for federated optimization, and related technical challenges to understand the convergence properties in FL. Existing work studied these issues. For example, Wang, Shiqiang, et al. [16]

theoretically analyzed the convergence rate of the distributed gradient descent. In this detailed work, the authors focus on deducing the optimal global aggregation frequency in a distributed learning setting to minimize the loss function of the global problem. Their problem considers resource constrained edge computing system. However, the setting differs with our proposed model where we have introduced the notion of participation, and proposed a game theoretic interaction between the workers (participating clients) and the master (MEC server) to attain a cost effective FL framework. Earlier to this work, McMahan, H. Brendan, et al. in

[15] proposed a practical variant of FL where the global aggregation was synchronous with a fixed frequency. The authors confirmed the effectiveness of this approach using various datasets. Furthermore, authors in [18] extended the theoretical training convergence analysis results of [15]

to general classes of distributed learning approaches with communication and computation cost. For the deep learning architecture where the objectives are non-convex, authors in

[23] proposed an algorithm namely FedProx, a special case of FedAvg where a surrogate of the global objective function was used to efficiently ensure the empirical performance bound in FL setting. In this work, the authors demonstrated the improvement in performance as in their theoretical assumptions, both in terms of robustness and convergence through a set of experiments.

Recent works adapt and extend the core concepts in [11], [12], [15] to develop a communication-efficient FL algorithm, where each participating clients in the federated learning setting independently computes their local updates on the current model and communicates with a central server to aggregate the parameters for the computation of a global model. The framework uses Federated Averaging (FedAvg) algorithm to reduce communication costs. In these regard, to characterize the communication and computation trade-off during model updates, distributed machine learning based on gradient descent is widely used. In the mentioned work [11]

, a variant of distributed stochastic gradient descent (SGD) was used to attain parallelism and improved computation. Similarly, in

[12]

, the authors discussed about a family of new randomized methods combining SGD, with primal and dual variants such as Stochastic Variance Reduced Gradient (SVRG), Federated Stochastic Variance Reduced Gradient (FSVRG) and Stochastic Dual Coordinate Ascent (SDCA). Further, in

[24] the authors explained about the redundancy in gradient exchanges in distributed SGD, and proposed a Deep Gradient Compression (DGC) algorithm to enhance communication efficiency in FL setting. The performance of parallel SGD and mini-batch parallel SGD had been discussed in [25], [23] for fast convergence and effective communication rounds. However, authors in their recent work [25] argue for the sufficient improvement in generalization performance with the variant of local SGD rather than the large mini-batch sizes, even in a non-convex setting. In [26], the authors proposed the Distributed Approximate Newton (DANE) algorithm for precisely solving a general subproblem available locally before averaging their solutions. In the recent work [27], the authors designed a robust method which applies the proposed periodic-averaging SGD (PASGD) technique to prevent communication delay in the distributed SGD setting. The idea in this work was to adapt the communication period such that it minimizes the optimization error at each wall-clock time. To this end, interestingly, in some of the latest works such as [28], the authors have well-studied and demonstrated the privacy risk scenario under collaborated learning mechanism such as FL.

In contrast to the above research that has overlooked the participatory method to build a high-quality central ML model and its criticality, and primarily focused on the convergence of learning time with variants of learning algorithms, our work addresses the challenge in designing a communication and computational cost effective FL framework by exploring a crowdsourcing structure. In this regard, few recent studies have discussed about the participation to build a global ML model with FL as in [29], [30]. Basically, in [29] the authors proposed a novel distributed approach based on FL to learn the network-wide queue dynamics in vehicular networks for achieving ultra-reliable low-latency communication (URLLC) via a joint power and resource allocation problem. The vehicles participate in FL to provide information related to sample events (i.e., queue lengths) to parameterize the distribution of extremes. In [30], the authors provided new design principles to characterize edge-learning and highlighted important research opportunities and applications with the new philosophy for wireless communication called learning-driven communication. The authors also presented some of the significant case studies and demonstrated the effectiveness of design principles in this regards. Further, recent work [17] studied the block-chained FL architecture proposing the data reward and mining reward mechanism for FL. However, these work largely provide a latency analysis for the related applications. Our paper focuses on the Stackelberg game-based incentive mechanism design to reveal the iteration strategy of the participating clients by solving the local subproblems for building a high-quality centralized learning model. Interestingly, incentive mechanism has been studied for years in mobile crowdsourcing/crowdsensing systems, especially with auction mechanisms (e.g., [31], [32], [33]), contract and tournament models (e.g, [34], [35]) and Stackelberg game-based incentive mechanisms such as in [36] and [37]. However, the design goals were specific towards fair and truthful data trading of distributed sensing tasks. In this regard, the novelty of our model is that we untangle and analyze the complex interaction scenario between the participating clients and the aggregating edge server in the crowdsourcing framework to obtain a cost-effective global learning model without sharing local datasets. Moreover, the proposed incentive mechanism models such interactions to enable communication-efficient FL, which is able to achieve a target accuracy, in consideration with the performance metrics. Further, we adopt the dual formulation of the learning problem to better decompose the global problem into distributed subproblems for federated computation across the participating clients.

Iii System Model

Fig. 1 illustrates our proposed system model for the crowdsourcing framework to enable FL. The model consists of a number of mobile clients associated with a base station having a central coordinating server (MEC server), acting as a central entity. The server facilitates the computation of the parameters aggregation, and feedback the global model updates in each global iteration. We consider a set of participating clients in the crowdsourcing framework. The crowdsourcer (platform) can interact with mobile clients via an application interface, and aims at leveraging FL to build a global ML model. As an example, consider a case where the crowdsourcer (referred to as MEC server hereafter, to avoid any confusion) wants to build a ML model. Instead of just relying on available local data to train the global model at the MEC server, the global model is constructed utilizing the local training data available across several distributed mobile clients. Here, the global model parameter is first shared by the MEC server to train the local models in each participating client. The local model’s parameters minimizing local loss functions are then sent back as feedback, and are aggregated to update the global model parameter. The process continues iteratively, until convergence.

Fig. 1: Crowdsourcing Framework for Decentralized Machine Learning.

Iii-a Federated Learning Background

For FL, we consider unevenly partitioned training data over a large number of participating clients to train the local models under any arbitrary learning algorithm. Each client stores its local dataset of size respectively. Then, we define the training data size

. In a typical supervised learning setting,

defines the collection of data samples given as a set of input-output pairs , where is an input sample vector with features, and is the labeled output value for the sample . The learning problem, for an input sample vector (e.g., the pixels of an image) is to find the model parameter vector that characterizes the output (e.g., the labeled output of the image, such as the corresponding product names in a store) with the loss function . Some examples of loss functions include

for a linear regression problem and

for support vector machines. The term

is often called a linear mapping function. Therefore, the loss function based on the local data of client , termed local subproblem is formulated as

(1)

with is the local model parameter, and is a regularizer; . This characterizes the local model in the FL setting.

1:Input: Initialize dual variable ,
2:for each aggregation round do
3:     for k  do
4:         Solve local subproblems (5) in parallel
5:         Update local variables as in (7)
6:     end for
7:     Aggregate to update global parameter as in (8)
8:end for
Algorithm 1 Federated Learning Framework

Global Problem: At the MEC server, the global problem can be represented as the finite-sum objective of the form

(2)

Problems of such structure as in (2) where we aim to minimize an average of local objectives are well-known as distributed consensus problems [38].

Solution Framework under Federated Learning: We recast the regularized global problem in (2) as

(3)

and decompose it as a dual optimization problem111The duality gap provides a certificate to the quality of local solutions and facilitates distributed training. in a distributed scenario [39] amongst participating clients. For this, at first, we define as a matrix with columns having data points for . Then, the corresponding dual optimization problem of (3) for a convex loss function is

(4)

where is the dual variable mapping to the primal candidate vector, and are the convex conjugates of and respectively [40], and . For the ease of representation, we will use for in (4) hereafter. We consider that is a strongly convex function, i.e., is continuous differentiable. Then, the solution is obtained following an iterative approach to attain a global accuracy (i.e., ).

Under the distributed setting, we further define data partitioning notations for clients to represent the working principle of the framework. Let us define a weight vector at the local subproblem with its elements zero for the unavailable data points. Following the assumption of having as -smooth and 1-strongly convex of to ensure convergence, its consequences is the approximate solution to the local problem defined by the dual variables , , characterized as

(5)

where is defined with a matrix columns having data points for

, and zero padded otherwise. Each participating client

iterates over its computational resources to solve its local problem (5) with a local relative accuracy that characterizes the quality of local solution, and produces a random output satisfying

(6)

The local dual variable is updated as follows:

(7)

Correspondingly, each participating client will broadcast the local parameter defined as , during each round of communication to the MEC server. The MEC server aggregates the local parameter (averaging) with the following rule:

(8)

and distributes the global change in to the participating clients. This way we observe the decoupling of global model parameter from the need of local clients’ data222Note that we consider the availability of quality of data with each participating client for solving a corresponding local subproblem. Further related demonstration on dependency of the normalized data size and accuracy can be found in [41]. for training a global model.

Alg. 1 briefly summarizes the FL framework as an iterative process to solve the global problem characterized in (3) for a global accuracy . A participating client strategically333Fewer iterations might not be sufficient to have an optimal local solution [16]. iterates over its local training data to solve the local subproblem (5) up to an accuracy . In each communication round with the MEC server, the participating clients synchronously pass on their parameters using a shared wireless channel. The MEC server then aggregates the local model parameters as in (8), and broadcasts the global parameters required for the participating clients to solve their local subproblems for the next communication round.

Fig. 2: Interaction Environment of Federated Learning Setting under Crowdsourcing Framework.

Within the framework, consider that each participating client uses any arbitrary optimization algorithm (such as Stochastic Gradient Descent (SGD), Stochastic Average Gradient (SAG), Stochastic Variance Reduced Gradient (SVRG)) to attain a relative accuracy per local subproblem. Then, for strongly convex objectives, the general upper bound on the number of iterations is dependent on local relative accuracy of the local subproblem and the global model’s accuracy as [12]:

(9)

where the local relative accuracy measures the quality of local solution as defined in the earlier paragraphs. Further, in this formulation, we have replaced the term in the numerator with , for a constant . For the fixed iterations of MEC server to solve the global problem, we observe in (9) that a very high local accuracy (small ) can significantly improve the accuracy . However, each client has to spend excessive resources in terms of local iterations, to attain a small accuracy as

(10)

where is a parameter choice of client that depends on the data size and condition number of the local subproblem [42]. Therefore, to address this trade-off, MEC server can setup an economic interaction environment (a crowdsourcing framework) to motivate the participating clients for improving the local relative accuracy. Correspondingly, with the increased reward, the participating clients are motivated to attain better local accuracy, which as observed in (9) can improve the global accuracy for a fixed number of iterations of the MEC server to solve the global problem. In this scenario, the corresponding performance bound in (9) for heterogeneous responses can be modified considering the worst case response of the participating client as

(11)

Fig. 2 describes an interaction environment incorporating crowdsourcing framework and FL setting. In the following section, we will further discuss in details about the proposed incentive mechanism, and present the interaction between MEC server and participating clients as a two-stage Stackelberg game.

Iii-B Cost Model

Training on local data for a defined accuracy level incurs a cost for the participating clients. We discuss its significance with two typical costs: the computing cost and the communication cost.

Computing cost: This cost is related to the number of iterations performed by client on its local data to train the local model for attaining a relative accuracy of in a single round of communication. With (10), we define the computing cost for client when it performs computation on its local data .

Communication cost: This cost is incurred when client interacts with MEC server for parameter updates to maintain accuracy. During a round of communication with the MEC server, let be the size (in bits) of local parameters in a floating point representation produced by the participating client after processing a mini-batch [21].

While is the same for all the participating clients under a specified learning setting of the global problem, each participating client can invest resources to attain specific as defined in (10). Although the best choice would be to choose such that the local solution time is comparable with the time expense in a single communication round, larger will induce more rounds of interaction between clients until global convergence, as formalized in (9).

With the inverse relation of global iteration upon local relative accuracy in (9), we can characterize the total communication expenditure as

(12)

where as the time required for the client to communicate with MEC server in each round of model’s parameter exchanges. Here, we normalize in (9) to 1 as the constant can be absorbed into for each round of model’s parameter exchanges when we characterize the communication expenditure in (12). Using first-order Taylor’s approximation444First-order taylor’s approximation for is . For small , the approximation results , we can approximate the total communication cost as . We assume that clients are allocated orthogonal sub-channels so that there is no interference between them555Note that the scenario of possible delay introduced with interference on poor wireless uplink channel can affect the local model update time. This can be mitigated by adjusting maximum waiting time as in [17] at MEC.. Therefore, the instantaneous data rate for client can be expressed as

(13)

where is the total bandwidth allocated to the client , is the transmission power of the client , is the channel gain between participating client and the base station, and is the Gaussian noise power at client . Then for client , using (13), we can characterize for each round of communication with the MEC server to upload the required updates as

(14)

(14) provides the dependency of on wireless conditions and network connectivity.

Assimilating the rationale behind our earlier discussions, for a participating client with evaluated , the increase in value of (poor local accuracy) will contribute for a larger communication expenditure. This is because the participating client has to interact more frequently with the MEC server (increased rounds of communication iterations) to update its local model parameter for attaining relative accuracy.

Therefore, the participating client ’s cost for the relative accuracy level on the local subproblem is

(15)

where 0 1 is the normalized monetary weight for communication and computing costs (i.e., $/ rounds of iteration). A smaller value of relative accuracy indicates a high local accuracy. Thus, there exists a trade-off between the communication and the computing cost (15). A participating client can adjust its preference on each of these costs with the weight metric . The higher value of emphasizes on the larger rounds of interaction with the MEC server to adjust its local model parameters for the relative accuracy. On the other hand, the higher value of reflects the increased number of iterations at the local subproblem to achieve the relative accuracy. This will also significantly reduce the overall contribution of communication expenditure in the total cost formulation for the client. Note that the client cost over iterations could not be the same. However, to make the problem more tractable, according to (9) we consider minimizing the upper-bound of the cost instead of the actual cost, similar to approach in [16].

Iv Incentive Mechanism for Client’s Participation in the Decentralized Learning Framework

In this section, firstly, we present our motivation to realize the concept of FL by employing a crowdsourcing framework. We next advocate an incentive mechanism required to realize this setting of decentralized learning model with our proposed solution approach.

Iv-a Incentive Mechanism: A Two-Stage Stackelberg Game Approach

The MEC server will allocate reward to the participating clients to achieve optimal local accuracy in consideration for improving communication efficiency of the system. That means, the MEC server will plan to incentivize clients for maximizing its own benefit, i.e., an improved global model. Consequently, upon receiving the announced reward, any rational client will individually maximize their own profit. Such interaction scenario can be realized with a Stackelberg game approach.

Specifically, we formulate our problem as a two-stage Stackelberg game between the MEC server (leader) and participating clients (followers). Under the crowdsourcing framework, the MEC server designs an incentive mechanism for participating clients to attain a local consensus accuracy level 666It signifies the agreement among the participating clients on the quality of solution at the local subproblems for building a high-quality centralized learning model. on the local models while improving the performance of a centralized model. The MEC server cannot directly control the participating clients to maintain a local consensus accuracy level, and requires an effective incentive plan to enroll clients for this setting.

Clients (Stage II): The MEC server has an advantage, being a leader with the first-move advantage influencing the followers for participation with a local consensus accuracy. It will at first announce a uniform reward rate777Prominently, two kinds of pricing scheme exist at present following different design goals: uniform pricing and discriminatory or differentiated pricing [43]. The differentiated pricing scheme is more efficient, but also requires more information and higher complexity than the uniform pricing [44], [45]. Therefore, based upon offered motivations and benefits, our proposed crowdsourcing framework follows platform-centric model to train a high quality global model with low complexity, less information exchange by using the uniform pricing scheme. (e.g., a fair data package discount as $/accuracy level) for the participating clients. Given , at Stage II, a rational client will try to improve the local model’s accuracy for maximizing its net utility by training over the local data with global parameters. The proposed utility framework incorporates the cost involved while a client tries to maximize its own individual utility.

Client Utility Model: We use a valuation function to denote the model’s effectiveness that explains the valuation of the client when relative accuracy is attained for the local subproblem.
Assumption 1. The valuation function is a linear, decreasing function with , i.e., . Intuitively, for a smaller relative accuracy at the local subproblem, there will be an increase in the reward for the participating clients.

Given , each participating client ’s strategy is to maximize its own utility as follows:

(16)

given cost as (15). The feasible solution is always restricted to the value less than 1 (i.e., without loss of generality, for , it violates the participation assumption for the crowdsourcing framework). Therefore, problem (16) can be represented as

(17)

Also, we have , which means is a strictly convex function. Thus, there exists a unique solution

MEC Server(Stage I): Knowing the response (strategy) of the participating clients, the MEC can evaluate an optimal reward rate to maximize its utility. The utility of MEC server can be defined in relation to the satisfaction measure achieved with local consensus accuracy level.

MEC Server Utility Model: We define as the number of iterations required for an arbitrary algorithm to converge to some accuracy. We similarly define as global iterations of the framework to reach a relative accuracy on the local subproblems.

From this perspective, we require an appropriate utility function as the satisfaction measure of the framework with respect to the number of iterations for achieving accuracy. In this regard, use the definition of the number of iterations for accuracy as

Due to large values of iterations, we approximate as a continuous value, and with the aforementioned relation, we choose as a strictly concave function of for , i.e., with the increase in , also increases. Thus, we propose as the normalized utility function bounded within as

(18)

which is strictly increasing with , and represents the satisfaction of MEC increase with respect to accuracy.

Fig. 3: MEC utility as a function of with different parameter values of .

As for the global model, there exists an acceptable value of threshold accuracy measure correspondingly reflected by . This suggests the possibility of near-zero utility for MEC server for failing to attain such value.

Fig. 3 depicts our proposed utility function, a concave function of with parameters and that reflect the required behavior of the utility function defined in (18). In Fig. 3, we can observe that larger value of means smaller iterations requirement and larger values of introduces flat curves suggesting more flexibility in accuracy. So we can analyze the impact of parameters and in (18), and set them to model the utility function for the MEC server as per the design requirements of the learning framework. Furthermore, in our setting, can be elaborated with a upper bound (maximum global iterations, ) as

(19)

(19) explains the efficiency paradigm of the proposed framework in terms of time required for the convergence to some accuracy . If is the time per iteration to reach a relative accuracy at a local subproblem and is the communication time required during a single iteration for any arbitrary algorithm, then we can analyze the result in (19) with the efficiency of the global model as

(20)

Because the cost of communication is proportional to the speed and energy consumption in a distributed scenario [20], the bound defined in (19) explains the efficiency in terms of MEC server’s resource restriction for attaining accuracy.

The utility of the MEC server can therefore be defined for the set of measured best responses as

where is the system parameter 888Note that characterizes a linear scaling metric to the utility function which can be set arbitrarily and will not alter our evaluation. Equivalently, it can be understood as the MEC server’s physical resource consignments for the FL that reflects the satisfaction measure of the framework., and is the cost spent for incentivizing participating clients in the crowdsourcing framework for FL. So, for the measured from the participating clients at MEC server, the utility maximization problem can be formulated as follows:

(21)
s.t. (22)

In constraint (22), characterizes the worst case response for the server side utility maximization problem with the bound on permissible global iterations. Note that MEC adapts admission control strategy (discussed in Section VII) to improve the number of participation for maximizing its utility. In fact, MEC has to increase the reward rate to maintain a minimum number of participation (at least two) to realize the distributed optimization setting in FL. In addition to this, the framework may suffer from slower convergence due to fewer participation. Thus, MEC will avoid deliberately dropping the clients to achieve a faster consensus with (22).

Furthermore, using the relationship defined in (19) between and relative accuracy for the subproblem, we can analyze the impact of responses on MEC server’s utility in a FL setting with the constraint (11). To be more specific about this relation, we can observe that with the increased value of , i.e., lower relative accuracy (high local accuracy), the MEC server can attain better utility due to corresponding increment in value of . Note that in the client cost problem, is treated as a constant provided by the MEC problem, and can be ignored for solving (16).

Lemma 1.

The optimal solution for (21) can be derived as .

Proof:

See Appendix A. ∎

Therefore, for a given response , we can formalize (21) as

(23)

Stackelberg Equilibrium. With a solution to MEC server’s utility maximization problem, we have the following definition.

Definition 1. For any values of , and , is a Stackelberg equilibrium if it satisfies the following conditions:

(24)
(25)

Next, we employ the backward-induction method to analyze the Stackelberg equilibria: the Stage-II problem is solved at first to obtain , which is then used for solving the Stage-I problem to obtain .

Iv-B Stackelberg Equilibrium: Algorithm and Solution Approach

Intuitively, from (19), we see that the server can evaluate the maximum value of required for attaining accuracy for the centralized model while maintaining relative accuracy amongst the participating clients. Here, is a consensus on a maximum local accuracy level amongst participating clients, i.e., the local subproblems will maintain at least relative accuracy. So, with the measured responses from the participating clients, the server can design a proper incentive plan to improve the global model while maintaining the worst case relative accuracy as for the local model.

Since the threshold accuracy can be adjusted by the MEC server for each round of solution, each participating client will maintain a response towards the maximum local consensus accuracy . This formalizes the client’s selection criteria [see Remark 1.] which is sufficient enough for the MEC server to maintain the accuracy . We also have the lower bound related with the value of for equivalent accuracy while dealing with the client’s responses , i.e.,

(26)

where is the maximum permissible upper bound to the global iterations.

As explained before and with (26), the value of can be varied (lowered) by MEC server to improve the overall performance of the system. For a worst case scenario, where the offered reward for the client is insufficient to motivate it for participation with improved local relative accuracy, we might have , i.e., , no participation.

Lemma 2.

For a given reward rate , and which is determined based upon the channel conditions (14), we have the unique solution for the participating client satisfying following relation:

(27)

for , where,

Proof: Because for , (17) is a strictly convex function resulted as a linear plus convex structure. Therefore, by the first-order condition, (17) can be deduced as

(28)

We observe that Lemma 2 is a direct consequence of the solution structure derived in (28). Hence, we conclude the proof.

From Lemma 2, we have some observations with the definition of for the response of the participating clients. First, we can show that is larger for the poor channel condition on a given reward rate. Second, in such scenario, with the increase in reward rate, say for the participating clients will iterate more during their computation phase resulting in lower . This will reduce the number of global iterations to attain an accuracy level for the global problem.

We can therefore characterize the participating client ’s best response under the proposed framework as

(29)

(29) represents the best response strategy for the participating client under our proposed framework. Intuitively, exploring the logarithmic structure in (27), we observe that the increase in incentive will motivate participating clients to increase their efforts for local iteration in one global iteration. This is reflected by a better response, i.e., a lower relative accuracy (high local accuracy) during each round of communication with the MEC server.

Fig. 4: An illustration showing participating clients response over the offered reward rate.

Fig. 4 illustrates such strategic responses of the participating clients over an offered reward for a given configuration. In this scenario, to elaborate the best response strategy as characterized in (29), we have considered four participating clients with different preferences (e.g., Client 3 being the most reluctant participant). We observe that Client 3 seeks more incentive to maintain comparable accuracy level as Client 1. Further, we consider the trade off between communication cost and the computation cost as discussed with the relation in (15). These costs are complementary in relation by , and for each client their preferences upon these costs are also different. For instance, the higher value of for client emphasizes on the increased number of communication with the MEC server to improve the local relative accuracy .

(a)
(b)
(c)
Fig. 5: Solution Analysis (27) (Left Y-axis: Relative accuracy, Right Y-axis: Communication cost): (a) impact of communication adversity on local relative accuracy for a constant reward (b) normalized weight versus relative accuracy for a fair data rate (quality communication channel) (c) normalized weight versus relative accuracy for an expensive data rate.
(a)
(b)
(c)
Fig. 6: Case Study: impact of communication cost and offered reward rate for different values of normalized weight (preferences), defining client’s categories (a) Reluctant, (b) Rational, (c) Sensitive, . X-axis shows the increase in incentive (r) value from left-to-right, and the y-axis defines the increase in value of communication expenditure (top-to-bottom).

In Fig. 5, we briefly present the solution analysis to (27) with the impact of channel condition (we define it as communication adversity) on the local relative accuracy for a constant reward. For this, in Fig. 5a we consider a participating client with the fixed offered reward setting

from uniformly distributed values of 0.1 to 5. We use normalized

parameter for a client to illustrate the response analysis scenario. In Fig. 5b and Fig. 5c, is uniformly distributed on [0.1, 1], and is set at 0.6. Intuitively, as in Fig. 5a, the increase in communication time for a fixed reward will influence participating clients to iterate more locally for improving local accuracy than to rely upon the global model, which will minimize their total cost. Under this scenario, we observe the increase in communication cost with the increase in communication time . Thus, the clients will iterate more locally. However, the trend is significantly affected by normalized weights , as observed in Fig. 5b and Fig. 5c. For a larger value of (poor channel condition) as in the case of Fig. 5c, increasing the value of , i.e., clients with more preference on the communication cost in the total cost model results to higher local iterations for solving local subproblems, as reflected by the better local accuracy, unlike in Fig. 5b. In both cases we observe the decrease in communication cost upon participation. However, in Fig. 5c the communication cost is higher because of an expensive data rate. Therefore, for a given , client can adjust its weight metrics accordingly to improve the response .

In Fig. 6, we explore such behaviors of the participating clients through the heatmap plot. To explain better, we define three categories of participating clients based upon the value of normalized weights , which are their individual preferences upon the computation cost and the communication cost for the convergence of the learning framework. (i) Reluctant clients with a lower consume more reward to improve local accuracy, even though the value of is larger (expensive), as observed in Fig. 6a. (ii) Sensitive clients are more susceptible towards the channel quality with larger , and iterates more locally within a round of communication to the MEC server for improving local accuracy, as observed in Fig. 6c. (iii) Rational clients, as referred in Fig. 6b tend to balance these extreme preferences (say for client ), which in fact would be unrealistic to expect all the time due to heterogeneity in participating client’s resources.

To solve (23) efficiently, with (29) we introduce a new variable in relation with consensus on local relative accuracy ,

(30)

where

is the minimum incentive value required obtained from (29) to attain the local consensus accuracy at client for the defined parameters .

This means, when , and when . MEC server can use this setting to drop the participants with poor accuracy. As discussed before, for the worst case scenario we consider .

Therefore, the utility maximization problem can be equivalently written as

(31)
s.t. (32)
(33)
1:Sort clients as with
2:
3:while  do
4:     Obtain the solutions to the following problem
5:     if  then
6:     end if
7:     
8:     
9:end while
10:Return with highest optimal values in problem (4)
Algorithm 2 MEC Server’s Utility Maximization

The problem (31) is a mixed-boolean programming, which may require exponential-complexity effort (i.e., configuration of ) to solve by the exhaustive search. To solve this problem with linear complexity, we refer to the solution approach as in Algorithm 2.

The utility maximization problem at MEC server can be reformulated as a constraint optimization problem (34-35) assuming a fixed configuration of as

(34)
s.t. (35)

where (35) is budget constraint for the problem. The second-order derivative of function in (35) is , i.e., the problem (34) is a convex problem and can be solved similarly with Algorithm 2 (line 4 -5).
Proposition 1. Algorithm 2 can solve the Stage-I equivalent problem (23) with linear complexity.

Proof: As the clients are sorted in the order of increasing (line 1), for the sufficient condition resulting , the MEC’s utility maximization problem reduces to a single-variable problem that can be solved using popular numerical methods.
Remark 1. Algorithm 2 can maintain consensus accuracy by formalizing the clients selection criteria. This is because from (30), for , and for . Thus, MEC server uses this setting to drop the participants with .
Theorem 1. The Stackelberg equilibria of the crowdsourcing framework are the set of pairs .

Fig. 7: Comparison of (a) Reward Rate and (b) MEC Utility, under three schemes for different values of threshold accuracy.

Proof: For any given , it is obvious that since is the solution to the Stage-I problem. Thus, we have . In the similar way, for any given value of and , we have . Hence, . Combining these facts, we conclude the proof being based upon the definitions of (24) and (25).

V Simulation Results and Analysis

In this section, we present numerical simulations to illustrate our results. We consider the learning setting for a strongly convex model such as logistic regression, as discussed in Section III, to characterize and demonstrate the efficacy of the proposed framework. First, we will show the optimal solution of Algorithm

2 (ALG. 2) and conduct a comparison of its performance with two baselines. The first one, named OPT, is the optimal solutions of problem (23) with exhaustive search for the optimal response . The second one is called Baseline that considers the worst response amongst the participating clients to attain local consensus accuracy with an offered price. This is an inefficient scheme but still enables us to attain feasible solutions. Finally, we analyze the system performance by varying different parameters, and conduct a comparison of the incentive mechanism with the baseline and their corresponding utilities. In our analysis, the smaller values of local consensus are of specific interest as they reflect the effectiveness of FL.

1) Settings: For an illustrative scenario, we fix the number of participating clients to 4. We consider the system parameter , and the upper bound to the number of global iterations which characterizes the permissible rounds of communication to ensure global accuracy. The MEC’s utility model is defined with parameters , and . For each client , we consider normalized weight is uniformly distributed on [0.1,0.5], which can provide an insight on the system’s efficacy as presented in Fig. 6. We characterize the interaction between the MEC server and the participating clients under homogeneous channel condition, and use the normalized value of for all participating clients.

(a)
(b)
Fig. 8: (a) For , . (b) For , and

2) Reward Rate: In Fig. 7 we increase the value of local consensus accuracy from 0.2 to 0.6. When the accuracy level is improved (from 0.4 to 0.2), we observe a significant increase in reward rate. These results are consistent with the analysis in Section IV-B. The reason is that cost for attaining higher local accuracy level requires more local iterations, and thus the participating clients exert more incentive to compensate for their costs.

We also show that the reward variation is prominent for lower values of , and observe that scheme ALG. 2 and OPT achieve the same performance, while Baseline is not as efficient as others. Here, we can observe up to 22% gain in the offered reward against the Baseline by other two schemes. In Fig. 7b, we see the corresponding MEC utilities for the offered reward that complements the competence of proposed ALG. 2. We see, the trend of utility against the offered reward goes along with our analysis.

3) Parametric choice: In Fig. 8 we show the impact of parametric choice adopted by the participating client to solve the local subproblem [19], which is characterized by . In Fig. 8a, we see a lower offered reward for the improved local accuracy level for the participating clients adapting same parameters (algorithms) for solving the local subproblem, in contrast to Fig. 8b with the uniformly distributed on [1,5] to achieve the competitive utility.

4) Comparisons: In Table a, and Table b, we see the effect of randomized parameter for different configuration of MEC utility model defined by . For the smaller values of , which captures the competence of the proposed mechanism, we observe that the choice of provides a consistent offered reward for improved utility from to , which follows our analysis in Section IV-A. For larger values of , we also see the similar trend in MEC utility. For a randomized setting, we observe up to 71% gain in offered reward against the Baseline, which validates our proposal’s efficacy aiding FL.

Threshold Accuracy Baseline ALG. 2 ALG. 2 ALG. 2
0.2 18 5.22 5.22 5.22
0.3 12 3.48 3.48 3.48
0.4 8.99 2.602 2.6 2.61
0.5 7.19 2.79 4.3 2.2
0.6 5.99 2.38 2.87 2.1
0.7 5.13 2.84 3.17 1.9
(a) OFFERED REWARD RATE COMPARISON WITH RANDOMIZED EFFECT FOR DIFFERENT SETTING.
Threshold Accuracy ALG. 2 ALG. 2 ALG. 2
0.2 8.55 8.79 8.96
0.3 8.41 8.60 8.95
0.4 8.33 8.58 8.94
0.5 8.2 8.73 8.91
0.6 8.18 8.4 8.91
0.7 7.8 8.51 8.86
(b) UTILITY COMPARISON WITH RANDOMIZED EFFECT FOR DIFFERENT SETTING.

Vi Threshold Accuracy Estimation : An Admission Control Strategy

Our earlier discussion in Section IV and simulation results explain the significance of choosing a local accuracy to build a global model that maximizes the utility of the MEC server. In this regard, at first, the MEC server evokes admission control to determine and the final model is learned later. This means, with the number of expected clients, it is crucial to appropriately select a proper prior value of that corresponds to the participating client’s selection criteria for training a specific learning model. Note that, in each communication round of synchronous aggregation at the MEC server, the quality of local solution benefits to evaluate the performance at the local subproblem. In this section, we will discuss about the probabilistic model employed by the MEC server to determine the value of the consensus accuracy.

We consider the local

accuracy for the participating clients is an i.i.d and uniformly distributed random variable over the range

, then the PDF of the responses can be defined as . Let us consider a sequence of discrete time slots , where the MEC server updates its configuration for improving the accuracy of the system. Following our earlier definitions, at time slot , the number of participating clients in the crowdsourcing framework for FL is , or simply . We restrict the clients with the accuracy measure . For number of participation requests, the total number of accepted responses is defined as . We have . At each time , the MEC server chooses as the threshold accuracy that maximizes the sum of its utility as defined in (18) for the defined parameters and the total participation, , subject to the constraint that the response lies between the minimum and maximum accuracy measure (). Using the definitions in (19), for , the MEC server maximizes its utility for the number of participation with accuracy as

(36)

The Lagrangian of the problem (36) is as follows:

(37)

where and are dual variables. Problem (36) is a convex problem whose optimal primal and dual variables can be characterized using the Karush-Khun-Tucker (KKT) conditions [40] as

(38)
(39)
(40)

Following the complementary slackness criterion, we have

(41)

Therefore, from (41), we solve (36) with the KKT conditions assuming that as an admission control strategy, and find the optimal that satisfies the following relation

(42)

(42) can be rearranged as

(43)

To obtain the value of we will use Netwon-Raphson method [46] employing an appropriate initial guess that manifests the quadratic convergence of the solution. We choose as an initial guess for finding which follows the PDF . Then the solution method is an iterative approach as follows:

(44)
Fig. 9: Variation of local accuracy for different values of given the density function, , (a) For a = 0.35, b = -1. (b) For a = 0.45, b = -1.05.

Numerical Analysis: In Fig. 9, we vary the number of participating clients up to 50 with different values of . The response of the clients is set to follow a uniform distribution on [0.1, 0.9] for the ease of representation. In Fig. 9a, for the model parameters (a,b) as (0.35,-1), we see increases with the increase in the number of participating clients for all values of . It is intuitive, and goes along with our earlier analysis that for the small number of participating clients, the smaller captures the efficacy of our proposed framework. Because it is an iterative process, the evolution of over the rounds of communication will be reflected in the framework design. Subsequently, the larger upper bound exhibits the similar impact on setting , where smaller imposes strict local accuracy level to attain high-quality centralized model. Also due to the same reason, in Fig. 9b, we see is increasing for the increase in the number of participating clients, however, with the lower value. It is because of the choice of parameters (a,b) as explained in Section IV-A. So the value of is lower in Fig. 9b.

Vii Conclusion

In this paper, we designed and analyzed a novel crowdsourcing framework to enable FL. An incentive mechanism has been established to enable the participation of several devices in FL. In particular, we have adopted a two-stage Stackelberg game model to jointly study the utility maximization of the participating clients and MEC server interacting via an application platform for building a high-quality learning model. We have incorporated the challenge of maintaining communication efficiency for exchanging the model parameters among participating clients during aggregation. Further, we derived the best response solution and proved the existence of Stackelberg equilibrium. We have examined characteristics of participating clients for different parametric configurations. Additionally, we have conducted numerical simulations and presented several case studies to evaluate the framework efficacy. Through a probabilistic model, we have designed and presented numerical results on an admission control strategy for the number of client’s participation to attain the corresponding local consensus accuracy. For future work, we will focus on mobile crowdsourcing framework to enable the self-organizing FL that considers task offloading strategies for the resource constraint devices. We will consider the scenario where the central coordinating MEC server is replaced by one of the participating clients and devices can offload their training task to the edge computing infrastructure. Another direction is to study the impact of discriminatory pricing scheme for participation. Such works can narrate towards numerous incentive mechanism design, such as offered tokens in blockchain network [17]. We also plan to further investigate on participating client’s behavior, in terms of incentive and communication efficiency to incorporate cooperative data trading scenario for the proposed framework [47], [48].

Appendix A KKT Solution

The utility maximization problem in (21) is a convex optimization problem whose optimal solution can be obtained by using Lagrangian duality. The lagrangian of (21) is

(A.1)

where is the Lagrangian multiplier for constraint (22).