I Introduction
Currently, there are nearly billion connected InternetofThings (IoT) devices^{1}^{1}1https://iotanalytics.com/stateoftheiotupdateq1q22018numberofiotdevicesnow7b/ and billion smartphones around the world. The devices continuously generate a large amount of fresh data. The traditional data analytics and machine learning requires all the data to be collected to a centralized data center/server, and then used for analysis or produce effective machine learning models. This is the actual practice now conducted by giant AI companies, including Amazon, Facebook, Google, etc.. However, this approach may raise concerns regarding the data security and privacy. Although various privacy preservation methods have been proposed, such as differential privacy [1] and secure multiparty computation (MPC) [10], a large proportion of people are still not willing to expose their private data which can be inspected by the server. This discourages the development of advanced AI technologies as well as new industrial applications. Motivated by the increasing privacy concern among data owners, Google introduced the concept of the federated learning (FL) [5]. The FL is a collaborative learning scheme that distributes the training process to individual users which then collaboratively train the shared model while keeping the data on their devices, thus alleviating the privacy issues.
A typical FL system is composed of two entities, including the FL platform and the data owners. Each data owner, e.g., mobile phone user, has a set of private data stored at its local device. The local data are used to train a local machine learning model where the initial model and hyperparameters are preset by the FL platform. Once the local training is completed, each data owner just sends the trained model to the FL platform. Then, all received local models are aggregated by the FL platform to build a global model. The training process iterates until achieving the target performance or reaching the predefined number of iterations. Federated learning has three distinctive characteristics: 1) A massive number of distributed FL participants are independent and uncontrollable, which is different from the traditional distributed training at a centralized data center. 2) The communication among devices, especially through the wireless channel, can be asymmetric, slow and unstable. The assumption of a perfect communication environment with a high information transmission rate and negligible packet loss is not realistic. For example, the Internet upload speed is typically much slower than download speed. Some participants may consequently drop out due to disconnection to the Internet, especially using the mobile phone through congested wireless communication channels. 3) The local data is not independent and identically distributed (NonIID), which significantly affects the learning performance [33]. Since data owners’ local data cannot be accessed and fused by the FL platform and may follow different distributions, assuming all local datasets are IID is impractical.
As imposed by the first characteristic above, an important prerequisite for a successful FL task is the participation of a large base of data owners that contribute sufficient training data. Therefore, establishing an FL services market is necessary for the sustainable development of the FL community. We propose an auction based market model to facilitate commercializing federated learning services among different entities. Specifically, the FL platform first initiates and announces an FL task. When receiving the information of the FL tasks, each data owner determines the service value by evaluating its local data quality and the computing and communication capabilities. Then, data owners report their types including bids representing the services value and their resources information to the FL platform. According to the received types, the platform selects a set of FL workers from data owners and decides the service payments. Finally, the FL platform coordinates the selected FL workers to conduct model training.
In this paper, we mainly investigate the federated learning in the wireless communication scenario and design applicable auction mechanisms to realize the trading between the FL platform and the data owners. From the system perspective, we aim to maximize their total utility, i.e., social welfare. For an efficient and stable business ecosystem of the FL services market, there are several critical issues about FL task allocation and pricing. First, which data owner can participate in the federated training as an FL worker? Due to the unique features listed above, the FL platform should consider data owners’ reported data size and nonIID degree of data. Also, the limited wireless spectrum resource need to be reasonably allocated since the large population of participated data owners may exacerbate the communication congestion. Second, how to set reasonable payments for data owners such that they can be incentivized to undertake the FL tasks? Auction is an efficient method for pricing and task allocation [16]. The payment amount should satisfy individual rationality, which means there is no loss to data owners from trading. We should also consider how to make data owners truthfully expose their private types. The truthfulness property can stabilize the market, prevent possible manipulation and may significantly reduce the communication overhead and improve the learning efficiency. The major contributions of this paper can be summarized as follows:

Based on realworld datasets and experiments, we define and verify a data quality function that reflects the impacts of local data volume and distribution on the federated training performance. The earth mover’s distance (EMD) [33] is used as the metric to measure the nonIID degree of the data. Moreover, we consider the wireless channel sharing conflicts among data owners.

We propose an auction framework for the wireless federated learning services market. From the perspective of the FL platform, we formulate the social welfare maximization problem which is a combinatorial NPhard problem.

We first design a reverse multidimensional auction (RMA) mechanism as an approximate algorithm to maximize the social welfare. To further improve the social welfare and the efficiency, we novelly develop an automated deep reinforcement learning based auction (DRLA) mechanism which is integrated with the graph neural network (GNN). According to the data owners’ requested wireless channels, we construct a conflict graph for the usage of GNN. Both mechanisms, i.e., RMA and DRLA, are theoretically proved to be strategyproof, i.e. truthful and individually rational.

Demonstrated by our simulation results, the proposed auction mechanisms can help the FL platform make practical trading strategies to efficiently coordinate data owners to invest their data and computing resources in the federated learning while optimizing the social welfare of the FL services market. Particularly, the automated DRLA mechanism shows significant improvement in social welfare compared with the RMA mechanism.
To the best of our knowledge, this is the first work that studies the auction based wireless FL services market and applies the GNN and deep reinforcement learning (DRL) in the design of a truthful auction mechanism to solve a combinatorial NPhard problem.
The rest of this paper is organized as follows. Section II reviews related work. The system model of the FL services market and the social welfare maximization problem are introduced in Section III. Section IV proposes the designed reverse multidimensional auction mechanism. In Section V, the automated auction mechanism based on GNN and DRL is presented in detail. Section VI presents and analyzes simulation results based on realworld and synthetic datasets. Finally, Section VII concludes this paper. Table I lists notations frequently used in the paper.
Notation  Description 

,  Set of data owners and the total number of data owners 
,  Set of selected FL workers and the total number of FL workers 
,  Data owner ’s local training dataset 
Total size of selected workers  
, , ,  Data owner ’s bid, data size, EMD value and set of requested wireless channels 
Set of data owners that have channel conflicts with set of selected workers  
FL Platform’s data utility  
Average EMD value of selected workers  
Model transmission rate between the platform and workers  
Number of iterations in local and global training 
Ii Related Work
Due to the resource constraints and the heterogeneity of devices, some papers have discussed the optimal allocation of the resources and tasks to improve the efficiency of federated training. The relevant issues mainly include client selection, computation offloading and incentive mechanism. The authors in [24] designed a protocol called FedCS. The FedCS protocol has a resource request phase to gather information such as computing power and wireless channel states from a subset of randomly selected clients, i.e., FL workers. To tradeoff the accuracy and efficiency, the FL platform optimally selects a set of clients that are able to punctually finish the local training. Compared with the protocol that ignores the client selection, the FedCS can achieve higher performance. Besides improving the training efficiency, the authors in [20, 17] discussed the fairness issue that if a protocol selects the clients by the computing power, the final trained model would more cater to the data distribution of clients with high computational capability. Based on the original federated averaging (FedAvg) algorithm [15], a FedAvg training algorithm was proposed in [17] to give the client with low performance a higher weight in optimizing the objective function. For computation offloading, the authors in [30] combined the deep reinforcement learning (DRL) and the FL to optimally allocate the mobile edge computing (MEC) resources. The client can use the DRL to intelligently decide whether to perform the training locally or offload it to the edge server. The simulation results showed that the DRL based approaches can achieve similar average utilities in FL and centralized learning. With respect to the incentive mechanism design, the authors in [9] proposed a Stackelberg game model to investigate the interactions between the server and the mobile devices in a cooperative relay communication network. The mobile devices determine the price per unit of data for individual profit maximization, while the server chooses the size of training data to optimize its own profit. The simulation results demonstrate that the interaction can finally reach an equilibrium, and the cooperative communication scheme can reduce the congestion and improve the energy efficiency. In a similar setting of [9], the authors in [12] proposed a contract theory method to incentivize the mobile devices to take part in the FL and contribute highquality data. The mobile users can only choose the contract matching their own types to maximize the utility. However, the above incentive mechanisms did not consider the nonIID data or the wireless channel constraints which are taken into account in this paper.
Different from the Stackelberg game and contract theory, the auction mechanism allows the data owner to actively report its type. Thus, the FL platform can sufficiently understand their status and requests to optimize the target performance metric, such as the social welfare of the market or the platform’s revenue. To design a new auction mechanism for higher performance or other properties that manually designed auction mechanism cannot realize, the automated mechanism design [6, 25] assisted by machine learning techniques is gaining popularity. In [8], the authors used the multilayer neural network to model an auction with the guarantee of individual rationality (IR) and incentive compatibility (IC)^{2}^{2}2In this paper, truthfulness and incentive compatibility are used interchangeably.
. The proposed deep learning based framework successfully recovered all known analytical solutions to classical multiitem auction settings, and discovered new mechanisms for settings where the optimal analytical solution is unknown. The study of using DRL to solve combinatorial problems over the graph was initialized in
[13]. The authors first calculated the graph embedding and then trained a deep Q network to optimize several classical NPhard problems in a greedy style. Since the wireless channel conflicts among the data owners are represented by a conflict graph in this paper, we propose an automated auction mechanism based on DRL and GNN to optimize the social welfare of FL services market while meeting the requirement of IC and IR.Iii System Model: Federated Learning Services Market
Iiia Preliminary Knowledge of Federated Learning
As illustrated in Fig. 1, we focus on a representative monopoly FL services market structure which consists of one FL platform and a community of data owners . The platform performs publishing the FL task and selecting data owners as FL workers. Each data owner maintains a set of private local data and has a local FL runtime to train a local model . We use to denote the set of FL workers selected from data owners. Different from the traditional centralized training that collects all local data , the FL platform only collects and aggregates the updated local models from workers to generate a global model . We assume that the data owners are honest to use their real private data to do training and submit the true local models to the platform. The FL training process generally contains the following steps, where Steps and form an iterative loop between the platform and the workers.

Step 1 (task initialization): The platform determines the training task, i.e., the target application, and the corresponding data requirements. Meanwhile, it specifies the hyperparameters of the machine learning model and the training process. Then, the platform transmits the task information and the initial global model to all workers.

Step 2 (local model training and update): Based on the global model , where
denotes the current global epoch index, each worker respectively uses the local data and device to update the local model parameters
. The worker ’s goal in epoch is to make parametersthat minimize the predefined loss function
, i.e.,(1) 
Step 3 (global model aggregation and update): The platform receives and aggregates the local models from workers, and then sends the updated global model parameters back. The platform aims to minimize the global loss function , i.e.,
Steps 
repeat until the global loss converges. Note that the federated training process can be adopted for various machine learning approaches based on the gradient descent method such as Support Vector Machines (SVM), convolutional neural network, and linear regression. The worker
’s local training dataset usually contains a set of feature vectors and a set of corresponding labels . Let denote the predicted result from the model using data vector . We focus on the neural network model in which a common loss function is the mean square error (MSE) defined as(2) 
Global model aggregation is the core part of the FL scheme. In this paper, we apply the classical federated averaging algorithm (FedAvg) [5] in Algorithm 1. According to (1), the worker trains the local model on minibatches sampled from the original local dataset (lines ). At the th iteration, the platform minimizes the global loss using the averaging aggregation which is formally defined as
(3) 
As the hyperparameters of Algorithm 1, is the local minibatch size, is the number of local epochs and is the number of global epochs and is the learning rate.
IiiB Local Data Evaluation
The evaluation of local data is critical for both the data owners and the platform. The data owner has to calculate the data cost for the valuation of its FL service^{3}^{3}3Note that the FL worker refers to as the data owner that has been selected by the platform to perform the FL training.. The platform cares about the data quality and needs a metric to quantify data owners’ potential contributions to the task completion. Specifically, we evaluate the local data from two perspectives: one is the data size and the other one is the data distribution. The local data cost mainly comes from collecting data and is closely related to the data size. Thus, the worker ’s data cost can be written as
(4) 
where are respectively the data owner ’s data size and unit cost of data collection.
For the FL platform, more data generally means better prediction performance [7, 11]. With respect to the data distribution, the conventional centralized learning, e.g., data center learning, usually assumes that the training data are independently and identically distributed (IID). However, the local data are userspecific and usually nonIID in the FL scenario. The characteristic of nonIID dominantly affects the performance, e.g., prediction accuracy, of the trained FL model [5]. As indicated in [33], the accuracy reduction is mainly due to the weights divergence which can be quantized by the earth mover’s distance (EMD) metric. A large EMD value means that the weights divergence is high which adversely affects the global model quality. We consider an class classification problem defined over a compact space and a label space . The data owner ’s data samples distribute over following the distribution . Let denote the EMD of . Specifically, given the actual distribution for the whole population, the EMD is calculated by [33]
(5) 
The actual distribution
is actually used as a reference distribution. It can be the public knowledge or announced by the platform which has sufficient historical data to estimate
. Let denote the set of all data owner’s EMD value. With the data size and the EMD metric, the FL platform can measure its data utility. The realworld experimental results in Section VI indicate that the relationship between the model quality , e.g., prediction accuracy, and the selected workers’ total data size and average EMD can be well represented by the following function:(6) 
where and are functions of the set of workers , i.e., the total data size and the average EMD metric with , and . are positive curve fitting parameters. The curve fitting approach for determining the function of machine learning quality is typical in the literature and a similar function has been adopted in other works, such as [18]. In the experiment presented in Section VIA, the data utility function (6) fits well when falls in the . To guarantee good service quality, can be set as the maximum EMD that the platform can accept. The first term reflects that the increasing average EMD metric causes the degradation of the model performance. The exponential term captures the diminishing marginal returns when the total data size increases. Hereby, we define the platform’s data utility as a linear function of as
(7) 
where represents the profit per unit performance.
IiiC Auction based FL Services Market
To recruit enough qualified workers for successful federated training, the FL platform^{4}^{4}4We use “FL platform” and “platform” interchangeably. conducts an auction. Figure 1 depicts the auction supported the FL process. For simplicity, we assume that the data owners’ computing and storage capabilities, i.e., the CPU frequency and memory, can meet the FL platform’s minimum requirement of the training speed and the local model size. Due to the requirement of low latency/timeliness, the FL worker should immediately transmit back the updated local model at the required transmission rate bits/s when the local training is completed. As described in Step in Section IIIA, the platform first initializes the global neural network model with size and hyperparameters, such as and . Then, the platform announces the auction rule and advertises the FL task to the data owners. Then, the data owners report their type profile . The data owner ’s type contains the bid which reveals its private service cost/valuation , the size and EMD value of its possessed local data, and the set of its requested wireless channels to communicate with the FL platform, i.e., . Here, the data owners cannot provide services with higher data quality than their truly owned. That is, they will not report higher data size or lower EMD metric to the platform. Based on the received types, the platform has to select workers and notifies all data owners the service allocation, i.e., the set of FL workers , and the corresponding payments to each data owner. The workers are considered to be singleminded at the channel allocation. That is, the data owner only accepts the set of its requested channels if it wins the auction. The payment for a data owner failing the auction is set to be zero, i.e., if . Once the auction results are released, an FL session starts and the selected workers train the local model using the local data. Meanwhile, the platform keeps aggregating the local models and updating the global model. Finally, the platform pays the workers when the FL session is completed.
IiiD Service Cost in the FL Market
Besides the data cost defined in (4), the data owner also needs to calculate the costs of computation and communication to estimate its service cost if it becomes the worker. The data owner ’s computation cost is defined as
(8) 
where is the data owner ’s unit computation cost, mainly including the energy expenditure and the equipment depreciation. Since the structures of the global model and the local model are the same if applying FedAvg, we use to denote the model size. With respect to the communication cost, we ignore the communication overhead and assume the channel is slowfading and stable. Since this paper focuses on the design of incentive mechanism, we consider a frequencydivision multipleaccess (FDMA) communication scheme. This is also for simplicity and minimum communication interference. Nonetheless, other more sophisticated wireless communication configurations can be adopted with slight modification in the cost function. According to Shannon’s formula, the data owner ’s communication power cost is , where is the channel bandwidth, is the number of data owner ’s requested channels, is the total bandwidth, is the normalized channel power gain, is the channel gain between the data owner and the FL platform (as a base station), and is the onesided noise power spectral density. The total cost for communication is
(9)  
(10) 
where is the total time for model transmission, is the data owner ’s unit energy cost for communication. The channel conditions of different subcarriers for each data owner can be perfectly estimated. That is, is known by both the data owner and the platform. Adding all costs in (4), (8) and (9) together, the data owner ’s total service cost is
(11) 
Since our proposed auction mechanisms are truthful (to be proved later), the reported bid is equal to the true service cost , i.e., .
Similarly, the FL platform has the computation cost for model averaging and the communication cost for global model transmission defined as follows:
(12) 
(13) 
where and are respectively the unit costs for computation and communication. Hence, we have the platform’s total cost as follows
(14)  
(15) 
IiiE Social Welfare Optimization and Desired Economic Properties
With the data utility and the service cost introduced in Sections IIIB and IIID, we can obtain the utility functions of all entities. The FL platform’s utility is the data utility minus the total cost and the total payments to workers, which is written as
(16) 
The data owner ’s utility is the difference between its payment and service cost , which is expressed as
(17) 
In Section IV, we design the auction mechanism to maximize the social welfare which can be regarded as the FL system efficiency [32] and is defined as the sum of the platform’s utility and the data owners’ utilities. Formally, the social welfare maximization problem is
(18)  
(19)  
(20) 
As we consider the FDD communication scheme, the constraint in (20) requires that the sets of workers’ allocated channels have no conflict with each other. For an efficient and stable FL market, the following economic properties should be guaranteed.

Truthfulness (Incentive compatibility, IC). The data owner has no incentive to report a fake type for a higher utility. Formally, with other data owner’s types fixed, the condition for the truthfulness is
where is data owner ’s true type and is a false type.

Individual rationality (IR). No data owner will suffer a deficit from its FL service provision, i.e., ,.

Computational efficiency (CE). The auction algorithm can be completed in polynomial time.
Iv Reverse Multidimensional Auction Mechanism for Federated Training
In this section, we first design a truthful auction mechanism, called Reverse Multidimensional auction (RMA) mechanism, to maximize the social welfare defined in (18). As presented in Algorithm 2, the RMA generally follows a randomized and greedy way to choose the FL workers and decides the payments. It consists of three consecutive phases: dividing (lines 29), worker selection (lines 1220) and service payment determination (lines 2141).
The RMA first divides the workers into groups, i.e., , according to the EMD metric. Each group consecutively covers an EMD interval . That is, the data owner whose EMD value falls in will be put in the group . Meanwhile, we define a virtual EMD value for the data owner in group by the corresponding interval midpoint, i.e., . For group , the virtual social welfare is calculated by using the virtual EMD value as follows:
(21)  
(22) 
where and . Let denote the set of workers that have channel conflicts with the worker set . We introduce the marginal virtual social welfare density for the worker in group defined as
(23)  
(24) 
For the sake of brevity, we simply call it marginal density.
We use to denote the set of already selected workers from other groups. In each group , the RMA first excludes the workers that are conflicted with , i.e., . Then, the RMA finds and sorts the data owners which have no channel conflict with each other in by nonincreasing order of the marginal density:
(25) 
where is the set of first sorted data owners and . There are totally data owners in the sorting and the th data owner has the largest marginal density in while having no channel conflict with data owners in . From the sorting, the RMA aims to find the set containing data owners as workers, such that and (lines 1219).
Once the set of workers in group has been determined, the RMA reexecutes the worker selection on the set of data owners in group (except the data owner ), i.e., , to calculate the payment for worker (lines 2234). Similarly, the RMA sort data owners in as follows:
(26) 
where is the set of the first data owners in the sorting and . From the sorting, we select the first data owners as the workers where the th data owner is (1) the first one that has a nonnegative marginal density and channel conflicts with worker , i.e., and , or (2) the last one that satisfies and . If the data owner is chosen by the condition (1), the payment is set to be the bid value such that the worker and the data owner have equal marginal density on , i.e., (lines 3133). If data owner is chosen by condition (2), is set to be the maximum value such that , or (lines 2830 and 3539).
The dividing phase decomposes the original auction mechanism into a set of subauctions. We use to denote the subauction mechanism for group . Since the data owners in each group have the same EMD value and the reported channel information is true, only the bid and the data size in the type can be manipulated. Thus, each subauction can be reduced to a deterministic reverse multiunit auction where each data owner bids to sell data units. Reflected in the data utility function in (6), the data units here essentially represent the data owner ’s service quality. Here, again, the data owners are singleminded, which means they can only sell the reported amount of data units. The deterministic auction mechanism here means the same input types will deterministically generate the same unique output. As the randomization is applied over a collection of deterministic mechanisms (line 11), the original auction mechanism is a randomized auction mechanism [2]. Our design rationale of each subauction is formally presented in Theorem 1 which adopts the characterizations for the truthful forward multiunit auction presented in [23, Section 9.5.4].
Theorem 1.
In the reverse multiunit and singleminded setting, an auction mechanism is truthful if it satisfies the following two properties:

Monotonicity: If a bidder wins with type , then it will also win with any type which offers at most as much price for at least as many items. That is, bidder will still win if the other bidders do not change their types and bidder changes its type to some with and .

Critical payment: The payment of a winning type by bidder is the largest value needed in order to sell items, i.e., the supremum of such that is still a winning type, when the other bidders do not change their types.
We next show the desired properties of the RMA, including the truthfulness (Proposition 1), the individual rationality (Proposition 2) and the computational efficiency (Proposition 3).
Proposition 1.
The RMA mechanism is universally truthful (incentive compatible).
Proof:
We first investigate the truthfulness of the subauction . Since the RMA guarantees that data owners in the same group have the same virtual EMD value and the group selection is random (line 11), data owners have no incentive to report false EMD value. Therefore, we just need to discuss the truthfulness of the reported data size and the bid. According to Theorem 1, it suffices to prove that the worker selection of is monotone, and the payment is the critical value for the data owner to win the auction. Given a fixed EMD value , we construct a function as
(27) 
where and , are parameters. The first derivative and the second derivative of are receptively
(28) 
(29) 
Since and , we can find that and which means is a convex and monotonically decreasing function. Note that expanding is equivalent to increasing the total data size . Substituting and into , we can find which is monotonically decreasing with since and the monotonicity and convexity of . It is also clear that the marginal density defined in (23) is monotonically decreasing with the bid while monotonically increasing with . As the data owner takes the th place in the sorting (25), if it changes the type from to by lowering its bid from to ( ) or raising the reported data size from to ( ), it will have a larger marginal density . Since is a decreasing function of , the data owner ’s marginal density can only increase when it is at a higher rank in the sorting (25), i.e., . Thus, we have proved the monotonicity condition required by Theorem 1.
We next prove that calculated by Algorithm 2 is the critical payment, which means that with fixed, bidding a higher price causes the worker to fail the auction. As mentioned above, the final payment depends on the data owner in the sorting (26). If the th worker has channel conflict with the worker , summiting a higher bid makes worker be ranked after data owner , i.e, , and then worker would be removed from the candidate pool in the subsequent selection. If the data owner has no channel conflict with the data owner , a higher bid still causes and , which apparently cannot lead the data owner to win the auction. Thus, the truthfulness of the subauction is proved. Since each subauction is truthful and the original auction mechanism is a randomization over the collection of the subauctions, we can finally prove that the RMA mechanism is universally truthful [22, Definition 9.38]. ∎
Proposition 2.
The RMA mechanism is individually rational.
Proof:
Let denote the worker ’s replacement in the payment determination process, i.e., the th data owner in the sorting (26). As the data owner must be after the th place in the sorting (26) or even not in the sorting if worker wins the auction, we have . As shown in the Algorithm 2, the payment for worker is the maximum winning bid , which means the corresponding marginal density satisfies . Since is monotonically decreasing with the bid (see the proof for Proposition 1), we have , which means the worker ’s utility defined in (17) is nonnegative, i.e., . Therefore, we can guarantee the individual rationality of each subauction and the original RMA mechanism . ∎
Proposition 3.
The RMA mechanism is computationally efficient.
Proof:
For each subauction (lines 1241) in Algorithm 2, finding the workers in group with the maximum marginal density has the time complexity of (line 14). Since the number of workers is at most , the worker selection process (the whileloop lines 1320) has the time complexity of . In the payment determination process (lines 2141), each forloop executes similar steps as the whileloop in lines 1320 and the payment determination process generally has the time complexity of . Dominated by the forloop (lines 2141), the time complexity of a subauction (Algorithm 2) is . Since and , the running time of the original RMA is bounded by polynomial time . ∎
V Deep Reinforcement Learning based Auction Mechanism (DRLA)
Although the RMA mechanism can guarantee the IC, IR and CE, its achieved social welfare is still restricted. The reasons are that the randomization may degrade the social welfare performance and the channel conflicts among workers is not well represented and exploited. Resolving these issues is very challenging. In this section, we attempt to utilize the powerful artificial intelligence (AI) to establish an automated mechanism for improving the social welfare while ensuring the IC and IR. Specifically, we first use the graph neural network (GNN)
[26] to exploit the conflict relationships and generate effective embeddings. Based on the embeddings, we propose a deep reinforcement learning (DRL) framework to design truthful auction mechanisms in order to improve the social welfare.Va Feature engineering with embeddings of wireless spectrum conflict graph
Although the bid , data size and EMD in data owner’s type and the channel information are already continuous variables, the information of requested wireless channels is a discrete variable which restricts directing applying the DRL approach. Therefore, we construct a spectrum conflict graph [34] to represent the channel conflicting relationship among the data owners. As illustrated in Fig. 2, each node in the graph is a data owner and each undirected edge represents the conflicting relationship between two connected data owners. Due to the differences in some aspects, such as hardware or wireless channel occupancy, each data owner may have different demands for wireless channels. Taking an example with data owners, the data owners and respectively request channels , and . Since the data owners and are singleminded and both of them request the channel , they are conflicting in wireless channels and there should be an edge between data owners and . The data owner has no channel conflicting with any other worker’s requested channels, so there is no edge connected to data owner . To map the discrete channel information to continuous embeddings, we specifically apply a multilayer Graph Convolutional Network (GCN) [13] in which the th layer output is calculated by
(30) 
where is an allones matrix and denotes the adjacency matrix with selfconnections.
is an identity matrix,
is the diagonal degree matrix of and is the trainable weight matrix of theth layer. We use the rectified linear units
[21]as an activation function. Then, the embedding
of each data owner (node) generated by the GCN can be obtained from the output of the last layer