Auction-based Charging Scheduling with Deep Learning Framework for Multi-Drone Networks

01/09/2020 ∙ by MyungJae Shin, et al. ∙ 0

State-of-the-art drone technologies have severe flight time limitations due to weight constraints, which inevitably lead to a relatively small amount of available energy. Therefore, frequent battery replacement or recharging is necessary in applications such as delivery, exploration, or support to the wireless infrastructure. Mobile charging stations (i.e., mobile stations with charging equipment) for outdoor ad-hoc battery charging is one of the feasible solutions to address this issue. However, the ability of these platforms to charge the drones is limited in terms of the number and charging time. This paper designs an auction-based mechanism to control the charging schedule in multi-drone setting. In this paper, charging time slots are auctioned, and their assignment is determined by a bidding process. The main challenge in developing this framework is the lack of prior knowledge on the distribution of the number of drones participating in the auction. Based on optimal second-price-auction, the proposed formulation, then, relies on deep learning algorithms to learn such distribution online. Numerical results from extensive simulations show that the proposed deep learning-based approach provides effective battery charging control in multi-drone scenarios.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 10

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The possibility to use commercial drones in a broad range of applications is being extensively studied by the research community, and they are expected to manned operations in remote locations [park2017battery]. In general, commercial drones have inherent limitations in the amount of energy available to support their operations. This is due to the energy/weight ratio of current energy storage technologies, where increasing the capacity of the battery beyond a certain point degrades flight time due to excessive weight.

As a consequence, effective battery management is one of the main enablers of practical deployments of drone-based technologies and applications. Importantly, in applications requiring extensive flight time, the energy constraint problem can not be solved by only optimizing power consumption. Thus, charging during the completion of long-term tasks has been proposed to extend the operational range of the drones [park2017battery, couture2009adaptive]. There are several ways to powering the drones which have been proposed in the literature. We can divide them into two main classes: (i) harvesting energy directly from the surrounding environment, and (ii) taking energy from an electrical source such as a charging station [couture2009adaptive]. Within the latter class of approaches, the charging stations can be either stationary or mobile. However, solutions based on stationary charging stations may constrain the geographical area of operations around specific locations. In order to deal with this issue, mobile charging stations can be used although they face other challenges [couture2009adaptive]. As they are mobile, the size of these charging stations needs to be comparably smaller to that of fixed stations. As a consequence, the capacity of the system has limitations, leading to relatively low chargin speeds and a relatively smaller number of drones that can be charged simultaneously [couture2009adaptive, frankenberger2017mobile].

Motivated by this compelling problem, we consider a scenario where multiple drones compete to access the services provided by a mobile charging station (see Fig. 1). The framework proposed in this paper controls the charging process of the drones, where the charging station takes the role of leader in the distributed drone-charging system, and coordination within the system is supported by Internet-of-Vehicle (IoV) networking functions [wang2018internet, hou2016vehicular, chen2018capacity].

Fig. 1: Multi-drone network model for mobile charging stations.

We take an econometric approach, where the problem of controlling the scheduling is formulated as an auction, whose objective is to maximize the utility of the drones (i.e., the difference between payment and bid during auction computation) as well as the station’s revenue (i.e., payment received by the drones through charging scheduling). In general auction problems, buyers (the drones in this system) bid to access services periodically auctioned by a seller (the mobile charging station in the considered setting). The value of the bid is individually, and privately, estimated by each drone based on the urgency of its charging needs. The auction approach is especially useful when there is no accurate estimation of the buyer’s true valuation, and buyers are not aware of the private true values of other buyers. In the drone network model considered herein, the drones are assumed to be non-cooperative, that is, they operate independently and distributely. Furthermore, the mobile charging station is not assumed to know the exact true values associated with each drone, which became available only when the actual values are submitted. The auction approach we take is especially suitable to solve the problem of assigning time slots to drones in this information-limited system. Among the various auction formulations available (e.g., ascending auction, descending auction, first price auction, second price auction), we choose a second price auction formulation, where the highest bidder wins but the price paid is set to the second highest bid. One of the main benefits of the second price auction is that it results in a truthful auction process.

In the considered system model, the mobile station is the auctioneer and owner/seller of the resource (that is, the charging time slot) and each drone is considered as a buyer. The drones are in competition for scheduling battery charging with price bidding via its own private valuation for auction. As auctioneer, the mobile station (i) receives all bids from the drones, (ii) calculates the charging time allocation probabilities and payments, (iii) assigns the charging time to the drone (i.e., the winner in auction) who bids the highest value, corresponding to the largest allocation probability, (iv) announces the value which should be paid by the winner drone, and (v) receives the payment.

During the auction, drones strategically submit bids to increase their profits, i.e., utility. Similarly, the resource-owned auctioneer is not a sacrificial seller, thus it is required to consider the revenue in auctioneer, i.e., profitable. Therefore, revenue-optimal auctions have been considered as one of major objectives in auction design. Although there are many variants already available in the literature auction theory, the problem of simultaneously optimizing auctioneer’s revenue and buyers utility is still open [yong2015double, chang2010auction, wen2015quality, yi2015multi]. Among various auction algorithms, Myerson auction is one of the most efficient revenue-optimal single-item auctions [myerson1981]. The auction transforms the bid value, and then winner and payment is determined based on the transformed bid. At that point, if the transformation function is monotonic, the revenue-optimal auction is configured. Therefore, the proposed auction designs the revenue-optimal auction based on the concept of the Myerson auction.

However, it is difficult to apply the existing auction as it is in the distributed drone network environment considered in this paper. The charging scheduling system of the drones is still in the early stages of research; and key properties of the system such as drones location distribution and residual energy distribution have not been fully characterized in the literature. Therefore, a system that can extract the desired data (i.e., distribution of drones), from the actual system without prior knowledge or assumptions is desirable. Therefore, this paper takes advantage of deep learning to learn important features on-the-fly from the operating environment. Recently, frameworks combining game theory and deep learning have been active subject of research 

[subba2018game, tian2018application]. Results illustrate applications of such approach in various domains [cst2018wang, you2017deep, sill1998monotonic]

. The key is that deep learning can automatically extract and learn important features from data, and it has been widely demonstrated that neural network structures can approximate complex non-linear functions 

[csaji2001, dutting2017, luong2017]. In this paper, we use this feature to approximate some key – monotonic – functions governing the behavior of the system using relatively simple neural networks [sill1998monotonic]. Specifically, we use deep learning to learn the features necessary for the virtual transformation step of Myerson auctions. Then, the proposed auction is configured by replacing the trained deep learning network with a virtual transformation function. We remark that the functions to be learn by the deep learning layer is non-decreasing monotonic [myerson1981].

The proposed deep learning network uses the ReLU

(activation function) and

softmax (classification function) which are widely used in optimization procedures. In addition, due to the fact that the operations mostly amount to linear multiplications, the proposed approach has low complexity, and its execution takes a limited amount of time.

Contributions. Our proposed auction-based charging scheduling algorithm makes the following contributions. First, the revenue of auctioneer is considered even if the drones submit false/fake bids, i.e., thus the proposed algorithm is self-configurable and truthful. The proposed auction automatically learns environmental features. In distributed drone scenarios, various time varying features exist that make self-configurable nature essential to adapt to different scenarios and environments. The proposed deep learning based auction structure is simple to be implemented and imposes a small computation burden.

Organization. The rest of this paper is organized as follows. Sec. II discusses related work, and then Sec. III describes the auction-based mobile charging model. In Sec. IV, the deep learning based approach is presented. In Sec. V, performance evaluation results are presented. Sec. VI concludes the paper.

Ii Related Work

There have been several research results to solve limited-battery and limited resource scheduling problems through auctions [park2017battery, couture2009adaptive, wellman2001auction, parkes2001auction]. The method in [park2017battery]

aims at optimizing battery assignment and drone scheduling, assuming that the battery can be quickly replaced. The joint assignment and scheduling problem is formulated as a two-stage problem, where the assignment problem is solved by a heuristic and the scheduling problem is formulated as an integer-linear programming (ILP) problem. This paper proposes the scheduling algorithm based on auction. The proposed method uses information provided by drones capable of communicating with mobile charging stations to overcome the inability of a central service provider to acquire perfect state information in a distributed drone network. However, a solution based on battery assignment necessarily maps to a stationary service station. This imposes some limitations  

[couture2009adaptive], which are mitigated when using mobile charging stations.

In [couture2009adaptive], a systems of mobile robots executing a transportation task supported by a charging station is considered. The location of the charging station is a major factor in determining the operations and performance of the robots, and the paper assumes that the mobile charging station is itself an autonomous robot that attempts to incrementally improve its location. Although this work considers a mobile charging station, the problem of charging scheduling is not considered. In a more general scenario, the resources of the charging station are limited and the number of robots to be charged may be larger than the actual charging capacity of the station. Therefore, the charging system will need to implement forms of prioritization to optimize the charging process. The method proposed herein incorporates a notion of priority using an auction formulation based on the valuation of the drones.

In [wellman2001auction], an auction mechanism is proposed to solve a resource allocation problem in a distributed computing system. The inherently distributed nature of the system makes the resolution of the problem much harder. The paper proposes an auction-based solution to address such challenge. The proposed mechanism is configured as a two auction mechanism, used to compute optimal solutions at the single unit within the distributed scheduling problem in a computationally efficient manner. However, [wellman2001auction] assumes prior knowledge of the environment where the auction mechanism is executed, which may limit its application in real-world distributed scenarios. The method we propose herein uses an auction-based solution to solve the resource allocation problem, and employs deep learning to extract the required features automatically from the environment, so that prior knowledge is not necessary.

The method in [parkes2001auction] addresses a distributed train scheduling problem using an auction method. The determination of the winner is formulated as a mixed-integer problem. The bidding strategy of the buyers is solved via dynamic programming. In the proposed method, the auctioneer computes the set of bids that maximizes revenue. Both the method proposed in [parkes2001auction] and the one proposed in this paper are based on an auction formulation to effectively solve resource scheduling in distributed environments and maximize the revenue of the auctioneer. However, the method in [parkes2001auction] differs from the proposed deep learning based auction in terms of the required prior information to conduct the auction. The deep learning-based auction proposed in this paper only requires limited information since as it can learn in real-time environmental characteristics and parameters.

Iii Charging Scheduling Mechanism Design

Variables Descriptions
The number of drones
Mobile charging station
Bid profiles
-th bid profile
-th user
Maximum battery capacity of
Remaining battery capacity of
Average amperage draw of
Battery discharge of
Charging rate per unit time
Scheduled charging time to
Amount of energy charged of
Flight time with current battery of
The valuation of
The bid of
The transformed bid of
Allocation probability of
The virtual payment of
Actual payment of
The forward transformation function for
Weight of -th group, -th unit for
Weight of -th group, -th unit of
Bias of -th group, -th unit for
Bias of -th group, -th unit of
The utility of
The number of groups in network
The number of units in group

The number of epoch

The number of bid sets
TABLE I: Notations

Drone Network Model. The system is composed of the mobile charging station and drones111The notation used in this paper is summarized in Table I.. The mobile station is governed by the charging service controller; and the service controller collects revenue by providing charging services. The revenue of the charging service controller is recorded and will be requested later to be paid to drone operators. This paper assumes that the mobile station can provide charging service to only one drone in each time slot. Thus, drones competes to obtain charging opportunities. Note that we consider a short-range Internet of Vehicles (IoV) multi-drone network supporting short-distance communications among drones based on IEEE 802.11-based wireless local area network (WLAN) technologies. Therefore, the size of the network composed of one single mobile charging station and multiple drones is relatively small, and we assume that the flight time from drones’ current positions to the mobile charging station is negligible. Thus, unexpected operational problems due to the delay induced by long flight time toward the mobile charging station are not considered in this paper. Furthermore, we note that the specific design, system capabilities and state of the drones participating in the auction can vary in terms of battery capacity, residual battery, charging rates and so forth. Formally, each drone is characterized by the battery capacity , average amperage draw , and battery residual charge , which determines the mission lifetime. The average amperage draw denotes the amount of amperage required to the drone to operate on-board systems such as motors, embedded computers, sensors, etc. Each drone continuously monitors its own state and requests the scheduling of a charging slot to the mobile station if needed.

The requests from multiple drones to the mobile station for charging services can be interpreted as a distributed competition for a limited resource, which here is modeled and solved using an auction-based approach. In the considered setting, the auctioneer is the mobile station, which is also the owner and provider of the resource, and the drones are the buyers.

The mobile station and drones exchange information, i.e., bids and other auction variables, over wireless links. The mobile station announces the start of the auction to the drones when the charging system is ready to serve (i.e., idle). Upon reception of the announcement, each drone makes its own private, and independent, valuation for the use of the charging system. The private valuation of drone is used to compete for the charging service. Note that the charging resource is assigned at the granularity of individual time slots as illustrated in Fig 2. The mobile station sells the charging service and obtains revenue paid by the winner drone via auction.

Fig. 2: Auction procedure.

Drone Scheduling Auction Design. We use second price auction (SPA) as a baseline to design the auction in the considered setting. In SPA, all buyers submit their bids privately. The auctioneer receives the sealed bids and selects as winner the buyer who made the highest bid. The amount paid by the winner is set to be equal to the second highest bid value. Herein, the problem of assigning slots to drones is formulated as a single item auction based on SPA. Therefore, drones compete for one item, i.e., the charging service. Since the proposed approach is based on SPA, it is guaranteed that the charging service will be assigned to the drone with the highest valuation to the service [sujit2007distributed, lemaire2004distributed, bertuccelli2009real]. Myerson presents provable analytical results for single item auctions optimizing the auctioneer revenue where each buyer has its own private valuation of the resource [myerson1981, dutting2017].

When the auction-based mechanism is designed, it is important to let the participants act truthfully to ensure system stability [myerson1981, dutting2017, luong2017, jiao2018, yen2018, khan2016]. Previous studies attempted to achieve this objective by enforcing truthfulness to individual participants. The concepts such as incentive compatibility (IC) and individual rationality (IR) are the characteristics of auctions inducing the truthful action of participants. Based on this approach, we use a Myerson auction where the following characteristic is used as the baseline mechanism: The Myerson auction guarantees dominant strategy incentive compatibility (DSIC) and IR.

Definition 1.

(Incentive Compatibility  [bartal2003]) Incentive compatibility is defined by the following property: if for every bidder , every valuation , all declarations of the other bidders , and all possible ”false declarations” , we have that bidder ’s utility with bidding is no more than his utility with bidding the truth . Formally, let and be the mechanism ’s output with input and and be the mechanism’s output with input , then .

Thus, this weaker degree of DSIC guarantees IC, where IC means that the utility a participant can obtain by acting truthfully is greater than that by fake acting according to Definition 1.

Definition 2.

(Dominant Strategy Incentive Compatibility) Dominant strategy incentive compatibility is defined as the following property. For each bidder , and for every possible report of the other bidders bid , bidder weakly maximizes utility by reporting . That is, for all possible reports , .

Thus, the DISC is a stronger degree of IC, meaning that a truthful action is a weakly dominant strategy, that is, the action is guaranteed to be the best, regardless of the actions of others, as shown in Definition 2.

Definition 3.

(Individual Rationality) Individual rationality (IR) is defined by the following property: for every bidder and for every , we have , that is, no bidder is ever asked to pay more than its bid valuation.

In a DSIC and IR auction, it is in the best interest of each bidder to report truthfully. Therefore, these characteristics make the overall auction truthful. The Myerson auction guarantees DSIC and IR, thus encouraging the bidders to report truthfully [myerson1981, dutting2017]. Furthermore, the Myerson auction also guarantees auctioneer’s revenue optimality. In the considered drone network model, we remark that the charging service controller obtains revenue by providing charging services. The following subsections describe in detail the components of the Myerson auction mechanism, i.e., private valuation, allocation rule, payment rule, reserve price, utility, and auction design.

Private Valuation. In the proposed auction, each drone has its own individual private valuation . Each drone has an maximum battery capacity (denoted by ) and a current remaining battery (denoted by ). If the drone is assigned the mobile charging service time slot, the charged energy will be added to the residual energy in its own battery . The amount of energy charged by the mobile charging station can be expressed as where is the charging rate per unit of time in the mobile charging station. The scheduled charging time to via auction is denoted by . denotes the item being sold by auction, i.e., charging time. The higher , the higher the valuation of by drone , and the drone is willing to pay a higher amount for the charging service. The expected drone flight time with current battery status is denoted as , calculated as . If is larger, the drone will give a smaller valuation to . Let denote the private valuation of drone . Then, can be expressed as .

Allocation Rule. The allocation rule is used to determine the winner drone based on the valuation, i.e., to find which drone should be scheduled for charging. In the Myerson auction, the allocation rule that awards the item to the highest bidder is monotone. Therefore, in the proposed auction model, the allocation rule used to award the charging service to the highest bidder is monotone. Therefore, the allocation rule can be expressed as follows:

(1)

Payment Rule. The payment rule is used to determine the payment by the winner drone based on the valuation. In the proposed auction, the payment rule chooses a payment which is not higher than the private valuation, and it can be expressed as follows:

(2)

where , stands for the variable to represent the winning valuation in the auction, and is the winner drone in the auction.

Reserve Price. The proposed auction sets a specific price called a reserve price. The reserve price is the minimum reward the seller accepts [myerson1981]. In this paper, the reserve price is set to 0. In the auction, the auctioneer solicits the private bids from the bidders and computes the allocation rule and payment rule .

Utility. The proposed auction guarantees DSIC and IR; and thus each bidder reports truthfully to maximize its own utility. Note that stands for the variable corresponding to the winning in the auction. If the drone wins in auction, is set to whereas the is set to otherwise. Thus, the utility of drone can be calculated as .

Revenue Optimal Auction Design. We define the virtual valuation and virtual surplus as in Myerson [myerson1981]. The virtual valuation of a buyer in the auction is a function used to calculate the expected revenue of the auctioneer from that buyer . The virtual surplus is the expected revenue excluding the computing cost defined below. In Myerson auctions, each bidder has its own individual private valuation which is drawn from the strictly increasing cumulative density function

where the probability density function of

is denoted as  [krishna2009auction, vickrey1961counterspeculation]. The virtual valuation of bidder with private valuation can be expressed as follows:

(3)

There is a cost in computing the outcome which must be payed by the auction [hartline2006lectures]. Given valuation , virtual valuation , and allocation rule , the virtual surplus can be calculated as follows:

(4)

In Myerson auction, the expected payment is proportional to the expected virtual surplus; and it can be computed as follows [myerson1981, hartline2006lectures]:

(5)

Therefore, if the virtual valuations are non-decreasing in valuations , the virtual surplus is non-decreasing in valuations . The bid is drawn from the distribution with probability density function . Then, the expected payment can be computed as follows:

(6)
(7)
(8)

As a result, the proposed auction approach, which consists of a variant of the Myerson auction, is DSIC, IR, and revenue optimal.

However, Myerson auctions require full knowledge of the distributions according to Eq. (8). In the considered scenario, it is hard to obtain such information a priori, and we propose to use deep learning to estimate the distributions. In previous research results, it has been shown that the deep learning with limited structure can approximate specific functions [csaji2001, you2017deep, sill1998monotonic]

. Specifically, herein, we use neural networks and unsupervised learning 

[dutting2017, luong2017] to approximate the virtual valuation function . The strength of deep learning is that the approximated function can be continually updated as inputs are acquired. The use of unsupervised learning makes the learning process possible, as it does not require the true values as input. The resulting auction is not only easily applicable to the distributed multi-drone network problem, but is also capable to adapt to continuously changing environments.

Fig. 3: The proposed deep learning framework (revenue network) for revenue-optimal auction computation.

Iv Deep Learning based Auction Design

In this section, a deep learning based method for single item auctions is introduced. The method defines allocation rule , payment rule , and virtual valuation function for maximizing the revenue of the mobile charging station via deep learning. The deep learning model constitutes the auction that guarantees DSIC and IR as well as enables the revenue optimal computation for auctioneer [dutting2017, luong2017]

. The revenue optimal auction can be configured through a relatively simple deep learning structure, i.e., composed of max/min operations and a loss function shaping the training process.

Theorem 1.

(Myerson [myerson1981]). There exist a collection of monotonically increasing functions : , referred to as the virtual valuation functions, for selling a single item in the DSIC mechanism, which assigns the item to the buyer with the highest virtual value assuming this quantity is positive and charges the winning bidder the smallest bid that ensures that the bidder is winning.

As mentioned earlier, the proposed deep learning based auction is a variant of Myerson auctions; and thus the bid set is transformed to the virtual valuation via virtual valuation transformation. Specifically, as expressed in Theorem 1, the bid set of are converted to , where denotes that the transformed bid of . In this procedure, the trained deep network (a monotonic network) is utilized to replace the virtual valuation function . The consist of two layers, and is composed of linear computation units and min/max operation units. Based on the transformed bid , the SPA with reserve price 0 (SPA-0) is performed. The SPA-0 calculates the allocation probability and the payment of the winner drone based on the rules (i.e., payment rule and allocation rule ) as follow:

Theorem 2.

(Myerson [myerson1981]). For any set of strictly monotonically increasing functions , an auction defined by the allocation rule and payment rule is DSIC and IR.

The should have non-decreasing monotone feature when converting into transformed bid . Therefore, the proposed deep learning network has a parameter constraint and a specific structure so that the deep learning network can be approximated to monotonic function via training process. The used parameters for deep learning, i.e., weights and biases, are positive. The structure of the network, shown in Fig 3, is rather simple. The two layers network is represented as  [sill1998monotonic, dutting2017].

The assignment rule consist of the softmax operation which has been used in deep learning based multimodal classification. The payment rules is composed of max operation and ReLU. The ReLU makes the transformed bid which is less than the reserve price of SPA-0 to be 0. The max unit is used to make be the highest transformed bid except . The results of ReLU and max unit are denoted by . is the value, before conversion to , which should be paid by the winner drone . Note that can be larger than . Therefore, in a IR auction, can not be the payment. Thus is converted to via . This process makes the result of deep learning based auction to be IR when revenue optimal auction is designed as shown in Fig 3. The can be expressed as . The computations of and is described as follows.

(9)
(10)

The two layers network constitutes the virtual valuation function of Myerson auction, as shown in Fig 3. In the , it is important to reuse the weights from the network as presented in (10) and (9). This forces to be equal to as in the case in which the Myerson virtual valuation function is based on full knowledge of the distribution . The result of is the payment which should be paid by winner drone . Therefore, the result is greater than the second highest bid of and smaller than the winning bid  [myerson1981].

Additional networks are required to implement the rules in overall auction processes. The deep learning networks used in the proposed auction consist of three modular networks as follows: (i) a network that can replace the virtual valuation function of the Myerson auction, (ii) a network for the allocation rule , and (iii) a network for the payment rule . The above networks are optimized according to a loss function via a training process. The loss function is essential to enable the deep learning computation of the same structure to have different characteristics [arjovsky2017wasserstein, mao2017least], and plays an important role in deep learning.

Loss Function. In this paper, the negative expected virtual surplus is used as a loss function where the virtual surplus is equivalent to the revenue of the mobile charging station, that is, the auctioneer and seller. The loss function is used to train deep neural network parameters (weights and biases). The deep neural network that configures a revenue-optimal auction is composed of weights (denoted as ) and biases (denoted as ) which replaces the virtual valuation function of Myerson. Hhere, the deep neural network model automatically learns the distribution, and fits its parameters to the actual distribution of the data during the training process. The trained neural networks are approximated by a virtual valuation function which is based on fully distributed knowledge. The parameters and of the deep neural network are trained through unsupervised learning without ground truth information, i.e., the winner (which drone will be scheduled for charging) and payment (how many the winner drone will pay). Therefore, the results of the allocation and payment rules are used for training parameters (i.e., and ) can be explained as follows:

(11)

where the loss function (11) stands for the expected negative revenue of auctioneer, i.e., the maximization of the expected revenue of the auctioneer since the loss function should be minimized eventually during the training procedure. Based on the loss function, the benefit of the deep learning based auction is seen in the training process. The proposed networks which replace the virtual valuation function as well as auction rules are optimized to DSIC, IR and the revenue optimal auction.

Deep Learning Training. The detailed training process of the three networks are summarized in Algorithm 1, where, based on the bid , the payment and allocation probabilities are calculated (line ). is the loss function to guide the deep learning network training. The negative expected revenue is used as the loss function. The loss function can be calculated by allocation probability and payment (line ). The is regularization factor which are used to regularize the deep learning parameters (weights and bias) (line ). The regularization prevents parameters from becoming excessively large. The training process is based on unsupervised learning; and thus the allocation probability and payment are the only required information. This means that the environmental information such as distribution of private valuation is not required. As a result, the proposed deep learning network can be easily applied to mobile charging stations. The parameters are determined by means of empirical experiments as the payments of winners are updated sensitively due to the weight range (line ).

Iv-a Deep Learning Networks

In this paper, the virtual valuation function is replaced by the two layers network , composed of the monotonic networks.

Monotonic Network. As shown in Fig 3, the monotonic network is a three-layer deep neural network. The input layer is configured with multiple groups composed of sets of linear units. The maximum value of each group is calculated in the second layer. The last layer selects the minimum value of the given output of the second layer. As the name suggests, the monotonic network is monotonic, and this characteristic is preserved regardless of the number of groups, units, and the order of min/max operations.

1.0 Input :  where each input set
Output :  Optimized weights and
Initialize :  The network weights and using Xavier initialization
1 while epoch r:  do
2       while   do
3             Forward:
4             ;
5             ;
6             ;
7             ;
8             = ;
9             ;
10             Compute the expected negative revenue ;
11             Compute weight loss ;
12             Compute =
13             Optimize:
14             Update and for minimizing ;
15             Clip ;
16             Clip ;
17            
18       end while
19      
20 end while
Algorithm 1 Deep Learning Training

Virtual Valuation Network. The virtual valuation function in the Myerson auction is replaced with the monotonic network. The computation of and is implemented as follows.

(12)
(13)

The bid of the drone is transformed to via the virtual valuation network . In the , all outcomes of are calculated on the same weights, whereas the calculates the outcome using different weights for each bid. The inverse computation of is denoted by . The that determines the payment of the winner drone which is composed of two networks. In the computation of , the weights of are used. Thus the computations of two layers can be expressed as follows:

(14)
(15)

The payment of the drone is transformed to via the . The consists of and . The same weights are used to calculate all outcomes of . The outcome of is calculated based on different weights for each bid, as shown in (15).

The monotonic network is responsible for the transformation of the virtual bid in auction. As mentioned above, the optimal revenue is equivalent to the optimal virtual surplus. Thus, the monotonic network is major component of the auction. However, in order to configure the revenue-optimal auction, the additional network by allocation and payment rules is required. In this paper, the payment and allocation rules are configured with ReLU and softmax

which have been mainly used in deep learning as an activation function. This makes backpropagation easy during the training process.

Allocation Rule Network (). This section describes in more details the structure of the allocation rules (). In this paper, since this allocation rule is implemented using a deep neural network, the probability is calculated using softmax

which converts the input vector into a probability vector. The allocation rule

awards the charging service to the highest bidder drone; and thus the highest probability is assigned to the highest bidder. The continuous function (2) traditional auction is approximated using the deep network, which converts the input vector to the probability vector. In the SPA auction with reserve price 0 (SPA-0), the allocation rule assigns the highest winning probability to the highest bidder whose transformed bid is greater than , . The softmax based assignment can be calculated as follows:

(16)

The parameter is a constant value and it determines the quality of the approximation. As the increases, the quality of the approximation increases, whereas the smoothness in the allocation network decreases. For simplicity, this means that the higher makes a large difference between the allocation probabilities of users [dutting2017]. When the networks are trained to minimize (11), the value of increases. As a result, since the profit of the auctioneer is related to the second highest , it is also a function of the parameter . Results in Sec. V) show how larger values of lead to higher profits.

Payment Rule Network (). This section describes the structure of the allocation rule (). The ReLU is widely used in deep learning computation as an activation function. In the proposed auction, the payment of drone is calculated from the transformed bid . Before the computation of , the deep network excludes the bid below the reserve price 0 via ReLU . The input is the second highest transformed bid which is the output of . The payment rule network can be, then, calculated as:

(17)

and the result is used as an input of (9), i.e., the actual payment of winner drone.

1.00 Input : , , Bid sets
Output : allocation probability set ,
payment set
1 while Mobile charging system is  do
2       Drones: charging scheduling valuation ;
3       Drones: submit bid ;
4       ;
5       ;
6       ;
7       ;
8       = ;
9       ;
10       Calculate winner and payment ;
11       Winner Drone: Pay payment;
12       Allocate charging system to the winner;
13      
14 end while
Algorithm 2 Deep Learning-Based Algorithm for the Auction Controlling the Charging Scheduling

Iv-B Overall Auction Mechanism

The overall deep learning-based auction mechanism is summarized in Algorithm 2. If the mobile charging system becomes idle, the auction is initiated (line ). The valuation for the charging time is computed by each drone based on its own private criteria. Then, based on the individual private valuation, each drone submits its bid (line ). The mobile charging station runs the auction using the pre-trained networks. If , then all the drones assign a low valuation to the charging time and the mobile charging system does not allocate the charging time to users. If there exist bids which are larger than reserve price 0, the corresponding allocation and payment probabilities are calculated using the proposed deep learning networks, i.e., virtual valuation network, allocation network, and payment network (line ). Because the proposed deep learning auction is the variant of SPA-0, any bid below the reserve price 0 is converted to 0 (line ). As shown in line , the mobile charging station assigns the payment of to the drone with the highest . Finally, the mobile charging station allocates the charging time to the winner drone (line ). The drone, then, reaches the charging station and occupy it for the duration of the slot. After the winner drone leaves the charging station, next iteration starts if the mobile station is idle.

(a) Revenue statistics, 5 drones
(b) Revenue statistics, 10 drones
(c) Revenue statistics, 15 drones
(d) Revenue statistics, 5/10/15 drones,
Fig. 4: Revenue changes by and .
SPA
5 drones 4.7532 7.0001 7.0121 7.1009
10 drones 5.8493 7.0345 7.1408 7.2235
15 drones 7.4829 8.0912 8.6038 8.6471
TABLE II: Revenue changes by , (in Fig.3(a)-3(c))
Variables Descriptions
The number of drones 5, 10, 15
Learning rate 0.0001
regularization parameter 0.001
Training set size 100000 bid sets
Simulation epoch 100
Approximate quality 1, 3, 5
Distribution of U[1:5], U[5:10], U[1:10]
Weight range B 0.0001
TABLE III: Parameters

V Performance Evaluation

Software Prototype. First, we describe the software developed to test the auction mechanism. The Xavier initializer was used for weight value initialization, where the biases were initialized as 0. As mentioned earlier, regularization is used to prevent excessive parameter growth during training and reduce overfitting. The regularization factor was set to . During the training phase, the Adam [kingma2014adam] optimizer was us to iteratively. This choice is motivated by the need to keep separated the learning rates for each weight. An exponentially decaying average

of previous gradients was used for iteration-based optimization. In the experiments, different uniform distributions were used for data generation, as shown in Table

III. Data-intensive evaluation was conducted with generated data sets. Among the data sets, % of sets were used for training; and the remaining

% were used for testing. The proposed deep learning-based auction mechanism was implemented in Python/TensorFlow 

[tensorflow]

and Keras 

[keras]. A multi-GPU platform (equipped with 2 NVIDIA Titan XP GPUs using 1405 MHz main clock and 12 GB memory) was used for training and testing.

Experimental Setting. The test environment includes 5, 10, or 15 drones. During performance evaluation, the parameter is determined to control the quality of approximation. First, we compare the proposed model with SPA-0 with a priori knowledge to demonstrate revenue-optimality. Results show the ability of the proposed deep learning-based approach to adapt to different scenarios. The valuation results of drones are generated based on various distributions as defined in Table III. Table III summarizes the used parameters.

(a) Revenue statistics of mobile charging station.
(b) Revenue of mobile charging station.
Fig. 5: Revenue analysis.

Revenue Analysis - Parameter . The proposed framework is based on the Myerson optimal auction, which produces an increased revenue to the mobile charging station compared to SPA-0 auctions. The experiments shown in Fig. 4 confirm this effect, and illustrate the effect of the parameter . Fig. (3(a))-(3(c)) show a comparison between the revenue of the mobile charging station – the auctioneer – as a function of the parameter defined in (16). In the experiments, the bid set is uniformly generated in the range of . The bid is calculated based on the private valuation as discussed in Sec. III. The value of system parameters, such as battery consumption rate, and weight, is also assumed to be uniformly distributed. The results in Fig.(3(a))-(3(c)) show that the revenue increases as the increases. The numerical results are presented in the Table II. The revenue gap between the SPA-0 and the proposed auction when the number of drone is is near when , near when and about when . The mobile charging station can take the highest revenue, i.e., the case where . This result shows that the revenue of mobile charging station increases in the order of , , and . The parameter determines not only the approximation quality of softmax function but also the revenue of charging station. In Fig. 4, the number of drones which participate in the proposed charging scheduling auction is updated. In general, more drones participate in auction, the higher the bid can be submitted to the auction with high probability; and thus the second highest bid value of the auction can be increased while the number of drones increases. Note that the revenue of auctioneer is equivalent to the payment of user. Therefore, the payment of winner drone increases. As a result, the revenue of mobile charging station becomes larger. In the SPA-0, the revenue is increased from to when the number of drones increases from to . Similarly, the revenue of proposed charging scheduling auction increases to when , when and when . This tendency is maintained when the number of drones increases from to , as shown in (Fig.3(a)-3(c)). Fig. 3(d) and Table. II shows the revenue of model when the number of drones increases from to . In this evaluation, the training of deep learning network uses the pre-trained weights. Based on this experiment result, we can confirm that the proposed deep learning auction provides higher revenue to mobile charging system when the number of drones increases. In Fig.4, the horizontal axis of the experiment means the iteration of the proposed deep learning network training. The convergence of the deep learning networks during small number of iteration shows high adaptability to specific applications. Fig.(3(a)-3(c)) show that the proposed deep learning-based auction can achieve stability in approximately 300 iterations. Fig. 3(d) shows that the stability can be achieved much faster when the pre-trained network is used. This results mean that the proposed deep learning based auction has high adaptability; and thus it can be applied to the various environment with partial knowledge valuation distribution as presented in Sec. III. In Fig.4, the results show that the proposed auction guarantees the increased revenue of mobile charging station over SPA-0 and has a highly adaptive algorithm under partial knowledge distribution (a.k.a., not fully distributed knowledge).

SPA
Mean 8.0536 8.0548 8.6032
Top 25 percentile 6.6003 6.6081 7.1733
Top 75 percentile 9.0358 9.0372 9.3873
TABLE IV: Revenue statistics (in Fig. 4(a))
Case
SPA 7.5585 7.7175 6.1124 6.4550 5.7769
7.7392 7.8891 6.2405 6.5808 5.9459
7.9419 9.5227 8.8311 9.5005 9.9011
TABLE V: Revenue of mobile charging station (in Fig. 4(b))
(a) Payment changes due to false bidding.
(b) Increase of payment against SPA due to false bidding.
Fig. 6: Payment comparison among drones.

Statistical Analysis (Parameter ). In this section, we show the case where the penalty given to participant who bids a false bid. We confirm that the proposed method imposes penalty on the false bidder. In addition, the experimental results shows that how the penalty varies depending on values. In Fig. 4, the effect of parameter can be observed while the number of drones varies. Fig. 5 shows that the statistics analysis of revenue values for difference values in (16) and SPA-0 when the number of drones does not vary. The experiment results compare the average revenue, maximum revenue, minimum revenue, top 25 percentile, and top 75 percentile. The evaluation uses the deep learning networks when the values are and . As increases, the gap between the average revenue of model and the average revenue of SPA-0 get larger as shown in Fig. 4(a) and Table. IV. When the proposed model is , the revenue average is , similar to the revenue average of SPA-0. However, when the value of model is , the revenue average is ; and thus the model gets near % higher revenue average than SPA-0. When , the gap between the proposed model and SPA-0 is near in terms of top percentile whereas the gap is about , when . In addition, in terms of top percentile, the revenue of model is about larger than model and SPA-0. This result shows that the proposed model with large takes higher revenue. Therefore, we can confirm that the revenue of mobile charging station declines in the order of , , and SPA-0. In Fig. 4(b), the graph shows the results of validation experiments, i.e., cases are considered in the validation experiments. The number on the -axis in Fig. 4(b) represents the indices of individual cases. The result stands for the revenue of mobile charging station via deep learning auction. We can confirm that the revenue with is always smaller than the one with . The gap between the and models is about in Case ; and the Case is the minimum, whereas the maximum gap is about in Case as shown in Table V. The revenue with is larger than the SPA-0, but similar to SPA-0. The gap between the and SPA-0 is in Case ; and the Case is the minimum. The maximum gap is about in Case . This experiments also show that the gap between the results by the two models with / and the results of SPA-0 are not always equivalent. For example, the gap between and models is near in Case m however near in Case . This is due to the fact that the transformation depends on the weight of ; and thus the transformation via is not applied equally to the same bids. This means that if and also , these two bids can be transformed differently. Therefore, the payment is not always equivalent. This means that the proposed deep learning auction adapts to the bid distribution at the time of the auction procedure, giving the mobile charging station high revenue. It can be seen that higher revenue is guaranteed by increasing the value of parameter .

False Rate
SPA (6a) 8.6177 8.6177 8.6177 8.6177
(6a) 11.2161 17.8814 24.3196 32.4915
(6a) 11.3513 18.5424 25.1623 33.8989
(6b) 130.15% 207.49% 282.20% 377.03%
(6b) 131.72% 215.16% 291.98% 393.36%
TABLE VI: Payment of drone (in Fig. 5(a))

The proposed deep learning based auction algorithm has a strength in terms of giving penalty to false bidder. In Fig. 6, experiment results present the payment of drone when the drone submits bid falsely (i.e., fake bid). This experiment conducts with the models of and . The experiment assumes that the number of drones which participate in auction is . The truth valuation of drone which submits the bid falsely is set to . The bid values of the other drones are generated by uniform distribution. This experiment uses the scenario where drones exist and one is with fake bid and the other four are with truthful bids. In Fig. 5(a), the second highest bid is set to as shown in Table VI. This result shows that a drone cannot win the auction when it bids up to times larger than the true valuation . As a result, the drone is defeated in the auction due to false bid. On the other hand, when a drone submits bid as times larger than true valuation, the fake bid leads to win in auction. However, the fake bid increases the payment in the fake-bid drone. For example, if the drone submits near bid falsely in the SPA-based auction, the drone can only pay about . However, the payment is in the proposed auction with . This means that the bidding of drone which falsely submit the bid for getting charging increases the payment. In addition, the payment increases when the value of models increases. Table VI also shows the payment increment while increases. The increased payment of the proposed model is at least % greater than that of the auction using SPA-0 as shown in Table VI. When the drone submits true valuation, the payment is just about % higher than the SPA-0 auction. However, if the bid is times larger than the true valuation, the payment is about % larger than the SPA-0 auction. This experiment shows that if drone submits false bid for winning the auction, the drone gets a loss in terms of the payment; and thus the loss let the drone avoid fake bidding.

(a) Transfer learning by changing bid distribution,
(b) Number of discharging drones during the auction,
Fig. 7: Performance improvements under various bid distributions.

Fig. (6(a)) shows the proposed deep learning based auction can be trained through transfer learning when the distribution of bid values varies. The dotted lines are revenue when the SPA-0 is executed. The red and blue lines stand for the revenue when the proposed algorithm is used. The revenue is higher than SPA-0 as shown in previous experiments. The proposed model is stabilized with 400 training iterations if training starts from the initialized model. If the training starts from the trained model (i.e., transfer learning), the proposed model is stabilized with approximately 100 training iterations when the bid distribution varies. This experiment shows that the proposed model adapts to the change of the bid distribution and can provide reasonable results at various distribution. In Fig. (6(b)), we consider the flight energy consumption of drones in this experiment as follows [pugliese2016modelling, zorbas2016optimal].

(18)

where is a motor speed multiplier, is the minimum power needed to hover just over the ground (when altitude is almost zero), means the height at time step , and is the speed of drone and is the maximum power of motor to flight. Therefore, the term refers to the power consumption needed to lift to height with speed  [pugliese2016modelling, zorbas2016optimal]. In this experiment, we set the values of to , to , to , and to . Fig. (6(b)) shows that the number of drones discharged from battery. In this experiment, we assumed that the charging service fully charges the battery of the drones which wins in the auction. The charging service is only for time slot. Therefore, the drones consume per time step (1 hour) and recharge when recharged. In this experiment, we assume that drones exist and they want to constantly join the charging service scheduling. In Fig. (6(b)), when the proposed deep learning auction is used to schedule charging services, of the drones can be charged without being discharged. This experiment shows that the proposed method can increase the drone flight time in multi-drone networks.

(a) bid change,
(b) bid change,
(c) bid change,
Fig. 8: changes according to drone’s valuation.
(a) bid change,
(b) bid change,
(c) bid change,
Fig. 9: changes according to drone’s valuation.

Deep Learning Model Characteristics. These experimental results explain why monotonic network is considered as the baseline structure of the proposed deep learning auction architecture. In the proposed auction, when bids are transformed to virtual values through the monotonic network, the order of bids must be maintained. For example, if the bid of user is larger than the bid of user , converted must be larger than . The corresponding experimental results show that the monotonic network performs the transformation that maintains the order. Fig. 8 shows the transformed value which is the result of virtual valuation function when the bid of the winner drone and the second highest bid increase from to . The fixed weights of the were used when the winning bid and the second highest bid were transformed because the weights are continuously updated during deep learning training process. For this experiment, the networks are trained when the winning bid is , the second highest bid is , and other bids are generated along the uniform distribution. In Fig. 7(a), the transformed value of the second highest bid is higher than the one of winning bid. This tendency can be shown in Fig. 7(b) and Fig. 7(c). By comparing the Fig. 7(a) and Fig. 7(b), it can be observed that the transformed bid decreases as the number of drones increases. However, through Fig. 7(b) and Fig. 7(c), we can confirm that the transformed value is independent to the number of drones. Instead of the number of drones, the weights are affected by the allocation probability and payment of other bids because the loss function is configured based on the allocation rule and payment rule. Fig. 8 also shows the network conducts non-decreasing monotonic transformation. Therefore, the network is able to replace function in Myerson auction because the network performs a monotonic transformation.

In Fig. 9, the changes of is presented when the bid of the winner drone and the second highest bid increase from to . The fixed weights of the were also used. The deep learning network is trained when the winning bid is , the second highest bid is , and other bids are generated along the uniform distribution similar to the evaluation for Fig. 8. It can be seen that the result of is larger than the one of the shown in Fig. 8 because the computation of is conducted based on the result of the as well as the does non-decreasing monotonic transformation. Through comparison of Fig. 8(a) and Fig. 8(b), it can be seen that the result of decreases when the number of drones increases, as shown in Fig. 8. However, in Fig. 8(c), it can be observed that the result of with drones becomes larger than the one of with drones. Therefore, this experiment shows that the weights are affected by the allocation probability and payment and they are independent to the number of drones. The transformed bid of the second highest bid is larger than the one of the winning bid. The tendency which is shown in Fig. 8 is maintained. The network has independent weights per input data. That is, if there are inputs, there are networks, and all learning networks have different weights. However, network is just one network regardless of the number of inputs. If there are inputs, only one network exists. Through Fig. 8 and Fig. 9, it can be observed that the non-decreasing monotonic feature of networks and are trained by the limited network structure and the loss function regardless of whether the weights are shared or not.

Vi Concluding Remarks and Future Work

The proposed deep learning based auction is revenue optimal for mobile charging scheduling in distributed multi-drone networks. In this paper, the mobile charging scheduling problem is interpreted as auction problem where each drone bids its own valuation and then the charging station schedules drones based on it in terms of revenue-optimality. Through the proposed deep-learning based solution approach, the charging auction enables efficient scheduling by automatically learning the required knowledge (i.e., bids distribution), which is required in conventional auction mechanisms. Therefore, environmental information is not required anymore in auction computation. This makes effective troubleshooting possible in distributed multi-drone networks. The proposed algorithm only requires payment and allocation probabilities by the multi-drones. The loss function in deep learning computation is an important factor that allows the proposed auction to be constructed based on environment independent information. As verified via software prototype based performance evaluation, following facts are observed: (i) guaranteeing optimal revenue in terms of individual rationality and dominant strategy incentive compatibility, (ii) limiting the false bids of drones by increasing the payment to the false-bid drones, and (iii) enabling a revenue optimal auction to be constructed without complex prior knowledge, i.e., bids distribution.

As future research directions, advanced auction mechanism designs with multiple mobile charging stations are worthy to consider. In this case, the problem can be formulated with multi-item auction and then the corresponding mathematical formulation, verification, and analysis are desired. Furthermore, the proposed deep learning-based auction mechanisms can be advantageous in various applications. For example, visual attention is considerable because it can be reformulated as resource allocation [zhang2016detection, zhang2017co, han2006unsupervised, han2015background].

References