I Introduction
The internet of things (IoT) has emerged as a huge network, which extends connected agents beyond standard devices to any range of traditionally noninternetenabled devices. For instance, a large range of everyday objects such as vehicles, home appliances and street lamps will enter the network for data exchange. This extension will result in an extraordinary increase of required cover range and data amount, which is far beyond the existing network capability. For online data processing with delay requirement, the conventional cloud computing will face huge challenges. In order to collect and process big data sets with wide distribution, mobileedge computing (MEC) and unmanned aerial vehicle base stations (UAVBSs) have recently emerged to add existing networks with intelligence and mobility.
Conventionally, cloud computing has been deployed to provide a huge pool of computing resources for connected devices [1]. However, as the data transmission speed is limited by communication resources, cloud computing can not guarantee its latency [2]. In face of high data rate in IoT, the data transmission load will overwhelm the communication network, which poses great challenge to online data processing. Recently, mobileedge computing (MEC) has emerged as a promising technique in IoT. By deploying cloudlike infrastructure in the vicinity of data sources, data can be partly processed at the edge [3]. In this way, the data stream in network will be largely reduced.
In existing works, the problem with respect to computation offloading, network resource allocation and related network structure designs in MEC have been broadly studied in various models [2, 4, 5, 6, 7]. In [2], the authors employed deep reinforcement learning to allocate cache, computing and communication resources for MEC system in vehicle networks. In [4], the authors optimized the offload decision and resource allocation to obtain a maximum computation rate for a wireless powered MEC system. Considering the combination of MEC and existing communication service, a novel twolayer TDMAbased unified resource management scheme was proposed to handle both conventional communication service and MEC data traffic at the same time [5]. In [6], the authors jointly optimized the radio and computational resource for Multiuser MEC computing system. In addition to the edge, the cloud was also taken into consideration in [7].
MEC system design considering computation task offloading has been sufficiently investigated in previous works. However, for IoTbased big data processing, MEC server may also serve to process local data at the edge [8, 9, 10]. In [8], the authors discussed the application of MEC in data processing. In [9], the authors indicated that edge servers can process part of the data rather than completely send them to the cloud. Then in [10], the authors proposed a scheme for this system. In the field of edge computing, the research of edge data processing algorithm is still an open problem.
In IoT network, devices are often widely distributed with flexible movement. In this situation, conventional ground base station faces great challenge to provide sufficient service coverage. To figure out the problem, unmanned arial vehicle base stations (UAVBSs) has recently emerged as a promising technique to add the network coverage with flexibility. In UAVBSs wireless system, energyaware UAV deployment and operation mechanisms are crucial for intelligent energy usage and replenishment [11]. In the literature, this issue has been widely studied [12, 13, 14, 15, 16]. In [12], the authors characterized the UAVground channels. In [13], the optimal hovering attitude and coverage radius were investigated. In [14], the authors jointly considered energy efficiency and user coverage to optimize UAVBS placement. In [15], the authors considered the placement of UAVBSs with the criterion of minimizing UAVrecallfrequency. Furthermore, UAVBSs were also considered as a MEC server in [16]. However, they only considered one UAVBS, focusing on the computation offloading problem. Besides, the cloud center was excluded from discussions.
In IoT network, the data sets are generated by distributed sensors, which reflect their local information. In tasks such as supervision, the network is supposed to keep collecting and processing distributed data. Considering data freshness, the system should work in an online manner. In conventional cloud computing, all data will be transmitted to the cloud through base stations. Though the cloud may be powerful enough, the huge amount data will still pose a heavy load on the communication network. Furthermore, building base stations in a large region may cost too much, especially for rural regions. In this paper, we consider a MECbased IoT network, where hovering UAVBSs are deployed as edge servers. The network structure is shown in Fig. 1. The system is composed of three layers, involving distributed sensors, UAVBSs and the center cloud. Distributed sensors keeps generating data, which is collected by nearby UAVBSs. Each UAVBS is equipped with onboard edge servers for executing initial steps of data processing. A large proportion of redundant data are split out and the extracted information is transmitted to the cloud for further analysis. The edge processing will largely relive the heavy burden on communication network. However, the limited edge computational capacity will bring new challenges. To balance the burden, part of the data will be directly offloaded to the cloud. The rest data will be temporarily stored in edge buffers, which results in delay. In this paper, it is assumed that the cloud is power enough. Therefore, our fucus is on the mobile edge nodesUAVBS, and discuss how to minimize the cost and delay at the edge.
The system design faces great challenges with respect to the cooperation of different layers and agents. In this paper, we investigate the problems related to UAV path planning and network resource management. Our major contributions are summarized as follows:

We propose a threelayer data processing network structure, which integrates cloud computing, mobile edge computing (MEC) and UAV base stations (UAVBSs), as well as distributed IoT sensors. Data generated by distributed sensors are transmitted to UAVBSs with onboard edge servers. It is assumed that redundant data are split out at the edge and the extracted information takes only a few bandwidth to transmit. In face of high data rate, the rest bandwidth will be allocated to UAVBSs for data offloading. This system will largely relive the communication burden while providing a flexible service coverage.

A reinforcement learning based algorithm is proposed for UAVBS path planning. A local map of the around service requirement is taken as input to train a CNN neutral network, which predicts a reward for each possible action. The training samples are obtained by trials, feedbacks and corresponding observations. Considering heavy computational burden of network training, the training process is accomplished by powerful center cloud. Each UAVBS receives network weights from cloud and selects its own moving action based on current local observations. By welltrained neutral network, UAVBSs will automatically cooperate to cover the region of interest.

The distributed online data processing system faces challenges in network management. As the onboard energy and computational resources of UAVBSs are limited. In face of high data rate, part of received data will be offloaded to the cloud. Meanwhile, in face of low data rate, edge servers can lower down processor frequency for saving energy. Besides, they can also offload part of the data to further reduce energy consumption. This leads to the issue with respect to optimal network resource management. In this paper, we propose an online network scheduling algorithm based on Lyapunov optimization framework [17]
. Without probability distributions of data sources, the network updates its policy by current buffer length, aimed at stabilizing delay while saving energy.

The proposed algorithms are tested by simulations on Python. Simulation results show that the region of interest can be covered with good balance and high efficiency under our proposed path planning. Meanwhile, the performance with respect to energy consumption and delay are also tested in simulations. The results may assist to build an IoT network for processing a huge amount of data distributed in a large area.
The rest paper is organized as follows. We will introduce the system model and some key notations in Section II. In Section III, the path planning problem based on deep reinforcement learning will be investigated. In Section IV, the network scheduling algorithm for data processing will be proposed based on Lyapunov optimization. The simulation results of data processing network will be shown in Section V. Finally, we will conclude in Section VI.
Ii System model
Consider an online distributed data processing network as shown in Fig .1, where the data sources are distributed sensors denoted as . Upon them, hovering UAVBSs carrying onboard edge servers are denoted as . They collect data from around sensors and execute initial steps of data processing. The edge processing will split out a large sum of redundant data and the extracted information will be transmitted towards center cloud for further analysis. The internal environmental state is , which is affected by environmental elements and network scheduling policy. The observations of by compose the set . We denote the sensor index set as . The UAVBS index set is . The system time set is , with interval . In this section, we will introduce the network model, involving AirGround channel model, data generation model, UAV path planning model and edge computing model.
Iia Airground channel
The AirGround (AG) channel involves lineofsight (LOS) link and nonlineofsight (NLOS) link [12]. In the literature [18], the corresponding pass loss is defined as follows.
(1) 
where and separately represents LOS link and NLOS link. Projecting the UAV on the ground, its distance from the covered sensor is denoted as . Besides, is the speed of light and represents the signal frequency. Parameter is the hovering altitude of UAVBSs, while and are respectively the path loss parameters for LOS link and NLOS link. As obstacles will typically reduce a large proportion of signal intensity, we have .
The probability of LOS link is affected by environmental elements, which is given by [18] as
(2) 
where and are environmental constants of the target region and is the elevation angle of UAVBSs. Meanwhile, represents the NLOS probability. Then the final average path loss of AG channel is
(3) 
Notation  Explanations 

Set of distributed sensors  
Set of UAVBSs  
The internal environmental state in time slot  
Set of observations of local environmental elements by UAVBSs  
Set of system time slot  
Set of time slot for UAV path update  
The position in planned path for UAV at time slot  
The path update policy of UAV at time slot  
The generated data bits of sensor in time slot  
The collected data bits by UAV in time slot  
The capability of edge data processing on in time slot  
The capability of data transmission through network in time slot  
The edge buffer length on at  
The edge processor frequency on at  
The data transmission power of at  
The proportion of allocated bandwidth to at  
The update rate of network training  
The occurring frequency of action in training samples  
Decay coefficient of future rewards  
A coefficient reflecting the uncover rate of sensor 
IiB UAV path
The position of is denoted as , where represents its projection on the ground and is its corresponding hovering altitude. It is assumed that covers sensors around within radius . In our previous work [15], we proved that the optimal height satisfies
(4) 
where is the optimal elevation angle on the coverage boundary. That is, is the optimal height with . It is assumed that the data transmission rate is and the channel path loss is modeled as the above subsection. In this case, can be derived by binary research, see [15]. By optimized , the UAV path only involves twodimensional position . The time slot for path update is with interval , where is the time slot set for path update. The corresponding position is denoted as . Note that the reaction speed of flight control system is typically slower than computation and communication management. While is typically tiny, should be larger than .
In this paper, the UAV path update is conducted in an online manner. At , the path node for next time slot is determined based on observation set . Suppose the position of at is , its position in path at is
(5) 
where is the path update part for time slot . is the candidate policy set. Therefore, the path for is
(6) 
The entire multiUAV path set is denoted as .
IiC Data generation
The distributed sensors generate data involving local information. The data is temporarily stored in its buffer denoted as . It is assumed that sensor generates bits data during time slot , where . Parameter
is an i.i.d. random variable. It is supposed that
satisfies poisson distribution with
. In practical systems, is typically constrained by hardware limitation. Therefore, is assumed to be bounded by , where is the largest value of . Note that is an empirical parameter which may vary among different places.IiD Edge computing
It is assumed that data collection and its correlated network scheduling policy are updated in discrete time slots with interval [19, 16]. We suppose collects bits data in time slot . The collected data will be temporarily stored in edge data buffer.
Initial steps of data processing are executed at the edge, where a large amount of redundant data is split out. It is supposed that the extracted information at the edge takes only part of the bandwidth between edge and cloud for transmission. This relieves the heavy burden on network communication. However, the limited edge processing capability will bring new challenges. In this case, the rest bandwidth can be allocated to edge nodes for data offload, which balances the burden on edge processing and network communication.
IiD1 Data caching
In time slot , the data processing capability on is , while the edge data offloading capability is . The queuing length at the beginning of time slot on is , which evolves as follows.
(7) 
where is set to be zero.
IiD2 Edge processing
It is assumed that the edge server on needs CPU cycles to precess one bit data, which depends on the applied algorithm [6]. The CPU cycle frequency of in time slot is denoted as , where . Then is
(8) 
where is the time slot length. The power consumption of edge data processing [20] by is
(9) 
where is the effective switched capacitance [20] of , which is determined by processor chip structure.
IiD3 Data offloading
It is assumed that the wireless channels between UAVBSs and center cloud are i.i.d. frequencyflat block fading [15]. Thus the channel power gain between and center cloud is supposed to be , where represents the smallscale fading part of channel power gain, is the path loss constant, is the path loss exponent, is reference distance and is the distance between and center cloud. Let us consider the system working in FDMA mode, the data transmission capacity from to center cloud is
(10) 
where is the proportion of the bandwidth allocated to , is the transmission power with , is the entire bandwidth for data offloading and is the power spectral density of noise.
Iii UAV Path Planning
Moving UAVBSs provide a flexible and wide service coverage, which is especially effective for surveillance tasks. However, all the advantages must be built on smart path planning. In [16], the authors proposed an offline path planning algorithm based on convex optimization. However, it only aims at a single UAV. In multiUAV system, there exists correlation among UAVBSs. Each UAVBS may only obtain local observations. Furthermore, many unexpected environmental factors may pose great challenge to offline path planning. Therefore, it is essential to adaptively plan UAV path in an online manner.
In the last decade, deep reinforcement learning has obtained impressive results in online policy determination. Different from conventional reinforcement learning, deep reinforcement learning trains deep neutral network to predict rewards of each candidate action. Typically, the neutral network is utilized to fit complex unknown functions in learning tasks [21]. Besides, it can handle more complex input features. In [22], the authors adopted deep reinforcement learning to train a CNN network for playing computer games with online policy. In this paper, we adopt a similar way to train an adaptive path planning network. For at time , its input is observation . In this section, we will discuss the problem formulation and its solution based on deep reinforcement learning.
Iiia Problem formulation
The UAV path is planned in terms of time slot . Our objective is to optimize to enhance UAV coverage. In time slot , is supposed to use the plan by local observation . The policy is determined in a distributed manner without global information. However, local is not sufficient to depict the entire coverage. In this case, we need to find an alternative optimization objective to represent entire UAV coverage. Typically, an ideal coverage will sufficiently utilize data processing capability of . That is, if UAVBSs cooperate to enhance data collection amount, they will achieve a relatively good coverage. Therefore, the path planning problem is formulated as follows.
We suppose collects bits data in time slot . It is straightforward to see is determined by state set and UAV path set within time slot . The connection is represented by
(11) 
where is a time varying function determined by environmental elements. The environmental state is supposed to be characterized by a Markov process. The state update is determined by current state and path set , which is represented by
(12) 
Then the problem is formulated as follows.
(13)  
s.t.  (13a)  
(13b)  
(13c) 
where constraint (13a) represents the path update policy. Constraint (13b) represents the internal state update, which is determined by specific environment. Constraint (13c) represents the system reward by and .
The direct optimization of faces great challenges. In multiagent system, there exists correlation among agents. Models in (13b) and (13c) are determined by complex environmental elements involving correlations among UAVBSs. Therefore, it is very hard to specifically model and . Furthermore, the internal environmental state is also beyond our reach. Instead, we can only plan path by local observation . In this case, training an alternative function to approximate the complex environmental models may provide an achievable solution. This is the socalled reinforcement learning algorithm.
IiiB Reinforcement learning algorithm
The optimal policy is selected by rewards of each candidate action. In reinforcement learning, the Qfunction represents the rewards of action under state . Faced with complex environmental elements, it is very hard to model Qfunction specifically. In this case, reinforcement learning is applied to learn by iteratively interacting with around environment. By trials and feedbacks, they will obtain training samples in form of . With these dynamically updating training samples, the trained will be a good approximation to the environmental Qfunction. Reinforcement learning enables agents to learn an adaptive policy maker, which is widely applied in dynamic control and optimization. In path planning problem, UAVBSs only obtain observations of internal state . To explore internal features in obtained observations, deep Qlearning algorithm is applied.
In deepQlearning, a deep neutral network is applied to approximate Qfunction, where represents network weights and is the observation data. Taking as input, the Qnetwork will output predicting rewards of each candidate action. By continuous interaction with around environment, will be adaptively adjusted to fit the unknown environmental model. In [22], a CNN network is trained to adaptively play computer games with screen pictures as input. For such rather complex tasks, the observations can be matrix or sequence. In this case, the CNN neutral network can exploit local correlations of elements in by convolutional filters, which enables extractions of highdimensional features. In many practical applications, the algorithm works robustly with highlevel performance. The training process is summarized in Algorithm 1.
In the training process, the training samples generated by at is denoted as , where represents the observations by at , is its action, is the feedback reward and is the new observations. In this paper, a central training mode is applied. Training samples of distributed UAVBSs are gathered by center for network training. The UAVBSs share the centrally trained network weights. Based on different local observations, they can choose separated actions. The collected training samples are stored in relay memory , where is the buffer length. Each time, the algorithm will randomly sample a batch from for training. Compared with conventional training by consecutive samples, this method may enable networks to learn from more various past experiences rather than concurrent experiences.
The MSEbased loss function for is defined as follows.
(14) 
where and is the reference network weight. Parameter is the decay coefficient of future rewards while is the update rate. Note that the loss for other actions in the policy set is set to be .
To ensure convergence, is typically set as . Note that the rather frequent action will be trained more tensely, which will break the balance among all candidate actions. Therefore, the sample proportion of each candidate action is maintained here, denoted as . Parameter is the action index. Suppose the sample action index is and is upperbounded by , is determined by
(15) 
where is the maximum value of . Note that an action with a larger will have a smaller update rate.
IiiC Interaction with environment
The environmental model and the internal state is unknown. In previous subsection, we proposed a deep Qlearning algorithm to adaptively learn environmental elements. Before its implementation, the specific interaction mode with around environment will be discussed in this subsection.
A model of the internal environment and its interaction with the deep Qlearning algorithm is shown in Fig. 2. Based on state and action , the internal environment will generate a reward by model . In this case, an optimal policy is generated by maximizing the outcome rewards. Then the environmental state will be updated by its internal model . To approximate this environmental model for policy learning, a deep Qnetwork is implemented to interact with the environment. The observations is obtained by Qnetwork as input, which carries the essential information about within . By directly receiving the outcome
from environment, the Qnetwork will be trained to adaptively estimate
. Based on its estimation, we will derive a nearly optimal policy. In this paper, a modelfree reinforcement learning is applied. Therefore, the Qnetwork only needs to receive observations and estimate , without considering the internal state update model . The key elements of the interaction are observations, rewards and action policy.IiiC1 Observations
The observations of distributed sensors should involve information of around service requirement, so that the planned path can ensure a better coverage. The sensors which have long been uncovered should have more urgent service requirement. Besides, sensors with larger data rate also requires more coverage. Furthermore, it is also important to avoid overlap among coverage of different UAVBSs. Therefore, the observations by UAVBSs should involve the above essential elements for a proper path.
It is straightforward to see that the local observations should be a twodimensional data set. Suppose at time , the local observations involves a region around . The observation data is set as a matrix . The position of is . Then the position in map corresponding to is . represents observations of sensors around . In this way, the local region is represented in a discrete manner. Parameter is determined by the input data size of the Qnetwork . is set according to the observation range of UAVBSs. is called the observation sight, which describes the observation wideness.
We suppose sensor maintains its service requirement , which illustrates its data freshness and accumulation. The process is summarized in Algorithm 2. represents the data freshness of . Local data rate represents data accumulation rate. They are synthesized by . The initial sensor buffer is supposed to be . They are updated in terms of time slot . If uncovered, the data freshness will decay by (16). If covered by UAVBSs, it is assumed that will transmit at most bits data in time slot . In this case, will update by (17) and the data freshness will be renewed by (18).
It is assumed that can obtain from the sensors in the region around it. The processing of the corresponding observations is summarized in Algorithm 3. Matrix
is initialized as zero matrix.
from sensors around is added to . In this way, will reflect the local data freshness and accumulation. For covered by other UAVBSs, will be adjusted by (19). In this case, the observations will involve the coverage overlap among UAVBSs. Note that outside the region will lead to . The processed will be taken as input of the CNN Qnetwork for rewards estimation.(16) 
(17) 
(18) 
(19) 
IiiC2 Action policy
The path for is defined by (6). The corresponding action policy for online path planning is defined in (5). In this paper, we define a set with finite candidate policy. It is assumed that is a constant. That is, the UAV speed is supposed to remain stable and the length of path update does not change. Then the policy set with discrete direction is defined as follows.
(20) 
where is the length of a path step and is the discrete path angle. The zero element means hovering at the current position.
IiiC3 Reward function
The objective of is to maximize the overall data collection, so that the edge capability is sufficiently utilized. For distributed online decision, the reward must be accessible at the edge UAVBSs. Therefore, the reward is defined as the collected data bits in time slot . Note that the interaction experiences will be transmitted to center for network training. Furthermore, the observations also involve other around UAVBSs. Therefore, in the process of interaction and learning, the UAVBSs will tend to cooperate with each other to ensure a relatively good coverage.
Iv System Data Management
After receiving data from around sensors, the UAVBSs process their collected raw data and transmit the edge processing result to center cloud. It is assumed that the transmission of processing result takes very little communication resources. Therefore, the majority communication bandwidth between UAVBSs and center cloud can be utilized for transmitting part of the unprocessed data. In this way, the edge system can enhance its data throughput while reducing UAV onboard energy cost. In this section, we will formulate the data offloading problem into a Lyapunov optimization problem. As the cloud is supposed to be powerful enough, we may consider the edge energy cost and data processing delay as system cost.
Iva Problem formulation
The data offloading policy focus on stabilizing delay while reducing the power consumption of edge processing and data transmission. It is managed in terms of system time slot . It is assumed that each UAVBS is hovering at a constant speed. Thus, the power consumption of onboard dynamical system is excluded. At time slot , the power consumption of local computation on UAVBS is . The data transmission power of is . We denote the power consumption of in time slot as
(21) 
Then the average weighted sum power consumption is
(22) 
where is a positive parameter with regard to , which can be adjusted to balance power management of all UAVBSs. As the system performance metrics, is the longterm edge power consumption. The data offloading policy with respect to can be derived by statistical optimization.
The data collected by will be temporarily stored in the onboard data buffer for future processing. In this case, the data queuing delay is the metrics of edge system service quality. By Little’s Law [23], the average queuing delay of a queuing agent is proportional to the average queuing length. Therefore, the average data amount in onboard data memory is viewed as the system service quality metrics. The longterm queuing length for edge is defined as
(23) 
The network policy at time slot for UAVBSs is denoted as . The operation is the processor frequency for edge data processing on UAVBSs. The operation is the transmission power of data offloading. is the proportion of bandwidth allocation among the UAVBSs. Therefore, the optimization of edge data processing policy can be formulated as problem .
(24)  
s.t.  (24a)  
(24b)  
(24c) 
Eq. (24a) is the bandwidth allocation constraint, where is a system constant. Constraints (24b) indicates the boundary of processor frequency and transmission power. For delay consideration, constraint (24c) forces the edge data buffers to be stable, which guarantees the collected data can be processed in a finite time. Among the constraints, index belongs to set and time slot belongs to set
is obviously a statistical optimization problem with randomly arriving data. Therefore, the policy has to be determined dynamically in each time slot. Furthermore, the spatial coupling of bandwidth allocation among UAVBSs induces great challenge to the problem solution. Instead of solving directly, we propose an online jointly resource management algorithm based on Lyapunov optimization.
IvB Online optimization framework
The proposed is a challenging statistical optimization problem. By Lyapunov optimization [24], can be formulated into a deterministic problem for each time slot, which can be solved with low complexity. The online algorithm can cope with the dynamical random environment while deriving an overall optimal outcome. Based on Lyapunov optimization framework ,the algorithm aims at saving energy while stabilizing the edge data buffers.
The Lyapunov function for time slot is defined as
(25) 
This quadratic function is a scalar measure of data accumulation in queue. Its corresponding Lyapunov drift is defined as follows.
(26) 
To stabilize the network queuing buffer while minimizing the average energy penalty, the policy is determined by minimizing a bound on the following driftpluspenalty function for each time slot .
(27) 
where is a positive system parameter which represents the tradeoff between Lyapunov drift and energy cost. is the expectation of a random process with unknown probability distribution. Therefore, an upper bound of is estimated so that we can minimize without the specific probability distribution. According to the following Lemma 1, we derive a deterministic upper bound of for each time slot.
Lemma 1.
For an arbitrary policy constrained by (24a), (24b) and (24c), the Lyapunov drift function is upper bounded by
(28) 
where is a known constant independent with the system policy and is the current data buffer length. is the edge processing data bits while is the offloaded data bits. They are all for time slot .
Proof.
From equation (IID1), we have
(29) 
By (29), we can subtract on both side and sum up the inequalities for , which leads to follows.
(30) 
As stated in Section II, the data rate of sensors is bounded by . Furthermore, the channel capacity between sensors and UAVBSs is also limited. Therefore, is supposed to be upper bounded by . Note that the computation and communication resources are limited. Therefore, and are also bounded by their corresponding maximum processing rate. As the maximum processor frequency is , we have . Since and , we have . For simplicity, we separately denote and as and . Then the term should be bounded by Therefore, we have
(31) 
where . When considering a specific time slot , it is straightforward to see that is a deterministic constant. This completes the proof. ∎
Together with (27) and (28), the driftplus penalty function is upperbounded by
(32) 
By optimizing the above upper bound of in each time slot , the data queuing length can be stabilized on a low level while the power consumption penalty is also minimized. In this way, the overall optimal policy can be derived without specific probability distributions. In Lemma 1, parameter is not affected by system policy. Therefore, it is reasonable to omit in the policy determination problem.
Then the modified problem in each time slot based on Lyapunov optimization framework is defined as follows.
(33)  
s.t.  (33a)  
(33b) 
IvC Solution for
In last subsection, we formulated for deriving optimal policy in each time slot. The optimization objectives include local computation processor frequency , data transmission power and bandwidth allocation . In this section, we will divide into two subproblems and derive a solution for optimal policy.
IvC1 Optimal frequency for edge processor
We first delete part of the objective function independent of . Then it is straightforward to see that the subproblem with respect to is defined as follows.
(34)  
s.t.  (34a) 
It is obvious to confirm that is a convex optimization problem. Furthermore, there is no coupling among elements in . Therefore, the optimal processor frequency can be derived separately for each . The stationary point of is . In addition, the optimal processor frequency may also be the boundary . Then the final solution is given by
(35) 
Remark 1.
The optimal processor frequency is a monotone increasing function with respect to data queuing length . A straightforward insight is that edge servers tend to process faster as there is much data accumulating in the data buffer. Besides, as or increases, the proportion of edge computation energy cost becomes larger, which results in decreasing of processor frequency. As parameter increases, the energy consumption perfrequency gets larger, which causes to decrease. Furthermore, a larger corresponds to a lower edge processing frequency. Then the edge server should lower down its processor frequency and offload more data to the cloud.
IvC2 Bandwidth allocation and data transmission power
We reserve the elements with respect to and and derive the following subproblem.
(36)  
s.t.  (36a)  