Log In Sign Up

Towards Big data processing in IoT: Path Planning and Resource Management of UAV Base Stations in Mobile-Edge Computing System

Heavy data load and wide cover range have always been crucial problems for online data processing in internet of things (IoT). Recently, mobile-edge computing (MEC) and unmanned aerial vehicle base stations (UAV-BSs) have emerged as promising techniques in IoT. In this paper, we propose a three-layer online data processing network based on MEC technique. On the bottom layer, raw data are generated by widely distributed sensors, which reflects local information. Upon them, unmanned aerial vehicle base stations (UAV-BSs) are deployed as moving MEC servers, which collect data and conduct initial steps of data processing. On top of them, a center cloud receives processed results and conducts further evaluation. As this is an online data processing system, the edge nodes should stabilize delay to ensure data freshness. Furthermore, limited onboard energy poses constraints to edge processing capability. To smartly manage network resources for saving energy and stabilizing delay, we develop an online determination policy based on Lyapunov Optimization. In cases of low data rate, it tends to reduce edge processor frequency for saving energy. In the presence of high data rate, it will smartly allocate bandwidth for edge data offloading. Meanwhile, hovering UAV-BSs bring a large and flexible service coverage, which results in the problem of effective path planning. In this paper, we apply deep reinforcement learning and develop an online path planning algorithm. Taking observations of around environment as input, a CNN network is trained to predict the reward of each action. By simulations, we validate its effectiveness in enhancing service coverage. The result will contribute to big data processing in future IoT.


page 1

page 6


Towards Big data processing in IoT: network management for online edge data processing

Heavy data load and wide cover range have always been crucial problems f...

Multi-UAV Mobile Edge Computing and Path Planning Platform based on Reinforcement Learning

Unmanned Aerial vehicles (UAVs) are widely used as network processors in...

Path Planning for UAV-Mounted Mobile Edge Computing with Deep Reinforcement Learning

In this letter, we study an unmanned aerial vehicle (UAV)-mounted mobile...

User Association and Path Planning for UAV-Aided Mobile Edge Computing with Energy Restriction

Mobile edge computing (MEC) provides computational services at the edge ...

Data Freshness and Energy-Efficient UAV Navigation Optimization: A Deep Reinforcement Learning Approach

In this paper, we design a navigation policy for multiple unmanned aeria...

A Novel Internet-of-Drones and Blockchain-based System Architecture for Search and Rescue

With the development in information and communications technology (ICT) ...

I Introduction

The internet of things (IoT) has emerged as a huge network, which extends connected agents beyond standard devices to any range of traditionally non-internet-enabled devices. For instance, a large range of everyday objects such as vehicles, home appliances and street lamps will enter the network for data exchange. This extension will result in an extraordinary increase of required cover range and data amount, which is far beyond the existing network capability. For online data processing with delay requirement, the conventional cloud computing will face huge challenges. In order to collect and process big data sets with wide distribution, mobile-edge computing (MEC) and unmanned aerial vehicle base stations (UAV-BSs) have recently emerged to add existing networks with intelligence and mobility.

Conventionally, cloud computing has been deployed to provide a huge pool of computing resources for connected devices [1]. However, as the data transmission speed is limited by communication resources, cloud computing can not guarantee its latency [2]. In face of high data rate in IoT, the data transmission load will overwhelm the communication network, which poses great challenge to online data processing. Recently, mobile-edge computing (MEC) has emerged as a promising technique in IoT. By deploying cloud-like infrastructure in the vicinity of data sources, data can be partly processed at the edge [3]. In this way, the data stream in network will be largely reduced.

In existing works, the problem with respect to computation offloading, network resource allocation and related network structure designs in MEC have been broadly studied in various models [2, 4, 5, 6, 7]. In [2], the authors employed deep reinforcement learning to allocate cache, computing and communication resources for MEC system in vehicle networks. In [4], the authors optimized the offload decision and resource allocation to obtain a maximum computation rate for a wireless powered MEC system. Considering the combination of MEC and existing communication service, a novel two-layer TDMA-based unified resource management scheme was proposed to handle both conventional communication service and MEC data traffic at the same time [5]. In [6], the authors jointly optimized the radio and computational resource for Multi-user MEC computing system. In addition to the edge, the cloud was also taken into consideration in [7].

MEC system design considering computation task offloading has been sufficiently investigated in previous works. However, for IoT-based big data processing, MEC server may also serve to process local data at the edge [8, 9, 10]. In [8], the authors discussed the application of MEC in data processing. In [9], the authors indicated that edge servers can process part of the data rather than completely send them to the cloud. Then in [10], the authors proposed a scheme for this system. In the field of edge computing, the research of edge data processing algorithm is still an open problem.

In IoT network, devices are often widely distributed with flexible movement. In this situation, conventional ground base station faces great challenge to provide sufficient service coverage. To figure out the problem, unmanned arial vehicle base stations (UAV-BSs) has recently emerged as a promising technique to add the network coverage with flexibility. In UAV-BSs wireless system, energy-aware UAV deployment and operation mechanisms are crucial for intelligent energy usage and replenishment [11]. In the literature, this issue has been widely studied [12, 13, 14, 15, 16]. In [12], the authors characterized the UAV-ground channels. In [13], the optimal hovering attitude and coverage radius were investigated. In [14], the authors jointly considered energy efficiency and user coverage to optimize UAV-BS placement. In [15], the authors considered the placement of UAV-BSs with the criterion of minimizing UAV-recall-frequency. Furthermore, UAV-BSs were also considered as a MEC server in [16]. However, they only considered one UAV-BS, focusing on the computation offloading problem. Besides, the cloud center was excluded from discussions.

Fig. 1: The structure of the three-layer network system. Data are generated by distributed sensors and transmitted to the cloud through UAVs at the edge.

In IoT network, the data sets are generated by distributed sensors, which reflect their local information. In tasks such as supervision, the network is supposed to keep collecting and processing distributed data. Considering data freshness, the system should work in an online manner. In conventional cloud computing, all data will be transmitted to the cloud through base stations. Though the cloud may be powerful enough, the huge amount data will still pose a heavy load on the communication network. Furthermore, building base stations in a large region may cost too much, especially for rural regions. In this paper, we consider a MEC-based IoT network, where hovering UAV-BSs are deployed as edge servers. The network structure is shown in Fig. 1. The system is composed of three layers, involving distributed sensors, UAV-BSs and the center cloud. Distributed sensors keeps generating data, which is collected by nearby UAV-BSs. Each UAV-BS is equipped with onboard edge servers for executing initial steps of data processing. A large proportion of redundant data are split out and the extracted information is transmitted to the cloud for further analysis. The edge processing will largely relive the heavy burden on communication network. However, the limited edge computational capacity will bring new challenges. To balance the burden, part of the data will be directly offloaded to the cloud. The rest data will be temporarily stored in edge buffers, which results in delay. In this paper, it is assumed that the cloud is power enough. Therefore, our fucus is on the mobile edge nodes-UAV-BS, and discuss how to minimize the cost and delay at the edge.

The system design faces great challenges with respect to the cooperation of different layers and agents. In this paper, we investigate the problems related to UAV path planning and network resource management. Our major contributions are summarized as follows:

  • We propose a three-layer data processing network structure, which integrates cloud computing, mobile edge computing (MEC) and UAV base stations (UAV-BSs), as well as distributed IoT sensors. Data generated by distributed sensors are transmitted to UAV-BSs with onboard edge servers. It is assumed that redundant data are split out at the edge and the extracted information takes only a few bandwidth to transmit. In face of high data rate, the rest bandwidth will be allocated to UAV-BSs for data offloading. This system will largely relive the communication burden while providing a flexible service coverage.

  • A reinforcement learning based algorithm is proposed for UAV-BS path planning. A local map of the around service requirement is taken as input to train a CNN neutral network, which predicts a reward for each possible action. The training samples are obtained by trials, feedbacks and corresponding observations. Considering heavy computational burden of network training, the training process is accomplished by powerful center cloud. Each UAV-BS receives network weights from cloud and selects its own moving action based on current local observations. By well-trained neutral network, UAV-BSs will automatically cooperate to cover the region of interest.

  • The distributed online data processing system faces challenges in network management. As the onboard energy and computational resources of UAV-BSs are limited. In face of high data rate, part of received data will be offloaded to the cloud. Meanwhile, in face of low data rate, edge servers can lower down processor frequency for saving energy. Besides, they can also offload part of the data to further reduce energy consumption. This leads to the issue with respect to optimal network resource management. In this paper, we propose an online network scheduling algorithm based on Lyapunov optimization framework [17]

    . Without probability distributions of data sources, the network updates its policy by current buffer length, aimed at stabilizing delay while saving energy.

  • The proposed algorithms are tested by simulations on Python. Simulation results show that the region of interest can be covered with good balance and high efficiency under our proposed path planning. Meanwhile, the performance with respect to energy consumption and delay are also tested in simulations. The results may assist to build an IoT network for processing a huge amount of data distributed in a large area.

The rest paper is organized as follows. We will introduce the system model and some key notations in Section II. In Section III, the path planning problem based on deep reinforcement learning will be investigated. In Section IV, the network scheduling algorithm for data processing will be proposed based on Lyapunov optimization. The simulation results of data processing network will be shown in Section V. Finally, we will conclude in Section VI.

Ii System model

Consider an online distributed data processing network as shown in Fig .1, where the data sources are distributed sensors denoted as . Upon them, hovering UAV-BSs carrying onboard edge servers are denoted as . They collect data from around sensors and execute initial steps of data processing. The edge processing will split out a large sum of redundant data and the extracted information will be transmitted towards center cloud for further analysis. The internal environmental state is , which is affected by environmental elements and network scheduling policy. The observations of by compose the set . We denote the sensor index set as . The UAV-BS index set is . The system time set is , with interval . In this section, we will introduce the network model, involving Air-Ground channel model, data generation model, UAV path planning model and edge computing model.

Ii-a Air-ground channel

The Air-Ground (AG) channel involves line-of-sight (LOS) link and non-line-of-sight (NLOS) link [12]. In the literature [18], the corresponding pass loss is defined as follows.


where and separately represents LOS link and NLOS link. Projecting the UAV on the ground, its distance from the covered sensor is denoted as . Besides, is the speed of light and represents the signal frequency. Parameter is the hovering altitude of UAV-BSs, while and are respectively the path loss parameters for LOS link and NLOS link. As obstacles will typically reduce a large proportion of signal intensity, we have .

The probability of LOS link is affected by environmental elements, which is given by [18] as


where and are environmental constants of the target region and is the elevation angle of UAV-BSs. Meanwhile, represents the NLOS probability. Then the final average path loss of AG channel is

Notation Explanations
Set of distributed sensors
Set of UAV-BSs
The internal environmental state in time slot
Set of observations of local environmental elements by UAV-BSs
Set of system time slot
Set of time slot for UAV path update
The position in planned path for UAV at time slot
The path update policy of UAV at time slot
The generated data bits of sensor in time slot
The collected data bits by UAV in time slot
The capability of edge data processing on in time slot
The capability of data transmission through network in time slot
The edge buffer length on at
The edge processor frequency on at
The data transmission power of at
The proportion of allocated bandwidth to at
The update rate of network training
The occurring frequency of action in training samples
Decay coefficient of future rewards
A coefficient reflecting the uncover rate of sensor
TABLE I: Summary of key notations

Ii-B UAV path

The position of is denoted as , where represents its projection on the ground and is its corresponding hovering altitude. It is assumed that covers sensors around within radius . In our previous work [15], we proved that the optimal height satisfies


where is the optimal elevation angle on the coverage boundary. That is, is the optimal height with . It is assumed that the data transmission rate is and the channel path loss is modeled as the above sub-section. In this case, can be derived by binary research, see [15]. By optimized , the UAV path only involves two-dimensional position . The time slot for path update is with interval , where is the time slot set for path update. The corresponding position is denoted as . Note that the reaction speed of flight control system is typically slower than computation and communication management. While is typically tiny, should be larger than .

In this paper, the UAV path update is conducted in an online manner. At , the path node for next time slot is determined based on observation set . Suppose the position of at is , its position in path at is


where is the path update part for time slot . is the candidate policy set. Therefore, the path for is


The entire multi-UAV path set is denoted as .

Ii-C Data generation

The distributed sensors generate data involving local information. The data is temporarily stored in its buffer denoted as . It is assumed that sensor generates bits data during time slot , where . Parameter

is an i.i.d. random variable. It is supposed that

satisfies poisson distribution with

. In practical systems, is typically constrained by hardware limitation. Therefore, is assumed to be bounded by , where is the largest value of . Note that is an empirical parameter which may vary among different places.

Ii-D Edge computing

It is assumed that data collection and its correlated network scheduling policy are updated in discrete time slots with interval [19, 16]. We suppose collects bits data in time slot . The collected data will be temporarily stored in edge data buffer.

Initial steps of data processing are executed at the edge, where a large amount of redundant data is split out. It is supposed that the extracted information at the edge takes only part of the bandwidth between edge and cloud for transmission. This relieves the heavy burden on network communication. However, the limited edge processing capability will bring new challenges. In this case, the rest bandwidth can be allocated to edge nodes for data offload, which balances the burden on edge processing and network communication.

Ii-D1 Data caching

In time slot , the data processing capability on is , while the edge data offloading capability is . The queuing length at the beginning of time slot on is , which evolves as follows.


where is set to be zero.

Ii-D2 Edge processing

It is assumed that the edge server on needs CPU cycles to precess one bit data, which depends on the applied algorithm [6]. The CPU cycle frequency of in time slot is denoted as , where . Then is


where is the time slot length. The power consumption of edge data processing [20] by is


where is the effective switched capacitance [20] of , which is determined by processor chip structure.

Ii-D3 Data offloading

It is assumed that the wireless channels between UAV-BSs and center cloud are i.i.d. frequency-flat block fading [15]. Thus the channel power gain between and center cloud is supposed to be , where represents the small-scale fading part of channel power gain, is the path loss constant, is the path loss exponent, is reference distance and is the distance between and center cloud. Let us consider the system working in FDMA mode, the data transmission capacity from to center cloud is


where is the proportion of the bandwidth allocated to , is the transmission power with , is the entire bandwidth for data offloading and is the power spectral density of noise.

Iii UAV Path Planning

Moving UAV-BSs provide a flexible and wide service coverage, which is especially effective for surveillance tasks. However, all the advantages must be built on smart path planning. In [16], the authors proposed an off-line path planning algorithm based on convex optimization. However, it only aims at a single UAV. In multi-UAV system, there exists correlation among UAV-BSs. Each UAV-BS may only obtain local observations. Furthermore, many unexpected environmental factors may pose great challenge to off-line path planning. Therefore, it is essential to adaptively plan UAV path in an online manner.

In the last decade, deep reinforcement learning has obtained impressive results in online policy determination. Different from conventional reinforcement learning, deep reinforcement learning trains deep neutral network to predict rewards of each candidate action. Typically, the neutral network is utilized to fit complex unknown functions in learning tasks [21]. Besides, it can handle more complex input features. In [22], the authors adopted deep reinforcement learning to train a CNN network for playing computer games with online policy. In this paper, we adopt a similar way to train an adaptive path planning network. For at time , its input is observation . In this section, we will discuss the problem formulation and its solution based on deep reinforcement learning.

Iii-a Problem formulation

The UAV path is planned in terms of time slot . Our objective is to optimize to enhance UAV coverage. In time slot , is supposed to use the plan by local observation . The policy is determined in a distributed manner without global information. However, local is not sufficient to depict the entire coverage. In this case, we need to find an alternative optimization objective to represent entire UAV coverage. Typically, an ideal coverage will sufficiently utilize data processing capability of . That is, if UAV-BSs cooperate to enhance data collection amount, they will achieve a relatively good coverage. Therefore, the path planning problem is formulated as follows.

We suppose collects bits data in time slot . It is straightforward to see is determined by state set and UAV path set within time slot . The connection is represented by


where is a time varying function determined by environmental elements. The environmental state is supposed to be characterized by a Markov process. The state update is determined by current state and path set , which is represented by


Then the problem is formulated as follows.

s.t. (13a)

where constraint (13a) represents the path update policy. Constraint (13b) represents the internal state update, which is determined by specific environment. Constraint (13c) represents the system reward by and .

The direct optimization of faces great challenges. In multi-agent system, there exists correlation among agents. Models in (13b) and (13c) are determined by complex environmental elements involving correlations among UAV-BSs. Therefore, it is very hard to specifically model and . Furthermore, the internal environmental state is also beyond our reach. Instead, we can only plan path by local observation . In this case, training an alternative function to approximate the complex environmental models may provide an achievable solution. This is the so-called reinforcement learning algorithm.

Iii-B Reinforcement learning algorithm

The optimal policy is selected by rewards of each candidate action. In reinforcement learning, the Q-function represents the rewards of action under state . Faced with complex environmental elements, it is very hard to model Q-function specifically. In this case, reinforcement learning is applied to learn by iteratively interacting with around environment. By trials and feedbacks, they will obtain training samples in form of . With these dynamically updating training samples, the trained will be a good approximation to the environmental Q-function. Reinforcement learning enables agents to learn an adaptive policy maker, which is widely applied in dynamic control and optimization. In path planning problem, UAV-BSs only obtain observations of internal state . To explore internal features in obtained observations, deep Q-learning algorithm is applied.

In deep-Q-learning, a deep neutral network is applied to approximate Q-function, where represents network weights and is the observation data. Taking as input, the Q-network will output predicting rewards of each candidate action. By continuous interaction with around environment, will be adaptively adjusted to fit the unknown environmental model. In [22], a CNN network is trained to adaptively play computer games with screen pictures as input. For such rather complex tasks, the observations can be matrix or sequence. In this case, the CNN neutral network can exploit local correlations of elements in by convolutional filters, which enables extractions of high-dimensional features. In many practical applications, the algorithm works robustly with high-level performance. The training process is summarized in Algorithm 1.

0:  Initialize the relay memory ; Initialize deep Q-network weights ; Initialize the reference network weights by ; Initialize , , and .

 each epoch 

     Randomly initialize UAV positions.
     while  do
        for each  do
           Collect around service requirements and generate observations .
           Randomly generate .
           Choose action by:
           if  then
              randomly select an action
           end if
           Move along the planned path by executing .
           Collect data from covered sensors.
           Obtain the reward and observations .
           Transmit to the central relay memory.
        end for
        Randomly choose a batch of interaction experience from relay memory .
        Determine update rate by (15) and calculate the target value by
        Train the CNN neutral network

by loss function

        Update the reference network weights by every steps.
        Update .
     end while
  end for
Algorithm 1 Deep Q-learning process for UAV path planning

In the training process, the training samples generated by at is denoted as , where represents the observations by at , is its action, is the feedback reward and is the new observations. In this paper, a central training mode is applied. Training samples of distributed UAV-BSs are gathered by center for network training. The UAV-BSs share the centrally trained network weights. Based on different local observations, they can choose separated actions. The collected training samples are stored in relay memory , where is the buffer length. Each time, the algorithm will randomly sample a batch from for training. Compared with conventional training by consecutive samples, this method may enable networks to learn from more various past experiences rather than concurrent experiences.

The MSE-based loss function for is defined as follows.


where and is the reference network weight. Parameter is the decay coefficient of future rewards while is the update rate. Note that the loss for other actions in the policy set is set to be .

To ensure convergence, is typically set as . Note that the rather frequent action will be trained more tensely, which will break the balance among all candidate actions. Therefore, the sample proportion of each candidate action is maintained here, denoted as . Parameter is the action index. Suppose the sample action index is and is upper-bounded by , is determined by


where is the maximum value of . Note that an action with a larger will have a smaller update rate.

Iii-C Interaction with environment

Fig. 2: The interaction mode between deep Q-learning algorithm and environment.

The environmental model and the internal state is unknown. In previous subsection, we proposed a deep Q-learning algorithm to adaptively learn environmental elements. Before its implementation, the specific interaction mode with around environment will be discussed in this subsection.

A model of the internal environment and its interaction with the deep Q-learning algorithm is shown in Fig. 2. Based on state and action , the internal environment will generate a reward by model . In this case, an optimal policy is generated by maximizing the outcome rewards. Then the environmental state will be updated by its internal model . To approximate this environmental model for policy learning, a deep Q-network is implemented to interact with the environment. The observations is obtained by Q-network as input, which carries the essential information about within . By directly receiving the outcome

from environment, the Q-network will be trained to adaptively estimate

. Based on its estimation, we will derive a nearly optimal policy. In this paper, a model-free reinforcement learning is applied. Therefore, the Q-network only needs to receive observations and estimate , without considering the internal state update model . The key elements of the interaction are observations, rewards and action policy.

Iii-C1 Observations

The observations of distributed sensors should involve information of around service requirement, so that the planned path can ensure a better coverage. The sensors which have long been uncovered should have more urgent service requirement. Besides, sensors with larger data rate also requires more coverage. Furthermore, it is also important to avoid overlap among coverage of different UAV-BSs. Therefore, the observations by UAV-BSs should involve the above essential elements for a proper path.

It is straightforward to see that the local observations should be a two-dimensional data set. Suppose at time , the local observations involves a region around . The observation data is set as a matrix . The position of is . Then the position in map corresponding to is . represents observations of sensors around . In this way, the local region is represented in a discrete manner. Parameter is determined by the input data size of the Q-network . is set according to the observation range of UAV-BSs. is called the observation sight, which describes the observation wideness.

We suppose sensor maintains its service requirement , which illustrates its data freshness and accumulation. The process is summarized in Algorithm 2. represents the data freshness of . Local data rate represents data accumulation rate. They are synthesized by . The initial sensor buffer is supposed to be . They are updated in terms of time slot . If uncovered, the data freshness will decay by (16). If covered by UAV-BSs, it is assumed that will transmit at most bits data in time slot . In this case, will update by (17) and the data freshness will be renewed by (18).

It is assumed that can obtain from the sensors in the region around it. The processing of the corresponding observations is summarized in Algorithm 3. Matrix

is initialized as zero matrix.

from sensors around is added to . In this way, will reflect the local data freshness and accumulation. For covered by other UAV-BSs, will be adjusted by (19). In this case, the observations will involve the coverage overlap among UAV-BSs. Note that outside the region will lead to . The processed will be taken as input of the CNN Q-network for rewards estimation.

0:  Initialize and as .
  while  do
     if  is beyond coverage then
     end if
  end while
Algorithm 2 Sensor data freshness maintaining process
0:  Initialize by zero matrix; Obtain position ; Observe of distributed sensors in its around region.
  Obtain the position corresponding to as
  Find around sensor .
  Update corresponding to the above by
  if  is covered by other nearby UAV-BSs then
  end if
Algorithm 3 UAV-BSs observation processing on at

Iii-C2 Action policy

The path for is defined by (6). The corresponding action policy for online path planning is defined in (5). In this paper, we define a set with finite candidate policy. It is assumed that is a constant. That is, the UAV speed is supposed to remain stable and the length of path update does not change. Then the policy set with discrete direction is defined as follows.


where is the length of a path step and is the discrete path angle. The zero element means hovering at the current position.

Iii-C3 Reward function

The objective of is to maximize the overall data collection, so that the edge capability is sufficiently utilized. For distributed online decision, the reward must be accessible at the edge UAV-BSs. Therefore, the reward is defined as the collected data bits in time slot . Note that the interaction experiences will be transmitted to center for network training. Furthermore, the observations also involve other around UAV-BSs. Therefore, in the process of interaction and learning, the UAV-BSs will tend to cooperate with each other to ensure a relatively good coverage.

Iv System Data Management

After receiving data from around sensors, the UAV-BSs process their collected raw data and transmit the edge processing result to center cloud. It is assumed that the transmission of processing result takes very little communication resources. Therefore, the majority communication bandwidth between UAV-BSs and center cloud can be utilized for transmitting part of the unprocessed data. In this way, the edge system can enhance its data throughput while reducing UAV onboard energy cost. In this section, we will formulate the data offloading problem into a Lyapunov optimization problem. As the cloud is supposed to be powerful enough, we may consider the edge energy cost and data processing delay as system cost.

Iv-a Problem formulation

The data offloading policy focus on stabilizing delay while reducing the power consumption of edge processing and data transmission. It is managed in terms of system time slot . It is assumed that each UAV-BS is hovering at a constant speed. Thus, the power consumption of onboard dynamical system is excluded. At time slot , the power consumption of local computation on UAV-BS is . The data transmission power of is . We denote the power consumption of in time slot as


Then the average weighted sum power consumption is


where is a positive parameter with regard to , which can be adjusted to balance power management of all UAV-BSs. As the system performance metrics, is the long-term edge power consumption. The data offloading policy with respect to can be derived by statistical optimization.

The data collected by will be temporarily stored in the onboard data buffer for future processing. In this case, the data queuing delay is the metrics of edge system service quality. By Little’s Law [23], the average queuing delay of a queuing agent is proportional to the average queuing length. Therefore, the average data amount in onboard data memory is viewed as the system service quality metrics. The long-term queuing length for edge is defined as


The network policy at time slot for UAV-BSs is denoted as . The operation is the processor frequency for edge data processing on UAV-BSs. The operation is the transmission power of data offloading. is the proportion of bandwidth allocation among the UAV-BSs. Therefore, the optimization of edge data processing policy can be formulated as problem .

s.t. (24a)

Eq. (24a) is the bandwidth allocation constraint, where is a system constant. Constraints (24b) indicates the boundary of processor frequency and transmission power. For delay consideration, constraint (24c) forces the edge data buffers to be stable, which guarantees the collected data can be processed in a finite time. Among the constraints, index belongs to set and time slot belongs to set

is obviously a statistical optimization problem with randomly arriving data. Therefore, the policy has to be determined dynamically in each time slot. Furthermore, the spatial coupling of bandwidth allocation among UAV-BSs induces great challenge to the problem solution. Instead of solving directly, we propose an online jointly resource management algorithm based on Lyapunov optimization.

Iv-B Online optimization framework

The proposed is a challenging statistical optimization problem. By Lyapunov optimization [24], can be formulated into a deterministic problem for each time slot, which can be solved with low complexity. The online algorithm can cope with the dynamical random environment while deriving an overall optimal outcome. Based on Lyapunov optimization framework ,the algorithm aims at saving energy while stabilizing the edge data buffers.

The Lyapunov function for time slot is defined as


This quadratic function is a scalar measure of data accumulation in queue. Its corresponding Lyapunov drift is defined as follows.


To stabilize the network queuing buffer while minimizing the average energy penalty, the policy is determined by minimizing a bound on the following drift-plus-penalty function for each time slot .


where is a positive system parameter which represents the tradeoff between Lyapunov drift and energy cost. is the expectation of a random process with unknown probability distribution. Therefore, an upper bound of is estimated so that we can minimize without the specific probability distribution. According to the following Lemma 1, we derive a deterministic upper bound of for each time slot.

Lemma 1.

For an arbitrary policy constrained by (24a), (24b) and (24c), the Lyapunov drift function is upper bounded by


where is a known constant independent with the system policy and is the current data buffer length. is the edge processing data bits while is the offloaded data bits. They are all for time slot .


From equation (II-D1), we have


By (29), we can subtract on both side and sum up the inequalities for , which leads to follows.


As stated in Section II, the data rate of sensors is bounded by . Furthermore, the channel capacity between sensors and UAV-BSs is also limited. Therefore, is supposed to be upper bounded by . Note that the computation and communication resources are limited. Therefore, and are also bounded by their corresponding maximum processing rate. As the maximum processor frequency is , we have . Since and , we have . For simplicity, we separately denote and as and . Then the term should be bounded by Therefore, we have


where . When considering a specific time slot , it is straightforward to see that is a deterministic constant. This completes the proof. ∎

Together with (27) and (28), the drift-plus penalty function is upper-bounded by


By optimizing the above upper bound of in each time slot , the data queuing length can be stabilized on a low level while the power consumption penalty is also minimized. In this way, the overall optimal policy can be derived without specific probability distributions. In Lemma 1, parameter is not affected by system policy. Therefore, it is reasonable to omit in the policy determination problem.

Then the modified problem in each time slot based on Lyapunov optimization framework is defined as follows.

s.t. (33a)

Iv-C Solution for

In last subsection, we formulated for deriving optimal policy in each time slot. The optimization objectives include local computation processor frequency , data transmission power and bandwidth allocation . In this section, we will divide into two subproblems and derive a solution for optimal policy.

Iv-C1 Optimal frequency for edge processor

We first delete part of the objective function independent of . Then it is straightforward to see that the subproblem with respect to is defined as follows.

s.t. (34a)

It is obvious to confirm that is a convex optimization problem. Furthermore, there is no coupling among elements in . Therefore, the optimal processor frequency can be derived separately for each . The stationary point of is . In addition, the optimal processor frequency may also be the boundary . Then the final solution is given by

Remark 1.

The optimal processor frequency is a monotone increasing function with respect to data queuing length . A straightforward insight is that edge servers tend to process faster as there is much data accumulating in the data buffer. Besides, as or increases, the proportion of edge computation energy cost becomes larger, which results in decreasing of processor frequency. As parameter increases, the energy consumption per-frequency gets larger, which causes to decrease. Furthermore, a larger corresponds to a lower edge processing frequency. Then the edge server should lower down its processor frequency and offload more data to the cloud.

Iv-C2 Bandwidth allocation and data transmission power

We reserve the elements with respect to and and derive the following subproblem.

s.t. (36a)