Motivated by the surging traffic demands spurred by online video and Internet-of-things (IoT) applications, including machine type and mission-critical communication (e.g., augmented/virtual reality (AR/VR) and drones), mobile edge computing (MEC)/fog computing are emerging technologies that distribute computations, communication, control, and storage at the network edge [2, 3, 4, 5, 6]. When executing the computation-intensive applications at mobile devices, the performance and user’s quality of experience are significantly affected by the device’s limited computation capability. Additionally, intensive computations are energy-consuming which severely shortens the lifetime of battery-limited devices. To address the computation and energy issues, mobile devices can wirelessly offload their tasks to proximal MEC servers. On the other hand, offloading tasks incurs additional latency which cannot be overlooked and should be taken into account in the system design. Hence, the energy-delay tradeoff has received significant attention and has been studied in various MEC systems [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22].
I-a Related Work
In , Kwak et al. focused on an energy minimization problem for local computation and task offloading in a single-user MEC system. The authors further studied a multi-user system, which takes into account both the energy cost and monetary cost of task offloading . Therein, the cost-delay tradeoff was investigated in terms of competition and cooperation among users and offloading service provider. Additionally, the work  considered the single-user system and assumed that the mobile device is endowed with a multi-core central process unit (CPU) to compute different applications simultaneously. In order to stabilize all task queues at the mobile device and MEC server, the dynamic task offloading and resource allocation policies were proposed by utilizing Lyapunov stochastic optimization in [7, 8, 9]. Assuming that the MEC server is equipped with multiple CPU cores to compute different users’ offloaded tasks in parallel, Mao et al.  studied a multi-user task offloading and bandwidth allocation problem. Subject to the stability of task queues, the energy-delay tradeoff was investigated using the Lyapunov framework. Extending the problem of , the authors further took into account the server’s power consumption and resource allocation in the system analysis . In , a wireless powered MEC network was considered in which multiple users, without fixed energy supply, are wirelessly powered by a power beacon to carry out local computation and task offloading. Taking into account the causality of the harvested energy, this work  aimed at maximizing energy efficiency subject to the stability of users’ task queues. Therein, the tradeoff between energy efficiency and average execution delay was analyzed by stochastic optimization. Xu et al. studied another energy harvesting MEC scenario, in which the edge servers are mainly powered by solar or wind energy, whereas the cloud server has a constant grid power 
. Aiming at minimizing the long-term expected cost which incorporates the end-to-end delay and operational cost, the authors proposed a reinforcement learning-based resource provisioning and workload offloading (to the cloud) to edge servers. Besides the transmission and computation delays, the work took into account the cost (in terms of delay) of handover and computation migration, due to user mobility, in an ultra-dense network. Taking into the long-term available energy constraint, an online energy-aware base station association and handover algorithm was proposed to minimize the average end-to-end delay by incorporating Lyapunov optimization and multi-armed bandit theory . Ko et al.  analyzed the average latency performance, including communication delay and computation delay, of a large-scale spatially random MEC network. Furthermore, an upper and a lower bound  on the average computation delay were derived by applying stochastic geometry and queuing theory. A hybrid cloud-fog architecture was considered in . The delay-tolerable computation workloads, requested by the end users, are dispatched from the fog devices to the cloud servers when delay-sensitive workloads are computed at the fog devices. The studied problem was cast as a network-wide power minimization subject to an average delay requirement . Focusing on the cloud-fog architecture, Lee et al.  studied a scenario in which a fog node distributes the offloaded tasks to the connected fog nodes and a remote cloud server for cooperative computation. To address the uncertainty of the arrival of neighboring fog nodes, an online fog network formation algorithm was proposed such that the maximal average latency among different computation nodes is minimized . Considering a hierarchical cloudlet architecture, Fan and Ansari  proposed a workload allocation (among different cloudlet tiers) and computational resource allocation approach in order to minimize the average response time of a task request. The authors further focused on an edge computing-based IoT network in which each user equipment (UE) can run several IoT applications . Therein, the objective was to minimize the average response time subject to the delay requirements of different applications. In 
, a distributed workload balancing scheme was proposed for fog computing-empowered IoT networks. Based on the broadcast information of fog nodes’ estimated traffic and computation loads, each IoT device locally chooses the associated fog node in order to reduce the average latency of its data flow. In addition to the task uploading and computation phases, the work also accounted for the delay in the downlink phase, where the computed tasks are fed back to the users. The objective was to minimize a cost function of the estimated average delays of the three phases. The authors in  studied a software-defined fog network, where the data service subscribers (DSSs) purchase the fog nodes’ computation resources via the data service operators. Modeling the average latency using queuing theory in the DSS’s utility, a Stackelberg game and a many-to-many matching game were incorporated to allocate fog nodes’ resources to the DSSs .
I-B Our Contribution
While conventional communication networks were engineered to boost network capacity, little attention has been paid to reliability and latency performance. Indeed, ultra-reliable and low latency communication (URLLC) is one of the pillars for enabling 5G and is currently receiving significant attention in both academia and industry [23, 24, 25]. Regarding the existing MEC literature, the vast majority considers the average delay as a performance metric or the quality-of-service requirement [13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. In other words, these system designs focus on latency through the lens of the average. In the works addressing the stochastic nature of the task arrival process [7, 8, 9, 10, 11, 12], their prime concern is how to maintain the mean rate stability of task queues, i.e., ensuring a finite average queue length as time evolves 
. However, merely focusing on the average-based performance is not sufficient to guarantee URLLC for mission-critical applications, which mandates a further examination in terms of bound violation probability, high-order statistics, characterization of the extreme events with very low occurrence probabilities, and so forth.
The main contribution of this work is to propose a URLLC-centric task offloading and resource allocation framework, by taking into account the statistics of extreme queue length events. We consider a multi-user MEC architecture with multiple servers having heterogeneous computation resources. Due to the UE’s limited computation capability and the additional incurred latency during task offloading, the UEs need to smartly allocate resources for local computation and the amount of tasks to offload via wireless transmission if the executed applications are latency-sensitive or mission-critical. Since the queue value is implicitly related to delay, we treat the former as a delay metric in this work. Motivated by the aforementioned drawbacks of average-based designs, we set a threshold for the queue length and impose a probabilistic requirement on the threshold deviation as a URLLC constraint. In order to model the event of threshold deviation, we characterize its statistics by invoking extreme value theory  and impose another URLLC constraint in terms of higher-order statistics. The problem is cast as a network-wide power minimization problem for task computation and offloading, subject to statistical URLLC constraints on the threshold deviation and extreme queue length events. Furthermore, we incorporate the UEs’ mobility feature and propose a two-timescale UE-server association and task computation framework. In this regard, taking into account task queue state information, servers’ computation capabilities and workloads, co-channel interference, and URLLC constraints, we associate the UEs with the MEC servers, in a long timescale, by utilizing matching theory . Then, given the associated MEC server, task offloading and resource allocation are performed in the short timescale. To this end, we leverage Lyapunov stochastic optimization  to deal with the randomness of task arrivals, wireless channels, and task queue values. Simulation results show that considering the statistics of the extreme queue length as a reliability measure, the studied partially-offloading scheme includes more reliable task execution than the scheme without MEC servers and the fully-offloading scheme. In contrast with the received signal strength (RSS)-based baseline, our proposed UE-server association approach achieves better delay performance for heterogeneous MEC server architectures. The performance enhancement is more remarkable in denser networks.
The remainder of this paper is organized as follows. The system model is first specified in Section II. Subsequently, we formulate the latency requirements, reliability constraints, and the studied optimization problem in Section III. In Section IV, we detailedly specify the proposed UE-server association mechanism as well as the latency and reliability-aware task offloading and resource allocation framework. The network performance is evaluated numerically and discussed in Section V which is followed by Section VI for conclusions. Furthermore, for the sake of readability, we list all notations in Table II shown in Appendix A. The meaning of the notations will be detailedly defined in the following sections.
Ii System Model
The considered MEC network consists of a set of UEs and a set of MEC servers. UEs have computation capabilities to execute their own tasks locally. However, due to the limited computation capabilities to execute computation-intense applications, UEs can wirelessly offload their tasks to the MEC servers with an additional cost of communication latency. The MEC servers are equipped with multi-core CPUs such that different UEs’ offloaded tasks can be computed in parallel. Additionally, the computation and communication timeline is slotted and indexed by in which each time slot, with the slot length , is consistent with the coherence block of the wireless channel. We further assume that UEs are randomly distributed and moves continuously in the network, whereas the MEC servers are located in fixed positions. Since the UE’s geographic location keeps changing, the UE is incentivized to offload its tasks to a different server which is closer to the UE, provides a stronger computation capability, or has the lower workload than the currently associated one. In this regard, we consider a two-timescale UE-server association and task-offloading mechanism. Specifically, we group every successive time slots as a time frame, which is indexed by and denoted by . In the beginning of each time frame (i.e., the long/slow timescale), each UE is associated with an MEC server. Let represent the UE-server association indicator in the th time frame, in which indicates that UE can offload its tasks to server during time frame . Otherwise, . We also assume that each UE can only offload its tasks to one MEC server at a time. The UE-server association rule can be formulated as
Subsequently in each time slot, i.e., the short/fast timescale, within the th frame, each UE dynamically offloads part of the tasks to the associated MEC server and computes the remaining tasks locally. The network architecture and timeline of the considered MEC network are shown in Fig. 1.
Ii-a Traffic Model at the UE Side
The UE uses one application in which tasks arrive in a stochastic manner. Following the data-partition model , we assume that each task can be computed locally, i.e., at the UE, or remotely, i.e., at the server. Different tasks are independent and can be computed in parallel. Thus, having the task arrivals in time slot , each UE divides its arrival into two disjoint parts in which one part is executed locally when the remaining tasks will be offloaded to the server. Task splitting at UE can be expressed as
Here, represents the unit task which cannot be further split. Moreover, we assume that task arrivals are independent and identically distributed (i.i.d.) over time with the average arrival rate .
Each UE has two queue buffers to store the split tasks for local computation and offloading. For the UE ’s local-computation queue, the queue length (in the unit of bits) in time slot is denoted by which evolves as
Here, (in the unit of cycle/sec) is the UE ’s allocated CPU-cycle frequency to execute tasks when accounts for the required CPU cycles per bit for computation, i.e., the processing density. The magnitude of the processing density depends on the performed application.111 For example, the six-queen puzzle, 400-frame video game, seven-queen puzzle, face recognition, and virus scanning require the processing densities of 1760 cycle/bit, 2640 cycle/bit, 8250 cycle/bit, 31680 cycle/bit, and 36992 cycle/bit, respectively
For example, the six-queen puzzle, 400-frame video game, seven-queen puzzle, face recognition, and virus scanning require the processing densities of 1760 cycle/bit, 2640 cycle/bit, 8250 cycle/bit, 31680 cycle/bit, and 36992 cycle/bit, respectively. Furthermore, given a CPU-cycle frequency , the UE consumes the amount of power for computation. is a parameter affected by the device’s hardware implementation [10, 29]. For UE ’s task-offloading queue, we denote the queue length (in the unit of bits) in time slot as . Analogously, the task-offloading queue dynamics is given by
is UE ’s transmission rate222 All transmissions are encoded based on a Gaussian distribution.
All transmissions are encoded based on a Gaussian distribution.to offload tasks to the associated MEC server in time slot . and are UE ’s transmit power and the power spectral density of the additive white Gaussian noise (AWGN), respectively. is the bandwidth dedicated to each server and shared by its associated UEs. Additionally, is the wireless channel gain between UE and server , including path loss and channel fading. We also assume that all channels experience block fading. In this work, we mainly consider the uplink, i.e., offloading tasks from the UE to the MEC server, and neglect the downlink, i.e., downloading the computed tasks from the server. The rationale is that compared with the offloaded tasks before computation, the computation results typically have smaller sizes [15, 30, 31]. Hence, the overheads in the downlink can be neglected.
In order to minimize the total power consumption of resource allocation for local computation and task offloading, the UE adopts the dynamic voltage and frequency scaling (DVFS) capability to adaptively adjust its CPU-cycle frequency [5, 29]. Thus, to allocate the CPU-cycle frequency and transmit power, we impose the following constraints at each UE , i.e.,
where is UE ’s power budget.
Ii-B Traffic Model at the Server Side
We assume that each MEC server has distinct queue buffers to store different UEs’ offloaded tasks, where the queue length (in bits) of the UE ’s offloaded tasks at server in time slot is denoted by . The offloaded-task queue length evolves as
Here, is the server ’s allocated CPU-cycle frequency to process UE ’s offloaded tasks. Note that the MEC server is deployed to provide a faster computation capability for the UE. Thus, we consider the scenario in which each CPU core of the MEC server is dedicated to at most one UE (i.e., its offloaded tasks) in each time slot, and a UE’s offloaded tasks at each server can only be computed by one CPU core at a time [9, 10]. The considered computational resource scheduling mechanism at the MEC server is mathematically formulated as
where denotes the total CPU-core number of server , is server ’s computation capability of one CPU core, and is the indicator function. In (9), we account for the allocated CPU-cyle frequencies to all UEs even though some UEs are not associated with this server in the current time frame. The rationale will be detailedly explained in Section IV-D after formulating the concerned optimization problem. Additionally, in order to illustrate the relationship between the offloaded-task queue length and the transmission rate, we introduce inequality (8) which will be further used to formulate the latency and reliability requirements of the considered MEC system and derive the solution of the studied optimization problem.
Iii Latency Requirements, Reliability Constraints, and Problem Formulation
In this work, the end-to-end delays experienced by the locally-computed tasks and offloaded tasks consist of different components. When the task is computed locally, it experiences the queuing delay (for computation) and computation delay at the UE. If the task is offloaded to the MEC server, the end-to-end delay includes: 1) queuing delay (for offloading) at the UE, 2) wireless transmission delay while offloading, 3) queuing delay (for computation) at the server, and 4) computation delay at the server. From Little’s law, we know that the average queuing delay is proportional to the average queue length . However, without taking the tail distribution of the queue length into account, solely focusing on the average queue length fails to account for the low-latency and reliability requirement . To tackle this, we focus on the statistics of the task queue and impose probabilistic constraints on the local-computation and task-offloading queue lengths as follows:
Here, and are the queue length bounds when and are the tolerable bound violation probabilities. Furthermore, the queue length bound violation also undermines the reliability issue of task computation. For example, if a finite-size queue buffer is over-loaded, the incoming tasks will be dropped.
In addition to the bound violation probability, let us look at the complementary cumulative distribution function (CCDF) of the UE’s local-computation queue length, i.e.,, which reflects the queue length profile. If the monotonically decreasing CCDF decays faster while increasing , the probability of having an extreme queue length is lower. Since the prime concern in this work lies in the extreme-case events with very low occurrence probabilities, i.e., , we resort to principles of extreme value theory333Extreme value theory is a powerful and robust framework to study the tail behavior of a distribution. Extreme value theory also provides statistical models for the computation of extreme risk measures. to characterize the statistics and tail distribution of the extreme event . To this end, we first introduce the Pickands–Balkema–de Haan theorem .
Theorem 1 (Pickands–Balkema–de Haan theorem).
Consider a random variable
Consider a random variable, with the cumulative distribution function (CDF) , and a threshold value . As the threshold closely approaches , i.e., , the conditional CCDF of the excess value , i.e., , can be approximated by a generalized Pareto distribution (GPD) , i.e.,
|where and ,||(12a)|
|where and ,||(12b)|
|where and ,||(12c)|
which is characterized by a scale parameter and a shape parameter .
In other words, the conditional CCDF of the excess value converges to a GPD as . However, from the proof  for Theorem 1, we know that the GPD provides a good approximation when is close to 1, e.g., . That is, depending on the CDF of , imposing a very large might not be necessary for obtaining the approximated GPD. Moreover, for a GPD , its mean
and other higher-order statistics such as variance
and skewness exist if, , and , respectively. Note that the scale parameter and the domain of are in the same order. In this regard, we can see that at and at in (12b). We also show the CCDFs of the GPDs for various shape parameters in Fig. 2, where the x-axis is indexed with respect to the normalized value . As shown in Fig. 2, the decay speed of the CCDF increases as decreases. In contrast with the curves with , we can see that the CCDF decays rather sharply when .
Now, let us denote the excess value (with respect to the threshold in (10)) of the local-computation queue of each UE in time slot as . By applying Theorem 1, the excess queue value can be approximated by a GPD whose mean and variance are
with the corresponding scale parameter and shape parameter . In (13) and (14), we can find that the smaller and are, the smaller the mean value and variance. Since the approximated GPD is just characterized by the scale and shape parameters as mentioned previously, therefore, we impose thresholds on these two parameters, i.e., and . The selection of threshold values can be referred to the above discussions about the GPD, Fig. 2, and the magnitude of the interested metric’s values. Subsequently, applying the two parameter thresholds and to (13) and (14
), we consider the constraints on the long-term time-averaged conditional mean and second moment of the excess value of each UE’s local-computation queue length, i.e.,
with . Analogously, denoting the excess value, with respect to the threshold , of UE ’s task-offloading queue length in time slot as , we have the constraints on the long-term time-averaged conditional mean and second moment
in which and are the thresholds for the characteristic parameters of the approximated GPD, and .
Likewise, the average queuing delay at the server is proportional to the ratio of the average queue length to the average transmission rate. Referring to (8), we consider the probabilistic constraint as follows:
with the threshold and tolerable violation probability , on the offloaded-task queue length at the MEC server. is the moving time-averaged transmission rate. Similar to the task queue lengths at the UE side, we further denote the excess value, with respect to the threshold , in time slot as of the offloaded-task queue length of UE at server and impose the constraints as follows:
with . Here, and are the thresholds for the characteristic parameters of the approximated GPD.
We note that the local computation delay at the UE and the transmission delay while offloading are inversely proportional to the computation speed and the transmission rate as per (3) and (4), respectively. To decrease the local computation and transmission delays, the UE should allocate a higher local CPU-cycle frequency and more transmit power, which, on the other hand, incurs energy shortage. Since allocating a higher CPU-cycle frequency and more transmit power can also further decrease the queue length, both (local computation and transmission) delays are implicitly taken into account in the queue length constraints (10), (11), and (15)–(18). At the server side, the remote computation delay can be neglected because one CPU core with the better computation capability is dedicated to one UE’s offloaded tasks at a time. On the other hand, the server needs to schedule its computational resources, i.e., multiple CPU cores, when the associated UEs are more than the CPU cores.
Incorporating the aforementioned latency requirements and reliability constraints, the studied optimization problem is formulated as follows:
where and are the UE ’s long-term time-averaged power consumptions for local computation and task offloading, respectively. and
denote the network-wide UE-server association and transmit power allocation vectors, respectively. In addition,denotes the network-wide computational resource allocation vector in which is the computational resource allocation vector of server . To solve problem MP, we utilize techniques from Lyapunov stochastic optimization and propose a dynamic task offloading and resource allocation policy in the next section.
Iv Latency and Reliability-Aware Task Offloading and Resource Allocation
Let us give an overview of the proposed task offloading and resource allocation approach before specifying the details. In the beginning of each time frame, i.e., every slots, we carry out a UE-server association, taking into account the wireless link strength, the UEs’ and servers’ computation capabilities, their historical workloads, and URLLC constraints (11) and (17)–(21). To this end, a many-to-one matching algorithm is utilized to associate each server with multiple UEs. Afterwards, we focus on task offloading and resource allocation by solving three decomposed optimization problems, via Lyapunov optimization, in each time slot. At the UE side, each UE splits its instantaneous task arrivals into two parts, which will be computed locally and offloaded respectively, while allocating the local computation CPU-cyle frequency and transmit power for offloading. At the server side, each MEC server schedules its CPU cores to execute the UEs’ offloaded tasks. In the procedures (of task splitting and offloading, resource allocation, and CPU-core scheduling), the related URLLC constraints out of (10), (11), and (15)–(21) are considered. The details of our proposed approach will be illustrated in the remainder of this section.
Iv-a Lyapunov Optimization Framework
We first introduce a virtual queue for the long-term time-averaged constraint (15) with the queue evolution as follows:
in which the incoming traffic amount and outgoing traffic amount correspond to the left-hand side and right-hand side of the inequality (15), respectively. Note that  ascertains that the introduced virtual queue is mean rate stable, i.e., , is equivalent to satisfying the long-term time-averaged constraint (15). Analogously, for the constraints (16)–(18), (20), and (21), we respectively introduce the virtual queues as follows:
Now problem MP is equivalently transferred to 
To solve problem MP’, we let denote the combined queue vector for notational simplicity and express the conditional Lyapunov drift-plus-penalty for slot as
is the Lyapunov function. The term is a parameter which trades off objective optimality and queue length reduction. Subsequently, plugging the inequality , all physical and virtual queue dynamics, and (8) into (34), we can derive
Here, is UE ’s maximum offloading rate. Since the constant does not affect the system performance in Lyapunov optimization, we omit its details in (35) for expression simplicity. Note that a solution to problem MP’ can be obtained by minimizing the upper bound (35) in each time slot , in which the optimality of MP’ is asymptotically approached by increasing . To minimize (35), we have three decomposed optimization problems P1, P2, and P3 which are detailed and solved in the following parts.
The first decomposed problem, which jointly associates UEs with MEC servers and allocates UEs’ computational and communication resources, is given by
Note that in P1, the UE’s allocated transmit power is coupled with the local CPU-cycle frequency. The transmit power also depends on the wireless channel strength to the associated server and the weight of the corresponding offloaded-task queue, in which the former depends on the distance between the UE and server when the latter is related to the MEC server’s computation capability and the number of associated UEs. Therefore, the UEs’ geographic configuration and the servers’ computation capabilities should be taken into account while we associate the UEs with the servers. Moreover, UE-server association, i.e., , and resource allocation, i.e., and , are performed in two different timescales, i.e., in the beginning of each time frame and every time slot afterwards. We solve P1 in two steps, in which the UE-server association is firstly decided. Then, given the association results, UEs’ CPU-cycle frequencies and transmit powers are allocated.
Iv-B UE-Server Association using Many-to-One Matching with Externalities
To associate UEs to the MEC servers, let us focus on the wireless transmission part of P1 and, thus, fix and , at this stage. The wireless channel gain and the weight factors and dynamically change in each time slot, whereas the UEs are re-associated with the servers in every slots. In order to take the impacts of ,