The explosive growth of Internet of Things (IoT) in recent years enables cost-effective interconnections between tens of billions of wireless devices (WDs), such as sensors and wearable devices. Due to the stringent size constraint and production cost concern, an IoT device is often equipped with a limited battery and a low-performance on-chip computing unit, which are recognized as two fundamental impediments for supporting computation intensive applications in future IoT. Mobile edge computing (MEC) [2, 3], viewed as an efficient solution, has attracted significant attention. The key idea of MEC is to offload intensive computation tasks to the edges of radio access network, where much more powerful servers will compute on behalf of the resource-limited WDs. Compared with the traditional mobile cloud computing, MEC can overcome the drawbacks of high overhead and long backhaul latency.
In general, MEC has two computation offloading models: binary and partial offloading . Binary offloading requires each task to be either computed locally or offloaded to the MEC server as a whole. Partial offloading, on the other hand, allows a task to be partitioned and executed both locally and at the MEC server. In this paper, we consider binary computation offloading, which is commonly used in IoT systems for processing non-partitionable simple tasks .
Because of the time-varying wireless channel condition, it is not necessarily optimal to always offload all the computations to the MEC server, e.g., deep fading may lead to very low offloading data rate. Meanwhile, wireless resource allocation, e.g., transmit time and power, needs to be jointly designed with the computation offloading for optimum computing performance. In this regard, on the one hand, [4, 5, 6, 7] focused on the optimal binary offloading policies when each user only has one task to be executed. Specifically,  considered energy-optimal offloading and resource allocation in the single user case. Authors in 
further considered a wireless powered MEC and maximize the probability of successful computations. The performance optimization of multi-user wireless powered MEC system was later studied in[4, 5]. On the other hand,  and  considered a more general scenario, where the binary offloading model is applied to multiple independent tasks. Specifically,  considered multiple mobile users that each offloads multiple independent tasks to a single access point. In , a single user can offload independent tasks to multiple edge devices, which then minimizes the weight sum of WD’s energy consumption and total tasks’ execution latency.
Nonetheless, the above studies do not consider the important dependency among different tasks in various applications. That is, a user often needs to execute multiple related tasks, where the input of one task requires the output of another. Since the executions are coupled among tasks, the optimal design becomes much more difficult than the previous case where independent tasks can be executed in parallel. Call graphs  are commonly used to model the dependency among different tasks [11, 12, 13, 14, 15]. For a single-user MEC system,  considered a general call graph and obtained the joint optimal task-offloading decisions and transmit power that minimize the WD’s energy consumption under latency constraint. Besides, the authors in  considered a sequential call graph for a single user and derived an optimal one-climb policy, which means that the execution migrates only at most once between the WD and the cloud server. This work was extended to a call graph with a general topology in  and an online task offloading scenario was studied in . A multi-user case was considered in , where each independent WD has multiple tasks with a general call graph and the goal is to optimize the energy efficiency.
The call graphs considered by most of the existing studies on MEC, such as in [11, 12, 13, 14, 15], only take into account the dependency among tasks executed by an individual WD. In practice, tasks executed by different WDs usually have relevance as well. For example, an IoT sensor often needs to combine the processed data from other sensors. The inter-user task dependency has significant impact to the offloading and resource allocation decisions. For instance, a WD is likely to offload its task to the edge server even when the channel condition is poor, because another WD with time-critical applications is urgently in need of its computation output. Besides, the exchange of computation results for dependent tasks also consumes extra energy and time. In general, the case with inter-user dependency requires the joint optimization of tasks executions of all correlated users, which is a challenging problem yet lacking of concrete study.
In this paper, we consider a task call graph in a two-user MEC system as shown in Fig. 1, where the computation of an intermediate task at WD2 requires the output of the last task at WD1. To the authors’ best knowledge, this is the first work that exploits the task dependency across different users in an MEC system. The main contributions of this paper are as follows:
With the inter-user task dependency in Fig. 1, we formulate a mixed integer optimization problem to minimize the weighted sum of the WDs’ energy consumption and task execution time. The task offloading decisions, local CPU frequencies and transmit power of each WD are jointly optimized. The problem is challenging due to the combinatorial nature of the offloading decisions among all tasks in such call graph and the strong coupling with resource allocation.
Given the offloading decisions, we first derive closed-form solutions of the optimal local CPU frequencies and transmit power of each WD, respectively. We then establish an inequality condition of the completion time between the two dependent tasks, based on which an efficient bi-section search method is proposed to obtain the optimal resource allocation.
We further show that the optimal offloading decisions follow an one-climb policy, where each WD offloads its data at most once to the edge server at the optimum. Based on the one-climb policy, we propose a reduced-complexity Gibbs sampling algorithm to obtain the optimal offloading decisions.
Simulation results show that our proposed algorithm can effectively reduce the energy consumption and computation delay compared with other representative benchmarks. In particular, it significantly outperforms the scheme that neglects the task dependency and optimizes the two WDs’ performance individually. Meanwhile, the proposed method has low computational complexity with respect to the size of call graph.
The rest of the paper is organized as follows. In Section II, we describe the system model and formulate the problem. The optimal CPU frequencies and transmit power of each WD under fixed offloading decisions are derived in Section III. In Section IV, we first prove that the optimal offloading decisions follow an one-climb policy and based on that, a reduced-complexity Gibbs sampling algorithm is proposed. In Section V, the performance of the proposed algorithms is evaluated via simulations. Finally, we conclude the paper in Section VI.
Ii System Model and Problem Formulation
We consider an MEC system with two WDs and one access point (AP), all equipped with single antenna. The AP is the gateway of the edge cloud and has a stable power supply. As shown in Fig.1, WD1 and WD2 have and sequential tasks to execute, respectively. For simplicity of exposition, we introduce for each WD an auxiliary node 0 as the entry task, and auxiliary nodes , as the exit tasks for WD1 and WD2, respectively. In particular, we assume that the computations of the two WDs are related, such that the calculation of an intermediate task of WD2, denoted as , for , requires the output of the last task of WD1.
Each task of WD is characterized by a three-item tuple , where when , and when . Specifically, denotes the computing workload in terms of the total number of CPU cycles required for accomplishing the task, and denote the size of computation input and output data in bits, respectively. As for the two auxiliary nodes of each WD, . For WD1, it holds that , . As for the WD2, we have
Moreover, for the entry node and for the exit node of each WD.
We assume that the two series of tasks must be initiated and terminated at the respective WD. That is, the auxiliary entry and exist tasks must be executed locally, while the other actual tasks can be either executed locally or offloaded to the edge server. We denote the computation offloading decision of task of WD as , where denotes edge execution and denotes local computation.
In addition, we assume that each WD is allocated with an orthogonal channel of equal bandwidth , thus there is no interference between the WDs when offloading/downloading. The wireless channel gains between the WD and the AP when offloading and downloading task are denoted as and
, respectively. Besides, we assume additive white Gaussian noise (AWGN) with zero mean and equal varianceat all receivers for each user.
In the following, we discuss the computation overhead in terms of execution time and energy consumption for local and edge computing, respectively.
Ii-a Local Computing
We denote the CPU frequency of WD for computing task as . Thus, the local computation execution time can be given by
and the corresponding energy consumption is 
where is the fixed effective switched capacitance parameter depending on the chip architecture.
Ii-B Edge Computing
Let denote the transmit power of WD when offloading task to the AP, and we can express the uplink data rate for offloading task of WD as
From (4), the transmission time of WD when offloading task is expressed as
Then, the transmission energy consumption is
We assume that the edge server can compute the tasks of different WDs in parallel. The execution time of task of WD on the edge is given by , where is the constant CPU frequency of the edge server.
Furthermore, as for the downlink transmission, we denote the fixed transmit power of the AP by . Thus, the downlink data rate for feeding the -th task’s input of WD from the AP when computing task locally can be expressed as
Likewise, the time needed for the downlink transmission is given by .
Ii-C Task Dependency Model
As shown in Fig. 2, the task dependency model between the two WDs can be one of the following four cases, depending on the values of and .
Case 1: When both the -th task of WD1 and the -th task of WD2 are executed locally, i.e., and , the AP acts as a relay node. First, the WD1 uploads its output of -th task to the AP. Then, the AP forwards this information to the WD2. Specifically, the uplink transmission time and energy in this process are
respectively, where and are the corresponding uplink data rate and uplink transmit power, respectively. As for the downlink transmission, the transmission time is denoted as
Case 2: When the -th task of WD1 is executed at the edge and the -th task of WD2 is computed locally, i.e., and , the output of -th task of WD1 is downloaded to the WD2 after execution at the edge.
Case 3: In this case, the -th task of WD1 is executed locally and the -th task of WD2 is offloaded to the edge, i.e., and . The WD1 needs to upload the result before the computation of the -th task of WD2 at the edge.
Case 4: In this case, both the -th task of WD1 and the -th task of WD2 are executed at the edge, i.e., and . Therefore, neither uplink nor downlink transmission is needed.
Ii-D Problem Formulation
From the above discussion, in order to obtain the total tasks execution time of WD1, we first denote the time spent on computations both locally and at the edge server by , which can be expressed as
As for the communication delay consumed on uploading/downloading the task data to/from the AP, we have
Note that there is no communication delay for the -th task if , i.e., the two tasks are computed at the same device. Otherwise, if and , the communication delay is due to the uplink transmission time , whereas, if and , the communication delay is due to the downlink transmission time . Therefore, the total tasks execution time of WD1 is
Furthermore, we can calculate the total energy consumption of WD1 by
which consists of the total execution energy of tasks and the energy consumption on offloading the final result if the -th task is computed locally, i.e., when . Note that the energy cost for the uplink transmission occurs in (15) only if and .
Similarly, the total computation energy consumption of WD2 can be expressed as
As for the execution time of WD2, we first consider the waiting time until the output of the -th task of WD1 reaches WD2, denoted by , as follows.
It consists of the total execution time of tasks of WD1, and the transmit time of the output of the -th task as shown in the four cases of Fig. 2.
Meanwhile, the waiting time until the output of the -th task of WD2 is ready, denoted by , is given by
which includes the total execution time of the first tasks and the transmission time on offloading task (i.e., when , ) or downloading the output of task to WD2 (i.e., when , ). From (II-D) and (II-D), the total waiting time before the -th task of WD2 is ready for execution is
Accordingly, the total task execution time of WD2 equals to plus the execution time of tasks from to , i.e.,
where and denote the weights of energy consumption and computation completion time for WD1, respectively. Without loss of generality, it is assumed that the weights are related by . Accordingly, the ETC of WD2 is
where and denote the two weighting parameters satisfying . It is worth noting that represents a special case which will be discussed in Section III, while leads to a trivial solution that the WD2 will take infinitely long time to finish its task executions.
Denoting , , and , we are interested in minimizing the total ETC of the two WDs by solving the following problem:
where the first two constraints correspond to the peak transmit power and peak CPU frequency. We assume in this paper. Because of the one-to-one mappings between and in (2) and between and in (6), it is equivalent to optimize (P1) over the time allocation . By introducing an auxiliary variable , (P1) can be equivalently expressed as
), respectively. Notice that (P2) is non-convex in general due to the binary variables. However, it can be seen that for any given , the remaining optimization over is a convex problem. In the following section, we assume that the offloading decision is given and study some interesting properties of the optimal CPU frequencies and the transmit power of each WD, based on which an efficient method is proposed to obtain the optimal solutions.
Iii Optimal Resource Allocation under Fixed Offloading Decision
Iii-a Optimal Solution of (P2) given
Suppose that is given. A partial Lagrangian of Problem (P2) is given by
where and denote the dual variables associated with the corresponding constraints.
Let and denote the optimal dual variables. We derive the closed-form expressions of the optimal CPU frequencies and transmit power of each WD as follows.
Proposition 3.1: with , the optimal CPU frequencies of the two WDs satisfy
Please refer to Appendix A. ∎
From Proposition 3.1, we have the following observations:
The optimal local CPU frequencies are the same for all the tasks of the same type, i.e., in WD1, or in WD2, regardless of the wireless channel conditions and workloads.
For each task of WD1, when or increases (a larger corresponds to a tighter task dependency constraint at optimum), the optimal strategy is to speed up local computing. However, with the increase of , the WD1 prefers to save energy with a lower optimal .
For the -th task of WD2, , a larger leads to a higher optimal . On the other hand, the optimal is not related to for , as the corresponding executions are not constrained by the WDs’ dependency.
Here, denotes the Lambert function, which is the inverse function of , i.e., .
Please refer to Appendix B. ∎
From Proposition 3.2, we obtain the following observations:
The optimal transmit power is inversely proportional to the channel gain when is above a threshold, and equals the peak power when the channel gain is below the threshold.
With the increase of peak transmit power , the value of the threshold is decreasing. This means that for a larger , the WDs tend to transmit at the maximum power when meeting worse channel condition.
Based on Propositions 3.1 and 3.2, our precedent conference paper  applies an ellipsoid method  to search for the optimal dual variables . The ellipsoid method guarantees to converge because (P2) is a convex problem given . In general, the ellipsoid method may take a long time to converge.
In this paper, we further study some interesting properties of an optimal solution in the following Lemma 3.1 and 3.2, based on which a reduced complexity one-dimensional bi-section search method is proposed in the following subsection.
Lemma 3.1: and hold at the optimum of (P2).
We prove this lemma by contradiction. Suppose that there exists an optimal solution with . According to the KKT conditions and , we have and . As , according to (26) and (28), the optimal and are finite, which means that are finite for all . Hence, is finite. However, when , we have the optimal from (27) and from (29). Thus, we have . This contradicts with the assumption that , and thus completes the proof. ∎
The above lemma indicates that the -th task’s waiting time for the input data stream from WD1 is not larger than that for the other input from WD2. In other words, WD2 always receives the task output from WD1 first and then waits until its local tasks finish before computing the -th task. In addition to the results in Lemma 3.1, the following lemma 3.2 shows two special cases, where is satisfied.
Lemma 3.2: holds at the optimum of (P2) if one of the following two sufficient conditions is satisfied:
The proof is similar as that of Lemma 3.1 and is omitted here. ∎
Specifically, in the first case, the role of WD1 is solely to provide needed data to WD2 and minimizing its own execution time is not an objective. Nonetheless, the execution time of WD1 still affects that of WD2, which is to be minimized. In the second case, the -th task of WD1 chooses to perform local computing, i.e., .
Iii-B A Low-complexity Bi-section Search Method
According to Lemma 3.1, we have . Therefore, Problem (P2) is simplified as
Similarly, the Lagrangian of Problem (P3) is
where denotes the dual variable associated with the constraint .
By applying the KKT conditions in (P3), we can obtain the optimal solutions of and . The details are omitted here. By combining with the optimal solutions in Proposition 3.1 and Proposition 3.2, we have the following proposition.
Proposition 3.3: The optimal dual variables in (P2) and in (P3) are related by
where . In other words, we have
Note that (P3) is convex given the offloading decision . Thus, is a sufficient condition for optimality. By defining , we can efficiently obtain the optimal based on the following proposition.
Proposition 3.4: is a monotonically decreasing function in . Besides, a unique that satisfies exists when .
It can be proved that both and are monotonically increasing function in , while and , , are monotonically decreasing function in . Therefore, all terms in decrease with , thus is a monotonically decreasing function in . Meanwhile, when , it holds that and , , which leads to when . Together with the result that is a monotonically decreasing function, there must exist a unique that satisfies when . ∎
With Proposition 3.4, when , the optimal can be efficiently obtained via a bi-section search over that satisfies . If , we have according to the KKT condition . Now that is obtained, the optimal can be directly calculated using (31), (26), (27), (28) and (29). Due to the convexity, the primal and dual optimal values are the same for (P3) given .
The pseudo-code of the bi-section search method is illustrated in Algorithm 1. Given a precision parameter , it takes number iterations for Algorithm 1 to converge. In each iteration, the computational complexity is proportional to the number of tasks in WDs, i.e., . Therefore, the overall complexity of Algorithm 1 is .
Iv Optimization of Offloading Decision
In section III, we efficiently obtain the optimal of (P1) once is given. Intuitively, one can enumerate all feasible and choose the optimal one that yields the minimum objective in (P2). However, this brute-force search quickly becomes computationally prohibitive as increases. In this section, we propose an efficient approximate algorithm to reduce the complexity.
Iv-a One-climb Policy
Here, we first show in the following Theorem 1 that the optimal offloading decision has an one-climb structure.
Theorem 1 (one-climb policy): Assuming that , the execution for each WD migrates at most once from the WD to the edge server at the optimum.
In the following, we prove the one-climb policy by contradiction. Suppose that the optimal offloading decision allows a WD to offload its data two times, as shown in the Fig. 3(a). Under the two-time offloading scheme, tasks from to are migrated to the edge server for execution. Then, tasks from to execute at the WD , followed by tasks from to migrated to the edge server, where is the index of WDs. As for the one-climb scheme in Fig. 3(b), tasks of WD from to are, however, executed on the edge server.
We denote the optimal offloading decision, local CPU frequencies and transmit power of WD in the two-time and one-climb offloading schemes as and , respectively. By the optimality assumption, we have
For the two-time offloading policy in WD1, the total execution time from the -th task to the -th task can be expressed as
As for the one-climb policy in WD1, we have
Since the computing speed of the edge server is higher than that of the WDs, i.e., , the following inequalities hold for the -th and -th tasks:
In addition, we have for the tasks of WD1 between and . Therefore, it can be shown that .
On the other hand, with respect to the energy consumption of WD1 from the -th task to the -th task, we observe that the two-time offloading scheme consumes more energy compared with the one-climb policy due to the local tasks computing from to , the -th task’s offloading and the -th task’s offloading as illustrated in Fig. 2 (if ). That is, , where and denote the energy consumption from the -th task to the -th task in the two-time and one-climb offloading schemes, respectively.
Similarly, as for the WD2, if , and hold according to the above discussion. Since extra time cost will be introduced if according to the task dependency model illustrated in Fig. 2, we still have and when