Recent advancements in the fifth-generation (5G) cellular technologies have enabled various new applications such as the augmented reality (AR), autonomous driving, and Internet of things (IoT). These applications demand ultra-low-latency communication, computation, and control among a large number of wireless devices (e.g., sensors and actuators). In practice, the real-time computation tasks to be executed can be quite intensive, but wireless devices are generally of small size and only have limited communication, computation, and storage resources (see, e.g., ). Therefore, how to enhance their computation capabilities and reduce the computation latency is one crucial but challenging issue to be tackled for making these 5G applications a reality.
Conventionally, mobile cloud computing (MCC) has been widely adopted to enhance wireless devices’ computation capabilities, by moving their computing and data storage to the remote centralized cloud . However, as the cloud servers are normally distant from wireless devices, MCC may not be able to meet the stringent computation latency requirements for emerging 5G applications. To overcome such limitations, mobile edge computing (MEC) has been recently proposed as a new solution to provide cloud-like computing at the edge of wireless networks (e.g., access points (APs) and cellular base stations (BSs)), by deploying distributed MEC servers therein [4, 3, 5, 6, 7, 9, 8]. In MEC, wireless devices can offload computation-intensive and latency-critical tasks to APs/BSs in close proximity for remote execution, thus achieving much lower computation latency.
The computation offloading design in MEC systems critically relies on tractable computation task models. Two widely adopted task models in the MEC literature are binary and partial offloading, respectively [6, 7]. In binary offloading, the computation tasks are not partitionable, and thus should be executed as a whole via either local computing at the user or offloading to the MEC server. This practically corresponds to highly integrated or relatively simple tasks such as those in speech recognition and natural language translation. In contrast, for partial offloading, the computation tasks need to be partitioned into two or more independent parts, which can be executed in parallel by local computing and offloading. This corresponds to applications with multiple fine-grained procedures/components, in, e.g., AR applications . Based on the binary and partial offloading models, the prior works (see, e.g., [10, 11, 9, 12, 13, 14, 16, 15, 17]) investigated the joint computation and communication resources allocation to improve the performance of MEC. For example,  and  considered a single-user MEC system with dynamic task arrivals and channel fading, in which the user jointly optimizes the local computing or offloading decisions to minimize the computation latency, subject to the computation and communication resource constraints. [12, 13, 14] investigated the energy-efficient design in multiuser MEC systems with multiple users offloading their respective tasks to a single AP/BS for execution, in which the objective is to minimize the users’ energy consumption while ensuring their computation latency requirements. Furthermore, [16, 15, 17] proposed wireless powered MEC systems by integrating the emerging wireless power transfer (WPT) technology into MEC for self-sustainable computing, where the AP employs WPT to power the users’ local computing and offloading.
Despite the recent research progress, multiuser MEC designs still face several technical challenges. First, the computation resources at the MEC server and the communication resources at the AP should be shared among the actively-computing users. When the user number becomes large, the computation and communication resources allocated to each user are fundamentally limited, thus compromising the benefit of MEC. Next, due to the signal propagation loss over distances, far-apart users may spend much more communication resources than nearby users for offloading, which results in a near-far user fairness issue. Note that the 5G networks are expected to consist of massive wireless devices with certain computation and communication resources. Due to the burst nature of wireless traffic, each active device is highly likely to be surrounded by some idle devices with unused or additional resources. As such, in this paper we propose a novel joint computation and communication cooperation approach in multiuser MEC systems, such that the nearby users are enabled as helpers to share their computation and communication resources to help actively-computing users, thereby improve the MEC computation performance.
In this paper, we consider a basic three-node MEC system with user cooperation, which consists of a user node, a helper node, and an AP node attached with an MEC server. Here, the helper node can be an IoT sensor, a smart phone, or a laptop, which is nearby and has certain computation and communication resources.111In general, the computation capability of the helper should be comparable or stronger than that of the user in order for the computation cooperation to be feasible. We focus on the user’s latency-constrained computation within a given time block. To implement the joint computation and communication cooperation, the block is divided into four time slots. Specifically, in the first slot, the user offloads some computation tasks to the helper for remote execution. In the second and third slots, the helper acts as a decode-and-forward (DF) relay to help the user offload some other computation tasks to the AP, for remote execution at the MEC server in the fourth slot. Under this setup, we pursue an energy-efficient user cooperation MEC design for both the partial and binary offloading cases, by jointly optimizing the computation and communication resource allocations. The main results of this paper are summarized as follows.
First, for the partial offloading case, the user’s computation tasks are partitioned into three parts for local computation, offloading to helper, and offloading to AP, respectively. Towards minimizing the total energy consumption at both the user and the helper subject to the user’s computation latency constraint, we jointly optimize the task partition of the user, the central process unit (CPU) frequencies for local computing at both the user and the helper, as well as the time and transmit power allocations for offloading. The non-convex problem of interests in general can be reformulated into a convex one. Leveraging the Lagrange duality method, we obtain the globally optimal solution in a semi-closed form.
Next, for the binary offloading case, the user should execute the non-partitionable computation tasks by choosing one among three computation modes, i.e., the local computing at the user, computation cooperation (offloading to the helper), and communication cooperation (offloading to the AP). Solving the resultant latency-constrained energy minimization problem, we develop an efficient optimal algorithm, by firstly choosing the computation mode and then optimizing the corresponding joint computation and communication resources allocation.
Finally, extensive numerical results show that the proposed joint computation and communication cooperation approach achieves significant performance gains in terms of both the computation capacity and the system energy efficiency, compared with other benchmark schemes without such a joint design.
It is worth noting that there have been prior studies on communication cooperation (see, e.g., [19, 18, 20, 21, 22, 24, 23]) or computation cooperation [25, 26, 27, 28, 29], respectively. On one hand, the cooperative communication via relaying has been extensively investigated in wireless communication systems to increase the communication rate and improve the communication reliability [20, 21], and applied in various other setups such as the wireless powered communication  and the wireless powered MEC systems . On the other hand, cooperative computation has emerged as a viable technique in MEC systems, which enables end users to exploit computation resources at nearby wireless devices (instead of APs or BSs). For example, in the so-called device-to-device (D2D) fogging  and peer-to-peer (P2P) cooperative computing , users with intensive computing tasks can offload all or part of the tasks to other idle users via D2D or P2P communications for execution. Similar computation task sharing among wireless devices has also been investigated in mobile social networks with crowdsensing  and in mobile wireless sensor networks [26, 27] for data fusion. However, different from these existing works with either communication or computation cooperation, this work is the first to pursue a joint computation and communication cooperation approach, by unifying both of them for further improving the user’s computation performance. Also note that this work with user cooperation is different from the prior works on multiuser MEC systems [12, 13, 14], in which multiple users offload their own computation tasks to the AP/BS for execution, without user cooperation considered.
The remainder of this paper is organized as follows. Section II introduces the system model. Section III formulates the latency-constrained energy minimization problems under the partial and binary offloading models, respectively. Sections IV and V present the optimal solutions to the two problems of our interests, respectively. Section VI provides numerical results, followed by the conclusion in Section VII.
Ii System Model
As shown in Fig. 1, we consider a basic three-node MEC system consisting of one user node, one helper node, and one AP node with an MEC server integrated, in which the three nodes are each equipped with one single antenna. We focus on a time block with duration , where the user needs to successfully execute computation tasks with task input-bits within this block. By considering a latency-critical application, we assume that the block duration is smaller than the channel coherence time, such that the channel power gain remains unchanged within the block of interest. Such an assumption has been commonly adopted in prior works [12, 13, 14, 16, 17, 15]. It is further assumed that there is a central controller that is able to collect the global channel state information (CSI), and computation-related information for the three nodes; accordingly, the central controller can design and coordinate the computation and communication cooperation among the three nodes. This serves as a performance upper bound (or energy consumption lower bound) for practical cases when only partial CSI and computation-related information are known.
Specifically, without loss of generality, the task input-bits can be divided into three parts intended for local computing, offloading to helper, and offloading to AP, respectively. Let , , and denote the numbers of task input-bits for local computing at the user, offloading to the helper, and offloading to the AP, respectively. We then have
Consider the two cases with partial offloading and binary offloading, respectively. In partial offloading, the computation task can be arbitrarily partitioned into subtasks. By assuming the number of subtasks are sufficiently large in this case, it is reasonable to approximate , , and as real numbers between 0 and subject to (1). In binary offloading, , , and can only be set as 0 or , and there is only one variable among them equal to due to (1).
Ii-a MEC Protocol With Joint Computation and Communication Cooperation
As shown in Fig. 2, the duration- block is generally divided into four slots for joint computation and communication cooperation. In the first slot with duration , the user offloads the task input-bits to the helper, and the helper can then execute them in the remaining time with duration . In the second and third slots, the helper acts as a DF relay to help the user offload task input-bits to the AP. In the second slot with duration , the user transmits wireless signals containing the task input-bits to both the AP and the helper simultaneously. After successfully decoding the received task input-bits, the helper forwards them to the AP in the third slot with duration . After decoding the signals from the user and the helper, the MEC server can remotely execute the offloaded tasks in the fourth time slot with duration .
As the computation results are normally of much smaller size than the input bits, the time for downloading the results to the user is negligible compared to the offloading time. Thus, we ignore the downloading time in this paper. In order to ensure the computation tasks to be successfully executed before the end of this block, we have the following time constraint
Ii-B Computation Offloading
In this subsection, we discuss the computation offloading from the user to the helper and the AP, respectively.
Ii-B1 Computation Offloading to Helper
In the first slot, the user offloads task input-bits to the helper with transmit power . Let denote the channel power gain from the user to the helper, and the system bandwidth. Accordingly, the achievable data rate (in bits/sec) for offloading from the user to the helper is given by
where represents the power of additive white Gaussian noise (AWGN) at the helper, and is a constant term accounting for the gap from the channel capacity due to a practical modulation and coding scheme (MCS). For simplicity, is assumed throughout this paper. Consequently, we have the number of task input-bits as
Furthermore, let denote the maximum transmit power at the user, and thus we have . For computation offloading, we consider the user’s transmission energy as the dominant energy consumption and ignore the energy consumed by circuits in its radio-frequency (RF) chains, baseband signal processing, etc. Therefore, in the first slot, the energy consumption for the user to offload task input-bits to the helper is given by
Ii-B2 Computation Offloading to AP Assisted by Helper
In the second and third slots, the helper acts as a DF relay to help the user offload task input-bits to the AP. Denote by the user’s transmit power in the second slot. In this case, the achievable data rate from the user to the helper is given by with defined in (3). Denoting as the channel power gain from the user to the AP, the achievable data rate from the user to the AP is
where is the noise power at the AP receiver.
After successfully decoding the received message, the helper forwards it to the AP in the third slot with the transmit power , where denotes the maximum transmit power at the helper. Let denote the channel power gain from the helper to the AP. The achievable data rate from the helper to the AP is thus
As in (5), we consider the user’s and helper’s transmission energy consumption for offloading as the dominant energy consumption in both the second and third slots. Therefore, we have
Ii-C Computing at User, Helper, and AP
In the subsection, we explain the computing models at the user, the helper, and the AP, respectively.
Ii-C1 Local Computing at User
The user executes the computation tasks with task input-bits within the whole block. In practice, the number of CPU cycles for executing a computation task is highly dependent on various factors such as the specific applications, the number of task input-bits, as well as the hardware (e.g., CPU and memory) architectures at the computing device . To characterize the most essential computation and communication tradeoff and as commonly adopted in the literature (e.g., [10, 16, 12, 11, 15, 13, 14]), we consider that the number of CPU cycles for this task is a linear function with respect to the number of task input-bits, where denotes the number of CPU cycles for computing each one task input-bit at the user. Also, let denote the CPU frequency for the -th cycle, where . Note that the CPU frequency is constrained by a maximum value, denoted by , i.e.,
As the local computing for the task input-bits should be successfully accomplished before the end of the block, we have the following computation latency requirement
where denotes the effective capacitance coefficient that depends on the chip architecture at the user . It has been shown in [15, Lemma 1] that to save the computation energy consumption with a computation latency requirement, it is optimal to set the CPU frequencies to be identical for different CPU cycles. By using this fact and letting the constraint in (12) be met with strict equality (for minimizing the computation energy consumption), we have
Ii-C2 Cooperative Computing at Helper
After receiving the offloaded task input-bits in the first time slot, the helper executes the tasks during the remaining time with duration . Let and denote the CPU frequency for the -th CPU cycle and the maximum CPU frequency at the helper, respectively. Similarly as for the local computing at the user, it is optimal for helper to set the CPU frequency for the -th CPU cycle as , , where is the number of CPU cycles for computing one task-input bit at the helper. Accordingly, the energy consumption for cooperative computation at the helper is given by
where is the effective capacitance coefficient of the helper.
Similarly as in (16), we have the constraint on the number of task input-bits as
where denotes the maximum CPU frequency for the helper.
Ii-C3 Remote Computing at AP (MEC Server)
In the fourth slot, the MEC server at the AP executes the offloaded task input-bits. In order to minimize the remote execution, the MEC server executes the offloaded tasks at its maximal CPU frequency, denoted by . Hence, the time duration for the MEC server to execute the offloaded bits is
Iii Problem Formulation
In this paper, we pursue an energy-efficient design for the three-node MEC system. As the AP normally has reliable power supply, we focus on the energy consumption at the wireless devices side (i.e., the user and helper) as the performance metric. In particular, we aim to minimize the total energy consumption at both the user and the helper (i.e., ), subject to the user’s computation latency constraint
, by optimizing the task partition of the user, as well as the joint computation and communication resources allocation. The design variables include the time allocation vector of the slots, the user’s task partition vector , and the transmit power allocation vector for offloading of the user and helper.
In the case with partial offloading, the latency-constrained energy minimization problem is formulated as
where (1) denotes the task partition constraint, (16) and (18) are the maximum CPU frequency constraints at the user and the helper, respectively, (20) denotes the time allocation constraint, (21b) and (21c) denote the constraints for the numbers of the offloaded bits from the user to the helper and to the AP, respectively. Note that in problem (P1), we replace the two equalities in (4) and (8) as two inequality constraints (21b) and (21c). It is immediate that constraints (21b) and (21c) should be met with strict equality at optimality of problem (P1). Also note that problem (P1) is non-convex, due to the coupling of and in the objective function (21a) and the constraints (21b) and (21c). Nonetheless, in Section IV we will transform (P1) into an equivalent convex problem and then present an efficiently algorithm to obtain the optimal solution of problem (P1) in a semi-closed form.
In the case with binary offloading, the latency-constrained energy minimization problem is formulated as
Note that problem (P2) is a mixed-integer nonlinear program (MINLP)  due to the involvement of integer variables , , and . In Section V, we will develop an efficient algorithm to solve problem (P2) optimally by examining three computation modes, respectively.
Iii-a Feasibility of and
Before solving problems (P1) and (P2), we first check their feasibility to determine whether the MEC system of interests can support the latency-constrained task execution or not. Let and denote the maximum numbers of task input-bits supported by the MEC system within the duration- block under the partial and binary offloading cases, respectively. Evidently, if (or ), then problem (P1) (or (P2)) is feasible; otherwise, the corresponding problem is not feasible. Therefore, the feasibility checking procedures of problems (P1) and (P2) correspond to determining and , respectively.
First, consider the partial offloading case. The maximum number of task input-bits is attained when the three nodes fully use their available communication and computation resources. This corresponds to setting as , , and letting the constraints (16), (18), (20), and (21c) be met with the strict equality in problem (P1). As a result, is the optimal value of the following problem:
Note that problem (23
) is a linear program (LP) and can thus be efficiently solved via standard convex optimization techniques such as the interior point method. By comparing versus , the feasibility of problem (P1) is checked.
Next, consider the binary offloading case. The user’s computation tasks can only be executed by one of the three computation modes, namely the local computing, computation cooperation (offloading to helper), and communication cooperation (offloading to AP). For the three modes, the maximum numbers of supportable task input-bits can be readily obtained, as stated in the following.
For the local-computing mode, we have . With the maximum CPU frequency and setting (16) to be tight, the maximum supportable number of task input-bits is given by
For the communication-cooperation mode, we have , , and in problem (P2). The maximum number of task input-bits is obtained by solving the following LP:
By comparing with , the feasibility of problem (P2) is checked.
By comparing and , we show that . This is expected since that any feasible solution to problem (P2) is always feasible for problem (P1), but the reverse is generally not true. In other words, the partial offloading case can better utilize the distributed computation resources at different nodes, and thus achieves higher computation capacity than the binary offloading case.
Iv Optimal Solution to (P1)
In this section, we present an efficient algorithm for optimally solving problem (P1) in the partial offloading case.
Towards this end, we introduce an auxiliary variable vector with for all . Then it holds that if , and if either or for any . By substituting , , problem (P1) can be reformulated as
Problem (P1.1) is a convex problem.
The function is a concave function with respect to for any , and thus its perspective function is jointly concave with respect to and . As a result, the set defined by constraints (28b)–(28d) becomes convex. The function is a convex function with respect to and , and hence the term in the objective function is jointly convex with respect to and . Therefore, problem (P1.1) is convex.
As stated in Lemma IV.1, problem (P1.1) is convex and can thus be optimally solved by the standard interior point method . Alternatively, to gain essential engineering insights, we next leverage the Lagrange duality method to obtain a well-structured optimal solution for problem (P1.1).
Let , , and denote the dual variables associated with the constraints in (28b–d), respectively, and be the dual variables associated with the constraints in (20) and (1), respectively. Define and . The partial Lagrangian of problem (P1.1) is given by
Then the dual function of problem (P1.1) is given by
In order for the dual function to be bounded from below, it must hold that .
See Appendix -A.
Based on Lemma IV.2, the dual problem of problem (P1.1) is given by
Denote and as the feasible set and the optimal solution of for problem (D1.1), respectively.
Since problem is convex and satisfies the Slater’s condition, strong duality holds between problems and . As a result, one can solve problem by equivalently solving its dual problem . In the following, we first evaluate the dual function under any given , and then obtain the optimal dual variables to maximize . Denote as the optimal solution to problem (29) under any given , as the optimal primal solution to problem .
Iv-a Derivation of Dual Function
The optimal solutions to problems (31)–(35) are presented in the following Lemmas IV.3–IV.7, respectively. As these lemmas can be similarly proved via the Karush-Kuhn-Tucker (KKT) conditions , we only show the proof of Lemma IV.3 in Appendix -B and omit the proofs of Lemmas IV.4–IV.7 for brevity.
Under given , the optimal solution to problem (31) satisfies
where with , and
See Appendix -B.
Under given , the optimal solution to problem (32) satisfies
where with , , , , and
Under given , the optimal solution to problem (33) satisfies
where and with