I Introduction
Recent advancements in the fifthgeneration (5G) cellular technologies have enabled various new applications such as the augmented reality (AR), autonomous driving, and Internet of things (IoT). These applications demand ultralowlatency communication, computation, and control among a large number of wireless devices (e.g., sensors and actuators)[2]. In practice, the realtime computation tasks to be executed can be quite intensive, but wireless devices are generally of small size and only have limited communication, computation, and storage resources (see, e.g., [3]). Therefore, how to enhance their computation capabilities and reduce the computation latency is one crucial but challenging issue to be tackled for making these 5G applications a reality.
Conventionally, mobile cloud computing (MCC) has been widely adopted to enhance wireless devices’ computation capabilities, by moving their computing and data storage to the remote centralized cloud [4]. However, as the cloud servers are normally distant from wireless devices, MCC may not be able to meet the stringent computation latency requirements for emerging 5G applications. To overcome such limitations, mobile edge computing (MEC) has been recently proposed as a new solution to provide cloudlike computing at the edge of wireless networks (e.g., access points (APs) and cellular base stations (BSs)), by deploying distributed MEC servers therein [4, 3, 5, 6, 7, 9, 8]. In MEC, wireless devices can offload computationintensive and latencycritical tasks to APs/BSs in close proximity for remote execution, thus achieving much lower computation latency.
The computation offloading design in MEC systems critically relies on tractable computation task models. Two widely adopted task models in the MEC literature are binary and partial offloading, respectively [6, 7]. In binary offloading, the computation tasks are not partitionable, and thus should be executed as a whole via either local computing at the user or offloading to the MEC server. This practically corresponds to highly integrated or relatively simple tasks such as those in speech recognition and natural language translation. In contrast, for partial offloading, the computation tasks need to be partitioned into two or more independent parts, which can be executed in parallel by local computing and offloading. This corresponds to applications with multiple finegrained procedures/components, in, e.g., AR applications [7]. Based on the binary and partial offloading models, the prior works (see, e.g., [10, 11, 9, 12, 13, 14, 16, 15, 17]) investigated the joint computation and communication resources allocation to improve the performance of MEC. For example, [10] and [11] considered a singleuser MEC system with dynamic task arrivals and channel fading, in which the user jointly optimizes the local computing or offloading decisions to minimize the computation latency, subject to the computation and communication resource constraints. [12, 13, 14] investigated the energyefficient design in multiuser MEC systems with multiple users offloading their respective tasks to a single AP/BS for execution, in which the objective is to minimize the users’ energy consumption while ensuring their computation latency requirements. Furthermore, [16, 15, 17] proposed wireless powered MEC systems by integrating the emerging wireless power transfer (WPT) technology into MEC for selfsustainable computing, where the AP employs WPT to power the users’ local computing and offloading.
Despite the recent research progress, multiuser MEC designs still face several technical challenges. First, the computation resources at the MEC server and the communication resources at the AP should be shared among the activelycomputing users. When the user number becomes large, the computation and communication resources allocated to each user are fundamentally limited, thus compromising the benefit of MEC. Next, due to the signal propagation loss over distances, farapart users may spend much more communication resources than nearby users for offloading, which results in a nearfar user fairness issue. Note that the 5G networks are expected to consist of massive wireless devices with certain computation and communication resources. Due to the burst nature of wireless traffic, each active device is highly likely to be surrounded by some idle devices with unused or additional resources. As such, in this paper we propose a novel joint computation and communication cooperation approach in multiuser MEC systems, such that the nearby users are enabled as helpers to share their computation and communication resources to help activelycomputing users, thereby improve the MEC computation performance.
In this paper, we consider a basic threenode MEC system with user cooperation, which consists of a user node, a helper node, and an AP node attached with an MEC server. Here, the helper node can be an IoT sensor, a smart phone, or a laptop, which is nearby and has certain computation and communication resources.^{1}^{1}1In general, the computation capability of the helper should be comparable or stronger than that of the user in order for the computation cooperation to be feasible. We focus on the user’s latencyconstrained computation within a given time block. To implement the joint computation and communication cooperation, the block is divided into four time slots. Specifically, in the first slot, the user offloads some computation tasks to the helper for remote execution. In the second and third slots, the helper acts as a decodeandforward (DF) relay to help the user offload some other computation tasks to the AP, for remote execution at the MEC server in the fourth slot. Under this setup, we pursue an energyefficient user cooperation MEC design for both the partial and binary offloading cases, by jointly optimizing the computation and communication resource allocations. The main results of this paper are summarized as follows.

First, for the partial offloading case, the user’s computation tasks are partitioned into three parts for local computation, offloading to helper, and offloading to AP, respectively. Towards minimizing the total energy consumption at both the user and the helper subject to the user’s computation latency constraint, we jointly optimize the task partition of the user, the central process unit (CPU) frequencies for local computing at both the user and the helper, as well as the time and transmit power allocations for offloading. The nonconvex problem of interests in general can be reformulated into a convex one. Leveraging the Lagrange duality method, we obtain the globally optimal solution in a semiclosed form.

Next, for the binary offloading case, the user should execute the nonpartitionable computation tasks by choosing one among three computation modes, i.e., the local computing at the user, computation cooperation (offloading to the helper), and communication cooperation (offloading to the AP). Solving the resultant latencyconstrained energy minimization problem, we develop an efficient optimal algorithm, by firstly choosing the computation mode and then optimizing the corresponding joint computation and communication resources allocation.

Finally, extensive numerical results show that the proposed joint computation and communication cooperation approach achieves significant performance gains in terms of both the computation capacity and the system energy efficiency, compared with other benchmark schemes without such a joint design.
It is worth noting that there have been prior studies on communication cooperation (see, e.g., [19, 18, 20, 21, 22, 24, 23]) or computation cooperation [25, 26, 27, 28, 29], respectively. On one hand, the cooperative communication via relaying has been extensively investigated in wireless communication systems to increase the communication rate and improve the communication reliability [20, 21], and applied in various other setups such as the wireless powered communication [22] and the wireless powered MEC systems [24]. On the other hand, cooperative computation has emerged as a viable technique in MEC systems, which enables end users to exploit computation resources at nearby wireless devices (instead of APs or BSs). For example, in the socalled devicetodevice (D2D) fogging [28] and peertopeer (P2P) cooperative computing [29], users with intensive computing tasks can offload all or part of the tasks to other idle users via D2D or P2P communications for execution. Similar computation task sharing among wireless devices has also been investigated in mobile social networks with crowdsensing [25] and in mobile wireless sensor networks [26, 27] for data fusion. However, different from these existing works with either communication or computation cooperation, this work is the first to pursue a joint computation and communication cooperation approach, by unifying both of them for further improving the user’s computation performance. Also note that this work with user cooperation is different from the prior works on multiuser MEC systems [12, 13, 14], in which multiple users offload their own computation tasks to the AP/BS for execution, without user cooperation considered.
The remainder of this paper is organized as follows. Section II introduces the system model. Section III formulates the latencyconstrained energy minimization problems under the partial and binary offloading models, respectively. Sections IV and V present the optimal solutions to the two problems of our interests, respectively. Section VI provides numerical results, followed by the conclusion in Section VII.
Ii System Model
As shown in Fig. 1, we consider a basic threenode MEC system consisting of one user node, one helper node, and one AP node with an MEC server integrated, in which the three nodes are each equipped with one single antenna. We focus on a time block with duration , where the user needs to successfully execute computation tasks with task inputbits within this block. By considering a latencycritical application, we assume that the block duration is smaller than the channel coherence time, such that the channel power gain remains unchanged within the block of interest. Such an assumption has been commonly adopted in prior works [12, 13, 14, 16, 17, 15]. It is further assumed that there is a central controller that is able to collect the global channel state information (CSI), and computationrelated information for the three nodes; accordingly, the central controller can design and coordinate the computation and communication cooperation among the three nodes. This serves as a performance upper bound (or energy consumption lower bound) for practical cases when only partial CSI and computationrelated information are known.
Specifically, without loss of generality, the task inputbits can be divided into three parts intended for local computing, offloading to helper, and offloading to AP, respectively. Let , , and denote the numbers of task inputbits for local computing at the user, offloading to the helper, and offloading to the AP, respectively. We then have
(1) 
Consider the two cases with partial offloading and binary offloading, respectively. In partial offloading, the computation task can be arbitrarily partitioned into subtasks. By assuming the number of subtasks are sufficiently large in this case, it is reasonable to approximate , , and as real numbers between 0 and subject to (1). In binary offloading, , , and can only be set as 0 or , and there is only one variable among them equal to due to (1).
Iia MEC Protocol With Joint Computation and Communication Cooperation
As shown in Fig. 2, the duration block is generally divided into four slots for joint computation and communication cooperation. In the first slot with duration , the user offloads the task inputbits to the helper, and the helper can then execute them in the remaining time with duration . In the second and third slots, the helper acts as a DF relay to help the user offload task inputbits to the AP. In the second slot with duration , the user transmits wireless signals containing the task inputbits to both the AP and the helper simultaneously. After successfully decoding the received task inputbits, the helper forwards them to the AP in the third slot with duration . After decoding the signals from the user and the helper, the MEC server can remotely execute the offloaded tasks in the fourth time slot with duration .
As the computation results are normally of much smaller size than the input bits, the time for downloading the results to the user is negligible compared to the offloading time. Thus, we ignore the downloading time in this paper. In order to ensure the computation tasks to be successfully executed before the end of this block, we have the following time constraint
(2) 
IiB Computation Offloading
In this subsection, we discuss the computation offloading from the user to the helper and the AP, respectively.
IiB1 Computation Offloading to Helper
In the first slot, the user offloads task inputbits to the helper with transmit power . Let denote the channel power gain from the user to the helper, and the system bandwidth. Accordingly, the achievable data rate (in bits/sec) for offloading from the user to the helper is given by
(3) 
where represents the power of additive white Gaussian noise (AWGN) at the helper, and is a constant term accounting for the gap from the channel capacity due to a practical modulation and coding scheme (MCS). For simplicity, is assumed throughout this paper. Consequently, we have the number of task inputbits as
(4) 
Furthermore, let denote the maximum transmit power at the user, and thus we have . For computation offloading, we consider the user’s transmission energy as the dominant energy consumption and ignore the energy consumed by circuits in its radiofrequency (RF) chains, baseband signal processing, etc. Therefore, in the first slot, the energy consumption for the user to offload task inputbits to the helper is given by
(5) 
IiB2 Computation Offloading to AP Assisted by Helper
In the second and third slots, the helper acts as a DF relay to help the user offload task inputbits to the AP. Denote by the user’s transmit power in the second slot. In this case, the achievable data rate from the user to the helper is given by with defined in (3). Denoting as the channel power gain from the user to the AP, the achievable data rate from the user to the AP is
(6) 
where is the noise power at the AP receiver.
After successfully decoding the received message, the helper forwards it to the AP in the third slot with the transmit power , where denotes the maximum transmit power at the helper. Let denote the channel power gain from the helper to the AP. The achievable data rate from the helper to the AP is thus
(7) 
By combining the second and third slots, the number of task inputbits offloaded to the AP via a DF relay (the helper) should satisfy [21, 20, 19]
(8) 
As in (5), we consider the user’s and helper’s transmission energy consumption for offloading as the dominant energy consumption in both the second and third slots. Therefore, we have
(9)  
(10) 
IiC Computing at User, Helper, and AP
In the subsection, we explain the computing models at the user, the helper, and the AP, respectively.
IiC1 Local Computing at User
The user executes the computation tasks with task inputbits within the whole block. In practice, the number of CPU cycles for executing a computation task is highly dependent on various factors such as the specific applications, the number of task inputbits, as well as the hardware (e.g., CPU and memory) architectures at the computing device [30]. To characterize the most essential computation and communication tradeoff and as commonly adopted in the literature (e.g., [10, 16, 12, 11, 15, 13, 14]), we consider that the number of CPU cycles for this task is a linear function with respect to the number of task inputbits, where denotes the number of CPU cycles for computing each one task inputbit at the user. Also, let denote the CPU frequency for the th cycle, where . Note that the CPU frequency is constrained by a maximum value, denoted by , i.e.,
(11) 
As the local computing for the task inputbits should be successfully accomplished before the end of the block, we have the following computation latency requirement
(12) 
Accordingly, the user’s energy consumption for local computing is [7, 15]
(13) 
where denotes the effective capacitance coefficient that depends on the chip architecture at the user [30]. It has been shown in [15, Lemma 1] that to save the computation energy consumption with a computation latency requirement, it is optimal to set the CPU frequencies to be identical for different CPU cycles. By using this fact and letting the constraint in (12) be met with strict equality (for minimizing the computation energy consumption), we have
(14) 
Substituting (14) into (13), the user’s corresponding energy consumption for local computing is reexpressed as
(15) 
Combining (14) with the maximum CPU frequency constraint in (11), it follows that
(16) 
IiC2 Cooperative Computing at Helper
After receiving the offloaded task inputbits in the first time slot, the helper executes the tasks during the remaining time with duration . Let and denote the CPU frequency for the th CPU cycle and the maximum CPU frequency at the helper, respectively. Similarly as for the local computing at the user, it is optimal for helper to set the CPU frequency for the th CPU cycle as , , where is the number of CPU cycles for computing one taskinput bit at the helper. Accordingly, the energy consumption for cooperative computation at the helper is given by
(17) 
where is the effective capacitance coefficient of the helper.
Similarly as in (16), we have the constraint on the number of task inputbits as
(18) 
where denotes the maximum CPU frequency for the helper.
IiC3 Remote Computing at AP (MEC Server)
In the fourth slot, the MEC server at the AP executes the offloaded task inputbits. In order to minimize the remote execution, the MEC server executes the offloaded tasks at its maximal CPU frequency, denoted by . Hence, the time duration for the MEC server to execute the offloaded bits is
(19) 
where represents the number of CPU cycles required for computing one taskinput bit at the AP. By substituting (19) into (2), the time allocation constraint is reexpressed as
(20) 
Iii Problem Formulation
In this paper, we pursue an energyefficient design for the threenode MEC system. As the AP normally has reliable power supply, we focus on the energy consumption at the wireless devices side (i.e., the user and helper) as the performance metric. In particular, we aim to minimize the total energy consumption at both the user and the helper (i.e., ), subject to the user’s computation latency constraint
, by optimizing the task partition of the user, as well as the joint computation and communication resources allocation. The design variables include the time allocation vector of the slots
, the user’s task partition vector , and the transmit power allocation vector for offloading of the user and helper.In the case with partial offloading, the latencyconstrained energy minimization problem is formulated as
(21a)  
(21b)  
(21c)  
(21d)  
(21e)  
(21f)  
(21g)  
where (1) denotes the task partition constraint, (16) and (18) are the maximum CPU frequency constraints at the user and the helper, respectively, (20) denotes the time allocation constraint, (21b) and (21c) denote the constraints for the numbers of the offloaded bits from the user to the helper and to the AP, respectively. Note that in problem (P1), we replace the two equalities in (4) and (8) as two inequality constraints (21b) and (21c). It is immediate that constraints (21b) and (21c) should be met with strict equality at optimality of problem (P1). Also note that problem (P1) is nonconvex, due to the coupling of and in the objective function (21a) and the constraints (21b) and (21c). Nonetheless, in Section IV we will transform (P1) into an equivalent convex problem and then present an efficiently algorithm to obtain the optimal solution of problem (P1) in a semiclosed form.
In the case with binary offloading, the latencyconstrained energy minimization problem is formulated as
(22a)  
(22b)  
Note that problem (P2) is a mixedinteger nonlinear program (MINLP) [31] due to the involvement of integer variables , , and . In Section V, we will develop an efficient algorithm to solve problem (P2) optimally by examining three computation modes, respectively.
Iiia Feasibility of and
Before solving problems (P1) and (P2), we first check their feasibility to determine whether the MEC system of interests can support the latencyconstrained task execution or not. Let and denote the maximum numbers of task inputbits supported by the MEC system within the duration block under the partial and binary offloading cases, respectively. Evidently, if (or ), then problem (P1) (or (P2)) is feasible; otherwise, the corresponding problem is not feasible. Therefore, the feasibility checking procedures of problems (P1) and (P2) correspond to determining and , respectively.
First, consider the partial offloading case. The maximum number of task inputbits is attained when the three nodes fully use their available communication and computation resources. This corresponds to setting as , , and letting the constraints (16), (18), (20), and (21c) be met with the strict equality in problem (P1). As a result, is the optimal value of the following problem:
(23)  
Note that problem (23
) is a linear program (LP) and can thus be efficiently solved via standard convex optimization techniques such as the interior point method
[33]. By comparing versus , the feasibility of problem (P1) is checked.Next, consider the binary offloading case. The user’s computation tasks can only be executed by one of the three computation modes, namely the local computing, computation cooperation (offloading to helper), and communication cooperation (offloading to AP). For the three modes, the maximum numbers of supportable task inputbits can be readily obtained, as stated in the following.

For the localcomputing mode, we have . With the maximum CPU frequency and setting (16) to be tight, the maximum supportable number of task inputbits is given by
(24) 
For the communicationcooperation mode, we have , , and in problem (P2). The maximum number of task inputbits is obtained by solving the following LP:
(26)
Based on (24)–(26), the maximum number of supportable task inputbits for the binary offloading case is given by
(27) 
By comparing with , the feasibility of problem (P2) is checked.
By comparing and , we show that . This is expected since that any feasible solution to problem (P2) is always feasible for problem (P1), but the reverse is generally not true. In other words, the partial offloading case can better utilize the distributed computation resources at different nodes, and thus achieves higher computation capacity than the binary offloading case.
Iv Optimal Solution to (P1)
In this section, we present an efficient algorithm for optimally solving problem (P1) in the partial offloading case.
Towards this end, we introduce an auxiliary variable vector with for all . Then it holds that if , and if either or for any . By substituting , , problem (P1) can be reformulated as
(28a)  
(28b)  
(28c)  
(28d)  
(28e)  
(28f)  
Lemma IV.1
Problem (P1.1) is a convex problem.
Proof:
The function is a concave function with respect to for any , and thus its perspective function is jointly concave with respect to and [33]. As a result, the set defined by constraints (28b)–(28d) becomes convex. The function is a convex function with respect to and , and hence the term in the objective function is jointly convex with respect to and . Therefore, problem (P1.1) is convex.
As stated in Lemma IV.1, problem (P1.1) is convex and can thus be optimally solved by the standard interior point method [33]. Alternatively, to gain essential engineering insights, we next leverage the Lagrange duality method to obtain a wellstructured optimal solution for problem (P1.1).
Let , , and denote the dual variables associated with the constraints in (28b–d), respectively, and be the dual variables associated with the constraints in (20) and (1), respectively. Define and . The partial Lagrangian of problem (P1.1) is given by
Then the dual function of problem (P1.1) is given by
(29)  
Lemma IV.2
In order for the dual function to be bounded from below, it must hold that .
Proof:
See Appendix A.
Based on Lemma IV.2, the dual problem of problem (P1.1) is given by
(30a)  
(30b)  
(30c)  
(30d) 
Denote and as the feasible set and the optimal solution of for problem (D1.1), respectively.
Since problem is convex and satisfies the Slater’s condition, strong duality holds between problems and [33]. As a result, one can solve problem by equivalently solving its dual problem . In the following, we first evaluate the dual function under any given , and then obtain the optimal dual variables to maximize . Denote as the optimal solution to problem (29) under any given , as the optimal primal solution to problem .
Iva Derivation of Dual Function
First, we obtain by solving (29) under any given . Equivalently, (29) can be decomposed into the following five subproblems.
(31) 
(32) 
(33) 
(34) 
(35) 
The optimal solutions to problems (31)–(35) are presented in the following Lemmas IV.3–IV.7, respectively. As these lemmas can be similarly proved via the KarushKuhnTucker (KKT) conditions [33], we only show the proof of Lemma IV.3 in Appendix B and omit the proofs of Lemmas IV.4–IV.7 for brevity.
Lemma IV.3
Under given , the optimal solution to problem (31) satisfies
(36)  
(37)  
(38) 
where with , and
(39) 
(40) 
(41) 
(42) 
Proof:
See Appendix B.
Comments
There are no comments yet.