The proliferation of smartphones over the last decade has stimulated the emergence of many resource-demanding mobile applications such as video gaming, virtual/augmented reality. The limited computation and battery capacity of mobile devices have become bottleneck for the deployments of many emerging mobile applications. The MEC has been considered a potential solution to these problems where heavy computation and processing tasks can be offloaded from mobile users to a MEC server for execution .
MEC servers can be deployed at radio base stations (BSs), which allows to process large computation tasks at the network edge. The MEC technology, therefore, helps reduce application latency and energy consumption which improves the users’ quality of experience . Moreover, employment of NFV in the software defined networking (SDN) based 5G wireless networks allows mobile application functions to run virtual machines or containers  where VNFs associated with a particular application can be represented by an execution graph through a process called SFC. The dynamic deployment of VNFs from many mobile applications requires to address several challenging problems: (i) VNFs’ placement to determine a physical host running each VNF, and (ii) computing resource allocation to execute the VNFs at the assigned hosts. Joint design of SFC placement and resource allocation across multiple clouds is an important research problem . In the 5G networks, the NVFs are originated from an application of a mobile user; therefore, one must decide whether to execute these NVFs on the mobile device or offloaded to remote servers for execution. This offloading incurs communication delay and energy consumption, which must be taken into account in the offloading decision.
Several design aspects of MEC have been studied in the literature. Joint optimization of offloading decision and resource allocations for delay-sensitive tasks is addressed in . The dynamic voltage frequency scaling (DVFS) technique employed for energy saving of mobile devices is explored in 
. Different approaches have been taken to address the computation offloading design including heuristic mechanisms, dynamic programming , and distributed computation replication . However, joint design of native application chaining structure , computation offloading, and resource allocation leveraging the collaboration among servers in the multi-site MEC system . Our current paper fills this gap in the existing literature.
In the 5G wireless system, the edge servers deployed at individual BSs may have limited resources or lack certain service libraries to execute underlying applications. Collaborations among edge/cloud servers, as illustrated in Fig. 1, by offloading computing load of different VNFs in the SFC using backhaul links allow efficient execution of the underlying applications. In this paper, we consider such a multi-server MEC system and our design jointly optimizes the offloading, placement of VNFs and computing resource allocation to minimize the weighted sum of normalized mobile energy consumption and computation cost considering constraints on the maximum execution latency and maximum computing resources at the servers. We propose an efficient algorithm to solve this challenging problem by using the decomposition approach and show the efficacy of this design via extensive numerical studies.
The rest of this paper is organized as follows. Section II presents the system model. Section III describes the proposed design and algorithms. Section IV evaluates the performance of our design followed by the conclusion in Section V.
Ii System Model
Ii-a MEC System and Backhaul Network Models
Consider a multi-server MEC system with several computation servers (CoSs) denoted by the server set . Moreover, we assume that one edge server is deployed at each multi-antenna base station (BS) and the set of these edge servers is a subset of (i.e., ). Further, each combined BS and its co-located CoS provide both wireless communication and computing services to mobile users (MUs) inside its coverage. For convenience, we use to refer to the set of BSs and the set of associated CoSs. Let denote the set of MUs served by BS . We assume that each CoS has a limited computing capacity represented by the maximum clock speed (CPU cycles per second). For brevity, we refer to MU associated with BS as MU in the following.
We further assume that the CoSs are inter-connected by the backhaul network where the computation load from one server can be offloaded to other one-hop-away servers for execution. We model this backhaul network as a directed graph where the set of CoSs correspond to the nodes and the set of backhaul links corresponds to the set of (directed) edges in the graph. With this graph model, each vertex/CoS is connected with and can receive data from a set of the vertices adjacent to it, called in-neighbor vertex set . Therefore, any CoSs in can offload their computation load to CoS . For any two connected CoSs and CoS , we assume the data transmission delay over the corresponding backhaul link is approximately equal to its connection setup time (i.e., backhaul transmission rate is very high). To capture the connectivity of the CoSs, we introduce the binary parameters , where is the indicator function, equal 1 if there exists a connection between CoSs and and 0, otherwise.
Ii-B Service Function Chain (SFC) Model
Let denote the set of all possible network functions. We assume that each CoS provides services to execute the subset of functions . Each MU at BS is assumed to run an application whose computation load can be decomposed into the set of service requests and their corresponding network functions. Moreover, each request can be executed locally and/or at the remote CoSs (via offloading). Specifically, the network functions of each request can be represented by an ordered function set, called service function chain (SFC), where the order of this set represents the execution order of the corresponding functions.
These request/function models are illustrated in Figure 2. In particular, each MU in BS has a set of service requests in the set . Each request corresponds to an ordered set of network functions , which must be placed and executed at the MU or some CoSs. Moreover, each function has a particular amount of input data (e.g., a video file) to be processed and the execution of the function produces an amount of output data. Let represent the ratio between the amount of output data after executing function and the original input data of request associated with MU of BS . The parameter will depend on the data output/input ratios of all functions executed before function in the SFC.
Ii-C Computation Offloading Models
We assume that MU at BS needs to run an application with the request set and the total amount of input data is (bits). Moreover, each request from this application has to process a fraction of this total input data (i.e., the amount of data to be processed by request is ). Furthermore, the computing load of each function of request can be computed based on the computing load per input data bit .
Ii-C1 Local Computation Models
Let be the computing resource (CPU clock speed) allocated by MU at BS to execute function of request locally at the MU. We assume that must be chosen in the range where denote the MU’s maximum CPU clock speed (i.e., computing capacity). Then, the processing delay and energy consumption for local execution of request of MU can be expressed, respectively as
Ii-C2 Computation Offloading Models
The mobile computation offloading scheme is illustrated in Figure 3. The data of each offloaded request must be first transmitted to the associated BS . The request can be either processed by the CoS at this BS or sent out to its neighboring CoSs for execution. The total execution delay of the application of MU is defined as the maximum execution delay of individual requests either at the MU or at remote CoSs via offloading. And this execution delay must be constrained by the maximum allowable delay :
where and denote the execution delay of request if done locally at the MU or at remote CoSs via offloading, respectively. We show how to calculate in the following.
To enable the offloading of any particular request of MU , the involved data must be transmitted in the uplink direction from MU to BS . Recall that we consider the multi-cell Massive-MIMO wireless system where each MU has a single antenna and each BS is equipped with antennas where . We assume that the same transmit power
is used by each MU to transmit the training data (to estimate the channel state information) and application data (to support the offloading). The achieved signal to interference ratio (SIR) of the uplink transmission from MUin the cell can be expressed as , where represents the large-scale channel coefficient capturing the path-loss effect, is the distance between the co-channel MU in cell of MU and the BS , and is the path-loss exponent. Then, the corresponding achievable rate can be written as .
For request , the total execution time is the sum of the uplink communication delay , the backhaul transfer delay and processing delay of all network functions in the SFC of request . Thus, we have
where . The energy consumption required to transmit the involved data for offloading can be calculated . Detailed descriptions on how these delay components can be calculated are given in the following.
Ii-D Offloading Parameters and SFC Placement Constraints
Different network functions associated with request of MU can be processed at CoS associated with BS or routed to neighboring CoSs with larger computation resource for processing. The network operator must make decisions on request offloading as well as placement and execution of different functions of each request. Toward this end, we introduce three optimization variable sets. The first set of variables represents the binary offloading decisions where if request of MU is processed locally at this MU then ; otherwise, we have if the request is offloaded to remote CoSs. The second variable set indicates the SFC placement where if function of request of MU is placed at CoS , we have ; otherwise, we have . The last variable set represents the computing resource allocation (in CPU clock speed) where denote the CPU clock speeds assigned to serve function of request locally at MU or remotely at CoS , respectively.
The function placement needs to satisfy several constraints:
Ii-D1 Function placement constraints
Each function should be placed at exactly one CoS:
The total computation load routed to CoS must not exceed its computing capacity:
Ii-D2 Routing path constraints
the functions and associated with request of MU must be placed at the same CoS or at two inter-connected CoSs. Therefore, we have the following constraints for : .
By applying the offloading parameters, we represent the backhaul transfer delay for the involved data between CoSs and can be expressed as:
The server computation time of function is a function of the allocated computing resource and it can be expressed as:
Ii-E Problem Formulation
Our design aims to minimize: 1) normalized mobile energy consumption, and 2) normalized computing cost. Toward this end, we will optimize a single objective function which is the weighted sum of these two optimization metrics of interest. The normalized mobile energy consumption is equal to ratio between the energy consumption and the total energy pool where the energy consumption is equal to either the local computing energy or communication energy depending on the offloading decision.
The offloading/computing cost is calculated based on the computing price per time unit at the processing speed which is expressed as . Then, the computing cost required to process function of request of MU at CoS can be expressed as where the required computing time of the corresponding function is . The coefficients and can vary with cloud platforms. Suppose the available budget to cover the computing expenses is .
To maintain the total system utility, an amount of budget is given and is granted equally for all MUs. The normalized energy consumption and VNF placement cost is accounted for the ratio of used computation and is scaled to normally common budget of energy and computing cost . Then, we define the normalized cost of each MU as:
where and are weighting parameters capturing the importance of energy consumption and computing cost, respectively where .
The considered Joint Computation Offloading and Resource Allocation (JCORA) problem can be formulated as follows:
where the set of optimization variables is defined as .
Iii Proposed Algorithm
We describe our proposed algorithm to solve problem (JCORA) in this section. Problem (JCORA) is difficult to solve because it is a mixed integer and non-linear optimization problem. Specifically, there are two set of variables concerning the local resource allocation and the SFC placement and resource allocation at the CoSs . To solve problem (JCORA), we employ the decomposition approach where we optimize different sets of variables separately by tackling the corresponding sub-problems in the iterative manner. The proposed iterative algorithm is given in Algorithm 1. This algorithm has an initialization step in which we try to execute as many requests locally at MUs as possible while using all local computing resource (step 0). After initialization, we know the set of requests executed locally (called local request set) and the set of requests offloaded to remote servers (called offloading request set). We then optimize the function chain placement for all offloaded requests and computing resource allocation for them (step 1). To further improve the performance, we iteratively update the offloading decisions by moving more requests from MUs to the remote CoSs (step 2). We describe these steps in more details in the following.
Iii-a Step 0: Initialization at MUs
For initialization, we attempt to minimize the total local computation energy by solving the following problem:
We solve this problem by first tackling the local computation allocation for all requests by assuming that the local computing capacities at all MUs are very large. Then, we use this computation allocation result to determine the local request set and offloading request set. These two sub-steps are as follows.
Sub-step 1 - Local computation resource allocation
To determine local computation allocation, we solve problem (JPL1) with the same objective with problem (JPL) assuming that considering only constraints (7h). The Lagrangian for problem (JPL1)  can be written as
Taking the derivative of the Lagrangian w.r.t , we have:
Setting this derivative to 0 yields the estimation of. It can be verified that the objective function of problem (JPL1) is non-decreasing with the allocated computing resource, thus, at the optimal , the equality condition for (7h) holds; thus, . From this condition, we can obtain the allocated computing resource as follows :
Sub-step 2 - Determination of local/offloading request sets
Using the computation allocation results for (10) in problem (JPL1), we arrive at the following problem:
This is indeed a knapsack problem which determines the requests to be executed locally at each MU (i.e., requests with ) where the size of each item/request is the total computation resource required by its functions, i.e., . The knapsack problem can be efficiently solved via ILP solver  which will try to pack as many items (requests) as possible to fill up the bin size and it stops at the split point. The remaining items/requests will be offloaded to remote CoSs.
Iii-B Step 1: Function chain placement and computation resource allocation at remote CoSs
After step 0, we obtain the set of offloaded requests of each MU which is denoted by . The function chain placement and computation resource allocation for all functions of these offloaded requests can be determined by solving the following problem:
where and denotes the set of offloaded requests of MU .
This problem is a mixed integer optimization problem and still hard to solve. To tackle the problem, we employ Bender’s decomposition approach that separates the original problem into a slave problem for computation resource optimization and a master problem for function placement optimization. The proposed algorithm is summarized in Algorithm 2. Detailed descriptions of the master and slave problems are given in the following.
Iii-B1 Master problem to optimize function placement
Similar to step 0, to solve this problem, we estimate the computation resource allocation for all functions of offloaded requests in the first sub-step; then, using this result, we determine the service function placement solution in the second sub-step.
In the first sub-step, we solve a related problem of Problem () where it has the same objective subject to the delay constraints (12). Here, we assume that the maximum computing resource at each CoS is sufficiently large; therefore, the computation resource allocation is performed to achieve the minimum computation cost while simply maintaining the delay constraints. As a result, the considered computation resource allocation variables do not depend on the CoS index .
We solve this problem by defining the Lagrangian and solve the Karush-Kuhn-Tucker optimality conditions . After several manipulations, we can derive the following computation resource allocation policy:
where is the Lambert function  and can be obtained by solving the following equation:
Hence, the root of this equation can be determined by using a numerical searching method.
In the second sub-step, we perform function placements by solving another related problem with Problem () where it has the same objective with () but with only constraints (7h)-(7h). The computation allocation solution obtained in the first sub-step is used to estimate the consumed computation resources during the function placements. This problem is more complicated than the multi-knapsack problem due to the additional backhaul topology constraints (7h).
To solve this problem, we propose a greedy function placement algorithm which is described in Algorithm 2. This algorithm has two phases. In phase one, we attempt to place functions of offloaded requests at the corresponding local CoSs of BSs. This is done by solving the knapsack problem with the local maximum computation constraint. In phase two, we perform placements for the remaining (un-placed) network functions denoted as , which have not been placed in phase one. To efficiently utilize CoSs’ computing resource leveraging the load balancing, it is desired to place more functions to CoSs with larger available computing resource and being connected with a smaller number of neighboring CoSs.
After phase one, let denote the remaining computing resource in CPU clock speed of CoS , which is equal to minus the total estimated computing resource of all functions placed at CoS in phase one where is given in (14). We define the ranking metric for each CoS as where is the in-neighbor CoS set of CoS . We then rank CoSs in the descending order of and let denote the corresponding ordered set of CoSs. Then, for each CoS in the ordered set of CoSs , we perform function placements by solving the corresponding knapsack problem whose objective is to minimize the total computation cost subject to the constraint on the (remaining) computing capacity. After performing function placements for all CoSs in , we obtain the function placement solution (i.e., ).
Iii-B2 Slave problem to optimize computation resource allocation
For given , we introduce slack variable . Then, the slave problem that optimizes the computation resource allocation can be stated as:
Problem () is a convex optimization problem due to its affine equality constraints, convex objective function and convex equality constraint function. Thus, it can be solved efficiently to obtain the optimal values of and .
Iii-C Step 2: Update offloading decisions
To update the offloading decisions, we define the following cost improvement factor :
which quantifies the cost reduction if we offload request to the CoSs.
Specifically, in step 2 of the proposed algorithm, we iteratively and greedily find one request with positive and maximum cost reduction where this request is currently executed locally and we force this request to be offloaded to remote CoSs.
Iv Numerical Results
We consider a simple 4-cell network where the distance between two nearest BSs is as illustrated in Fig. 1. In each cell, we randomly place 8 MUs so that the distance from the BS to its MUs is in the range . The channel gains are generated by considering path-loss exponent . In the simulation, we choose equal to for all MUs, kHz and ms. Each MU needs to execute an application with data size of 800 kbits (Mbit) within the maximum delay of ms () where each application is assumed to be split into 5 requests ().
The maximum computing capacity for each MU () is randomly selected from the set GHz and the local computing energy per CPU cycle is J/CPU cycle. Each data bit is assumed to consume CPU cycle/bit. Finally, the capacity of four servers are chosen as GHz. The energy and computing budgets of each MU are allocated as mW and which are set based on the cost of Amazon AWS and IBM clouds, which yields the cost of and $ per one CPU clock. We compare the performance of our proposed algorithm with the following baseline algorithms.
Greedy Offloading and Joint Resource Allocation (Gojra)
In this algorithm, as many requests as possible are offloaded up to fill up the maximum capacity of the CoSs at the BSs and then computation allocations are jointly optimized.
Heuristic offloading decision algorithm (Hoda )
This algorithm evaluates the cost reduction factor and each request is offloaded if its cost reduction is positive and vice versa. The algorithm is run at each BS to receive all offloading requests and then jointly decides offloading requests based on the sign of the corresponding cost reduction factors.
First, we examine the variations of the normalized total cost (the value of the considered objective function) versus the input data size in Fig. 5. It can be seen that the system uses almost all system resource in the low bandwidth scenario with W=100kHz. Low bandwidth creates the bottleneck in the communications and this can be relaxed by allocating more bandwidth resource (W=500kHz). When more bandwidth is allocated with W1MHz, the system becomes more constrained by the computing resources so the normalized total cost can only be reduced moderately. In Fig. 5, we show the impacts of wireless bandwidth to the achievable system cost. This figure shows that the setting with and achieves about 30% reduction of the normalized total cost compared to the setting with and . This illustrates the impacts of cost weights to the achievable performance.
Since intelligent computation offloading can help save energy in general, we show the average energy consumption versus the maximum allowable delay in Fig. 7. This figure confirms that the proposed algorithm can achieve the smallest energy among the algorithms (i.e., GOJRA, HODA, GTDA). Moreover, the larger the allowable delay, the larger energy saving that can be achieved.
In Fig. 7, we show the benefit of cooperation among the CoS where the normalized total cost is shown for four different network configurations: Full Mesh, Ring, Mesh-c-Cloud (with cloud servers in the center), and Mesh-c-BS (with the BSs’ fog servers in the center) versus the number of MUs. The figure confirms that the Full-Mesh backhaul topology results in the lowest normalized total cost. This is because this topology allows most efficient placement of functions and exploitation of computation resources. The Mesh-c-BS topology achieves similar cost with HODA that is higher than those of other backhaul topologies. Moreover, Mesh-c-Cloud topology leads to a slightly lower cost than that achieved by Ring topology.
In Fig. 8 we demonstrate the impact of the computation budget on the offloading data size associated with all offloaded requests considering different backhaul topologies. As can be seen, the proposed GTDA enables more effective exploitation of the computing resources of CoSs compared to HODA. Moreover, the total offloading data size under the Full-Mesh topology is largest while the offloading data size for the Mesh-c-BS topology is slightly higher than that due to HODA because GTDA can leverage cooperation among BSs. The Mesh-c-Cloud topology leads to larger offloading data size compared to the Ring topology. Finally, the offloading data size increases with the computation budget for all algorithms and topologies and it becomes saturated when the computation budget is sufficiently large.
In this paper, we have considered the joint optimization design for the cooperative multi-server MEC system to minimize the weighted sum of MUs’ energy consumption and computing cost. We have developed the sub-optimal but efficient algorithm to solve the underlying problem. Numerical results have confirmed the desirable performance of the proposed design and the benefits of servers’ cooperation. Specifically, the normalized total cost achieved by the proposed algorithm is much smaller than other base line schemes. Moreover, the full-mesh backhaul topology enables the most efficient cooperation among CoSs and computing resource utilization; therefore, the full-mesh backhaul topology achieves the smallest total cost compared to those achieved by other topologies.
-  (2018-Apr.) Joint vnf placement and cpu allocation in 5g. In IEEE INFOCOM, Honolulu, USA, pp. 1943–1951. Cited by: §I.
-  (2017-Apr.) Optimal virtual network function placement in multi-cloud service function chaining architecture. J. Comput. Commun. 102, pp. 1–16. Cited by: §I.
-  (2004) Convex optimization. Cambridge University Press. Cited by: §III-A, §III-B1.
-  (1996) On the lambertw function. Adv. Comput. Math. 5 (1), pp. 329–359. Cited by: §III-B1.
-  (2017-Apr.) Offloading in mobile edge computing: task allocation and computational frequency scaling. IEEE Trans. Commun. 65 (8), pp. 3571–3584. Cited by: §I.
-  (2017-Oct.) Contract design for traffic offloading and resource allocation in heterogeneous ultra-dense networks. IEEE J. Sel. Areas in Commun. 35 (11), pp. 2457–2467. Cited by: §II-E.
-  (2018-Apr.) Energy-efficient dynamic computation offloading and cooperative task scheduling in mobile cloud computing. IEEE Trans. Mobile Comput. 18 (2), pp. 319–333. Cited by: §I.
-  (2018-Jan.) Energy-efficient admission of delay-sensitive tasks for mobile edge computing. IEEE Trans. Commun. 66 (6). Cited by: §I.
-  (2016-Apr.) Multiuser joint task offloading and resource optimization in proximate clouds. IEEE Trans. Veh. Technol. 66 (4). Cited by: §I, §III-A, §IV.
-  (2017-Aug.) A survey on mobile edge computing: The communication perspective. IEEE Commun. Surveys & Tuts. 19 (4), pp. 2322–2358. Cited by: §I.
-  (2010-Oct.) Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wireless Commun. 9 (11), pp. 3590–3600. Cited by: §II-C2.
-  (2014-Dec.) Specifying and placing chains of virtual network functions. In Proc. IEEE CloudNet, Luxembourg,Luxembourg, pp. 7–13. Cited by: §I.
-  (2019-09) Computation offloading and resource allocation for backhaul limited cooperative mec systems. In Proc. IEEE VTC Fall, Hawaii, USA. Cited by: §I.
-  (2014) Gurobi optimizer reference manual. Cited by: §III-A.
-  (2014-09)(Website) External Links: Cited by: §I.
-  (2018) Joint computation offloading and resource allocation optimization in heterogeneous networks with mobile edge computing. IEEE Access 6, pp. 19324–19337. Cited by: §I.