Computation offloading in mobile edge/cloud computing (MEC/MCC) systems is an important technique to enable effective deployment of desktop-level applications on resource-constrained mobile platforms [1, 2, 3]
. In fact, many emerging mobile applications such as face recognition, natural language processing, interactive gaming, and augmented reality entail intensive computation, strict latency requirements, and high energy consumption. Mobile devices may not be able to meet this demand. Therefore, mobile edge/cloud computing (MEC/MCC) technologies are considered as promising solutions for enhancing the mobile usability and prolonging the mobile battery life by offloading computation heavy applications to a remote fog/cloud server. In an MCC system, enormous computing resources are available in the core network, but the limited backhaul capacity can induce significant delay for the underlying applications. In contrast, an MEC system, with computing resources deployed at the network edge in close proximity to mobile devices, can enable computation offloading and meet demanding application requirements.
Hierarchical fog-cloud computing systems which leverage the advantages of both MCC and MEC can further enhance the system performance  where fog servers deployed at the network edge can operate collaboratively with the more powerful cloud servers to execute computation-intensive user applications. Specifically, when the users’ applications require high computing power or low latency, their computation tasks can be offloaded and processed at the fog and/or remote cloud servers. However, the upsurge of mobile data and the constrained radio spectrum may result in significant delays in transferring offloaded data between the mobile users and the fog/cloud servers, which ultimately degrades the quality of service (QoS) . To overcome this challenge, advanced data compression techniques can be leveraged to reduce the amount of incurred data (i.e., the input data of a user’s application) [7, 8]. However, employment of data compression entails additional computations for the execution of the corresponding data compression and decompression algorithms . Therefore, an efficient joint design of data compression, offloading decisions, and resource allocation is needed to take full advantage of data compression while meeting QoS requirements and other system constraints.
Computation offloading design for MCC/MCE systems has been studied extensively in the literature, see recent surveys [10, 11] and the references therein. Most existing works consider two main performance metrics for their designs, namely energy-efficiency [12, 13, 14, 15] and delay-efficiency [16, 17, 18]. Focusing on energy-efficiency, the authors of  develop partial offloading frameworks for multiuser MEC systems employing time division multiple access and frequency-division multiple access. In , wireless power transfer is integrated into the computation offloading design. Moreover, different binary offloading frameworks are developed in [14, 15]
where various branch-and-bound and heuristic algorithms are proposed to tackle the resulting mixed integer optimization problems.
Considering computation offloading from the delay-efficiency point of view, an iterative heuristic algorithm to optimize the binary offloading decisions for minimization of the overall computation and transmission delay in a hierarchical fog-cloud system is proposed in . The authors in  formulate the computation offloading and resource allocation problem as a student-project-allocation game with the objective to maximize the ratio between the average offloaded data rate and the offloading cost at the users. In , the authors study a binary computation offloading problem for maximization of the weighted sum computation rate. Then, they propose a coordinate descent based algorithm in which the offloading decision and time-sharing variables are iteratively updated until convergence.
Some recently proposed schemes for computation offloading consider both energy and delay efficiency aspects [19, 20, 21]. In particular, the work in  proposes a radio and computing resource allocation framework where the computational loads of fog and cloud servers are determined and the trade-off between power consumption and service delay is investigated. Additionally, the authors of 
jointly optimize the transmit power and offloading probability for minimization of the average weighted energy, delay, and payment cost. In, the authors study fair computation offloading design minimizing the maximum WEDC of all users in a hierarchical fog-cloud system. In this work, a two-stage algorithm is proposed where the offloading decisions are determined in the first stage using a semidefinite relaxation and probability rounding based method while the radio and computing resource allocation is determined in the second stage. However, references [12, 13, 14, 15, 16, 17, 18, 19, 20, 21] have not exploited data compression for computation offloading.
There are few existing works that explore data compression for computation offloading. Specifically, the authors of  propose an analytical framework to evaluate the outage performance of a hierarchical fog-cloud system. Moreover, the work in  considers data compression for computation offloading design for systems with a single server but assumes a fixed compression ratio (i.e., this parameter is not optimized). In general, the compression ratio should be jointly optimized with the computation offloading decisions and resource allocation to achieve optimal system performance. However, the computational load incurred by compression/decompression is a non-linear function of the compression ratio, which makes this joint optimization problem very challenging.
To the best of our knowledge, the joint design of data compression, computation offloading, and resource allocation for hierarchical fog-cloud systems has not been considered in the existing literature. The main contributions of this paper can be summarized as follows:
We propose a non-linear computation model which can be fitted to accurately capture the computational load incurred by data compression and decompression. In particular, the compression and decompression computational load as well as the quality of data recovery are modeled as functions of the compression ratio.
For data compression at only the mobile users, we formulate the fair joint design of the compression ratio, computation offloading, and resource allocation as a mixed-integer non-linear programming (MINLP) optimization problem. This problem formulation takes into account practical constraints on the maximum transmit power, wireless access bandwidth, backhaul capacity, and computing resources. We propose an optimal algorithm, referred to as Joint Data compression, Computation offloading, and Resource Allocation (JCORA) algorithm, which solves this challenging problem optimally. To develop this algorithm, we first prove that users incurring higher WEDC when executing their application locally should have higher priority for offloading. Based on this result, the bisection search method is employed to optimally classify users into two user sets, namely the set of offloading users, and the set of remaining users, and JCORA globally optimizes the decision optimization variables for both user sets.
We then study a more general design where data compression is performed at both the mobile users and the fog server (with different compression ratios) before the compressed data are transmitted over the wireless link and the backhaul link connecting the fog server and cloud server, respectively. This enhanced design can lead to a significant performance gain when both wireless access and backhaul networks are congested. Three different solution approaches are proposed to solve this more general problem. In the first approach, we extend the design principle of the JCORA algorithm by employing the piece-wise linear approximation (PLA) method to tackle the coupling of optimization variables. In the remaining approaches, we utilize the Lagrangian method and solve the dual optimization problem. Specifically, in the second approach, referred to as One-dimensional -Search based Two-Stage (OSTS) algorithm, a one-dimensional search is employed to determine the optimal value of the Lagrangian multiplier, while in the third approach, referred to as Iterative -Update based Two-Stage (IUTS) algorithm, a low-complexity iterative sub-gradient projection technique is adopted to tackle the problem.
Extensive numerical results are presented to evaluate the performance gains of the proposed designs in comparison with conventional strategies that do not employ data compression. Moreover, our results confirm the excellent performance achievable by joint optimization of data compression, computation offloading decisions, and resource allocation in a hierarchical fog-cloud system.
The remainder of this paper is organized as follows. Section II presents the system model, the computation and transmission energy models, and the problem formulation. Section III develops the proposed optimal algorithm for the case when data compression is performed only at the mobile users. Section IV provides the enhanced problem with data compression also at the fog server and three methods for solving it. Section V evaluates the performance of the proposed algorithms. Finally, Section VI concludes this work.
Ii System Model and Problem Formulation
Ii-a System Model
We consider a hierarchical fog-cloud system consisting of mobile users, one cloud server, and one fog server co-located with a base station (BS) equipped with multiple antennas. For convenience, we denote the set of users as . We assume that each user needs to execute an application requiring CPU cycles within an interval of seconds, in which CPU cycles must be executed locally at the mobile device and the remaining offloadable CPU cycles can be processed locally or offloaded and processed at the fog/cloud server for energy saving and delay improvement. Let be the number of bits representing the corresponding incurred data (i.e., programming states, input text/image/video) of the possibly-offloaded CPU cycles. To overcome the wireless transmission bottleneck caused by the capacity-limited wireless links between the users and the BS, data compression is employed at the users for reducing the amount of data transferred to the fog server. Fig. 1 illustrates the considered system.
In particular, once CPU cycles are offloaded, user first compresses the corresponding bits down to bits before sending them to the remote fog server. The ratio between and for user is called the compression ratio, which is denoted as . Depending on the available fog computing resources, the offloaded computation task can be directly processed at the fog server or be further offloaded to the cloud server. The amount of data containing the computation outcome sent back to the users is usually much smaller than that incurred by offloading the task. Therefore, similar to [12, 20, 21], we do not consider the downlink transmission of the computation results in this paper111The design in this paper can be extended to consider the downlink transmission of feedback data as in ..
Ii-A1 Data compression Model
Data compression can be achieved by eliminating only statistical redundancy (i.e., lossless compression) or by also removing unnecessary information (i.e., lossy compression). To realize it, compression and decompression algorithms must be executed at the data source and destination, respectively, which induces additional computation load. Since the compression computation load, decompression computation load, and compression quality are in general non-linear functions of the compression ratio, , we propose the following model:
where ‘’ ‘’ and ‘’ stands for compression and decompression, respectively, represents the possible range of due to the compression algorithm employed at user , and denote the additional CPU cycles at source and destination needed for compression and decompression, respectively222Note that when the compression and decompression algorithms are executed at a fixed CPU clock speed, the computation load in CPU cycles is linearly proportional to the execution time.; represents the perceived QoS (i.e, this parameter, which is only considered for lossy compression, measures the deviation between the true data and the decompressed data); is the maximum number of CPU cycles; are constant parameters where .
For validation, Fig. 2 illustrates the relation between the normalized compression/decompression execution time and the compression ratio
using the lossless algorithms GZIP and BZ2 for the benchmark text files “alice.txt” and “asyoulik.txt” from Canterbury Corpus , and the lossy algorithm ‘JPEG’ for images “clyde-river.jpg” and “frog.jpg” from the Canadian Museum of Nature , obtained from simulation and fitting the proposed model. Here, the normalized execution time is the ratio of the actual execution time and the maximum execution time over all values of the compression ratio. The figure shows that the curves obtained through fitting using the proposed model match
the simulation results well333For validation, we follow a procedure similar to the one described in . In particular, we first turned off all other applications to keep the CPU clock speed almost constant when executing the compression and decompression algorithms by using ’cpupower tool’ in Linux. Then, we run algorithms GZIP, BZ2, and JPEG in Python 3.0 via a Linux terminal using Ubuntu 18.04.1 LTS on a computer equipped with CPU chipset Intel(R) core(TM) i7-4790, and 12 GB RAM. The simulation results were obtained by averaging over 1000 realizations. This allowed us to estimate the normalized execution time, which is proportional to the normalized computation load.
in Linux. Then, we run algorithms GZIP, BZ2, and JPEG in Python 3.0 via a Linux terminal using Ubuntu 18.04.1 LTS on a computer equipped with CPU chipset Intel(R) core(TM) i7-4790, and 12 GB RAM. The simulation results were obtained by averaging over 1000 realizations. This allowed us to estimate the normalized execution time, which is proportional to the normalized computation load.. This figure also confirms that the proposed non-linear model is more accurate in characterizing the computation load due to compression/decompression operations compared to the simple linear model adopted in [26, 9].
Ii-A2 Computing and Offloading Model
We now introduce the binary offloading decision variables , , and for the computation task of user , where , , and denote the scenarios where the application is executed at the mobile device, the fog server, and the cloud server, respectively; and these variables are zero otherwise. Moreover, we assume that the CPU cycles can be executed at exactly one location, which implies Then, the total computation load of user at the mobile device, denoted as , and at the fog server, denoted as , are given as, respectively,
As the fog and cloud servers are generally connected to the power grid while the capacity of a mobile battery is limited, we will focus on the energy consumption of the users . The local computation energy consumed by user and the local computation time can be expressed, respectively, as where is the CPU clock speed of user and denotes the energy coefficient specified by the CPU model . Let denote the CPU clock speed used at the fog server to process . Then, the computing time at the fog server is given by We assume that the computation task of each user is executed at the cloud server with a fixed delay of seconds444The delay time for the cloud server consists of two components: the execution time and the CPU set-up time. Due to the huge computing resource in the cloud server, the execution time is generally much smaller than the CPU set-up time , which is identical for all users..
Ii-A3 Communication Model
In order to send the incurred data during the offloading process, we assume that zero-forcing beamforming is applied at the BS and the average uplink rate from user to the BS (fog server) is expressed as where is the uplink transmit power per Hz of user , denotes the transmission bandwidth, and in which represents the large-scale fading coefficient, is the noise power density (watts per Hz), and is the MIMO beamforming gain . It is assumed that the number of antennas is sufficiently large so that is identical for all users. Then, the uplink transmission time and energy of user can be computed as, respectively, where denotes the circuit power consumption per Hz. For the data transmission between the fog server and the cloud server, a backhaul link with capacity bps (bits per second) is assumed. Let denote the backhaul rate allocated to user , then the transmission time from the fog server to the cloud server is
Ii-B Problem Formulation
Assume the users have to pay for their usage of the radio and computing resources at the fog/cloud servers. Then, the service cost of user can be modeled as where is the price per Hz of bandwidth for wireless data transmission, and is the price paid to execute one CPU cycle at the fog/cloud servers. Assuming that a pre-determined contract agreement specifies a maximum service cost then . This constraint can be rewritten equivalently as Beside the constrained service cost, two important metrics for each user are the service latency and the consumed energy. Specifically, the total delay for completing the computation task of user is given by
In addition, the overall energy consumed at user for processing its task comprises the energy for local computation and for data transmission in the offloading case. Hence, the energy consumption of user is given by
Practically, all users want to save energy and enjoy low application execution latency. Hence, we adopt the WEDC as the objective function of each user as follows:
where and represent the weights corresponding to the service latency and consumed energy, respectively. These weights can be pre-determined by the users to reflect their priorities or interests. The proposed design aims to minimize the WEDC function for each user while maintaining fairness among all users. Towards this end, we consider the following min-max optimization problem:
where , ; is the maximum CPU clock speed of user , is the maximum CPU clock speed of the fog server, is the maximum transmit power of user , denotes the feasible range of the compression ratio which can guarantee the required QoS of the recovered data. In particular, for lossless data compression where the perceived QoS for all , this feasible range is determined as and . For lossy data compression where the perceived QoS is required to be greater than , this range is determined as and . In this problem, (C1) and (C2) represent the constraints on the computing resources at the users and the fog server, respectively, while the offloading decision constraints are characterized by (C3) and (C4). The constraints on the compression ratio are captured by (C5), while (C6) and (C7) impose constraints on the maximum user transmit power and the bandwidth, respectively. Finally, (C8) and (C9) are the constraints on the limited backhaul capacity and delay, respectively.
Iii Optimal Algorithm Design for Data compression at Only Mobile Users
Iii-a Problem Transformation
To gain insight into its non-smooth min-max objective function, we recast into the following equivalent problem:
where is an auxiliary variable. is a MINLP problem which is difficult to solve due to the complex fractional and bilinear form of the transmission time and energy consumption, the logarithmic transmission rate function, and the mix of binary offloading decision variables and continuous variables. Conventional approaches usually decompose the problem into multiple subproblems which optimize the offloading decision, and the computing and radio resource allocation separately as in [18, 21]
or relax the binary variables as in[14, 15]. These approaches can obtain only sub-optimal solutions.
To solve the problem optimally, we first study how to classify the users into two sets, namely, a “locally executing user set” which is the set of users executing their applications locally, and an “offloading user set” which is the set of users offloading their applications for processing at the fog/cloud server. This classification is important because, in all constraints of , the optimization variables corresponding to the locally executing users are independent from the optimization variables of the other users. Hence, the decisions for the locally executing users can be optimized by decomposing into user independent subproblems which can be solved separately. The optimal algorithm is developed based on the bisection search approach where in each search iteration, we perform: 1) user classification based on the current value of using the results in Theorem 1 below; 2) feasibility verification for sub-problem of corresponding to the offloading user set ; and 3) updates of lower and upper bounds on according to the feasibility verification outcome. The detailed design is presented in the following.
Iii-B User Classification
Let be the locally executing user set, and be the offloading user set. We further define any pair of sets satisfying as a user classification. By defining and , then for a given classification , problem can be tackled by solving two sub-problems and for the users in sets and , respectively, as follows:
Note that the variable set corresponding to user in becomes since we have and the other variables can be set equal to zero when user executes its application locally. In such a scenario, can be simplified to . To attain more insight into the user classification, we now study the relationship between optimization sub-problems and in the following lemma.
We denote the optimal values of , , and as , , and , respectively. Then, we have
for any classification .
The merged optimal solutions of and are the optimal solution of if
If , then, we have .
The proof is given in Appendix A . ∎
Considering Lemma 1, instead of solving , we can equivalently solve the two sub-problems and . Moreover, a classification is optimal if the condition in (6) holds. The optimal solution of can be obtained as described in Proposition 1 while solving requires a more complex approach which will be discussed in Section III-D.
The optimal objective value of can be expressed as where is defined as
where and .
The proof is given in Appendix B. ∎
If is the optimum objective value of problem , then an optimal classification, , can be determined as
The proof is given in Appendix C. ∎
Iii-C General Optimal Algorithm Design
The results in Theorem 1 are now employed to develop an optimal algorithm for solving by iteratively solving and and updating until the optimal is obtained. The general optimal algorithm is presented in Algorithm 1. In this algorithm, we initially calculate for all users in as in (7). Then, we employ the bisection search to find the optimum where upper bound and lower bound are iteratively updated until the difference between them becomes sufficiently small, is feasible, and the sets and do not change. At convergence, the optimal classification solution can be obtained by merging the solutions of and . The optimal solution of can be determined using Proposition 1 and the verification of the feasibility of is addressed in the following. The relationship between the (sub)problems when solving is illustrated in Fig. 3.
Iii-D Feasibility Verification of
In order to verify the feasibility of , we consider the following problem
This problem minimizes the total required computing resource of the fog server subject to all constraints of except . Let be the objective value of problem . Then, the feasibility of can be verified by comparing to the available fog computing resource . In particular, problem is feasible if . Otherwise, is infeasible.
We propose to solve as follows. First, recall that there are two possible scenarios for executing the tasks of the users in set (referred to as modes): Mode 1 - task execution at the fog server, i.e., ; Mode 2 - task execution at the cloud server, i.e., . In addition, the fog computing resources are only required by the users in Mode 1 and the backhaul resources are only used by the users in Mode 2. Considering these two modes, a three-step solution approach is proposed to verify the feasibility of sub-problem as follows. In Step 1, the minimum required fog computing resource of every user is determined by assuming that it is in Mode 1. This step is fulfilled by solving sub-problem for every user , see Section III-D1. In Step 2, the minimum required backhaul rate for each user is optimized by assuming that it is in Mode 2. This step can be accomplished by solving subproblem for every user , see Section III-D2. In Step 3, using the results obtained in the two previous steps, problem is equivalently transformed to a mode-mapping problem, see Section III-D3.
Iii-D1 Step 1 - Minimum Fog Computing Resources for User
If the application of user is executed at the fog server, the minimum fog computing resource required for this application, denoted as , can be optimized based on the following sub-problem:
where , , , , and denote the respective constraints of user corresponding to , , and . In sub-problem , the WEDC function consists of posynomials and other terms involving . We can convert into a convex function via logarithmic transformation as follows. When , all variables in set must be positive to satisfy constraints (C0) and (C9); therefore, we can employ the following variable transformations: , , , , and . With these transformations, the objective function and all constraints of except and are converted into a linear form while the total delay and the WEDC in and can be rewritten, respectively, as where and . The convexity of is formally stated in the following proposition.
Sub-problem is convex with respect to set , where and .
The proof is given in Appendix D. ∎
Iii-D2 Step 2 - Minimum Allocated Backhaul Resource for User
If the application of user is executed at the cloud server, the minimum backhaul capacity for transferring its application to the cloud server, denoted as , can be determined by solving the following sub-problem:
Similar to , can be converted to a convex problem via logarithmic transformations; thus, we can find the optimal point . If is infeasible, we set .
Iii-D3 Step 3 - Feasibility Verification
With the obtained values and , problem can be transformed to
where for a given . In fact, is a “0-1 knapsack” problem , which can be solved optimally and effectively using the CVX solver. If , combining the set of all solutions of the ’s, ’s, and yields a feasible solution of for this value of . Hence, is feasible in such scenario. The feasibility verification of is summarized in Algorithm 2.
Iii-E Optimal JCORA Algorithm to Solve
Based on the results presented in the previous sections, the solution of can be found by employing Algorithm 1 and the feasibility verification presented in Algorithm 2. The optimality of the obtained solution is formally stated in the following theorem.
Iii-F Complexity Analysis
We analyze the computational complexity of the JCORA algorithm in terms of the required number of arithmetic operations. In all proposed algorithms, the while-loop for the bisection search of requires iterations. To verify the feasibility of given , the convex problems and can be solved by using the interior point method with complexity , where is the number of equality constraints and represents the number of variables . It can be verified that and have the same complexity. On the other hand, the knapsack problem for users can be solved by the Algorithm 2 in pseudo-polynomial time with complexity , where is determined by coefficients in . Moreover, and can be solved independently for all users ; therefore, the complexity of each bisection search step can be expressed as , where