I Introduction
The rapid growth of smart environments, and advent of Internet of Things (IoT) have led to the generation of large amounts of data. However, it is a daunting task to transmit enormous data through traditional networks due to limited bandwidth and energy limitations [1]. These data need to be efficiently compressed, transmitted, and cached to satisfy the Quality of Information (QoI) required by end users. In fact, many wireless components operate on limited battery power supply and are usually deployed in remote or inaccessible areas, which necessitates the need for designs that can enhance the energy efficiency of the system with a QoI guarantee.
A particular example of modern systems that require high energy efficiency is the wireless sensor network (WSN). Consider a WSN with various types of sensors, which can generate enormous amount of data to serve end users. On one hand, data compression has been adopted to reduce transmission (communication) cost at the expense of computation cost. On the other hand, caches can be used as a mean of reducing transmission costs and access latency, thus enhancing QoI but with the expense of the added caching cost. Hence, there exists a tradeoff in energy consumption due to data communication, computation and caching. This raises the question: what is the right balance between compression and caching so as to minimize the total energy consumption of the network?
In this paper, we formulate an optimization problem to find the optimal data compression rate and data placement to minimize the energy consumed due to data compression, communication and caching with QoI guarantee in a communication network. The formulated problem is a Mixed Integer NonLinear Programming problem with nonconvex functions, which is NPhard in general. We propose a variant of spatial branch and bound algorithm that guarantees global^{1}^{1}1global optimality means that the obtained solution is within tolerance of the global optimal solution. optimality.
Each node has the ability to compress and cache the data with some finite storage capacity. We focus on wireless sensor networks as our motivating example. In particular, as shown in Figure 1, we assume that only edge sensors generate data, and there exists a single sink node that collects and serves the requests for the data generated in this network. The model can be extended to include any arbitrary node that produces data at the expense of added notational complexity.
Computation: Data aggregation [2, 3] is the process of gathering data from multiple generators (e.g., sensors), compressing them to eliminate redundant information and then providing the summarized information to end users. Since only part of the original data is transmitted, data aggregation can conserve a large amount of energy. A common assumption in previous works is that energy required to compress data is smaller than that needed to transmit data. Therefore, data compression was considered a viable technique for reducing energy consumption. However, it has been shown [4] that computational energy cost can be significant and may cause a netenergy increase if data are compressed beyond a certain threshold. Hence, it is necessary to consider both transmission and computation costs, and it is important to characterize the tradeoff between them[1].
Caching: Caches have been widely used in networks and distributed systems to improve performance by storing information locally, which jointly reduces access latency and bandwidth requirements, and hence improves user experience. Content Distribution Networks (CDNs), Software Defined Networks (SDNs), Named Data Networks (NDNs) and Content Centric Networks (CCNs) are important examples of such systems. The fundamental idea behind caching is to make information available at a location closer to the enduser. Again, most previous work focused on designing caching algorithms to enhance system performance without considering the energy cost of caching. Caching can reduce the transmission energy by storing a local copy of the data at the requesting node (or close by), hence eliminating the need for multiple retransmission from the source node to the requesting node. However, caching itself can incur significant energy costs [5]. Therefore, analyzing the impact of caching on overall energy consumption in the network (along with data communication and compression) is critical for system design.
Quality of Information (QoI): The notion of QoI required by end users is affected by many factors. In particular, the degree of the data aggregation in a system is crucial for QoI. It has been shown that data aggregation can deteriorate QoI in some situations [6]. Thus an energy efficient design for appropriate data aggregation with a guaranteed QoI is desirable.
We focus on a treestructured sensor network where each leaf node generates data, and compresses and transmits the data to the sink node in the network, which serves the requests for these data from devices outside this network. Examples of such a setting are military sites, wireless sensors or societal networks, where a large number of devices gather data, and desire to transmit the local information to any device outside this network that requires this information. The objective of our work is to obtain optimal data compression rate at each node, and an optimal data placement in the network for minimizing energy consumption with QoI guarantee.
Ia Organization and Main Results
Section IB presents a review of relevant literature. In Section II, we describe our system model in which nodes are logically arranged as a tree. Each node receives and compresses data from its children node(s). The compressed data are transmitted and further compressed towards the sink node. Each node can also cache the compressed data locally. In Section III, we formulate the problem of energyefficient data compression, communication and caching with QoI constraint as a MINLP problem with nonconvex functions, which is NPhard in general. We then show that there exists an equivalent problem obtained through symbolic reformation [7] in Section IV, and propose a variant of the Spatial BranchandBound (VSBB) algorithm to solve it. We show that our proposed algorithm can achieve global optimality.
In Section V, we evaluate the performance of our optimization framework and show that the use of caching along with data compression and communication can significantly improve the energy efficiency of a communication network. More importantly, we observe that with the joint optimization of data communication, computation and caching (C), energy efficiency can be improved by as much as compared to only optimizing communication and computation, or communication and caching (C). The improvement depends on the values of parameters in the model and the magnitude of improvement varies with different energy costs of the model. While the improvement in energy efficient is important, our framework helps in characterizing and analyzing the enhancement in energy efficiency for different network settings. We also evaluate the performance of the proposed VSBB algorithm through extensive numerical studies. In particular, we make a thorough comparison with other MINLP solvers Bonmin [8], NOMAD [9]
, Matlab’s genetic algorithm (GA), Baron
[10], SCIP [11] and Antigone [12] under different network scenarios. The results show that our algorithm can achieve global optimality, and the achieved objective function value (we achieve a lower objective function value for a minimization problem) is mostly better than stochastic algorithms such as NOMAD, GA while it performs comparably with deterministic algorithms such as Baron, Bonmin, SCIP and Antigone. Furthermore, our algorithm provides a solution in varying network situations even when other solvers such as Bonmin, and SCIP are not able to. We provide concluding remarks in Section VI.IB Related Work
To the best of our knowledge, there is no prior work that jointly considers communication, computation and caching costs in distributed networks with a QoI guarantee for end users.
Data Compression: Compression is a key operation in modern communication networks and has been supported by many dataparallel programming models [13]. For WSNs, data compression is usually performed over a hierarchical topology to improve communication energy efficiency [2], whereas we focus on energy tradeoff between communication, computation and caching.
Data Caching: Caching plays a significant role in many systems with hierarchical topologies, e.g., WSNs, microprocessors, CDNs etc. There is a rich literature on the performance of caching in terms of designing different caching algorithms, e.g., [14, 15], and we do not attempt to provide an overview here. However, none of these work considered the costs of caching, which may be significant in some systems [5]. The recent paper by Li et al. [16] is closest to the problem we tackle here. The differences between our work and [16] are mainly from two perspectives. First, the mathematical formulations are quite different, we consider energy tradeoffs between C3 while [16] focused on C2. Second, we provide a optimal solution to a MINLP problem while [16] aimed at developing approximation algorithms.
Energy Costs: While optimizing energy costs in wireless sensor networks has been extensively studied [17], existing work primarily is concerned with routing [18], MAC protocols [17], and clustering [19]. With the growing deployment of smart sensors in modern systems [1], innetwork data processing, such as data aggregation, has been widely used as a mean of reducing system energy cost by lowering the data volume for transmission.
Ii Analytical Model
We represent the network as a directed graph For simplicity, we consider a tree, with nodes, as shown in Figure 2. It is possible to generalize our framework to general network topology with arbitrary source nodes, provided that the route between the source and requesting node is known. Node is capable of storing amount of data. Let with be the set of leaf nodes, i.e., . Time is partitioned in periods of equal length and data generated in each period are independent. Without loss of generality (W.l.o.g.), we consider one particular period in the remainder of the paper. We assume that only leaf nodes can generate data, and all other nodes in the tree receive and compress data from their children nodes, and either cache or transmit the compressed data to their parent nodes during time T. Arbitrary source nodes can also be incorporated into the model at the cost of added notational and model complexity.
Let be the amount of data generated by leaf node . The data generated at the leaf nodes are transmitted up the tree to sink node which serves requests for data generated in the network. Let be the depth of node in the tree. W.l.o.g., we assume that the sink node is located at level We represent a path from node to the sink node as the unique path of length as a sequence of nodes such that where (i.e., the sink node) and (i.e., the node itself).
We denote the perbit reception, transmission and compression cost of node as , and respectively. Each node along the path can compress the data generated by leaf node with a data reduction rate , where The reduction rate characterizes the degree to which a node can compress the received data, which plays an important role for determining the QoI.
The higher the value of , the lower the compression will be, and vice versa. The higher the degree of data compression, the larger will be the amount of energy consumed by compression. Similarly, caching the data closer to the sink node may reduce the transmission cost for serving the request, however, each node only has finite storage capacity. We study the tradeoff among the energy consumed at each node for transmitting, compression and caching the data.
Denote the total energy consumption at node as , which consists of reception cost , transmission cost , computation cost and storage (caching) cost ; it takes the form
where  
(1) 
The above energy consumption models for data transmission, compression and caching have been used in literature [1, 20, 5] and are suitable for highlighting the energy consumption in a communication network. However, our formulation can be extended to incorporate various other energy consumption models as well. In (II), captures the computation energy. As computation energy increases with the degree of compression, we assume that is a continuous, decreasing and differentiable function of the reduction rate. One candidate function is [1, 20]. Moreover, we consider an energyproportional model [5] for caching, i.e., if the received data is cached for a duration of where represents the power efficiency of caching, which strongly depends on the storage hardware technology. W.l.o.g., is assumed to be identical for all the nodes. For simplicity, denote = ++ as the sum of perbit reception, transmission and compression cost at node per unit time.
During time period , we assume that there are requests at sink node for data generated by leaf node . For simplicity, we assume that the number of requests for the data of a node is constant. The boolean variable equals if the data from node is stored along the path at node otherwise it equals . We allow the data to be cached at only one node along the unique path between the leaf node and root node. For ease of notation, we define by Let denote the set of leaf nodes that are descendants of node . We also assume that the energy cost for searching for data at different nodes in the network is negligible [1, 15]. For convenience, let and For ease of exposition, the parameters used throughout this paper are summarized in Table I.
Notation  Description 

number of data (bits) generated at node  
reduction rate at node , is the ratio of amount of output data to input data  
the QoI threshold  
perbit reception cost of node  
perbit transmission cost of node  
perbit compression cost of node  
if node caches the data from leaf node ; otherwise  
storage capacity of node  
caching power efficiency  
request rate for data from node  
total number of nodes in the network  
set of leaf nodes that are descendants of node  
time length that data are cached  
upper bound of the objective function  
list of regions  
any subregion in  
upper bound on the objective function in subregion  
lower bound on the objective function in subregion  
difference between the upper and lower bound  
lower bound on auxiliary variable in subregion  
upper bound on auxiliary variable in subregion  
candidate variable for branching  
chosen branching variable  
value at which the variable is branched  
bt  bilinear terms 
lft  linear fractional terms 
set of bilinear terms (bt)  
set of linear fractional terms (lft) 
Iii Energy Optimization
In this section, we first define the cost function in our model and then formulate the optimization problem. Data produced by every leaf node is received, transmitted, and possibly compressed by all nodes in the path from the leaf node to the root node, consuming energy
(2) 
where if . Equation (2) captures onetime^{2}^{2}2During every time period , data is always pushed towards the sink upon the first request. energy cost of receiving, compressing and transmitting data from leaf node (level ) to the sink node (level ). The amount of data received by any node at level from leaf node is due to the compression from level to The term captures the reception, transmission and compression energy cost for node at level along the path from leaf node to the sink node.
Let be the total energy consumed in responding to the subsequent requests. We have
(3) 
Note that the remaining requests are either served by the leaf node or a cached copy of data at level for W.l.o.g., we consider node at level . If data is not cached from up to the sink node (level , i.e., for the cost is incurred due to receiving, transmitting and compressing the data times, which is captured by the first term in Equation (3), the second term is . Otherwise, the requests are served by the cached copy at , the corresponding caching and transmission cost serving from are captured by the second term in Equation (3), and the corresponding reception, transmission and compression cost from upto to sink node is captured by the first term. Note that the first time cost of reception, transmission and compression the data from leaf node to is already captured by Equation (2).
We present a simple but illustrative example to explain the above equations.
Example 1.
We consider a network with one leaf node and one sink node, i.e., and Then the cost in Equation (2) becomes where the first and second terms capture the reception, transmission and compression cost for data at sink node and the leaf node, respectively.
The cost in Equation (3) is
where Term and Term capture the costs at sink node and leaf node, respectively. To be more specific, there are three cases: (i) data is cached at sink node , i.e., and (since we only cache one copy); (ii) data is cached at leaf node , i.e., and ; and (iii) data is not cached, i.e., . We consider these three cases in the following.
Case (i), i.e., and , Term becomes and Term reduces to
since all the requests are served from sink node. This indicates that the total energy cost is due to caching the data for time period and transmitting it times from the sink node to users that request it.
Case (ii), i.e., and , Term becomes , which captures the reception, transmission and compression costs at sink node for serving the requests. Term becomes , which captures the cost of caching data at the leaf node and transmitting the data times from the cached copy to the sink node . The sum of them is the total cost to serve requests.
Case (iii), i.e., , , which captures the reception, transmission and compression costs at sink node and leaf node for serving the requests since there is no cached copy in the network.
The total energy consumed in the network is ,
(4) 
where and
. Our objective is to minimize the total energy consumption of the network with a QoI constraint for end users by choosing the compression ratio vector
and caching decision vector in the network Therefore, the optimization problem is,s.t.  
(5) 
where is the depth of node in the tree.
The first constraint is the QoI constraint, i.e., the total data available at the sink node [1]. The second constraint indicates that our decision (caching) variable is binary. The third constraint is on total amount of data that can be cached at each node. The fourth constraint is that at most one copy of the generated data should be cached on the path between the leaf node and the sink node.
The optimization problem in (5) is a nonconvex MINLP problem with continuous variables, the ’s and binary variables, the ’s where, = .
Iiia Properties
We first analyze the complexity of the problem give in (5). There are two decision variables in (5), i.e., the compression ratio and caching decision variables. To analyze the impact of these variables on the complexity of the problem, we consider two cases, (i) given the caching decisions variables , solving for the optimal compression rates and (ii) given the compression ratio , solving for the optimal cache placement decision.
IiiA1 Given Caching Decisions
For given caching decision variables the optimization problem in (5) turns into a constrained polynomial minimization over the positive quadrant (PMoP)[21] with respect to the compression ratio that is an NPhard problem [21].
Theorem 1.
Given fixed caching decisions the optimization problem in (5) is NPhard.
Proof.
We prove the hardness by reduction from the classical jobshop problem, which is NPhard [22].
We can reduce the jobshop problem to our problem in (5) with fixed caching decisions as follows. Consider each node in our model to be a machine . Denote the set of machines as . The compression rate constitutes the set of jobs , where indicates the compression rate at any machine . Let be the set of all sequential job assignments to different machines so that every machine performs every job only once. The elements can be written as matrices, where column orderwise lists the sequential jobs that the machine will perform. There is a cost function that captures the cost (energy) for any machine to perform a particular job (compression) along with data transmission, reception and caching. Our objective in the optimization problem (5) is to find assignments of job to minimize energy consumption, which is equivalent to the classical jobshop problem. Since jobshop problem is NPhard [22], our problem in (5) with given cache placement decision is also NPhard.
∎
IiiA2 Given Compression Ratios
Given compression ratios , the optimization problem in (5) is only over the caching decision variables Hence, we obtain an integer programming problem, which is NPhard.
Theorem 2.
Given a fixed compression ratio the optimization problem in (5) is NPhard.
Proof.
We prove the hardness by reduction from the classical jobshop problem, which is NPhard [22]. The proof is the same as the proof for Theorem 1. However, the job in this case is whether to cache the data or not i.e., the caching decision constitutes the set of jobs , where means that the data is cached and means otherwise. The cost function in this case captures the cost (energy) for any machine to cache the data (along with data transmission, reception and compression). Our objective in the optimization problem (5) (with a fixed compression ratio) is to find assignments of job to minimize energy consumption, which is equivalent to the classical jobshop problem. ∎
Therefore, given the results in Theorems 1 and 2, we know that our optimization problem is NPhard in general.
Corollary 1.
The optimization problem defined in (5) is NPhard.
Remark 1.
The objective function defined in (5) is monotonically increasing in the number of requests for all provided that and are fixed.
Notice that (2) is independent of and (3) is linear in , and its multipliers are positive. Hence, for any fixed and , (4) increases monotonically with .
Remark 2.
Given a fixed network scenario, if we increase the number of requests for the data generated by leaf node then these data will be cached closer to the sink node or at the sink node, if there exists enough cache capacity, to reduce the overall energy consumption.
For fixed , observe from (3) that energy consumption decreases if the cache is moved closer to the root as the nodes deep in the tree do not need to retransmit.
IiiB Relaxation of Assumptions
In our model, we make several assumptions for the sake of simplicity. In the following, we discuss the relaxation of these assumptions.
While we assume that the network is structured as a tree, this assumption can be easily relaxed as long as there exists a simple fixed path from each leaf node to the sink node. The tree structure represents a simple topology that captures the key parameters in the optimization formulation without the complexity introduced by a general network topology. Furthermore, for simplicity, we assume that all parameters across the nodes are identical, which is not necessary as seen from the cost function. We also assume that only leaf nodes generate data. However, our model can be extended to allow intermediate nodes to generate data at the cost of added complexity.
Iv Variant of Spatial BranchandBound Algorithm
In this section, we present a variant of the Spatial BrandandBound algorithm (VSBB). Instead of solving the MINLP problem (5) directly, we use VSBB to solve a standard form of the original MINLP. We first introduce the Symbolic Reformulation[7] method that reformulates the MINLP (5) into a standard form needed by VSBB.
Definition 1.
A MINLP problem is said to be in a standard form if it can be written as
s.t.  
(6) 
where the vector of variables consists of continuous and discrete variables in the original MINLP. The sets and contain all relationships that arise in the reformulation. and are a matrix and a vector of real coefficients, respectively. The index obj denotes the position of a single variable corresponding to the objective function value within the vector
Theorem 3.
The nonconvex MINLP problem (5) can be transformed into a standard form.
Due to space constraints, we relegate detailed reformulations (see Appendix B for details of symbolic reformulation) and standard form of (5) to Appendix A.
Here, we give an example to illustrate the above reformulation process.
Example 2.
Consider the same network in Example 1, the nonconvex MINLP problem becomes
s.t.  
(7) 
is a bilinear term. Based on symbolic reformulation rules, a new bilinear auxiliary variable needs to be added. The first constraint in (2) is then transformed into which is linear in auxiliary variable . Similarly, we add for linearfractional term that appears in in the third constraint of (2) is a trilinear term. Since is replaced by , we obtain a bilinear term . Again, based on symbolic reformulation rules, is replaced by a new auxiliary variable . Similarly we add new auxiliary variables , and . The objective function in (2) can be then expressed as a function of these new auxiliary variables. Therefore, the standard form of (2) is
s.t.  
(8) 
Through this reformulation, the nonconvex and nonlinear terms in the original problem are transformed into bilinear and linear fractional terms, which can be easily used to compute the lower bound of each region in VSBB, which are discussed in details later. This is the reason VSBB requires reformulating the original problem into a standard form.
Theorem 4.
Reformulated problem and the original MINLP are equivalent.
Proof is available in Section (page ) [7].
Due to the reformulation, the number of variables in the reformulated problem is larger than in the original MINLP. In the following, we show that the number of auxiliary variables that arise from symbolic reformulation is bounded.
Remark 3.
The number of auxiliary variables in the symbolic reformulation is where is the number of variables in the original formulation.
From [23], a way to transform a general form optimization problem into a standard form (6) is through basic arithmetic operations on original variables. To be more specific, any algebraic expression results from the basic operators including the five basic binary operators, i.e., addition, subtraction, multiplication, division and exponentiation, and the unary operators, i.e., logarithms etc. Therefore, in order to construct a standard problem consisting of simple terms corresponding to these binary or unary operations, new variables need to be added corresponding to these operations. From the symbolic reformulation process [23, 24, 25], any added variable results from the basic operations between two (including possibly the same) original variables or added variables. Hence, based on the basic operations, there are at most combinations of these variables, given that there are variables in the original problem (5). Therefore, the number of added variables in the symbolic reformulation is bounded as In the remainder of this section, we present the VSBB to solve the equivalent problem.
Iva Our Variant of Spatial BranchandBound
The proposed spatial branchandbound method is a variant of the method proposed in [23] and is primarily tuned for solving our optimization problem (12) that is also the solution of (5). Our algorithm is different from [23] because

We do not use any bounds tightening steps as it does not always guarantee faster convergence [26] and in case of our problem slowed down the process.

By eliminating the bounds tightening step, we do not need to calculate the lower bound again separately and utilize the lower bound obtained in Step 2 for the chosen region , hence reducing the computational complexity of the algorithm.
Algorithm 1 provides an overview of the steps involved in spatial branchandbound algorithm. We describe some of the steps in Algorithm 1 in detail below.
Step : There are a number of approaches that can be used to choose a subregion from [27]. Here we use the least lower bound rule, i.e., we choose a subregion that has the lowest lower bound among all the subregions, since it is a widely used and well researched method. The lower bound can be obtained by solving a convex relaxation of the problem in (12). As our optimization problem in (5) and (12
) contains only bilinear and linear fractional terms, we use McCormick linear overestimators and underestimators
[28] (see Appendix C) to obtain a convex relaxation of all such terms. The resulting problem is then a Mixed Integer Linear Programming (MILP) problem that we solve using the SCIP solver [11]. The SCIP solver is a faster and well known solver for MILP problems. The subregion with lowest lower bound is then used as the region to explore for an optimum. The chosen regions’ lower bound is used as . If the convex relaxation is infeasible or if the obtained lower bound is higher than the existing upper bound of the problem, we fathom or delete the current region by moving to step .Step : In step , we calculate the upper bound for the subregion chosen in Step . This can be done in a number of ways (see [23]), here we use local MINLP solver such as Bonmin [8] to obtain a local minimum for the subregion as it performed better in terms of time than using local nonlinear programming optimization with fixed discrete values or added discreteness constraints in our simulation settings. If the upper bound for the region cannot be obtained or if it is greater than then we move to Step to further divide the region and search further for a better solution. Otherwise we set it as the current best solution and delete all the subregions whose lower bound is greater than the obtained upper bound since all such regions cannot contain the global optimal solution. If the difference between the upper and lower bound for the region is within the tolerance, the current subregion need not to be searched further, then we delete the current subregion by going to step , otherwise we move to step for further searching in the space.
Step : Step also known as the branching/partitioning step helps in partitioning/dividing a region to further refine the search for solution. In branching step, we select a variable for branching/partitioning as well as the value of the variable at which the region is to be divided. There are a number of different rules and techniques that can be used for branching (see [27] for detailed discussion). Here we use the variable selection and value selection rule specified in [24], since it has been found efficient for our problem [24].
We branch on the variable that causes the maximum reduction in the feasibility gap between the solution of convex relaxation (solution of Step 2) and the exact problem. To do so, the approximation error for the bilinear and linear fractional terms in (12) is calculated using (9a) and (9b) respectively where means the value of the variable obtained in Step 2. The variable with the maximum approximation error of all is chosen as the branching variable as that tightens the gap between the relaxation and the exact problem [24]. This results in two candidate variables for branching i.e. and . If one of the variables is discrete (binary in our case) and the other is continuous then choose the discrete variable since it will result only in finite number of branches. However, if both variables are of the same type (either binary or continuous), then the branching variable is chosen using (10) i.e. we choose the variable that has its value closer to its range’s midpoint. However, we first need to obtain the branching value for the candidate variables (the value at which to branch). should be between the upper and lower bounds of the variable in the region i.e. . The rules for the choice of the branch point have been set in [24], however we restate them here for sake of completeness.

Set to the value obtained in Step , i.e.,

If any feasible upper bound has been obtained and , then and stop the search for the value.

If step provided an upper bound for the subregion , then .
After obtaining the branch point value, we have all the parameters required for (10) and can then choose the variable for branching.
(9a)  
(9b) 
(10) 
We partition the subregion into and and add , into our region list . Then we move to Step and delete the subregion from the list .
IvB Convergence of Spatial BranchandBound
V Evaluation
We evaluate the performance of our communication, compression and caching (C) joint optimization framework through a series of experiments on several network topologies as shown in Figure 3. Our goal is to analyze the performance of C and assess the improvement in energy efficiency that can be achieved by jointly considering C costs when compared with C. While highlight the performance gain is valuable, characterizing the performance of C in different settings and parameters, and obtaining the optimal caching location and data compression rate is also of great significance. We also compare the performance of our VSBB algorithm with some other wellknown solvers.
The highlights of the evaluation results are:

Our C joint optimization framework improves energy efficiency by as much as compared to the C optimization over communication and computation, or communication and caching. This shows the significance of jointly considering C energy costs.

The improvement in energy efficiency with C framework increases with an increase in the number of requests and the network size. Furthermore, data of nodes that had largest number of requests ’s are cached at the sink node or closer to the sink node.

While comparing different MINLP solvers, VSBB algorithm can obtain an global optimal solution in most situations. We vary the network parameters and find that VSBB is able to obtain a feasible solution in all settings. SCIP, Baron, Bonmin and Antigone are faster in obtaining solutions. However, they are either not able to obtain solutions in all the settings or they provide an objective value higher than our algorithm particularly for lower values of .
Va Methodology
Our primary goal is to highlight the improvement in energy efficiency that is achieved using the C framework when compared with C. We define the energy efficiency as:
(11) 
where and are the optimal energy costs under the C optimization framework in (5) and the C optimization, respectively. reflects the reduction of energy efficiency for the C over the C optimization. While, the increase in energy efficiency using C framework is noteworthy, characterizing the magnitude of the improvement and the parameters that significantly impact the energy efficiency is important. Such characterization can help in identifying the operation regions
for the network and then accordingly devising heuristic algorithms for specific operation regions. We also compare the performance of VSBB with other MINLP solvers and show that it performs comparably with other MINLP solvers for our C
framework.Solver  Characteristics 

Bonmin [8]  A deterministic approach based on BranchandCut method that solves relaxation problem with Interior Point Optimization tool (IPOPT), as well as mixed integer problem with Coin or Branch and Cut (CBC). 
NOMAD [9]  A stochastic approach based on Mesh Adaptive Direct Search Algorithm (MADS) that guarantees local optimality. It can be used to solve nonconvex MINLP and has a relatively good performance. 
GA [29]  A metaheuristic stochastic approach that can be tuned to solve global optimization problems. We use Matlab Optimization Toolbox’s implementation. 
SCIP[11]  One of the fastest, noncommercial, deterministic global optimization solver that uses branchandbound algorithm for solving MINLP problems. 
Baron[10]  A deterministic global solver for MINLP problems that relies on Branch and Cut approach for solving MINLP problems. 
Antigone[12]  A deterministic global solver for MINLP problems that relies on special structure of the problem and uses Branch and Cut approach to solve the problem. 
Setup: We implement VSBB in Matlab on a Core i GHz CPU with GB RAM. The candidate MINLP solvers in this work include Bonmin, NOMAD and GA, which are implemented with OptiToolbox [30]. We summarize the characteristics of these solvers in Table II. Note that these solvers can be applied directly to solve the original optimization problem in (5), while our VSBB solves the equivalent problem. The reformulations needed are executed by a Java based module and we derive the bounds on the auxiliary variables. We also relax the integer constraint in (5) to obtain a nonlinear programming problem, which is solved by IPOPT [31] and use it as a benchmark for comparison. VSBB terminates when optimality is obtained or a computation timer of seconds expires. We take in our study. If the timer expires, the last feasible solution is taken as the best solution. For cases, where no solution is obtained within the specified timer, we increase the timer limit to seconds. Our simulation parameters are provided in Table III, which are the typical values used in the literature [1, 17, 32].
Parameter  Value  Parameter  Value (Joules) 

1000  50 10  
100  200 10  
1.88 10  80 10  
10s  [] 
Solver  

Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  
Bonmin  0.010  0.076  0.018  0.07  0.026  0.071  0.032  0.077  0.039  0.102 
NOMAD  0.012  1.036  0.038  0.739  0.033  0.640  0.038  0.203  0.039  0.263 
GA  0.010  0.286  0.018  2.817  0.026  7.670  0.042  11.020  0.064  3.330 
VSBB  0.010  18.231  0.018  17.389  0.026  12.278  0.032  7.327  0.039  19.437 
SCIP  Inf  0.07  0.0012  0.07  0.005  0.05  0.011  0.087  0.039  0.05 
Baron  0.01  0.91  0.018  0.79  0.026  0.77  0.032  0.87  0.039  0.49 
Antigone  0.01  0.195  0.018  0.18  0.026  0.175  0.032  0.19  0.039  0.2 
Solver  

Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  
Bonmin  0.0002  0.214  0.039  0.164  0.078  0.593  0.117  0.167  0.156  0.212 
NOMAD  0.004  433.988  0.121  381.293  0.108  203.696  0.158  61.093  0.181  26.031 
GA  0.043  44.538  0.096  30.605  0.164  44.970  0.226  17.307  0.303  28.820 
VSBB  0.0001  1871.403  0.039  25.101  0.078  30.425  0.117  23.706  0.156  19.125 
SCIP  NC  5901.7  NC  7200  NC  4829.4  NC  7200  0.156  1.37 
Baron  0.0002  00.74  0.039  1002.14  0.078  7200  0.117  3.41  0.156  0.15 
Antigone  0.0002  3.57  0.039  0.38  0.081  0.34  0.117  0.32  0.156  0.13 
VB Efficacy of the C Framework
Figure 4 shows the increase in energy consumption with increase in the number of requests in different network and compression settings. We observe that as the number of requests increases, the total energy cost increases, as reflected in Remark 1. An important observation is that the initial increase in the energy cost is large. However, when the data are cached (number of requests ), the slope decreases. This is because the transmission cost is usually much larger than the caching cost (using the energy proportional model for caching [5]) and once the data are cached, the cached copy is used to satisfy other requests.
For the energy efficiency, we compare the total energy costs under joint C optimization with those under C optimization. We consider two cases for the C optimization: (i) Co (Communication and Computation), where we set for each node to avoid any data caching; (ii) Ca (Communication and Caching), where we set which is equivalent to , i.e., no computation. Comparison between C, Co and Ca is shown in Figure 5. For the parameters that we used in simulation, the energy cost for the C joint optimization is lower than that for Co optimization for the same parameter setting. This highlights the improvement that can be achieved using C framework. In other words, although C incurs caching costs, it may significantly reduce the communication and computation, which in turn brings down total energy cost. Similarly, C optimization outperforms Ca. Using Equation (11), energy efficiency improves by as much as for the C framework when compared with the C formulation. These trends are observed in other candidate network topologies. Figure 6 shows the improvement that C brings in comparison with C for a two nodes network. Using Equation (11), energy efficiency improves by as much as for the C framework when compared with the C formulation. The results for three nodes and four nodes networks are presented in Tables VIII and IX.
Remark 4.
Note that the above results are based on parameter values typically used in the literature, as shown in Table III. From our analysis, it is clear that the larger the ratio between and , , the larger will be the improvement provided by our C formulation.
VC Comparison of Solvers
We compare the performance of our proposed VSBB with other MINLP solvers in terms of:
VC1 The Best Solution to the Objective Function
We compare the performance of VSBB with three other candidate solvers for the networks in Figure 3. The results for two nodes and seven nodes networks are presented in Tables IV and V. We observe that VSBB, Bonmin, SCIP, Antigone, and Baron achieve comparable objective function value for larger values of , while VSBB outperforms other algorithms for lower values of
(discussed in detail later). Furthermore, Bonmin and SCIP cannot generate a feasible solution even if it exists for some cases. Particularly, for Bonmin, there are a number of probable reasons for such a problem: a) For MINLP problems with nonconvex functions, Bonmin relies on heuristic options and does not guarantee
global optimality [33]. The heuristics can cause such problems; b) The BranchandCut method, used by Bonmin, is based on outerapproximation (OA) algorithm [34]. For the MINLP with nonconvex functions, OA constraints do not necessarily result in valid inequalities for the problem. Hence Bonmin’s BranchandCut method sometimes cuts regions where a lower value exists. NOMAD and GA in general yield a higher objectivefunction value than VSBB does. This is because both NOMAD and GA are based on a stochastic approach which cannot guarantee convergence to the global optimum. Similar trends are observed for three and four node networks.VC2 Convergence Time
The time taken to obtain the best solution is important in practice. The amount of time that an algorithm requires to obtain its best solution as discussed in Section VC1 are shown in Tables IV and V for the two nodes and seven nodes networks, respectively. It can be seen that Bonmin, Antigone, Baron and SCIP (when it is able to provide a solution) are the fastest methods. However, Bonmin, SCIP and Baron sometimes cannot find a solution although it exists.VSBB takes longer to obtain a better solution, because our reformulation introduces auxiliary variables and additional linear constraints. Different applications can tolerate various degrees of algorithm speed. For the sample networks and applications under consideration, the speed of VSBB is considered to be acceptable [27].
VC3 Stability
From the analysis in Sections VC1 and VC2, we know that Bonmin is faster but unstable in some situations. We further characterize the stability of Bonmin with respect to the threshold value of QoI as follows. Specifically, we fix all other parameters in Table III, and vary only the maximal possible value of in different networks. The results are shown in Table VI. For each maximal value, we test all the possible integer values of between and itself. Hence, the number of tests equals the maximal value. We see that the number of instances where the Bonmin method fails to produce a feasible solution increases as the network size increases.
Furthermore, Bonmin, Baron and Antigone can provide a feasible solution for smaller values of at a faster time, we observe that the value of the solution is larger than that of VSBB. We compare the performance of VSBB with these algorithms for smaller values of in Table VII. We see that VSBB outperforms Bonmin, Antigone and Baron by as much as , , and , respectively when searching for an global optimum, though it requires more time. The timer is set to s for results shown in Table VII. Results for three node and four node networks are given in Tables VIII and IX respectively. SCIP, for certain instances of the three node network, provides the lowest objective function value. However, for majority of the cases, we observe similar trends like Tables IV and V.
Networks  (a)  (b)  (c)  (d) 
of test values  1000  2000  2000  4000 
of infeasible solutions  0  0  1  216 
Infeasibility (%)  0  0  0.05  5.4 
Solver  =1  =3  =5  =8  =50  

Obj.  Time (s)  Obj.  Time  Obj.  Time  Obj.  Time  Obj.  Time  
Bonmin  0.0002  0.214  0.0003  0.211  0.0003  0.224  0.0005  0.23  0.0021  0.364 
Antigone  0.0002  3.57  0.000317  2.47  0.000395  6.53  0.000512  15.61  0.002153  2.71 
Baron  0.0002  0.74  0.00031  4846  0.00039  7200  0.005  7200  0.0021  7200 
VSBB  0.00011  1871  0.00015  2330  0.00019  1243  0.00047  1350  0.0020  3325 
Improvement over Bonmin (%)  52.45  49.43  50.30  7.59  4.62  
Improvement over Antigone (%)  50  52.72  51.92  8.27  7.08  
Improvement over Baron (%)  50  51.61  51.28  6  4.79  
Solver  

Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  
Bonmin  0.005  0.26  0.01  0.14  0.019  0.10  0.028  0.10  0.0383  5.56 
NOMAD  0.045  12.42  0.025  11.14  0.033  9.30  0.029  46.41  0.038  5.45 
GA  0.005  0.69  0.025  26.11  0.019  16.85  0.034  40.56  0.044  10.34 
VSBB  0.005  46.1  0.019  45.34  0.019  8.1  0.028  56.3  0.0383  12.2 
SCIP  0.00005  4.96  0.000056  0.16  0.000054  0.18  0.028  0.07  0.038  0.05 
Baron  0.005  0.1  0.01  0.09  0.019  0.09  0.028  0.1  0.0383  0.1 
Antigone  0.005  0.11  0.01  0.09  0.019  0.08  0.028  0.21  0.038  1.51 
Solver  

Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  Obj.  Time (s)  
Bonmin  0.002  0.36  0.02  0.11  0.039  0.11  0.06  0.10  0.08  0.16 
NOMAD  0.003  112.5  0.023  97.68  0.04  59.86  0.06  52.8  0.10  2.28 
GA  0.004  1.01  0.02  24.94  0.04  13.02  0.12  27.7  0.14  35.33 
VSBB  0.02  400  0.02  400  0.039  400  0.071  400  0.078  400 
SCIP  0.002  7.05  0.02  1999.4  0.0004  2.00  0.009  0.43  0.04  0.16 
Baron  0.002  0.52  0.02  2.69  0.039  0.89  0.06  0.16  0.078  0.1 
Antigone  0.002  21.2  0.02  0.26  0.042  0.18  0.06  0.1  0.078  0.08 
Vi Conclusion
We have investigated energy efficiency tradeoffs among communication, computation and caching with QoI guarantee in distributed networks. We first formulated an optimization problem that characterizes these energy costs. This optimization problem belongs to the nonconvex class of MINLP, which is hard to solve in general. We then proposed a variant of the spatial branchandbound (VSBB) algorithm, which can solve the MINLP with optimality guarantee. Finally, we showed numerically that the newly proposed VSBB algorithm outperforms the existing MINLP solvers, Bonmin, NOMAD and GA. We also observed that C3 optimization framework, which to the best of our knowledge has not been investigated in the literature, leads to an energy saving of as much as when compared with either of the C2 optimizations which have been widely studied.
Going further, we aim to extend our results in two ways. The first is to refine and improve the symbolic reformulation to reduce the number of needed auxiliary variables in order to shorten the algorithm execution time. Second, since many networking problems involve the optimization of both continuous and discrete variables as in this work, we plan to apply and extend the newly proposed VSBB to solve those problems.
Acknowledgments
The material in this paper has been accepted for publication in part at IEEE Globecom, Abu Dhabi, United Arab Emirates, December 2018. This work was supported by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number W911NF1630001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Faheem Zafari also acknowledges the financial support by EPSRC Centre for Doctoral Training in High Performance Embedded and Distributed Systems (HiPEDS, Grant Reference EP/L016796/1), and Department of Electrical and Electronics Engineering, Imperial College London. The authors will also like to thank Dr. Ruth Misener and the Chemical Engineering Department at Imperial College London for providing us with the access to Baron and Antigone Solvers.
References
 [1] S. Nazemi, K. K. Leung, and A. Swami, “QoIaware Tradeoff Between Communication and Computation in Wireless Adhoc Networks,” in Proc. IEEE PIMRC, 2016.
 [2] R. Rajagopalan and P. K. Varshney, “Data Aggregation Techniques in Sensor Networks: A Survey,” IEEE Commun. Surveys Tuts., vol. 8, no. 4, pp. 48––63, 2006.
 [3] E. Fasolo, M. Rossi, J. Widmer, and M. Zorzi, “Innetwork Aggregation Techniques for Wireless Sensor Networks: a Survey,” IEEE Wireless Communications, vol. 14, no. 2, 2007.
 [4] K. C. Barr and K. Asanović, “Energyaware Lossless Data Compression,” ACM Transactions on Computer Systems, 2006.
 [5] N. Choi, K. Guan, D. C. Kilper, and G. Atkinson, “Innetwork Caching Effect on Optimal Energy Consumption in ContentCentric Networking,” in Proc. IEEE ICC, 2012.
 [6] S. A. Ehikioya, “A Characterization of Information Quality Using Fuzzy Logic,” in NAFIPS, 1999.
 [7] E. M. Smith and C. C. Pantelides, “Global Optimisation of General Process Models,” in Glo. Opt. Eng. Des. Springer, 1996, pp. 355–386.
 [8] P. Bonami et al., “An Algorithmic Framework for Convex Mixed Integer Nonlinear Programs,” Disc. Opt., vol. 5, no. 2, pp. 186–204, 2008.
 [9] S. Le Digabel, “Algorithm 909: NOMAD: Nonlinear Optimization with the MADS Algorithm,” ACM TOMS, vol. 37, no. 4, p. 44, 2011.
 [10] M. Tawarmalani and N. V. Sahinidis, “A polyhedral branchandcut approach to global optimization,” Mathematical Programming, vol. 103, no. 2, pp. 225–249, 2005.
 [11] T. Achterberg, “SCIP: Solving Constraint Integer Programs,” Mathematical Programming Computation, vol. 1, no. 1, pp. 1–41, 2009.
 [12] R. Misener and C. A. Floudas, “Antigone: algorithms for continuous/integer global optimization of nonlinear equations,” Journal of Global Optimization, vol. 59, no. 23, pp. 503–526, 2014.
 [13] O. Boykin, S. Ritchie, I. O’Connell, and J. Lin, “Summingbird: A Framework for Integrating Batch and Online Mapreduce Computations,” Proc. of VLDB, 2014.
 [14] J. Li, S. Shakkottai, J. C. S. Lui, and V. Subramanian, “Accurate Learning or Fast Mixing? Dynamic Adaptability of Caching Algorithms,” IEEE Journal on Selected Areas in Communications, 2018.
 [15] S. Ioannidis and E. Yeh, “Adaptive Caching Networks with Optimality Guarantees,” in Proc. of ACM SIGMETRICS, 2016.
 [16] J. Li, F. Zafari, D. Towsley, K. K. Leung, and A. Swami, “Joint Data Compression and Caching: Approaching Optimality with Guarantees,” in Proc. of ACM/SPEC ICPE, 2018.
 [17] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “EnergyEfficient Communication Protocol for Wireless Microsensor Networks,” in System sciences, 2000.
 [18] A. Manjeshwar and D. P. Agrawal, “TEEN: a Routing Protocol for Enhanced Efficiency in Wireless Sensor Networks,” in IPDPS, 2001.
 [19] M. Ye, C. Li, G. Chen, and J. Wu, “EECS: an Energy Efficient Clustering Scheme in Wireless Sensor Networks,” in Proc. of IEEE IPCCC, 2005.
 [20] S. Eswaran, J. Edwards, A. Misra, and T. F. L. Porta, “Adaptive InNetwork Processing for Bandwidth and Energy Constrained MissionOriented Multihop Wireless Networks,” IEEE Transactions on Mobile Computing, vol. 11, no. 9, pp. 1484–1498, Sept 2012.
 [21] M. Chiang et al., “Geometric programming for communication systems,” Foundations and Trends® in Communications and Information Theory, vol. 2, no. 1–2, pp. 1–154, 2005.
 [22] A. S. Jain and S. Meeran, “Deterministic JobShop Scheduling: Past, Present and Future,” European journal of operational research, vol. 113, no. 2, 1999.
 [23] E. M. Smith and C. C. Pantelides, “A Symbolic Reformulation/Spatial BranchandBound Algorithm for the Global Optimisation of Nonconvex MINLPs,” Comp. & Chem. Eng., vol. 23, no. 4, pp. 457–478, 1999.
 [24] E. M. Smith, “On the Optimal Design of Continuous Processes,” Ph.D. dissertation, Imperial College London (University of London), 1996.
 [25] L. Liberti, “Reformulation and Convex Relaxation Techniques for Global Optimization,” 4OR: A Quarterly Journal of Operations Research, vol. 2, no. 3, pp. 255–258, 2004.
 [26] L. Liberti, “Reformulation and Convex Relaxation Techniques for Global Optimization,” Ph.D. dissertation, Imperial College London, 2004.
 [27] C. A. Floudas, Deterministic Global Optimization: Theory, Methods and Applications. Springer Science & Business Media, 2013, vol. 37.
 [28] G. P. McCormick, “Computability of Global Solutions to Factorable Nonconvex Programs: Part I—Convex Underestimating Problems,” Mathematical Programming, vol. 10, no. 1, pp. 147–175, 1976.

[29]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A Fast and Elitist
Multiobjective Genetic Algorithm: NSGAII,”
IEEE transactions on evolutionary computation
, vol. 6, no. 2, pp. 182–197, 2002.  [30] OPTI Toolbox, “A Free Matlab Toolbox for Optimization,” https://www.inverseproblem.co.nz/OPTI/index.php/Main/HomePage, [Online; accessed 28Jun2017].
 [31] A. Wächter and L. T. Biegler, “On the Implementation of an Interiorpoint Filter Linesearch Algorithm for Largescale Nonlinear Programming,” Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006.
 [32] W. Ye, J. Heidemann, and D. Estrin, “An EnergyEfficient MAC Protocol for Wireless Sensor Networks,” in Proc. of IEEE INFOCOM, 2002.
 [33] A. Fiat and P. Sanders, “Algorithmsesa 2009,” Lecture Notes in Computer Science, vol. 5757, 2009.
 [34] P. Bonami and J. Lee, “BONMIN Users’ Manual,” https://projects.coinor.org/Bonmin/browser/stable/1.5/Bonmin/doc/BONMIN_UsersManual.pdf?format=raw, 2011.
Appendix A
s.t.  