Latency is an important consideration in many Internet of Things (IoT) applications which provide real-time and/or critical services. Often IoT devices are battery powered and harvest energy from the environment. In such situations, intelligent transmission strategies are needed to mitigate the unreliability of available energy and provide low-latency services.
In this paper, we investigate the cross-layer design of delay optimal transmission strategies for energy harvesting transmitters when both the data arrival and the energy arrival processes are stochastic. Our motivation is to characterize qualitative properties of optimal transmission policies for such model. For example, in queuing theory, it is often possible to establish that the optimal policy is monotone increasing in the queue length [13, 4]. Such a property, in addition to being intuitively satisfying, simplifies the search and implementation of the optimal strategies. Such monotonicity properties are also known to hold for cross-layer design of communication systems when a constant source of energy is available at the transmitter . So it is natural to ask if such qualitative properties hold for energy harvesting transmitters.
Partial answers to this question for throughput optimal policies for energy harvesting transmitters are provided in [16, 1, 12, 6, 5, 11]. Under the assumption of backlogged traffic or the assumption of a deterministic data arrival process or the assumption of deterministic energy arrival process, these papers show that the optimal policy is weakly increasing in the queue state and/or weakly increasing in the battery state. There are other papers that investigate the structure of delay or throughput optimal policies under the assumption of a deterministic energy arrival process [15, 14, 7].
, but they don’t characterize the structure of delay-optimal policies rather provide numerical solutions or propose low-complexity heuristic policies or only establish structural properties of value functions.
Our main result is to show that in contrast to the structure of the delay optimal policy when constant source of energy is available  and the structure of throughput optimal policies for energy harvesting communication systems [16, 1, 12, 6, 5, 11], the delay optimal policy for energy harvesting communication systems is not necessarily monotone in battery or queue state. We present counterexamples to show that the delay optimal policy need not be weakly increasing in queue state nor in the battery state. Furthermore, the performance of the optimal policy is about 8–17% better than the performance of the best monotone policy. These counterexamples continue to hold for i.i.d. fading channels as well.
Uppercase letters (e.g., ,
, etc.) represent random variables; the corresponding lowercase letters (e.g.,, , etc.) represent their realizations. Cursive letters (e.g., , , etc.) represent sets. The sets of real, positive integers, and non-negative integers are denoted by , , and respectively. The notation is a short hand for .
Ii Model And Problem Formulation
Consider a communication system shown in Fig 1. A source generates bursty data packets that have to be transmitted to a receiver by an energy-harvesting transmitter. The transmitter has finite buffer where the data packets are queued and a finite capacity battery where the harvested energy is stored. The system operates in discrete time slots. The data packets and the energy that arrive during a slot are available only at the beginning of the next slot.
At the beginning of a slot, the transmitter picks some data packets from the queue, encodes them, and transmits the encoded symbol. Transmitting a symbol requires energy that depends on the number of encoded packets in the symbol. At the end of the slot, the system incurs a delay penalty that depends on the number of packets remaining in the queue.
Time slots are indexed by . The length of the buffer is denoted by and the size of the battery by ; and denote the sets and , respectively. Other variables are as follows:
: the number of packets in the queue at the beginning of slot .
: the number of packets that arrive during slot .
: the energy stored in the battery at the beginning of slot .
: the energy that is harvested during slot .
: the number of packets transmitted during slot . The feasible choices of are denoted by where
where denotes the amount of power needed to transmit packets. In our examples, we model the channel as a band-limited AWGN channel with bandwidth and noise level . The capacity of such a channel when transmitting at power level is . Therefore, for such channels we assume . In general, we assume that is a strictly convex and increasing function with .
The dynamics of the data queue and the battery are
Packets that are not transmitted during slot incur a delay penalty , where is a convex and increasing function with .
The data arrival process
is assumed to be an independent and identically distributed process with pmf (probability mass function). The energy arrival process is an independent process that is also independent of with pmf .
The number of packets to transmit are chosen according to a scheduling policy , where
The performance of a scheduling policy is given by
where denotes the discount factor and the expectation is taken with respect to the joint measure on the system variables induced by the choice of .
We are interested in the following optimization problem.
Given the buffer length , battery size , power cost , delay cost , pmf of the arrival process, pmf of the energy arrival process, and the discount factor , choose a feasible scheduling policy to minimize the performance given by (1).
Iii Dynamic Programming Decomposition
The system described above can be modeled as an infinite horizon time homogeneous Markov decision process (MDP) . Since the state and action spaces are finite, standard results from Markov decision theory imply that there exists an optimal policy which is time homogeneous and is given by the solution of a dynamic program. To succinctly write the dynamic program, we define the following Bellman operator: Define the operator that maps any to
where and are independent random variables with pmfs and . Then, an optimal policy for the infinite horizon MDP is given as follows .
The dynamic program described in (3
) can be solved using standard algoirthms such as value iteration, policy iteration, or linear programming algorithms.
Iii-a Properties of the value function
Let denote the family of the functions such that for any , is weakly increasing in and for any , is weakly decreasing in . Furthermore, let denote the family of functions such that for any , is weakly increasing in . Similarly, let be family of functions , such that for any , is weakly increasing in .
The optimal value function .
The proof is presented in the Appendix.
Proposition 1 says that if we follow an optimal policy, the optimal cost when starting from a smaller queue state is lower than that starting from a larger queue state. Similarly, the optimal cost when starting from a larger battery state is lower than that starting from a smaller battery state. Such a result appears to be intuitively obvious.
One might argue that it should be the case that the optimal policy should be weakly increasing in state of the queue, and weakly increasing in the available energy in the battery. In particular, if it is optimal to transmit packets when the queue state is , then (for the same battery state) the optimal number of packets to transmit at any queue state larger than should be at least . Similarly, if it is optimal to transmit packets when the battery state is , then (for the same queue state) the optimal number of packets to transmit at any battery state larger than should be at least . In the next section, we present counterexamples that show both of these properties do not hold. The code for all the results is available at .
Iv Counterexamples on the monotonicity of optimal policies
Iv-a On the monotonicity in queue state
Consider the communication system with a band-limited AWGN channel where , , , , (thus, ), , data arrival distribution , and energy arrival distribution .
The optimal policy for this system (obtained by policy iteration ) is shown in Fig. 1(a), where the rows correspond to the current queue length and the columns correspond to the current energy level. Note that the policy is not weakly increasing in queue state (i.e, ). In particular, .
Given that the optimal policy is not monotone, one might wonder how much do we lose if we follow a monotone policy instead of the optimal policy. To characterize this, we define the best queue-monotone policy as:
and let denote the corresponding value function.
The best monotone policy cannot be obtained using dynamic programming and one has to resort to a brute force search over all monotone policies. For the model described above, there are monotone policies.111Due to the power constraint , it is not possible to count the number of monotone functions using combinatorics. The number above is obtained by explicit enumeration. The best monotone policy obtained by searching over these is shown is Fig. 1(b). The worst case difference between the two value functions is given by
Thus, for this counterexample, the best queue-monotone policy performs worse than the optimal policy.
Iv-B On the monotonicity in battery state
Consider the communication system described in Sec. IV-A but with the data arrival distribution and the energy arrival distribution .
Given that optimal policy is not monotone, the previous question arises again that how much do we lose if we follow a monotone policy instead of the optimal policy. To characterize this, we define the best battery-monotone policy as:
and let denote the corresponding value function.
As before, we find the best monotone policy by a a brute force search over all monotone battery-policies.fnt The resultant policy is shown in Fig. 2(b).
The worst case difference between the two value functions is given by
Thus, for this counterexample, the best battery-monotone policy performs worse than the optimal policy.
V Counterexamples for fading channels
V-a Channel model with i.i.d. fading
Consider the model in Sec. II where the channel has i.i.d. fading. In particular, let denote the channel state at time and , where , denote the attenuation at state . Thus, the power needed to transmit packets when the channel is in state is given by . We assume that is an i.i.d. process with pmf that is independent of the data and energy arrival processes and .
V-B On the monotonicity in queue state
Consider the model in Sec. IV-A with , , and an i.i.d. fading channel where , and . The optimal policy for this model (obtained using policy iteration) is shown in Fig. 3(a)–3(c). Note that for all , the optimal policy in not monotone in the queue length.
In this case, there are monotone policies. Therefore, a brute force search to find the best monotone policy is not possible. We choose a heuristic monotone policy which differs from only at the following points: , , , for , , , , , , and . The policy may be thought of as the queue-monotone policy that is closest to . Let denote the corresponding value function. The worst case difference between the two value functions is given by
Thus, the heuristically chosen queue-monotone policy performs worse than the optimal policy.
V-C On the monotonicity in the battery state
Consider the model in Sec. IV-B with , , and an i.i.d. fading channel where , , and . The optimal policy for this model (obtained using policy iteration) is shown in Fig. 3(d)–3(e). Note that for , the optimal policy is not monotone in the battery state.
In this case, there are monotone policies. Therefore, a brute force search is not possible. As before, we choose a heuristic policy which is the battery-monotone policy that is closest to . In particular, differs from only at two points: and . Let denote the corresponding value function. The worst case difference between the two value functions is given by
Thus, the heuristically chosen battery-monotone policy performs worse than the optimal policy.
In this paper, we consider delay optimal strategies in cross layer design with energy harvesting transmitter. We show that the value function is weakly increasing in the queue state and weakly decreasing in the battery state. We show via counterexamples that the optimal policy is not monotone in queue length nor in the available energy in the battery.
Vi-a Discussion about the counterexamples
One might ask why the optimal policy is not monotone in the above model. The standard argument in MDPs to establish monotonicity of the optimal policies is to show that the value-action function is submodular in the state and action. The value-action function is given by
A sufficient condition for the optimal policy to be weakly increasing in the queue length is:
for every , is submodular in .
Note that since is convex, is submodular in . Thus, a sufficient condition for (S1) to hold is:
for all , is submodular in .
Since submodularity is preserved under addition, a sufficient condition for (S2) to hold is:
for all , is submodular in .
By a similar argument, it can be shown that a sufficient condition for the optimal policy to be weakly increasing in battery state is:
for all , is submodular in
We have not been able to identify sufficient conditions under which (S3) or (S4) hold. Note that if the data were backlogged, then we do not need to keep track of the queue state; thus, the value function is just a function of the battery state. In such a scenario, (S4) simplifies to is submodular in . Since is convex, it can be shown that convexity of is sufficient to establish submodularity of . This is the essence of the argument given in [6, 12].
Similarly, if the transmitter had a steady supply of energy, then we do not need to keep track of the battery state; thus, the value function is just a function of the queue state. In such a scenario, (S3) simplifies to is submodular in . It can be shown that convexity of the is sufficient to establish submodularity of . This is the essence of the argument given in .
In our model, data is not backlogged and energy is intermittent. As a result, we have two queues—the data queue and the energy queue—which have coupled dynamics. This coupling makes it difficult to identify conditions under which will be submodular in or .
Vi-B Implication of the results
In general, there are two benefits if one can establish that the optimal policy is monotone. The first advantage is that monotone policies are easier to implement. In particular, one needs a -dimensional look-up table to implement a general transmission policy (similar to the matrices shown in Figs. 2 and 3). In contrast, one only needs to store the thresholds boundaries of the decision regions (which can be stored in a sparse matrix) to implement a queue- or battery-monotone policy. Our counterexamples show that such a simpler implementation will result in a loss of optimality in energy-harvesting systems.
The second advantage is that if we know that the optimal policy is monotone, we can search for them efficiently using monotone value iteration and monotone policy iteration . Our counterexamples show that these more efficient algorithms cannot be used in energy-harvesting systems.
One might want to restrict to monotone policies for the sake of implementation simplicity. However, if the system does not satisfy properties (S3) and (S4) mentioned in the previous section, then dynamic programming cannot be used to find the best monotone policy. Thus, one has to resort to a brute force search, which suffers from the curse of dimensionality.
-C Monotonicity of Bellman operator
Given and any , , and , let
Define , i.e.,
Then, for all , , and , we have:
As a consequence of the above, .
The properties of follow from the monotonicity of and the fact that monotonicity is preserved under expectations. The details are omitted due to lack of space.
To prove that , we consider any and and let denote an arg min of the right hand side of (4). Now there are two cases: and .
Suppose . Then, it must be the case that . Thus,
where follows from Property 1.
Suppose . Then, it must be the case that and, therefore, . Hence . Thus,
where follows from Property 2.
As a result of both of these cases, we get that
Now let , recall that then thus
Where follows from Property 3 and follows from the fact that .
-D Proof of Proposition 1
-  (2016) Optimal stochastic power control for energy harvesting systems with delay constraints. IEEE J. Sel. Areas Commun. 34 (12), pp. 3512–3527. Cited by: §I, §I.
-  (2000) Power and delay trade-offs in fading channels. Ph.D. Thesis, Massachusetts Institute of Technology. Cited by: §I, §I, §VI-A.
-  (2018-05) Optimal resource scheduling for energy harvesting communications under strict delay constraint. In IEEE Int. Conf. on Comm., Vol. , pp. 1–6. External Links: Cited by: §I.
-  (1979) On monotone optimal policies in a queueing model of M/G/1 type with controllable service time distribution. Advances in Applied Probability 11 (4), pp. 870–887. Cited by: §I.
-  (2012) Optimal packet scheduling for energy harvesting sources on time varying wireless channels. Journal of Communications and Networks 14 (2), pp. 121–129. Cited by: §I, §I.
-  (2014) Joint energy allocation for sensing and transmission in rechargeable wireless sensor networks. IEEE Trans. Veh. Technol. 63 (6), pp. 2862–2875. Cited by: §I, §I, §VI-A.
-  (2011) Transmission with energy harvesting nodes in fading wireless channels: optimal policies. IEEE J. Sel. Areas Commun. 29 (8), pp. 1732–1743. Cited by: §I, §I.
-  (2014) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons. Cited by: §-D, §III, §III, §IV-A, §IV-B, §VI-B.
-  (2019-05)(Website) External Links: Cited by: §III-A.
-  (2018-05) Structural properties of optimal transmission policies for delay-sensitive energy harvesting wireless sensors. In IEEE Int. Conf. on Comm., Vol. , pp. 1–7. External Links: Cited by: §I.
-  (2018) Online power control for block iid energy harvesting channels. IEEE Transactions on Information Theory 64 (8), pp. 5920–5937. Cited by: §I, §I.
-  (2012) Optimal power allocation for a renewable energy source. In IEEE National Conf. on Commun., pp. 1–5. Cited by: §I, §I, §VI-A.
-  (1989) Monotonic and insensitive optimal policies for control of queues with undiscounted costs. Operations research 37 (4), pp. 611–625. Cited by: §I.
-  (2012) Optimum transmission policies for battery limited energy harvesting nodes. IEEE Trans. Wireless Commun. 11 (3), pp. 1180–1189. Cited by: §I.
-  (2012) Optimal packet scheduling in an energy harvesting communication system. IEEE Trans. Commun. 60 (1), pp. 220–230. Cited by: §I.
-  (2008) Optimal rate control for delay-constrained data transmission over a wireless channel. IEEE Transactions on Information Theory 54 (9), pp. 4020. Cited by: §I, §I.