Counterexamples on the monotonicity of delay optimal strategies for energy harvesting transmitters

10/08/2019
by   Borna Sayedana, et al.
0

We consider cross-layer design of delay optimal transmission strategies for energy harvesting transmitters where the data and energy arrival processes are stochastic. Using Markov decision theory, we show that the value function is weakly increasing in the queue state and weakly decreasing in the battery state. It is natural to expect that the delay optimal policy should be weakly increasing in the queue and battery states. We show via counterexamples that this is not the case. In fact, the delay optimal policy may perform 8-17 better than the best monotone policy.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

03/26/2018

Structural Properties of Optimal Transmission Policies for Delay-Sensitive Energy Harvesting Wireless Sensors

We consider an energy harvesting sensor transmitting latency-sensitive d...
08/21/2015

On Monotonicity of the Optimal Transmission Policy in Cross-layer Adaptive m-QAM Modulation

This paper considers a cross-layer adaptive modulation system that is mo...
07/22/2018

Accelerated Structure-Aware Reinforcement Learning for Delay-Sensitive Energy Harvesting Wireless Sensors

We investigate an energy-harvesting wireless sensor transmitting latency...
01/24/2022

Structural Properties of Optimal Fidelity Selection Policies for Human-in-the-loop Queues

We study optimal fidelity selection for a human operator servicing a que...
08/10/2019

Understanding Relative Network Delay inMicro-Energy Harvesting Wireless Networks

Micro-energy harvesting wireless network (MEHWN) enables a perpetual net...
11/13/2018

Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy

The energy output of photovoltaic (PV) power plants depends on the envir...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Latency is an important consideration in many Internet of Things (IoT) applications which provide real-time and/or critical services. Often IoT devices are battery powered and harvest energy from the environment. In such situations, intelligent transmission strategies are needed to mitigate the unreliability of available energy and provide low-latency services.

In this paper, we investigate the cross-layer design of delay optimal transmission strategies for energy harvesting transmitters when both the data arrival and the energy arrival processes are stochastic. Our motivation is to characterize qualitative properties of optimal transmission policies for such model. For example, in queuing theory, it is often possible to establish that the optimal policy is monotone increasing in the queue length [13, 4]. Such a property, in addition to being intuitively satisfying, simplifies the search and implementation of the optimal strategies. Such monotonicity properties are also known to hold for cross-layer design of communication systems when a constant source of energy is available at the transmitter [2]. So it is natural to ask if such qualitative properties hold for energy harvesting transmitters.

Partial answers to this question for throughput optimal policies for energy harvesting transmitters are provided in [16, 1, 12, 6, 5, 11]. Under the assumption of backlogged traffic or the assumption of a deterministic data arrival process or the assumption of deterministic energy arrival process, these papers show that the optimal policy is weakly increasing in the queue state and/or weakly increasing in the battery state. There are other papers that investigate the structure of delay or throughput optimal policies under the assumption of a deterministic energy arrival process [15, 14, 7].

There are some papers which investigate the problem of delay optimization for energy harvesting transmitters [7, 3, 10]

, but they don’t characterize the structure of delay-optimal policies rather provide numerical solutions or propose low-complexity heuristic policies or only establish structural properties of value functions.

Our main result is to show that in contrast to the structure of the delay optimal policy when constant source of energy is available [2] and the structure of throughput optimal policies for energy harvesting communication systems [16, 1, 12, 6, 5, 11], the delay optimal policy for energy harvesting communication systems is not necessarily monotone in battery or queue state. We present counterexamples to show that the delay optimal policy need not be weakly increasing in queue state nor in the battery state. Furthermore, the performance of the optimal policy is about 8–17% better than the performance of the best monotone policy. These counterexamples continue to hold for i.i.d. fading channels as well.

Notation

Uppercase letters (e.g., ,

, etc.) represent random variables; the corresponding lowercase letters (e.g.,

, , etc.) represent their realizations. Cursive letters (e.g., , , etc.) represent sets. The sets of real, positive integers, and non-negative integers are denoted by , , and respectively. The notation is a short hand for .

Ii Model And Problem Formulation

Fig. 1: Model of a transmitter with energy-harvester

Consider a communication system shown in Fig 1. A source generates bursty data packets that have to be transmitted to a receiver by an energy-harvesting transmitter. The transmitter has finite buffer where the data packets are queued and a finite capacity battery where the harvested energy is stored. The system operates in discrete time slots. The data packets and the energy that arrive during a slot are available only at the beginning of the next slot.

At the beginning of a slot, the transmitter picks some data packets from the queue, encodes them, and transmits the encoded symbol. Transmitting a symbol requires energy that depends on the number of encoded packets in the symbol. At the end of the slot, the system incurs a delay penalty that depends on the number of packets remaining in the queue.

Time slots are indexed by . The length of the buffer is denoted by and the size of the battery by ; and denote the sets and , respectively. Other variables are as follows:

  • : the number of packets in the queue at the beginning of slot .

  • : the number of packets that arrive during slot .

  • : the energy stored in the battery at the beginning of slot .

  • : the energy that is harvested during slot .

  • : the number of packets transmitted during slot . The feasible choices of are denoted by where

    where denotes the amount of power needed to transmit packets. In our examples, we model the channel as a band-limited AWGN channel with bandwidth and noise level . The capacity of such a channel when transmitting at power level is . Therefore, for such channels we assume . In general, we assume that is a strictly convex and increasing function with .

The dynamics of the data queue and the battery are

Packets that are not transmitted during slot incur a delay penalty , where is a convex and increasing function with .

The data arrival process

is assumed to be an independent and identically distributed process with pmf (probability mass function)

. The energy arrival process is an independent process that is also independent of with pmf .

The number of packets to transmit are chosen according to a scheduling policy , where

The performance of a scheduling policy is given by

(1)

where denotes the discount factor and the expectation is taken with respect to the joint measure on the system variables induced by the choice of .

We are interested in the following optimization problem.

Problem

Given the buffer length , battery size , power cost , delay cost , pmf of the arrival process, pmf of the energy arrival process, and the discount factor , choose a feasible scheduling policy to minimize the performance given by (1).

Iii Dynamic Programming Decomposition

The system described above can be modeled as an infinite horizon time homogeneous Markov decision process (MDP) [8]. Since the state and action spaces are finite, standard results from Markov decision theory imply that there exists an optimal policy which is time homogeneous and is given by the solution of a dynamic program. To succinctly write the dynamic program, we define the following Bellman operator: Define the operator that maps any to

(2)

where and are independent random variables with pmfs and . Then, an optimal policy for the infinite horizon MDP is given as follows [8].

Theorem 1

Let denote the unique fixed point of the following equation:

(3)

Furthermore, let be such that attains the minimum in the right hand side of (3). Then, the time homogeneous policy is optimal for Problem II.

The dynamic program described in (3

) can be solved using standard algoirthms such as value iteration, policy iteration, or linear programming algorithms 

[8].

Iii-a Properties of the value function

Let denote the family of the functions such that for any , is weakly increasing in and for any , is weakly decreasing in . Furthermore, let denote the family of functions such that for any , is weakly increasing in . Similarly, let be family of functions , such that for any , is weakly increasing in .

Proposition 1

The optimal value function .

The proof is presented in the Appendix.

Proposition 1 says that if we follow an optimal policy, the optimal cost when starting from a smaller queue state is lower than that starting from a larger queue state. Similarly, the optimal cost when starting from a larger battery state is lower than that starting from a smaller battery state. Such a result appears to be intuitively obvious.

One might argue that it should be the case that the optimal policy should be weakly increasing in state of the queue, and weakly increasing in the available energy in the battery. In particular, if it is optimal to transmit packets when the queue state is , then (for the same battery state) the optimal number of packets to transmit at any queue state larger than should be at least . Similarly, if it is optimal to transmit packets when the battery state is , then (for the same queue state) the optimal number of packets to transmit at any battery state larger than  should be at least . In the next section, we present counterexamples that show both of these properties do not hold. The code for all the results is available at [9].

Iv Counterexamples on the monotonicity of optimal policies

Iv-a On the monotonicity in queue state

(a) The optimal policy
(b) The best monotone policy
Fig. 2: The optimal and the best monotone policies for the example of Sec. IV-A.

Consider the communication system with a band-limited AWGN channel where , , , , (thus, ), , data arrival distribution , and energy arrival distribution .

The optimal policy for this system (obtained by policy iteration [8]) is shown in Fig. 1(a), where the rows correspond to the current queue length and the columns correspond to the current energy level. Note that the policy is not weakly increasing in queue state (i.e, ). In particular, .

Given that the optimal policy is not monotone, one might wonder how much do we lose if we follow a monotone policy instead of the optimal policy. To characterize this, we define the best queue-monotone policy as:

and let denote the corresponding value function.

The best monotone policy cannot be obtained using dynamic programming and one has to resort to a brute force search over all monotone policies. For the model described above, there are monotone policies.111Due to the power constraint , it is not possible to count the number of monotone functions using combinatorics. The number above is obtained by explicit enumeration. The best monotone policy obtained by searching over these is shown is Fig. 1(b). The worst case difference between the two value functions is given by

Thus, for this counterexample, the best queue-monotone policy performs worse than the optimal policy.

Iv-B On the monotonicity in battery state

Consider the communication system described in Sec. IV-A but with the data arrival distribution and the energy arrival distribution .

The optimal policy (obtained using policy iteration [8]) is shown in Fig. 2(a). Note that the policy is not weakly increasing in the battery state (i.e ). In particular, we have that .

(a) The optimal policy
(b) The best monotone policy
Fig. 3: The optimal and the best monotone policies for the example of Sec. IV-A.

Given that optimal policy is not monotone, the previous question arises again that how much do we lose if we follow a monotone policy instead of the optimal policy. To characterize this, we define the best battery-monotone policy as:

and let denote the corresponding value function.

As before, we find the best monotone policy by a a brute force search over all monotone battery-policies.fnt The resultant policy is shown in Fig. 2(b).

The worst case difference between the two value functions is given by

Thus, for this counterexample, the best battery-monotone policy performs worse than the optimal policy.

V Counterexamples for fading channels

(a)
(b)
(c)
(d)
(e)
Fig. 4: The optimal policy for the examples of Sec. V-B shown in subfigures (a)–(c) and Sec. V-C shown in subfigures (d)–(e).

V-a Channel model with i.i.d. fading

Consider the model in Sec. II where the channel has i.i.d. fading. In particular, let denote the channel state at time  and , where , denote the attenuation at state . Thus, the power needed to transmit packets when the channel is in state is given by . We assume that is an i.i.d. process with pmf that is independent of the data and energy arrival processes and .

V-B On the monotonicity in queue state

Consider the model in Sec. IV-A with , , and an i.i.d. fading channel where , and . The optimal policy for this model (obtained using policy iteration) is shown in Fig. 3(a)3(c). Note that for all , the optimal policy in not monotone in the queue length.

In this case, there are monotone policies. Therefore, a brute force search to find the best monotone policy is not possible. We choose a heuristic monotone policy which differs from only at the following points: , , , for , , , , , , and . The policy may be thought of as the queue-monotone policy that is closest to . Let denote the corresponding value function. The worst case difference between the two value functions is given by

Thus, the heuristically chosen queue-monotone policy performs worse than the optimal policy.

V-C On the monotonicity in the battery state

Consider the model in Sec. IV-B with , , and an i.i.d. fading channel where , , and . The optimal policy for this model (obtained using policy iteration) is shown in Fig. 3(d)3(e). Note that for , the optimal policy is not monotone in the battery state.

In this case, there are monotone policies. Therefore, a brute force search is not possible. As before, we choose a heuristic policy which is the battery-monotone policy that is closest to . In particular, differs from only at two points: and . Let denote the corresponding value function. The worst case difference between the two value functions is given by

Thus, the heuristically chosen battery-monotone policy performs worse than the optimal policy.

Vi Conclusion

In this paper, we consider delay optimal strategies in cross layer design with energy harvesting transmitter. We show that the value function is weakly increasing in the queue state and weakly decreasing in the battery state. We show via counterexamples that the optimal policy is not monotone in queue length nor in the available energy in the battery.

Vi-a Discussion about the counterexamples

One might ask why the optimal policy is not monotone in the above model. The standard argument in MDPs to establish monotonicity of the optimal policies is to show that the value-action function is submodular in the state and action. The value-action function is given by

A sufficient condition for the optimal policy to be weakly increasing in the queue length is:

  1. for every , is submodular in .

Note that since is convex, is submodular in . Thus, a sufficient condition for (S1) to hold is:

  1. for all , is submodular in .

Since submodularity is preserved under addition, a sufficient condition for (S2) to hold is:

  1. for all , is submodular in .

By a similar argument, it can be shown that a sufficient condition for the optimal policy to be weakly increasing in battery state is:

  1. for all , is submodular in

We have not been able to identify sufficient conditions under which (S3) or (S4) hold. Note that if the data were backlogged, then we do not need to keep track of the queue state; thus, the value function is just a function of the battery state. In such a scenario, (S4) simplifies to is submodular in . Since is convex, it can be shown that convexity of is sufficient to establish submodularity of . This is the essence of the argument given in [6, 12].

Similarly, if the transmitter had a steady supply of energy, then we do not need to keep track of the battery state; thus, the value function is just a function of the queue state. In such a scenario, (S3) simplifies to is submodular in . It can be shown that convexity of the is sufficient to establish submodularity of . This is the essence of the argument given in [2].

In our model, data is not backlogged and energy is intermittent. As a result, we have two queues—the data queue and the energy queue—which have coupled dynamics. This coupling makes it difficult to identify conditions under which will be submodular in or .

Vi-B Implication of the results

In general, there are two benefits if one can establish that the optimal policy is monotone. The first advantage is that monotone policies are easier to implement. In particular, one needs a -dimensional look-up table to implement a general transmission policy (similar to the matrices shown in Figs. 2 and 3). In contrast, one only needs to store the thresholds boundaries of the decision regions (which can be stored in a sparse matrix) to implement a queue- or battery-monotone policy. Our counterexamples show that such a simpler implementation will result in a loss of optimality in energy-harvesting systems.

The second advantage is that if we know that the optimal policy is monotone, we can search for them efficiently using monotone value iteration and monotone policy iteration [8]. Our counterexamples show that these more efficient algorithms cannot be used in energy-harvesting systems.

One might want to restrict to monotone policies for the sake of implementation simplicity. However, if the system does not satisfy properties (S3) and (S4) mentioned in the previous section, then dynamic programming cannot be used to find the best monotone policy. Thus, one has to resort to a brute force search, which suffers from the curse of dimensionality.

-C Monotonicity of Bellman operator

Lemma 1

Given and any , , and , let

Define , i.e.,

(4)

Then, for all , , and , we have:

  1. .

  2. .

  3. .

As a consequence of the above, .

Proof

The properties of follow from the monotonicity of and the fact that monotonicity is preserved under expectations. The details are omitted due to lack of space.

To prove that , we consider any and and let denote an arg min of the right hand side of (4). Now there are two cases: and .

  1. Suppose . Then, it must be the case that . Thus,

    where follows from Property 1.

  2. Suppose . Then, it must be the case that and, therefore, . Hence . Thus,

    where follows from Property 2.

As a result of both of these cases, we get that

(5)

Now let , recall that then thus

(6)

Where follows from Property 3 and follows from the fact that .

From (5) and (-C) we infer .

-D Proof of Proposition 1

Arbitrarily initialize and for , recursively define Since , Lemma 1 implies that , for all . Since monotonicity is preserved under the limit, we have that . By [8], Hence, .

References

  • [1] I. Ahmed, K. T. Phan, and T. Le-Ngoc (2016) Optimal stochastic power control for energy harvesting systems with delay constraints. IEEE J. Sel. Areas Commun. 34 (12), pp. 3512–3527. Cited by: §I, §I.
  • [2] R. A. Berry (2000) Power and delay trade-offs in fading channels. Ph.D. Thesis, Massachusetts Institute of Technology. Cited by: §I, §I, §VI-A.
  • [3] I. Fawaz, M. Sarkiss, and P. Ciblat (2018-05) Optimal resource scheduling for energy harvesting communications under strict delay constraint. In IEEE Int. Conf. on Comm., Vol. , pp. 1–6. External Links: Document, ISSN Cited by: §I.
  • [4] E. Gallisch (1979) On monotone optimal policies in a queueing model of M/G/1 type with controllable service time distribution. Advances in Applied Probability 11 (4), pp. 870–887. Cited by: §I.
  • [5] M. Kashef and A. Ephremides (2012) Optimal packet scheduling for energy harvesting sources on time varying wireless channels. Journal of Communications and Networks 14 (2), pp. 121–129. Cited by: §I, §I.
  • [6] S. Mao, M. H. Cheung, and V. W. Wong (2014) Joint energy allocation for sensing and transmission in rechargeable wireless sensor networks. IEEE Trans. Veh. Technol. 63 (6), pp. 2862–2875. Cited by: §I, §I, §VI-A.
  • [7] O. Ozel, K. Tutuncuoglu, et al. (2011) Transmission with energy harvesting nodes in fading wireless channels: optimal policies. IEEE J. Sel. Areas Commun. 29 (8), pp. 1732–1743. Cited by: §I, §I.
  • [8] M. L. Puterman (2014) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons. Cited by: §-D, §III, §III, §IV-A, §IV-B, §VI-B.
  • [9] B. Sayedana and A. Mahajan (2019-05)(Website) External Links: Link Cited by: §III-A.
  • [10] N. Sharma, N. Mastronarde, and J. Chakareski (2018-05) Structural properties of optimal transmission policies for delay-sensitive energy harvesting wireless sensors. In IEEE Int. Conf. on Comm., Vol. , pp. 1–7. External Links: Document, ISSN Cited by: §I.
  • [11] D. Shaviv and A. Özgür (2018) Online power control for block iid energy harvesting channels. IEEE Transactions on Information Theory 64 (8), pp. 5920–5937. Cited by: §I, §I.
  • [12] Sinha,Abhinav and Chaporkar,Prasanna (2012) Optimal power allocation for a renewable energy source. In IEEE National Conf. on Commun., pp. 1–5. Cited by: §I, §I, §VI-A.
  • [13] S. Stidham Jr and R. R. Weber (1989) Monotonic and insensitive optimal policies for control of queues with undiscounted costs. Operations research 37 (4), pp. 611–625. Cited by: §I.
  • [14] K. Tutuncuoglu and A. Yener (2012) Optimum transmission policies for battery limited energy harvesting nodes. IEEE Trans. Wireless Commun. 11 (3), pp. 1180–1189. Cited by: §I.
  • [15] J. Yang and S. Ulukus (2012) Optimal packet scheduling in an energy harvesting communication system. IEEE Trans. Commun. 60 (1), pp. 220–230. Cited by: §I.
  • [16] M. Zafer, E. Modiano, et al. (2008) Optimal rate control for delay-constrained data transmission over a wireless channel. IEEE Transactions on Information Theory 54 (9), pp. 4020. Cited by: §I, §I.