A Maximin Optimal Online Power Control Policy for Energy Harvesting Communications

12/04/2019
by   Shengtian Yang, et al.
0

A general theory of online power control for discrete-time battery limited energy harvesting communications is developed, which leads to, among other things, an explicit characterization of a maximin optimal policy. This policy only requires the knowledge of the (effective) mean of the energy arrival process and maximizes the minimum asymptotic expected average reward (with the minimization taken over all energy arrival distributions of a given (effective) mean). Moreover, it is universally near optimal and has strictly better worst-case performance as compared to the fixed fraction policy proposed by Shaviv and Özgür when the objective is to maximize the throughput over an additive white Gaussian noise channel. The competitiveness of this maximin optimal policy is also demonstrated via numerical examples.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/25/2021

On Optimal Power Control for Energy Harvesting Communications with Lookahead

Consider the problem of power control for an energy harvesting communica...
09/17/2019

On the Optimality of the Greedy Policy for Battery Limited Energy Harvesting Communications

Consider a battery limited energy harvesting communication system with o...
11/13/2018

Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy

The energy output of photovoltaic (PV) power plants depends on the envir...
11/15/2019

Flexible Functional Split and Power Control for Energy Harvesting Cloud Radio Access Networks

Functional split is a promising technique to flexibly balance the proces...
06/30/2020

Delayed Q-update: A novel credit assignment technique for deriving an optimal operation policy for the Grid-Connected Microgrid

A microgrid is an innovative system that integrates distributed energy r...
04/28/2019

The Optimal Power Control Policy for an Energy Harvesting System with Look-Ahead: Bernoulli Energy Arrivals

We study power control for an energy harvesting communication system wit...
01/11/2018

Energy Harvesting Communications Using Dual Alternating Batteries

We consider an energy harvesting transmitter equipped with two batteries...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent advances in energy harvesting technologies have enabled the development of self-sustainable wireless communication systems that are powered by renewable energy sources in the environment. An important research topic of energy harvesting communications is to design power control policies that maximize throughput or other rewards under random energy availability (see, e.g., [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

). Although offline power control is by far well investigated, our understanding of online power control remains quite limited. This situation can be largely attributed to the technical differences between these two control problems. For offline power control, since the realization of the whole energy arrival process is known in advance, the underlying distribution is irrelevant as far as policy design is concerned and it enters the picture only in the evaluation of the expected reward, where different realizations need to be weighted according to their respective probabilities. In contrast, for online power control, one has to take into account the distribution of the energy arrival process due to the uncertainty of future energy arrivals. Indeed, this fact can also be seen from the implicit characterization of the optimal online power control policy based on the Bellman equation, which invovles the energy arrival distribution in an essential way. Due to its distribution-dependent nature, the Bellman equation is often very difficult to solve exactly. To the best of our knowledge, the general analytical solution to the Bellman equation has only been found in the low battery-capacity regime where the greedy policy is shown to be optimal. Even in that case, the so-called low battery-capacity regime varies from one energy arrival distribution to another. More generally, for any nondegenerate reward function, there is no online power control policy that is universally optimal for all energy arrival distributions, which should be contrasted with offline power control where universality comes for free in light of the aforementioned reason. It is worth mentioning that the requirement of precise knowledge of the energy arrival distribution not only complicates the characterization of the optimal online power control policy, but also, in a certain sense, diminishes the importance of such policy since the needed knowledge is typically not available in practice.

Fortunately, as demonstrated by Shaviv and Özgür in their remarkable work [14], it is possible to break the deadlock by weakening the notions of optimality and universality. Specifically, they proposed a fixed fraction policy, which only requires the knowledge of the (effective) mean of the energy arrival process, and established its universal near-optimality in terms of the achievable throughput over an additive white Gaussian noise (AWGN) channel (see also [15]

for an extended version of this result for more general reward functions). At the heart of their argument is a worst-case performance analysis of the fixed fraction policy, which shows that among all energy arrival processes of the same (effective) mean, the Bernoulli process induces the minimum throughput for the fixed fraction policy; the aforementioned near-optimality result then follows directly from the fact that this minimum throughput is within both constant additive and multiplicative gaps from a simple universal upper bound. Their finding naturally raises the question of whether it is possible to find an online power control policy with improved worst-case performance as compared to the fixed fraction policy or, better still, a policy with the best worst-case performance. In this work, we provide an affirmative answer to this question by constructing an online power control policy that is maximin optimal in the following sense: this policy achieves the maximum asymptotic expected average reward for the Bernoulli energy arrival process of any (effective) mean while among all energy arrival processes of the same (effective) mean, the Bernoulli process induces the minimum asymptotic expected average reward for this policy. To this end, two major obstacles need to be overcome. First of all, the optimal online power control policy for the Bernoulli energy arrival process of a given (effective) mean is uniquely defined only for some discrete battery energy levels; however, under the maximin formulation, it is essential to extend the support of this policy to cover all possible battery energy levels, and a judicious construction is needed to ensure that the interpolated policy has desired properties and at the same time is amenable to analysis. The second obstacle lies in the worst-case performance analysis of the interpolated policy. In contrast to the fixed fraction policy for which some basic convexity/concavity argument suffices due to its linearity, the interpolated policy requires more delicate reasoning for establishing Bernoulli arrivals as the least favorable form of energy arrivals. It will be seen that these two obstacles are intertwined, and we will address them by developing a maximin theory based on detailed investigations of some general families of online power control policies. From a mathematical perspective, our work can also be viewed as saddle-point analysis in a functional space. Note that even for finite-dimensional minimax/maximin games, one often relies on fixed-point theorems to prove the existence of saddle-point solutions. It is thus somewhat surprising that the saddle-point solution of the specific functional game under consideration admits an explicit characterization. In this sense, our work is of inherent theoretical interest.

The rest of this paper is organized as follows. In Sec. II, we formulate the problem and introduce the main results of this paper. A maximin theory of online power control for discrete-time battery limited energy harvesting communications is developed in Sec. III. We conclude the paper in Sec. IV. The proofs of Theorem 20 and most propositions, as well as some auxiliary results, are given in the appendices.

Throughout the paper, the base of the logarithm function is . The maximum and the minimum of and are denoted by and , respectively. The Borel -field generated by the topology on a metric space is denoted by . The -fold composition of a function for some subset of is denoted by with the convention . An empty sum and an empty product are defined to be and , respectively.

Ii Problem Formulation and Main Results

Consider a discrete-time energy harvesting communication system equipped with a battery of capacity . We denote by the amount of energy harvested at time . An online power control policy is a family of mappings specifying the energy consumed in time slot based on . Let and denote the amounts of energy stored in the battery at the beginning of time slot before and after the arrival of energy , respectively. They satisfy

(1a)
(1b)

It is assumed that .

A policy is said to be admissible if

The collection of all admissible policies is denoted by . For , if depends on only through and is time invariant, we say is stationary and identify it by a mapping satisfying for all such that , where is understood as a function of by (1). The set of all (admissible) stationary policies is denoted by . In the sequel, when we write a stationary policy , it may be understood as a mapping , a policy , or a partial policy , and so on, by the context.

The energy is consumed to perform some task in time slot , from which a reward is obtained.

Definition 1

A reward function is a nondecreasing, Lipschitz, and concave function from to with .

Definition 2

A reward function is said to be regular if it is strictly concave and differentiable and the function

is convex for all (satisfying , which is in fact unnecessary by Proposition 19), where

One example of interest is the throughput over an AWGN channel. In this case, the reward is the information rate in time slot given by

(2)

with being the channel coefficient.

Thus the -horizon total reward of partial policy with respect to energy arrivals and the initial battery energy level is

where , , and , with and satisfying (1). The corresponding -horizon average reward is

Suppose now that the energy harvested at each time

is a random variable

, and consequently the whole sequence of energy arrivals forms a random process. Correspondingly, the energy variables , , and become the random variables , , and , respectively. The -horizon expected total reward is then given by

and the -horizon expected average reward is

The asymptotic expected average reward of policy with respect to energy arrivals is thus defined as

Since the above three quantities depend only on the (probability) distributions of

or , we can also write their associated distributions in place of or . For example, we may write , where denotes the distribution of an i.i.d. process with marginal distribution .

We are interested in characterizing online power control policies that maximize the asymptotic expected average reward in the worst case of a given family of energy arrival distributions. To this end, we introduce a maximin formulation.

Definition 3

The mean-to-capacity ratio (MCR) of a probability measure on is defined by

where is the (effective) mean of .

Definition 4

Let consist of all probability measures on with , where . An online power control policy is said to be maximin optimal for if

The main result of this paper is summarized as follows.

Theorem 5 (Theorems 26, 27 and Proposition 24)

If the reward function is regular, then the stationary policy

is maximin optimal for and its associated least favorable distribution is Bernoulli (see (4)), where

and

In particular, if is given by (2), then

is maximin optimal, where is the least integer satisfying

Iii A Maximin Theory of Online Power Control for Energy Harvesting Communications

In order to find the maximin optimal online power control policy, we adopt the following approach:

  1. Find a distribution that is the least favorable one in when a policy in some special subset of or is employed.

  2. Construct a policy that is optimal for and is an element of .

The rationale underlying this approach is best explained by the following self-evident result.

Proposition 6

Let . If there is a distribution such that

then is maximin optimal.

Iii-a Normal Stationary Policies and the Least Favorable Distribution

In this subsection, we will study a special family of policies called normal (stationary) policies. We will show that, for any

, the Bernoulli distribution is the least favorable one in

as long as is not below a certain threshold depending on .

Definition 7

For each (stationary) policy , let be its associated policy induced by the complement operation:

Note that .

Policy may be called a (stationary) reserve policy because it specifies the amount of energy reserved for future use.

Definition 8

A policy is said to be normal if it is nondecreasing and concave. The set of all normal policies is denoted by .

Proposition 9

A normal policy satisfies:

  1. .

  2. is nondecreasing and convex on .

  3. Both and are Lipschitz on .

  4. Both and are differentiable at all but at most countable points of , and and are nonincreasing and nondecreasing, respectively, on their domains of definition. Moreover, both and are between and whenever they exist.

The next theorem shows that the Bernoulli distribution is the least favorable one for normal policies under certain mild conditions.

Theorem 10

For a normal policy , if

(3)

for almost every , then

for all and , where

(4)

and

Proof:

Let and be two random sequences of energy arrivals such that and , respectively. Let

and

where and .

Note that and are both nondecreasing, concave, and Lipschitz (Definitions 1 and 8 and Proposition 9). Let

where . It is clear that is concave in for fixed . Recall that, for any concave functions and , is concave if is nondecreasing (Proposition 33). So a function such as is concave in for fixed .

Now we will show that for all . Note that

([14, Lemma 2])

and

moreover,

(Eq. (3))
(Proposition 9),

and is nondecreasing, Lipschitz, and concave on .

We proceed by induction on in the reverse order. Suppose that

(5a)
(5b)

and is nondecreasing, Lipschitz, and concave on . It follows that

(Eq. (5a))
([14, Lemma 2])

and

which, together with Eq. (3), implies

where

is nondecreasing, Lipschitz, and concave on (Lemma 29 with (5b)), and so is . Therefore, for all , and in particular for .

Remark 11

In essence, condition (3) compares the marginal utilities of two energy consumptions specified by policy : one in the current time slot and the other in the next time slot if there is no new energy arrival, assuming that the distribution of energy arrivals is Bernoulli. The marginal utilities of these two energy consumptions are

respectively. When condition (3) is met, can be considered, in a certain sense, non-greedy for .

Motivated by this observation, we introduce the following definitions.

Definition 12

A universal stationary policy is a mapping from to satisfying . The set of all universal stationary policy is denoted by . A universal stationary policy is said to be normal if it is nondecreasing and concave. The set of all universal normal (stationary) policies is denoted by .

Note that any universal stationary policy can be regarded as a stationary policy in by considering its restriction on .

Definition 13

For any universal stationary policy , the asymptotic expected average reward of with respect to the Bernoulli energy arrival distribution is a function of capacity , which is denoted by , or more succinctly, , when is fixed and is clear from the context.

Definition 14

The greed index of a stationary policy is defined by

The universal greed index of a universal stationary policy is defined by

Definition 15

A stationary policy is said to be non-greedy for if it satisfies (3).

By the concept of non-greediness, Theorem 10 can be restated as follows: If a normal policy is non-greedy for , then its least favorable distribution in is Bernoulli.

Remark 16

Theorem 10 only provides a sufficient condition, so it does not cover all possible policies for which the least favorable distribution is Bernoulli (e.g., the greedy policy for sufficiently large but fixed ). However, since condition (3) coincides in part with the optimality condition for the Bernoulli distribution (see (12) with and ), any policy completely violating (3) cannot be an optimal policy for the Bernoulli distribution, and hence is not maximin optimal even if its least favorable distribution is Bernoulli.

We end this subsection with some properties of the greed index and the universal greed index. Their proofs are simple and hence left to the reader.

Proposition 17

Let .

  1. .

  2. Policy is non-greedy for if .

Proposition 18

Let .

  1. .

  2. is nondecreasing in .

  3. .

  4. Policy is non-greedy for if .

Iii-B An Optimal Policy for Bernoulli Energy Arrivals

In this subsection we will construct an optimal policy for Bernoulli energy arrivals that is normal and non-greedy for , and consequently is maximin optimal. In order to achieve this goal, the reward function is required to be regular (Definition 2). This property further implies the following fact.

Proposition 19

A regular reward function is strictly increasing and continuously differentiable. Its derivative is strictly decreasing and satisfies .

Proof:

Use [18, Th. 1.5] and Proposition 32.

From now on, we will assume that is regular. Under this assumption, we can construct an explicit optimal stationary policy for Bernouli energy arrivals.

Theorem 20 (cf. [14, Th. 1] and [15, Sec. II-A])

A stationary policy is optimal for i.i.d. energy arrivals with the Bernoulli distribution iff it satisfies

(6)

for all

(7)

where

(8)

From the proof of Theorem 20, we can see that the value of for has no impact on the asymptotic expected average reward for Bernoulli arrivals. However, this is not necessarily the case for other energy arrival distributions. To construct a universal stationary policy with maximin optimal performance, we consider the natural extension of (6) from to . The resulting policy is analyzed with the aid of the following functions.

Definition 21

The extension of is defined by

where .

Definition 22

Let

where denotes the -fold composition of with the convention .

The following propositions summarize some important properties of and .

Proposition 23

The function has the following properties:

  1. for .

  2. is continuous, nondecreasing, and convex.

  3. , where and .

  4. The least nonnegative integer such that is

    which is a generalization of (8) (the latter corresponds to the special case ).

Proposition 24

The function is continuous, strictly increasing, and convex, and for all . In particular,

where

A straightforward consequence of these properties is the next theorem, which shows that is normal and non-greedy.

Theorem 25

The stationary policy has the following properties:

  1. Policy is strictly increasing, concave, and consequently normal.

  2. ω(¯ω^(i-1)(x)) = ¯κ_1/(1-p)^(i-1)(ω(x))
  3. , and consequently is non-greedy for .

Proof:

1) It is clear that (Theorem 20 and Proposition 24), which implies due to the invertibility of (Proposition 24). Moreover, is strictly increasing and concave (Propositions 24 and 34). Therefore, is normal.

2) Since ,

which implies that . Repeatedly applying this identity, we have , which is zero if (Proposition 23).

3) It is clear that

We have

and therefore . By Proposition 18, is non-greedy for .

From Theorems 10, 20, and 25 and Proposition 6, it then follows that is maximin optimal.

Theorem 26

Suppose that is regular. The stationary policy is maximin optimal for and

where is the Bernoulli distribution defined by (4).

In particular, for the special reward function given by (2), we have the following maximin optimal policy.

Theorem 27 (cf. [14, Th. 1])

Suppose that is given by (2). The policy

(9)

is maximin optimal, where is the least integer satisfying

Proof:

With no loss of generality, we assume . Note that

and

We have

and consequently

where . It is easy to see that is regular.

In light of Theorem 26, the online power control policy

is maximin optimal. Note that

with

Thus

where is the least integer satisfying

By replacing and with and , respectively, we get (9) for a general .

Fig. 1: Plots of the maximin opitmal policy , the fixed fraction policy , and the greedy policy .

By Proposition 23, it is easy to see that is a piecewise linear function, with the endpoints of line segments given by

(10)

for . Policy is plotted in Fig. 1 for and . For comparison, the fixed fraction policy

and the greedy policy are also plotted in Fig. 1. It is observed from (10) and Fig. 1 that coincides with when . It is also observed that as .

In general, a maximin optimal policy is not guaranteed to perform well for all distributions in . But in the case where the reward function is given by (2), it is known that the fixed fraction policy is universally near optimal in terms of additive and multiplicative gaps ([14, Th. 2]); moreover, this universal near optimality is established by considering the worst-case performance of . Note that for both and , the least favorable distribution is Bernoulli (Theorem 10 or [14, Prop. 5]). Since is optimal for Bernoulli arrivals whereas is suboptimal, it follows that has strictly better worst-case performance compared to and consequently must be universally near optimal as well.

Figs. 210 illustrate the performance comparisons of the maximin optimal policy and the fixed fraction policy when

is Bernoulli, truncated uniform, or truncated exponential, where the truncated uniform and exponential distributions are given by

with and

respectively. It can be seen from the plots that consistently outperforms and has a clear advantage in the low battery-capacity regime (i.e., when is small). This shows that the dominance of over is not restricted to the worst-case scenario.

Fig. 2: Performance comparison of the maximin opitmal and fixed fraction policies, for and .
Fig. 3: Performance comparison of the maximin opitmal and fixed fraction policies, for and .
Fig. 4: Performance comparison of the maximin opitmal and fixed fraction policies, for and .
Fig. 5: Performance comparison of the maximin opitmal and fixed fraction policies, for and ().
Fig. 6: Performance comparison of the maximin opitmal and fixed fraction policies, for and ().
Fig. 7: Performance comparison of the maximin opitmal and fixed fraction policies, for and ().
Fig. 8: Performance comparison of the maximin opitmal and fixed fraction policies, for and ().
Fig. 9: Performance comparison of the maximin opitmal and fixed fraction policies, for and ().
Fig. 10: Performance comparison of the maximin opitmal and fixed fraction policies, for and ().
Remark 28

In contrast to the fact that , the MCRs of and depend on the battery capacity (in addition to their respective parameters and ). To facilitate the characterization of this dependency, we define the nominal MCR (NMCR) of a truncated distribution to be the ratio of the mean of its original distribution (with no truncation) to the battery capacity . Note that the NMCRs of and are

and

respectively. Hence, if the NMCR is , then

and