# The Optimal Power Control Policy for an Energy Harvesting System with Look-Ahead: Bernoulli Energy Arrivals

We study power control for an energy harvesting communication system with independent and identically distributed Bernoulli energy arrivals. It is assumed that the transmitter is equipped with a finite-sized rechargeable battery and is able to look ahead to observe a fixed number of future arrivals. A complete characterization is provided for the optimal power control policy that achieves the maximum long-term average throughput over an additive white Gaussian noise channel.

## Authors

• 5 publications
• 65 publications
03/25/2021

### On Optimal Power Control for Energy Harvesting Communications with Lookahead

Consider the problem of power control for an energy harvesting communica...
08/01/2018

### Power Management Policies for AWGN Channels with Slow-Varying Harvested Energy

In this paper, we study power management (PM) policies for an Energy Har...
01/11/2018

### Energy Harvesting Communications Using Dual Alternating Batteries

We consider an energy harvesting transmitter equipped with two batteries...
03/27/2020

### Robust and Deterministic Scheduling of Power Grid Actors

Modern power grids need to cope with increasingly decentralized, volatil...
09/17/2019

### On the Optimality of the Greedy Policy for Battery Limited Energy Harvesting Communications

Consider a battery limited energy harvesting communication system with o...
05/08/2018

### Non-Asymptotic Achievable Rates for Gaussian Energy-Harvesting Channels: Best-Effort and Save-and-Transmit

An additive white Gaussian noise (AWGN) energy-harvesting (EH) channel i...
12/04/2019

### A Maximin Optimal Online Power Control Policy for Energy Harvesting Communications

A general theory of online power control for discrete-time battery limit...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Supplying required energy of a communication system by energy harvesting (EH) from natural energy resources is not only beneficial from the environmental perspective, but also essential for long-lasting self-sustainable affordable telecommunication which can be employed in places with no electricity infrastructure. On the other hand, the EH systems require to handle related challenges, such as varying nature of green energy resources and limited battery storage capacity.
Consider an EH communication system with a transmitter (TX) and a receiver (RX), connected by an additive white Gaussian noise (AWGN) channel. The TX is equipped with a rechargeable battery of a given storage size, and is capable of harvesting the energy arrivals, which are assumed to be independent and identically distributed (i.i.d.). The communication session consists of (discrete) time slots, and the instantaneous rate achieved at each time slot is a function of its allocated energy. A power control policy specifies the energy assignment across the time slots according to the initial battery energy level, the harvested energy sequence thus far, and the knowledge of future harvested energy arrivals. The reward associated with each policy is the average throughput over horizon. The average throughput optimization (ATO) problem aims to determine the optimal policy that achieves the maximum average throughput.
In the seminal paper [1], the authors studied the ATO problem over a finite horizon for the offline model, where the energy arrivals are non-causally known at the TX, and they derived the optimal policy for this model based on  [2]. The analyses of the offline model for more general channels can be found in [3, 4, 5, 6, 7] (and the references therein). In general, the optimal policies for the offline model strive to allocate the energy across the time horizon as uniformly as possible while trying to avoid energy loss due to the battery overflow.
In landmark paper [8], the authors studied the ATO problem over an infinite horizon for the online model, where the energy arrivals are causally known at the TX. They determined the optimal policy for Bernoulli energy arrivals and established the approximate optimality of the fixed fraction policy for general energy arrivals. Similar results were derived in  [9, 10] for a general concave and monotonically increasing utility function. In papers [11, 12], the authors studied the same problem with unlimited size battery and developed three simple online optimal policies which are optimal for the offline case as well.
In this paper, we study the setup where the TX is able to look ahead to observe a window of size of future energy arrivals. In fact, the online model and offline model correspond to the extreme cases and

, respectively. Therefore, our formulation provides a link between these two models, which have been largely studied in isolation. From the practical perspective, the TX often has a good estimation of the amount of available energy in near future, because such energy is already harvested but not yet converted to a usable form. We investigate the ATO problem over an infinite horizon for this new setup. Specifically, we focus on Bernoulli energy arrivals and characterize the corresponding optimal policy. The main difference between this optimal policy and that of

[8, 9, 10] is as follows. For the new policy, if no energy arrival is seen in the look-ahead window, the battery always keeps some energy for future and spends a portion of available energy in the current time slot. In contrast, for the policy in [8, 9, 10], the energy is only allocated to a fixed number of time slots after each battery charge and no energy is expended beyond that as the battery becomes depleted.
The organization of this paper is as follows. In Section II, we introduce the model and problem. In Section III, we develop the structure of an optimal policy and justify the problem solving strategy. In Section IV, we establish the main results and completely characterize the optimal policy. In Section V, we finally conclude this paper.

## Ii Problem Definitions

The notations of this paper are as follows. and

represent the set of natural numbers and set of positive real numbers, respectively. Random variables are denoted by capital letters and their realizations are written in lower case letters. The functions are denoted by calligraphic font.

is reserved for the expectation. The logarithms are in base 2.
Consider a point-to-point AWGN quasi-static fading channel from a TX to an RX, where the channel gain is a constant for the entire communication session. The communication is discrete time with time slot as formulated by , where and are the transmitted signal and the received signal, respectively;

is the Gaussian noise with zero mean and unit variance. The TX is capable of harvesting energy from the environment and is equipped with a rechargeable battery with finite size

. The exogenous (harvested) energy arrivals are assumed to be i.i.d. process with known marginal distribution . In this work, unless specified otherwise, we assume that is Bernoulli-, where , defined as

 PE(e)={p:e=B,1−p:e=0. (1)

Denote the energy level stored in the battery by random process with initial level, , where . If the energy arrives at some time instant , , and the battery is fully charged to . In this case, any remained energy in the battery is overflowed and wasted away. If no energy arrives, , and the battery energy level is not escalated at time slot .
In this paper, we assume that the TX is able to look ahead with a fixed window size : the realization of the energy arrival sequence is known to the TX at time .
The TX sends energy as an action at time slot , where the energy is determined by a (randomized) action function

 Aτ=Aτ((Et)τ+wt=1,B1),subject to Aτ≤Bτ (2)

and gain throughput , as the reward of time . Then, the battery energy level becomes

 Bτ=min{Bτ−1−Aτ−1+Eτ,B}. (3)

A look-ahead policy is characterized by sequence of action functions . For a fixed communication session time , gains the average (expected) throughput over -horizon

 Γπ(w)T≜1TE(T∑τ=112log(1+γAτ((Et)τ+wt=1,B1))), (4)

as its associated reward, where the expectation is over all energy arrival sequences .

###### Definition 1.

The largest average reward (channel throughput) over infinite horizon (long term) is defined as

 Γ∗B1≜supπ(w)liminfT→∞Γπ(w)T. (5)

If is attainable by a policy , it is called optimal.

###### Remark 1.

It can be shown [8, Appendix B] that does not depend on . Hence, we can drop the subscript in (5), and assume without loss of generality (WLOG).

In this paper, we seek and the corresponding optimal policy according to Definition 1.

## Iii Problem Solving Strategy

First, assume that the distribution of the harvested energy, , is arbitrary. In general, depends on . According to [13, 14], there is no loss of optimality in (5) if the supremum is taken over deterministic Markovian stationary policies which only rely on system state

 Sτ=(Bτ,Eτ+1,Eτ+2,…,Eτ+w). (6)

Indeed, (5) is attainable by an optimal stationary policy . Given with finite length , knowing energy arrivals does not enhance . Note that the action is not only determined by the current energy level , but also it can be affected by the observed future energy arrivals within the look ahead window. As the optimal policy is Markovian and stationary, the action function (2) can be simplified to the time-invariant function

 Aτ=A(Sτ).subject to Aτ≤Bτ (7)

Now, focus on Bernoulli distribution as defined in (

1). In this case, the state (6) can be simplified as follows. Let random variable be the time distance of the earliest energy arrival located inside the look-ahead window. Specifically, define

 Dτ≜{0: if Eτ+t=0 for allt∈{1,…,w},min{t:1≤t≤w,Et+τ=B}: O.W..

For any given energy arrival sequence and battery level at time , if an energy arrival is observed at time , i.e., , then the optimal policy uniformly assigns instant battery energy to the following time spots. This is due to the concavity of the reward function (4). Otherwise, is the action at time , which will be determined in the sequel. Therefore, the action function (7) of the stationary optimal policy is given by

 (8)

From (8), we conclude that (and so the associated reward) can be uniquely determined by . Hence, the system state for Bernoulli energy arrival can be reduced to

 Sτ=(Bτ,Dτ). (9)

A non-negative sequence with length is called admissible, if . Let . Define admissible sequence associated with by

 ξ∗j≜A(bj,0,⋯,0):j∈N, (10)

where . Due to (8), if the battery is charged up at some time , (), but no arrival occurs later, ( : ), sends for . In the sequel, and its properties are investigated. Once is determined, follows Algorithm 1.

The times in interval with property and for is called a cycle with start time . Let define random variable as the (duration) time of the cycle. Due to (1), the distribution of is Geometric, i.e., for , with mean

 E(L)=1p. (11)

If a stationary policy is employed, the processes of , , and are non-delayed regenerative processes [15, Section 7.5] with cycles of time : when the battery charges up to at some cycle start time , memories of the processes are reset. Consequently, does not statistically depend on and time .
As in (11) and  are bounded and is a non-delayed regenerative process, the “renewal reward theorem” [15, Section 7.4] can be utilized to simplify the long-term average throughput achieved by a policy .

 liminfT→∞Γπ(w)T = E(∑Lt=112log(1+γAt))E(L) (12) = p2(∞∑k=1k∑j=1log(1+γQj)Pr{L=k}),

where (12) is due to (11) and is the energy assigned to time of a cycle conditioned on .The following definition is helpful to calculate (12).

###### Definition 2.

Let be an admissible sequence. Define

 T∞({xj}∞i=1)≜ w∑k=1p2(1−p)k−1k2log(1+γBk) +∞∑j=1p(1−p)j+w−112log(1+γxj) +∞∑k=1 p2(1−p)k+w−1w2log(1+γB−∑kj=1xjw). (13)
###### Lemma 1.

The long-term average throughput (4) of optimal policy  with associated sequence satisfies

 Γ∗=T∞({ξ∗i}∞i=1). (14)
###### Proof.

The proof follows from Definition (1), (12) and Definition 2. First, energy assignments in (12) for is determined in the following. Assume a new cycle is started at time and so the battery energy level is . Given the cycle time , the following two cases can be considered.

1. Case : In this case, as the arrival is observed in the window. Hence, according to (8), we have

 Qj=Bkfor j∈{1,…,k}. (15)
2. Case : In this case, the TX allocates to the first

time slots of each cycle; then, it uniformly distributes the remained energy of the battery

into time slots due to (8), as soon as the arrival is observable (look-ahead window covers time ).

 Qj=⎧⎨⎩ξ∗j:j∈{1,…,k−w},B−∑k−wj=1ξ∗jw:j∈{k−w+1,…,w}. (16)

Finally, the proof can be concluded from the following calculation of (12) for based on (15) and (16).

 Γ∗ = pw∑k=1p(1−p)k−1k2log(1+γBk) +p∞∑k=w+1p(1−p)k−1[k−w∑j=112log(1+γξ∗j) +w2log(1+γB−∑k−wj=1ξ∗jw)] = w∑k=1p2(1−p)k−1k2log(1+γBk) +∞∑j=112log(1+γξ∗j)(∞∑k=j+wp2(1−p)k−1) +∞∑k=w+1p2(1−p)k−1w2log(1+γB−∑k−wj=1ξ∗jw). = w∑k=1p2(1−p)k−1k2log(1+γBk) +∞∑j=1p(1−p)j+w−112log(1+γξ∗j) +∞∑k=1p2(1−p)k+w−1w2log(1+γB−∑kj=1ξ∗jw).

## Iv Properties of the Optimal Policy

In this section, we characterize the energy sequence and its properties. The main result of this paper is as follows.

###### Theorem 1.

Define , where the supremum is over all admissible sequences . Then, the maximum long-term average reward (channel throughput) of the look-ahead model is given by

 Γ∗=T∗∞. (18)

Moreover, induced by the optimal policy is the unique maximizer of and is the unique sequence satisfying

 1−p1+γξ∗j+1 = 11+γξ∗j−p1+γw(B−∑ji=1ξ∗i), (19a) ∞∑j=1ξ∗j = B. (19b)

It is also a strictly decreasing positive sequence with property

 ξ∗j
###### Remark 2.

The optimal average throughput of the studied model coincides with the optimal average throughput of the non-causal model if is set in (LABEL:optimize-target-Infinity). In this case the optimal average throughput is given by

 Γ∗=∞∑k=1p2(1−p)k−1k2log(1+γBk).

In Fig. 1, we illustrate the (long-term) average throughput as a function of window size using (18) for the given system parameters. Although the optimal throughput rate is an increasing function of , if the window size in Fig. 1, this communication system achieves the optimal average throughput corresponds to within a gap smaller than . The rest of this section is devoted to the proof of theorem 1.
The proof of (18) is as follows. As mentioned, there exists a stationary policy which attains  [13, 14]. That policy also satisfies (14) in which is associated with . As is optimal, must maximize in (14) over all admissible sequences. As is a strictly concave function, the maximizing sequence is unique. The unique maximizer indeed achieves . Hence, (18) is justified. To investigate sequence as the unique maximizer of , we need to define the corresponding dimensional optimization problem in the following subsection to employ Karush-Kuhn-Tucker (KKT) conditions. Then, we investigate the relation between the finite dimensional optimization problem and the infinite dimensional optimization problem.

### Iv-a N−Dimensional Optimization Problem.

Let define the following dimensional optimization problem based on Definition 2.

###### Definition 3.

Fix . Recall Definition 2, and set for and for . Define

 TN(ξ1,⋯,ξN)≜T∞(ξ1,⋯,ξN,0,0,⋯) =w∑k=1p2(1−p)k−1k2log(1+γBk) +N∑j=1p(1−p)j+w−112log(1+γξj) +N∑k=1p2(1−p)k+w−1w2log(1+γB−∑kj=1ξjw) +p(1−p)w+Nw2log(1+γB−∑Nj=1ξjw).

The dimensional optimization problem is defined as

 T∗N=supTN(ξ(N)1,⋯,ξ(N)N), (21)

where the supremum is over all sequences subject to

 ξ(N)j≥0, (22a) N∑j=1ξ(N)j≤B. (22b)

The maximizer of , if exists, is denoted by .

###### Lemma 2.

The following statements are valid for .

1. There exists a unique maximizer of .

2. is an increasing function of .

3. and it is finite.

###### Proof.

(a) follows the fact that function is a continuous bounded concave function defined on a compact set. (b) is due to the fact that domain of is a subset of domain of . (c) follows from the fact that is a continuous bounded function and is increasing, i.e., , and thus exists and it is finite. Indeed, the following inequality proves that this limit is due to the squeeze theorem.

 T∗N(ξ(N)∗1,⋯,ξ(N)∗N) ≥TN(ξ∗1,⋯,ξ∗N) =T∗∞(ξ∗1,⋯,ξ∗∞)−∞∑j=N+1p(1−p)j+w−12log(1+γξ∗j) −∞∑k=N+1p2(1−p)k+w−1w2log(1+γw(B−k∑j=1ξ∗j)) +p(1−p)w+Nw2log(1+γB−∑Nj=1ξ∗jw) ≥T∗∞−∞∑j=N+1p(1−p)j+w−12log(1+γB) −∞∑k=Np2(1−p)k+w−1w2log(1+γwB) =T∗∞−(1−p)w+N2[log(1+γB)+pwlog(1+γwB)] =T∗∞−ϵN, (23)

where is a positive number with property . ∎

In Corollary 2 in the sequel, we will prove that is indeed a strictly increasing function, which is stronger than Lemma 2-part(B).
The -dimensional optimization problem can be solved by the KKT method. A necessary and sufficient condition for sequence to attain is to satisfy

 p(1−p)w+j−1γ2(1+γξ(N)∗j) −N−1∑k=jp2(1−p)k+w−1γ21+γB−∑ki=1ξ(N)∗iw −p(1−p)w+N−1γ21+γB−∑Ni=1ξ(N)∗iw−λ(N)+μ(N)j=0, (24a) μ(N)jξ(N)∗j=0, (24b) λ(N)(B−N∑i=1ξ(N)∗i)=0, (24c)

where , and are non-negative real numbers.

### Iv-B Properties of Sequence {ξ(N)∗j}Nj=1.

In this subsection, we investigate the properties of the sequence which achieves based on the KKT conditions (24).

###### Definition 4.

The effective length of an admissible energy sequence is the largest (time) index beyond which the rest of the sequence vanishes. Specifically, is called the effective length of if

 ξNeff>0 (25) ξ(N)∗j=0for allj>Neff. (26)
###### Lemma 3.

The effective length of sequence is .

###### Proof.

First, we show that there exists at least one non-zero elements in the sequence of . Assume that this claim is not valid, and an all zero sequence is the optimal solution which satisfies the KKT conditions (24). Thus,

 ξ(N)∗j=0forj∈{1,⋯,N} (27) N∑i=1ξ(N)∗i=0 (28)

Hence, due to (24c). From , , (24a) (when is set) and (28), we should have

 p(1−p)w+N−1γ2(1+γξ(N)∗N)−p(1−p)w+N−1γ21+γBw≤0 (29)

However, this equation is only valid when due to . This inequality contradicts with (27). Consequently, the largest (time) index of the non-zero element () exists with property (25).
Second, we prove that is not valid in the following. Suppose . Define . Note that we have

 λ(N) ≥0 (30) μ(N)J =0 (31)

where the second inequality is due to and (24b). First, assume ; Setting in (24a) gives

 λ(N)=−p(1−p)w+J−1γ2ξ(N)∗J2(1+γξ(N)∗J),

which is contradictory with the fact that . Next consider the case . In this case, . Setting in (24a) gives

 [p(1−p)w+Jγ2−p(1−p)w+Jγ21+γB−B′w]+μJ+1=0. (32)

However, in (32), the bracket is strictly positive. Hence, the left hand side of (32) is strictly positive. This lead to a contradiction. Therefore, always holds. ∎

###### Corollary 1.

The last element of the optimal sequence is always non-zero, i.e., and coefficient .

###### Proof.

Due to Lemma 3, the largest non-zero element is the element of the sequence. is due to (24b). ∎

###### Corollary 2.

is a strictly increasing function of .

###### Proof.
 T∗N(ξ(N)∗1,⋯,ξ(N)∗N) = TN+1(ξ(N)∗1,⋯,ξ(N)∗N,0) < T∗N+1(ξ(N+1)∗1,⋯,ξ(N+1)∗N,ξ(N+1)∗N+1)

where the last inequality is due to the fact that can not be the unique maximizer of , because due to Corollary 1. ∎

###### Lemma 4.

For any , we have

 ξ(N)∗j≤B−∑ji=1ξ(N)∗iw, (33)

where inequality holds in the strict sense for .

###### Proof.

If , the inequality holds because according to (22b) and Corollary 1 when . Otherwise, if , then due to (24b). Hence, we can derive the expressions (34)-(37) at the bottom of this page, where (34) is due to (24a) and , (35) follows from , for , due to (22b), (36) holds strictly only if , because due to Corollary 1 for . Therefore, the lemma is concluded from (37). ∎

For any fixed , let define parameter

 B(N)=B−N∑i=1ξ(N)∗i.

From Corollary 1 and Lemma 4, we conclude that

 B(N)>0. (38)

Also, from (24c) and (38), we conclude that

 λ(N)=0. (39)

In the following lemma, we investigate the behaviour of sequence as a function of time index when is fixed.

###### Lemma 5.

is a strictly decreasing positive sequence.

###### Proof.

From (24a), for two successive terms and , we have

 p(1−p)w+j−1(γ2)1+γξ(N)∗j−N−1∑k=jp2(1−p)w+k−1(γ2)1+γB−∑ki=1ξ(N)∗iw −p(1−p