I Introduction
Supplying required energy of a communication system by energy harvesting (EH) from natural energy resources is not only beneficial from the environmental perspective, but also essential for longlasting selfsustainable affordable telecommunication which can be employed in places with no electricity infrastructure. On the other hand, the EH systems require to handle related challenges, such as varying nature of green energy resources and limited battery storage capacity.
Consider an EH communication system with a transmitter (TX) and a receiver (RX), connected by an additive white Gaussian noise (AWGN) channel.
The TX is equipped with a rechargeable battery of a given storage size, and is capable of harvesting the energy arrivals, which are assumed to be independent and identically distributed (i.i.d.). The communication session consists of (discrete) time slots, and the instantaneous rate achieved at each time slot is a function of its allocated energy. A power control policy specifies the energy assignment across the time slots according to the initial battery energy level, the harvested energy sequence thus far, and the knowledge of future harvested energy arrivals. The reward associated with each policy is the average throughput over horizon. The average throughput optimization (ATO) problem aims to determine the optimal policy that achieves the maximum average throughput.
In the seminal paper [1], the authors studied the ATO problem over a finite horizon for the offline model, where the energy arrivals are noncausally known at the TX, and they derived the optimal policy for this model based on [2]. The analyses of the offline model for more general channels can be found in [3, 4, 5, 6, 7] (and the references therein). In general, the optimal policies for the offline model strive to allocate the energy across the time horizon as uniformly as possible while trying to avoid energy loss due to the battery overflow.
In landmark paper [8], the authors studied the ATO problem over an infinite horizon for the online model, where the energy arrivals are causally known at the TX. They determined the optimal policy for Bernoulli energy arrivals and established the approximate optimality of the fixed fraction policy for general energy arrivals. Similar results were derived in [9, 10] for a general concave and monotonically increasing utility function. In papers [11, 12], the authors studied the same problem with unlimited size battery and developed three simple online optimal policies which are optimal for the offline case as well.
In this paper, we study the setup where the TX is able to look ahead to observe a window of size of future energy arrivals. In fact, the online model and offline model correspond to the extreme cases and
, respectively. Therefore, our formulation provides a link between these two models, which have been largely studied in isolation. From the practical perspective, the TX often has a good estimation of the amount of available energy in near future, because such energy is already harvested but not yet converted to a usable form. We investigate the ATO problem over an infinite horizon for this new setup. Specifically, we focus on Bernoulli energy arrivals and characterize the corresponding optimal policy. The main difference between this optimal policy and that of
[8, 9, 10] is as follows. For the new policy, if no energy arrival is seen in the lookahead window, the battery always keeps some energy for future and spends a portion of available energy in the current time slot. In contrast, for the policy in [8, 9, 10], the energy is only allocated to a fixed number of time slots after each battery charge and no energy is expended beyond that as the battery becomes depleted.The organization of this paper is as follows. In Section II, we introduce the model and problem. In Section III, we develop the structure of an optimal policy and justify the problem solving strategy. In Section IV, we establish the main results and completely characterize the optimal policy. In Section V, we finally conclude this paper.
Ii Problem Definitions
The notations of this paper are as follows. and
represent the set of natural numbers and set of positive real numbers, respectively. Random variables are denoted by capital letters and their realizations are written in lower case letters. The functions are denoted by calligraphic font.
is reserved for the expectation. The logarithms are in base 2.Consider a pointtopoint AWGN quasistatic fading channel from a TX to an RX, where the channel gain is a constant for the entire communication session. The communication is discrete time with time slot as formulated by , where and are the transmitted signal and the received signal, respectively;
is the Gaussian noise with zero mean and unit variance. The TX is capable of harvesting energy from the environment and is equipped with a rechargeable battery with finite size
. The exogenous (harvested) energy arrivals are assumed to be i.i.d. process with known marginal distribution . In this work, unless specified otherwise, we assume that is Bernoulli, where , defined as(1) 
Denote the energy level stored in the battery by random process with initial level, , where . If the energy arrives at some time instant , , and the battery is fully charged to . In this case, any remained energy in the battery is overflowed and wasted away. If no energy arrives, , and the battery energy level is not escalated at time slot .
In this paper, we assume that the TX is able to look ahead with a fixed window size : the realization of the energy arrival sequence is known to the TX at time .
The TX sends energy as an action at time slot , where the energy is determined by a (randomized) action function
(2) 
and gain throughput , as the reward of time . Then, the battery energy level becomes
(3) 
A lookahead policy is characterized by sequence of action functions . For a fixed communication session time , gains the average (expected) throughput over horizon
(4) 
as its associated reward, where the expectation is over all energy arrival sequences .
Definition 1.
The largest average reward (channel throughput) over infinite horizon (long term) is defined as
(5) 
If is attainable by a policy , it is called optimal.
Remark 1.
In this paper, we seek and the corresponding optimal policy according to Definition 1.
Iii Problem Solving Strategy
First, assume that the distribution of the harvested energy, , is arbitrary. In general, depends on . According to [13, 14], there is no loss of optimality in (5) if the supremum is taken over deterministic Markovian stationary policies which only rely on system state
(6) 
Indeed, (5) is attainable by an optimal stationary policy . Given with finite length , knowing energy arrivals does not enhance . Note that the action is not only determined by the current energy level , but also it can be affected by the observed future energy arrivals within the look ahead window. As the optimal policy is Markovian and stationary, the action function (2) can be simplified to the timeinvariant function
(7) 
Now, focus on Bernoulli distribution as defined in (
1). In this case, the state (6) can be simplified as follows. Let random variable be the time distance of the earliest energy arrival located inside the lookahead window. Specifically, defineFor any given energy arrival sequence and battery level at time , if an energy arrival is observed at time , i.e., , then the optimal policy uniformly assigns instant battery energy to the following time spots. This is due to the concavity of the reward function (4). Otherwise, is the action at time , which will be determined in the sequel. Therefore, the action function (7) of the stationary optimal policy is given by
(8) 
From (8), we conclude that (and so the associated reward) can be uniquely determined by . Hence, the system state for Bernoulli energy arrival can be reduced to
(9) 
A nonnegative sequence with length is called admissible, if . Let . Define admissible sequence associated with by
(10) 
where .
Due to (8), if the battery is charged up at some time , (), but no arrival occurs later, ( : ), sends for . In the sequel, and its properties are investigated. Once is determined, follows Algorithm 1.
The times in interval with property and for is called a cycle with start time . Let define random variable as the (duration) time of the cycle. Due to (1), the distribution of is Geometric, i.e., for , with mean
(11) 
If a stationary policy is employed, the processes of , , and are nondelayed regenerative processes [15, Section 7.5] with cycles of time : when the battery charges up to at some cycle start time , memories of the processes are reset. Consequently, does not statistically depend on and time .
As in (11) and are bounded and is a nondelayed regenerative process, the “renewal reward theorem” [15, Section 7.4] can be utilized to simplify the longterm average throughput achieved by a policy .
(12)  
where (12) is due to (11) and is the energy assigned to time of a cycle conditioned on .The following definition is helpful to calculate (12).
Definition 2.
Let be an admissible sequence. Define
(13) 
Lemma 1.
The longterm average throughput (4) of optimal policy with associated sequence satisfies
(14) 
Proof.
The proof follows from Definition (1), (12) and Definition 2. First, energy assignments in (12) for is determined in the following. Assume a new cycle is started at time and so the battery energy level is . Given the cycle time , the following two cases can be considered.

Case : In this case, as the arrival is observed in the window. Hence, according to (8), we have
(15) 
Case : In this case, the TX allocates to the first
time slots of each cycle; then, it uniformly distributes the remained energy of the battery
into time slots due to (8), as soon as the arrival is observable (lookahead window covers time ).(16) Finally, the proof can be concluded from the following calculation of (12) for based on (15) and (16).
∎
Iv Properties of the Optimal Policy
In this section, we characterize the energy sequence and its properties. The main result of this paper is as follows.
Theorem 1.
Define , where the supremum is over all admissible sequences . Then, the maximum longterm average reward (channel throughput) of the lookahead model is given by
(18) 
Moreover, induced by the optimal policy is the unique maximizer of and is the unique sequence satisfying
(19a)  
(19b) 
It is also a strictly decreasing positive sequence with property
(20) 
Remark 2.
The optimal average throughput of the studied model coincides with the optimal average throughput of the noncausal model if is set in (LABEL:optimizetargetInfinity). In this case the optimal average throughput is given by
In Fig. 1, we illustrate the (longterm) average throughput as a function of window size using (18) for the given system parameters. Although the optimal throughput rate is an increasing function of , if the window size in Fig. 1, this communication system achieves the optimal average throughput corresponds to within a gap smaller than . The rest of this section is devoted to the proof of theorem 1.
The proof of (18) is as follows. As mentioned, there exists a stationary policy which attains [13, 14]. That policy also satisfies (14) in which is associated with . As is optimal, must maximize in (14) over all admissible sequences. As is a strictly concave function, the maximizing sequence is unique. The unique maximizer indeed achieves . Hence, (18) is justified. To investigate sequence as the unique maximizer of , we need to define the corresponding dimensional optimization problem in the following subsection to employ KarushKuhnTucker (KKT) conditions. Then, we investigate the relation between the finite dimensional optimization problem and the infinite dimensional optimization problem.
Iva Dimensional Optimization Problem.
Let define the following dimensional optimization problem based on Definition 2.
Definition 3.
Fix . Recall Definition 2, and set for and for . Define
The dimensional optimization problem is defined as
(21) 
where the supremum is over all sequences subject to
(22a)  
(22b) 
The maximizer of , if exists, is denoted by .
Lemma 2.
The following statements are valid for .

There exists a unique maximizer of .

is an increasing function of .

and it is finite.
Proof.
(a) follows the fact that function is a continuous bounded concave function defined on a compact set. (b) is due to the fact that domain of is a subset of domain of . (c) follows from the fact that is a continuous bounded function and is increasing, i.e., , and thus exists and it is finite. Indeed, the following inequality proves that this limit is due to the squeeze theorem.
(23) 
where is a positive number with property . ∎
In Corollary 2 in the sequel, we will prove that is indeed a strictly increasing function, which is stronger than Lemma 2part(B).
The dimensional optimization problem can be solved by the KKT method. A necessary and sufficient condition for sequence to attain is to satisfy
(24a)  
(24b)  
(24c) 
where , and are nonnegative real numbers.
IvB Properties of Sequence .
In this subsection, we investigate the properties of the sequence which achieves based on the KKT conditions (24).
Definition 4.
The effective length of an admissible energy sequence is the largest (time) index beyond which the rest of the sequence vanishes. Specifically, is called the effective length of if
(25)  
(26) 
Lemma 3.
The effective length of sequence is .
Proof.
First, we show that there exists at least one nonzero elements in the sequence of . Assume that this claim is not valid, and an all zero sequence is the optimal solution which satisfies the KKT conditions (24). Thus,
(27)  
(28) 
Hence, due to (24c). From , , (24a) (when is set) and (28), we should have
(29) 
However, this equation is only valid when due to . This inequality contradicts with (27).
Consequently, the largest (time) index of the nonzero element () exists with property (25).
Second, we prove that is not valid in the following.
Suppose . Define . Note
that we have
(30)  
(31) 
where the second inequality is due to and (24b). First, assume ; Setting in (24a) gives
which is contradictory with the fact that . Next consider the case . In this case, . Setting in (24a) gives
(32) 
However, in (32), the bracket is strictly positive. Hence, the left hand side of (32) is strictly positive. This lead to a contradiction. Therefore, always holds. ∎
Corollary 1.
The last element of the optimal sequence is always nonzero, i.e., and coefficient .
Proof.
Corollary 2.
is a strictly increasing function of .
Proof.
where the last inequality is due to the fact that can not be the unique maximizer of , because due to Corollary 1. ∎
Lemma 4.
For any , we have
(33) 
where inequality holds in the strict sense for .
Proof.
If , the inequality holds because according to (22b) and Corollary 1 when . Otherwise, if , then due to (24b). Hence, we can derive the expressions (34)(37) at the bottom of this page, where (34) is due to (24a) and , (35) follows from , for , due to (22b), (36) holds strictly only if , because due to Corollary 1 for . Therefore, the lemma is concluded from (37). ∎
(34)  
(35)  
(36)  
(37) 
For any fixed , let define parameter
From Corollary 1 and Lemma 4, we conclude that
(38) 
Also, from (24c) and (38), we conclude that
(39) 
In the following lemma, we investigate the behaviour of sequence as a function of time index when is fixed.
Lemma 5.
is a strictly decreasing positive sequence.
Proof.
From (24a), for two successive terms and , we have
Comments
There are no comments yet.