I Introduction
Recent advances in energy harvesting technologies have enabled the development of selfsustainable wireless communication systems that are powered by renewable energy sources in the environment. An important research topic of energy harvesting communications is to design power control policies that maximize throughput or other rewards under random energy availability (see, e.g., [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
). Although offline power control is by far well investigated, our understanding of online power control remains quite limited. This situation can be largely attributed to the technical differences between these two control problems. For offline power control, since the realization of the whole energy arrival process is known in advance, the underlying distribution is irrelevant as far as policy design is concerned and it enters the picture only in the evaluation of the expected reward, where different realizations need to be weighted according to their respective probabilities. In contrast, for online power control, one has to take into account the distribution of the energy arrival process due to the uncertainty of future energy arrivals. Indeed, this fact can also be seen from the implicit characterization of the optimal online power control policy based on the Bellman equation, which invovles the energy arrival distribution in an essential way. Due to its distributiondependent nature, the Bellman equation is often very difficult to solve exactly. To the best of our knowledge, the general analytical solution to the Bellman equation has only been found in the low batterycapacity regime where the greedy policy is shown to be optimal. Even in that case, the socalled low batterycapacity regime varies from one energy arrival distribution to another. More generally, for any nondegenerate reward function, there is no online power control policy that is universally optimal for all energy arrival distributions, which should be contrasted with offline power control where universality comes for free in light of the aforementioned reason. It is worth mentioning that the requirement of precise knowledge of the energy arrival distribution not only complicates the characterization of the optimal online power control policy, but also, in a certain sense, diminishes the importance of such policy since the needed knowledge is typically not available in practice.
Fortunately, as demonstrated by Shaviv and Özgür in their remarkable work [14], it is possible to break the deadlock by weakening the notions of optimality and universality. Specifically, they proposed a fixed fraction policy, which only requires the knowledge of the (effective) mean of the energy arrival process, and established its universal nearoptimality in terms of the achievable throughput over an additive white Gaussian noise (AWGN) channel (see also [15]
for an extended version of this result for more general reward functions). At the heart of their argument is a worstcase performance analysis of the fixed fraction policy, which shows that among all energy arrival processes of the same (effective) mean, the Bernoulli process induces the minimum throughput for the fixed fraction policy; the aforementioned nearoptimality result then follows directly from the fact that this minimum throughput is within both constant additive and multiplicative gaps from a simple universal upper bound. Their finding naturally raises the question of whether it is possible to find an online power control policy with improved worstcase performance as compared to the fixed fraction policy or, better still, a policy with the best worstcase performance. In this work, we provide an affirmative answer to this question by constructing an online power control policy that is maximin optimal in the following sense: this policy achieves the maximum asymptotic expected average reward for the Bernoulli energy arrival process of any (effective) mean while among all energy arrival processes of the same (effective) mean, the Bernoulli process induces the minimum asymptotic expected average reward for this policy. To this end, two major obstacles need to be overcome. First of all, the optimal online power control policy for the Bernoulli energy arrival process of a given (effective) mean is uniquely defined only for some discrete battery energy levels; however, under the maximin formulation, it is essential to extend the support of this policy to cover all possible battery energy levels, and a judicious construction is needed to ensure that the interpolated policy has desired properties and at the same time is amenable to analysis. The second obstacle lies in the worstcase performance analysis of the interpolated policy. In contrast to the fixed fraction policy for which some basic convexity/concavity argument suffices due to its linearity, the interpolated policy requires more delicate reasoning for establishing Bernoulli arrivals as the least favorable form of energy arrivals. It will be seen that these two obstacles are intertwined, and we will address them by developing a maximin theory based on detailed investigations of some general families of online power control policies. From a mathematical perspective, our work can also be viewed as saddlepoint analysis in a functional space. Note that even for finitedimensional minimax/maximin games, one often relies on fixedpoint theorems to prove the existence of saddlepoint solutions. It is thus somewhat surprising that the saddlepoint solution of the specific functional game under consideration admits an explicit characterization. In this sense, our work is of inherent theoretical interest.
The rest of this paper is organized as follows. In Sec. II, we formulate the problem and introduce the main results of this paper. A maximin theory of online power control for discretetime battery limited energy harvesting communications is developed in Sec. III. We conclude the paper in Sec. IV. The proofs of Theorem 20 and most propositions, as well as some auxiliary results, are given in the appendices.
Throughout the paper, the base of the logarithm function is . The maximum and the minimum of and are denoted by and , respectively. The Borel field generated by the topology on a metric space is denoted by . The fold composition of a function for some subset of is denoted by with the convention . An empty sum and an empty product are defined to be and , respectively.
Ii Problem Formulation and Main Results
Consider a discretetime energy harvesting communication system equipped with a battery of capacity . We denote by the amount of energy harvested at time . An online power control policy is a family of mappings specifying the energy consumed in time slot based on . Let and denote the amounts of energy stored in the battery at the beginning of time slot before and after the arrival of energy , respectively. They satisfy
(1a)  
(1b) 
It is assumed that .
A policy is said to be admissible if
The collection of all admissible policies is denoted by . For , if depends on only through and is time invariant, we say is stationary and identify it by a mapping satisfying for all such that , where is understood as a function of by (1). The set of all (admissible) stationary policies is denoted by . In the sequel, when we write a stationary policy , it may be understood as a mapping , a policy , or a partial policy , and so on, by the context.
The energy is consumed to perform some task in time slot , from which a reward is obtained.
Definition 1
A reward function is a nondecreasing, Lipschitz, and concave function from to with .
Definition 2
A reward function is said to be regular if it is strictly concave and differentiable and the function
is convex for all (satisfying , which is in fact unnecessary by Proposition 19), where
One example of interest is the throughput over an AWGN channel. In this case, the reward is the information rate in time slot given by
(2) 
with being the channel coefficient.
Thus the horizon total reward of partial policy with respect to energy arrivals and the initial battery energy level is
where , , and , with and satisfying (1). The corresponding horizon average reward is
Suppose now that the energy harvested at each time
is a random variable
, and consequently the whole sequence of energy arrivals forms a random process. Correspondingly, the energy variables , , and become the random variables , , and , respectively. The horizon expected total reward is then given byand the horizon expected average reward is
The asymptotic expected average reward of policy with respect to energy arrivals is thus defined as
Since the above three quantities depend only on the (probability) distributions of
or , we can also write their associated distributions in place of or . For example, we may write , where denotes the distribution of an i.i.d. process with marginal distribution .We are interested in characterizing online power control policies that maximize the asymptotic expected average reward in the worst case of a given family of energy arrival distributions. To this end, we introduce a maximin formulation.
Definition 3
The meantocapacity ratio (MCR) of a probability measure on is defined by
where is the (effective) mean of .
Definition 4
Let consist of all probability measures on with , where . An online power control policy is said to be maximin optimal for if
The main result of this paper is summarized as follows.
Iii A Maximin Theory of Online Power Control for Energy Harvesting Communications
In order to find the maximin optimal online power control policy, we adopt the following approach:

Find a distribution that is the least favorable one in when a policy in some special subset of or is employed.

Construct a policy that is optimal for and is an element of .
The rationale underlying this approach is best explained by the following selfevident result.
Proposition 6
Let . If there is a distribution such that
then is maximin optimal.
Iiia Normal Stationary Policies and the Least Favorable Distribution
In this subsection, we will study a special family of policies called normal (stationary) policies. We will show that, for any
, the Bernoulli distribution is the least favorable one in
as long as is not below a certain threshold depending on .Definition 7
For each (stationary) policy , let be its associated policy induced by the complement operation:
Note that .
Policy may be called a (stationary) reserve policy because it specifies the amount of energy reserved for future use.
Definition 8
A policy is said to be normal if it is nondecreasing and concave. The set of all normal policies is denoted by .
Proposition 9
A normal policy satisfies:

.

is nondecreasing and convex on .

Both and are Lipschitz on .

Both and are differentiable at all but at most countable points of , and and are nonincreasing and nondecreasing, respectively, on their domains of definition. Moreover, both and are between and whenever they exist.
The next theorem shows that the Bernoulli distribution is the least favorable one for normal policies under certain mild conditions.
Theorem 10
For a normal policy , if
(3) 
for almost every , then
for all and , where
(4) 
and
Proof:
Let and be two random sequences of energy arrivals such that and , respectively. Let
and
where and .
Note that and are both nondecreasing, concave, and Lipschitz (Definitions 1 and 8 and Proposition 9). Let
where . It is clear that is concave in for fixed . Recall that, for any concave functions and , is concave if is nondecreasing (Proposition 33). So a function such as is concave in for fixed .
Now we will show that for all . Note that
([14, Lemma 2])  
and
moreover,
and is nondecreasing, Lipschitz, and concave on .
We proceed by induction on in the reverse order. Suppose that
(5a)  
(5b) 
and is nondecreasing, Lipschitz, and concave on . It follows that
and
which, together with Eq. (3), implies
where
is nondecreasing, Lipschitz, and concave on (Lemma 29 with (5b)), and so is . Therefore, for all , and in particular for .
Remark 11
In essence, condition (3) compares the marginal utilities of two energy consumptions specified by policy : one in the current time slot and the other in the next time slot if there is no new energy arrival, assuming that the distribution of energy arrivals is Bernoulli. The marginal utilities of these two energy consumptions are
respectively. When condition (3) is met, can be considered, in a certain sense, nongreedy for .
Motivated by this observation, we introduce the following definitions.
Definition 12
A universal stationary policy is a mapping from to satisfying . The set of all universal stationary policy is denoted by . A universal stationary policy is said to be normal if it is nondecreasing and concave. The set of all universal normal (stationary) policies is denoted by .
Note that any universal stationary policy can be regarded as a stationary policy in by considering its restriction on .
Definition 13
For any universal stationary policy , the asymptotic expected average reward of with respect to the Bernoulli energy arrival distribution is a function of capacity , which is denoted by , or more succinctly, , when is fixed and is clear from the context.
Definition 14
The greed index of a stationary policy is defined by
The universal greed index of a universal stationary policy is defined by
Definition 15
A stationary policy is said to be nongreedy for if it satisfies (3).
By the concept of nongreediness, Theorem 10 can be restated as follows: If a normal policy is nongreedy for , then its least favorable distribution in is Bernoulli.
Remark 16
Theorem 10 only provides a sufficient condition, so it does not cover all possible policies for which the least favorable distribution is Bernoulli (e.g., the greedy policy for sufficiently large but fixed ). However, since condition (3) coincides in part with the optimality condition for the Bernoulli distribution (see (12) with and ), any policy completely violating (3) cannot be an optimal policy for the Bernoulli distribution, and hence is not maximin optimal even if its least favorable distribution is Bernoulli.
We end this subsection with some properties of the greed index and the universal greed index. Their proofs are simple and hence left to the reader.
Proposition 17
Let .

.

Policy is nongreedy for if .
Proposition 18
Let .

.

is nondecreasing in .

.

Policy is nongreedy for if .
IiiB An Optimal Policy for Bernoulli Energy Arrivals
In this subsection we will construct an optimal policy for Bernoulli energy arrivals that is normal and nongreedy for , and consequently is maximin optimal. In order to achieve this goal, the reward function is required to be regular (Definition 2). This property further implies the following fact.
Proposition 19
A regular reward function is strictly increasing and continuously differentiable. Its derivative is strictly decreasing and satisfies .
From now on, we will assume that is regular. Under this assumption, we can construct an explicit optimal stationary policy for Bernouli energy arrivals.
Theorem 20 (cf. [14, Th. 1] and [15, Sec. IIA])
A stationary policy is optimal for i.i.d. energy arrivals with the Bernoulli distribution iff it satisfies
(6) 
for all
(7) 
where
(8) 
From the proof of Theorem 20, we can see that the value of for has no impact on the asymptotic expected average reward for Bernoulli arrivals. However, this is not necessarily the case for other energy arrival distributions. To construct a universal stationary policy with maximin optimal performance, we consider the natural extension of (6) from to . The resulting policy is analyzed with the aid of the following functions.
Definition 21
The extension of is defined by
where .
Definition 22
Let
where denotes the fold composition of with the convention .
The following propositions summarize some important properties of and .
Proposition 23
The function has the following properties:

for .

is continuous, nondecreasing, and convex.

, where and .

The least nonnegative integer such that is
which is a generalization of (8) (the latter corresponds to the special case ).
Proposition 24
The function is continuous, strictly increasing, and convex, and for all . In particular,
where
A straightforward consequence of these properties is the next theorem, which shows that is normal and nongreedy.
Theorem 25
The stationary policy has the following properties:

Policy is strictly increasing, concave, and consequently normal.

ω(¯ω^(i1)(x)) = ¯κ_1/(1p)^(i1)(ω(x)) 
, and consequently is nongreedy for .
Proof:
1) It is clear that (Theorem 20 and Proposition 24), which implies due to the invertibility of (Proposition 24). Moreover, is strictly increasing and concave (Propositions 24 and 34). Therefore, is normal.
2) Since ,
which implies that . Repeatedly applying this identity, we have , which is zero if (Proposition 23).
Theorem 26
Suppose that is regular. The stationary policy is maximin optimal for and
where is the Bernoulli distribution defined by (4).
In particular, for the special reward function given by (2), we have the following maximin optimal policy.
Theorem 27 (cf. [14, Th. 1])
Suppose that is given by (2). The policy
(9) 
is maximin optimal, where is the least integer satisfying
Proof:
With no loss of generality, we assume . Note that
and
We have
and consequently
where . It is easy to see that is regular.
By Proposition 23, it is easy to see that is a piecewise linear function, with the endpoints of line segments given by
(10)  
for . Policy is plotted in Fig. 1 for and . For comparison, the fixed fraction policy
and the greedy policy are also plotted in Fig. 1. It is observed from (10) and Fig. 1 that coincides with when . It is also observed that as .
In general, a maximin optimal policy is not guaranteed to perform well for all distributions in . But in the case where the reward function is given by (2), it is known that the fixed fraction policy is universally near optimal in terms of additive and multiplicative gaps ([14, Th. 2]); moreover, this universal near optimality is established by considering the worstcase performance of . Note that for both and , the least favorable distribution is Bernoulli (Theorem 10 or [14, Prop. 5]). Since is optimal for Bernoulli arrivals whereas is suboptimal, it follows that has strictly better worstcase performance compared to and consequently must be universally near optimal as well.
Figs. 2–10 illustrate the performance comparisons of the maximin optimal policy and the fixed fraction policy when
is Bernoulli, truncated uniform, or truncated exponential, where the truncated uniform and exponential distributions are given by
with and
respectively. It can be seen from the plots that consistently outperforms and has a clear advantage in the low batterycapacity regime (i.e., when is small). This shows that the dominance of over is not restricted to the worstcase scenario.
Remark 28
In contrast to the fact that , the MCRs of and depend on the battery capacity (in addition to their respective parameters and ). To facilitate the characterization of this dependency, we define the nominal MCR (NMCR) of a truncated distribution to be the ratio of the mean of its original distribution (with no truncation) to the battery capacity . Note that the NMCRs of and are
and
respectively. Hence, if the NMCR is , then
and
Comments
There are no comments yet.