I Introduction
The explosive growth of mobile data traffic has led to the significant increase of energy consumption in wireless communication networks. In recent years, energy harvesting communications have attracted great attention from both academic and industrial research communities, because energy harvesting technologies can shift power supply from fossil fuels to renewable energy sources, e.g., solar radiation, wind, tides, etc.[1]. However, one of the main challenges in energy harvesting communications is the renewable energy scheduling and allocation due to the randomness of renewable energy and mobile data traffic.
Heterogeneous networks (HetNets) with mixed macrocell and small cell deployment are the main architectures of the fifth generation (5G) mobile networks to improve system capacity [2]. HetNets assisted by energy harvesting technologies can reduce the conventional energy (such as grid power) consumption [3, 4]. A few researchers have proposed some renewable energy scheduling and allocation methods in energy harvesting HetNets. Some of the works modeled energy harvesting HetNets as slotbased systems [5, 6, 7]. In [5] and [7], the authors assumed that the base stations have infinite data to be sent and proposed strategies to maximize energy efficiency. In [6], the authors averaged the transmission capacities of base stations with the given arrival rates and the amount bits of packets in some locations. In [8] and [9], the authors proposed the energy harvesting HetNets based on realtime admission control, where the variation of battery is modeled as an M/D/1 queue and the traffic intensity is modeled as a Poisson point process (PPP) in [8], and both the energy arrivals and the downlink packet arrivals are modeled as Poisson counting processes in [9].
The radio resource scheduling period of the existing communication protocols is on the order of milliseconds, e.g., the resource scheduling period of LTE networks is , while intensity change period of green energy is at least on the order of seconds. Therefore, it is difficult to apply slotbased joint optimization algorithms to practical communication networks. The existing real time methods also have difficulty in finding optimal energy scheduling policies. A semiMarkov decision process (SMDP)[10] and a continuoustime Markov decision process (CTMDP) [11] are efficient ways to model the resource management problem in realtime admission control energy harvesting systems [12, 13, 14]. An SMDPbased optimal data transmission and battery charging policy for solar powered sensor networks was proposed in [12]. In [13], a routing algorithm based on the SDMP was proposed in wireless sensor networks, where the arrivals of energy harvest and traffic packets are modeled as Poisson processes. A CTMDPbased optimal threshold policy for inhome smart grid with renewable generation integration was proposed in [14].
Generally, energy harvesting is modeled as a point process in existing literatures. In this paper, solar energy assisted HetNets are considered, where a macro base station (MBS) is powered by the power grid and a small base station (SBS) is powered by solar radiation. We model solar radiation as a continuoustime Markov chain (CTMC) and model the arrivals of multiclass downlink data traffic as Poisson processes. The downlink packet transmission link selection problem is modeled as an SMDP, which can be compatible with the mainstream wireless packet networks. We solve the semiMarkov decision problem using the relative value iteration algorithm under average criterion and the value iteration algorithm under discounted criterion, respectively.
Ii System Model
Iia System Description
Solar energy assisted HetNets considered in this paper are shown in Fig. 1. We assume that communication resources (e.g., frequency bandwidth, spacial channel and nonorthogonal multiple access (NOMA) channel) are abundant, i.e., every packet can be allocated enough communication resources in time. Users in a small cell can connect to an SBS and an MBS, where the SBS is powered by solar radiation and the MBS is powered by the power grid. The SBS connects the MBS via highspeed wired link.
IiB Energy Model
We model the intensity of solar radiation on the ground based on cloud cover. The number of solar radiation states is assumed to be . The intensity of solar radiation at state is (in ), where . The radiation state sojourn time is determined analytically by the cloud size and wind speed (in
). The wind speed is assumed to be stationary for a long time. We assume that the cloud size is exponentially distributed with mean
(in , i.e., the mean diameter of the cloud that results in solar radiation state ). Therefore, the evolutions of solar radiation states can be regarded as a CTMC where the state sojourn time follows exponential distribution. For example, the transitions among the solar radiation states are sequential and circular in [15]. The transition rate matrix corresponding to the CTMC can be expressed as follows:(1) 
where the expected sojourn time of state is , correspondingly, the transition rate is .
We assume that the area of the photovoltaic panel on the SBS is (in ) and the conversion efficiency of solar energy is . Given solar radiation state , the charging power is:
(2) 
We assume the battery in the SBS has limited capacity (in ). The minimum unit of energy that the battery can process (charge and discharge) is (in ). The maximum amount of energy units in the battery is:
(3) 
where represents rounding to the nearest integer towards minus infinity. Therefore, the battery state is defined as ().
IiC Traffic Model
We only consider the downlink data traffic. It is reasonable for the solar power assisted HetNets because transmitting signals needs to consume much more energy than receiving signals. One of the most prominent features of the now available and future wireless networks is supporting multiservice applications, such as voice, video, web browsing, file transmission, interactive gaming, etc.[16]
. Therefore, the downlink packets sent to the users in the small cell are classified into
classes. The arrival process of class packet is assumed to be a Poisson process with arrival rate , where . The arrival processes of all the classes of packets are assumed to be independent. For a class packet, it takes units of energy to transmit a packet from the MBS to a user and takes units of energy to transmit a packet from the SBS to a user. In general, .Iii SMDPbased Packet Scheduling Model
The downlink packet scheduling process is modeled as an SMDP in which the distribution of the time to the next decision epoch and the state at that time depend on the past only through the state and action chosen at the current decision epoch
[10]. In our SMDP model, the time intervals between adjacent decision epochs follow exponential distributions. Generally, an SMDP can be formulated as a 5tuple, i.e., , where , (symbol denotes the set of nonnegative integers) is an decision epoch, is the state space, is the action space,is the transition probability which includes the state transition probability and the state sojourn time distribution, and
is the immediate cost. In the rest of this section, we will formulate the SMDP in detail for the considered problem.Iiia Decision Epoch
In the SMDP, the time interval between two adjacent decision epochs can be a duration with a random length within , so that downlink packets can be sent timely.
IiiB State Space
In the SMDP model, a decisionmaking state is considered, which includes a conventional state and an event , i.e., . The conventional state includes the solar radiation state and the battery state , denoted as . The arrival event of class packet can be defined as , . The next solar radiation state is defined as . Thus, the arrival event of the next solar radiation state is defined as . The state space is the set of all the available decisionmaking states, represented as:
(4) 
where .
IiiC Action Space
The downlink packets can be transmitted by the MBS or SBS (if it has enough energy in the battery). The action that a packet is transmitted by the MBS is defined as and the action that a packet is transmitted by the SBS is defined as . When the solar radiation state changes, the controller takes an fictitious action, defined as . The action space is defined as the set of all possible actions, as follows:
(5) 
IiiD State Transition Probability
Given solar radiation state , the amount of time to harvest a unit energy is:
(6) 
The harvested energy is accumulated after the decision epoch. Assuming that the current battery state satisfies and the action is (or ), the next battery state should be one of the states in the states set . Correspondingly, the next event occurs in the time intervals from the current decision epoch.
If the next event is the arrival of a class packet (i.e., ) and occurs within duration , the next state is . Since all the downlink packet arrivals and the next solar radiation state arrival are independent under the given stateaction pair , the occurrence rate of the next event is:
(7) 
where the time interval from the stateaction pair to the next event occurrence follows an exponential distribution with parameter
(represented as the random variable
in Fig. 2), and the time interval to the other events occurrence follows an exponential distribution with parameter (represented as the random variable in Fig. 2). The probability density functions are respectively as follows:
(8) 
(9) 
Random variables and are independent. Since the next battery state is , the state transition probability (the integral of the function over the region in Fig. 2) is:
(10) 
In Fig. 2, represents the integral area of the probability that precedes the other events in the interval and represents the integral area of the probability that precedes the other events in the interval . Using the same method, all the state transition probabilities can be obtained.
IiiE Immediate Cost
Since the solar power equipment needs a certain cost and the charging and discharging processes also have some loss to equipment, we assume the price of every unit of solar energy is . We also assume the price of every unit of grid power energy is . Usually, . We consider the energy costs consumed at decision epochs as immediate costs:
(11) 
Iv Packet Scheduling Policy Optimization
The simplest link selection policy is the greedy policy, i.e., when a packet arrives, it is served by the SBS as long as there exists enough energy in the battery; otherwise the packet is served by the MBS. However, the greedy policy may not minimize the longterm energy costs. For example, there are two class packets. If and , the action may be better than when event occurs. Therefore, we should utilize the statistical characteristics to get a longterm optimal packet scheduling policy. In this section, we present two semiMarkov decision problem formulation criterions to get asymptotically optimal link selection policies; one is the average criterion, and the other is the discounted criterion.
Iva Average Criterion
We assume that the system is stationary, which means that its statistical properties are invariant with respect to time. Therefore, the decision policy for the SMDP model can be defined as a timeinvariant mapping from the state space to the action space: . Starting from state and continuing with policy , the longterm expected average cost can be formulated as follows:
(12) 
where is the time interval between the th and th decision epoch. In our SMDP model, the embedded Markov chain for every available policy is a unichain which consists of a single recurrent class plus a (possible empty) set of transient states. Thus, the average cost is independent on the initial state, namely , for all . The objective is to find an optimal link selection policy so as to minimize the longterm expected energy cost, i.e.:
(13) 
The Bellman optimality equation is a necessary condition for optimality associated with dynamic programming (DP). The following theorem presents the Bellman optimality equation for a unichain SMDP with average criterion.
Theorem 1
For a unichain SMDP, there are a scale and a function of states , satisfying the following Bellman optimality equation:
(14) 
where is the average cost under an optimal policy, is the expected time interval between adjacent decision epochs when action is taken under state , i.e., .
A proof of the above theorem can be found in [10]. We have obtained the state transition probability and the expected time interval between adjacent decision epochs . Accordingly, a DPbased algorithm (such as the value iteration algorithm or the policy iteration algorithm) derived from the Bellman optimality equation can be used to find an (asymptotically) optimal packet scheduling policy. In general, the discretetime MDP requires that is identical for each available stateaction pair, so that we cannot directly use these algorithms to solve the semiMarkov decision problem with average cost criterion because the expected time interval between adjacent decision epochs is not identical for every stateaction pair .
In order to apply the algorithms of the discretetime MDP to the SMDP, we should uniformize the event occurrence rates for all stateaction pairs by adding extra fictitious decisions. The uniformization method is described as follows:
For all and , the uniform constant event occurrence rate has to satisfy the following inequality:
(15) 
We set as follows:
(16) 
and multiply both sides of (14) by . Let
and
Thus, the uniform Bellman optimality equation can be given as:
(17) 
We use the relative value iteration algorithm to obtain the optimal policy as shown in Algorithm 1, where the operation symbol in step 3 is the span which is defined as . If constant is small enough, the optimal policy can converge to an optimal policy.
IvB Discounted Criterion
In practice, the statistical characteristics of the downlink data traffic and the solar radiation are timevariant. For convenience, we assume that they are timeinvariant within a finite horizon, e.g., one hour. The discounted SMDP may better formulate the downlink packet scheduling problem, because the decisions in the future will have less impact on the present over time. We assume that the continuoustime discounting rate is () which means that the present value of one unit received time units in the future equals . Assuming that the initial state is and the timeinvariant policy is followed by, the discounted expected total energy cost can be formulated as:
(20) 
where represents the time of the th decision epoch.
We define as the transition probability that the next decision epoch occurs at or before time , and the system state at that decision epoch equals under the current stateaction pair . We also define as
(21) 
and as the differential coefficient of . If the next battery state is , the next event occurrence time is . Assuming the next event is , is formulated as follows:
(22) 
In the same way, for all available can be obtained. Therefore, the discounted expected total energy cost can be also formulated as:
(23) 
An asymptotically optimal downlink packet scheduling policy satisfies the following Bellman optimality equation:
(24) 
According to the Bellman equation, we use the value iteration algorithm (Algorithm 2) to solve the discounted semiMarkov decision problem so as to obtain an asymptotically optimal packet scheduling policy.
V Simulation Results
In this section, we evaluate the system performance of our proposed SMDPbased downlink packet scheduling scheme by Matlab numerical simulation. Specifically, we present the average cost and policy of the relative value iteration algorithm under average criterion, the value iteration algorithm under discounted criterion and the greedy algorithm, respectively. We describe the simplified solar radiation in two states, i.e., direct sunlight and cloud cover. The solar radiation states are sequential and circular. We assume there are two classes of downlink packets. The simulation parameters are summarized in Table I. We use Monte Carlo method to generate random data based on the corresponding parameters to measure the average cost. The simulation results of the average cost are averaged over 10 runs, where each simulation run lasts for 3600s.
Parameter  value  Parameter  Value 

50  0.05  
200  1  
50  10  
100  5  
2  8  
0.2  10  
0.1  3  
2  6  
1.5  
Fig. 3 presents the average cost versus the arrival rate of the class packets. The average cost of the relative value iteration algorithm under the average criterion is similar to that of the value iteration algorithm under the discounted criterion. Both of them are less than that of greedy algorithm. In the future work, we will further investigate the performance of the relative value iteration algorithm under the average criterion and the value iteration algorithm under the discounted criterion in slow timevarying systems.
Table II shows the downlink packet scheduling policies for the three algorithms under the default parameters. For a decisionmaking state , the three columns policies correspond to the relative value iteration algorithm, the value iteration algorithm and the greedy algorithm, respectively. In the greedy algorithm, when a packet arrives, it is served by the SBS as long as there exists enough energy in the battery because of low immediate cost. In the SMDPbased packet scheduling algorithms, the actions are chosen based on the current state and the statistical characteristics of the system.
m  [0,m],  [1,m],  [0,m],  [1,m],  
0  0  0  0  0  0  0  0  0  0  0  0  0 
1  0  0  0  0  0  0  0  0  0  0  0  0 
2  0  0  0  0  0  0  0  0  0  0  0  0 
3  0  1  1  1  1  1  0  0  0  0  0  0 
4  0  1  1  1  1  1  0  0  0  0  0  0 
5  0  1  1  1  1  1  0  0  0  0  0  0 
6  1  1  1  1  1  1  0  0  1  0  0  1 
7  1  1  1  1  1  1  0  0  1  0  0  1 
8  1  1  1  1  1  1  0  0  1  0  0  1 
9  1  1  1  1  1  1  0  0  1  0  1  1 
10  1  1  1  1  1  1  0  0  1  0  1  1 
11  1  1  1  1  1  1  0  0  1  0  1  1 
12  1  1  1  1  1  1  0  0  1  1  1  1 
13  1  1  1  1  1  1  0  0  1  1  1  1 
14  1  1  1  1  1  1  1  0  1  1  1  1 
15  1  1  1  1  1  1  1  0  1  1  1  1 
16  1  1  1  1  1  1  1  0  1  1  1  1 
17  1  1  1  1  1  1  1  1  1  1  1  1 
18  1  1  1  1  1  1  1  1  1  1  1  1 
19  1  1  1  1  1  1  1  1  1  1  1  1 
20  1  1  1  1  1  1  1  1  1  1  1  1 
Vi Conclusions
In this paper, we proposed an SMDPbased downlink packet scheduling scheme for solar energy assisted HetNets, where the intensity of solar energy is modeled as a CTMC and the arrivals of multiclass downlink packets are modeled as Poisson processes with different rates. We obtained the asymptotically optimal packet scheduling policies with respect to average cost SMDP and discounted cost SMDP. Both the intuitive example and the simulation results show that the asymptotically optimal packet scheduling policies are better than the greedy policy. In our future work, we will consider bandwidth constraints and jointly design the bandwidth allocation and energy management in solar assisted energy HetNets.
References
 [1] M. L. Ku, W. Li, Y. Chen and K. J. Ray Liu, “Advances in energy harvesting communications: Past, present, and future challenges,” IEEE Commun. Surveys Tuts., vol. 18, no. 2, pp. 1384–1412, 2nd Quart. 2016.
 [2] C. X. Wang, F. Haider, X. Gao, X. H. You, Y. Yang, D. Yuan, H. M. Aggoune, H. Haas, S. Fletcher and E. Hepsaydir, “Cellular architecture and key technologies for 5G wireless communication networks,” IEEE Commun. Mag., vol. 52, no. 2, pp. 122130, Feb. 2014.
 [3] S. Zhang, N. Zhang, S. Zhou, J. Gong, Z. Niu and X. Shen, “Energysustainable traffic steering for 5G mobile networks,” IEEE Commun. Mag., vol. 55, no. 11, pp. 54–60, Nov. 2017.
 [4] H. S. Dhillon, Y. Li, P. Nuggehalli, Z. Pi and J. G. Andrews, “Fundamentals of heterogeneous cellular networks with energy harvesting,” IEEE Trans. Wireless Commun., vol. 13, no. 5, pp. 2782–2797, May 2014.
 [5] Y. H. Chiang and W. Liao, “Green multicell cooperation in heterogeneous networks with hybrid energy sources,” IEEE Trans. Wireless Commun., vol. 15, no. 12, pp. 7911–7925, Dec. 2016.
 [6] T. Han and N. Ansari, ”Provisioning green energy for base stations in heterogeneous networks,” IEEE Trans. Veh. Technol., vol. 65, no. 7, pp. 5439–5448, Jul. 2016.

[7]
Y. Wei, F. R. Yu, M. Song and Z. Han, “User acheduling and resource allocation in hetNets with hybrid energy supply: An actorcritic reinforcement learning approach,”
IEEE Trans. Wireless Commun., vol. 17, no. 1, pp. 680–692, Jan. 2018.  [8] S. Zhang, N. Zhang, S. Zhou, J. Gong, Z. Niu and X. Shen, “Energyaware traffic offloading for green heterogeneous networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 5, pp. 1116–1129, May 2016.
 [9] Q. Han, B. Yang, C. Chen, and X. Guan, “Energyaware and QoSaware load balancing for HetNets powered by renewable energy,” Comput. Netw., vol. 94, pp. 250–262, Jan. 2016.
 [10] M. L. Puterman, Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 1994.
 [11] X. Guo, O. Hern ndezLerma. Continuoustime Markov decision processes. Berlin: Springer, 2009.
 [12] M. A. Murtaza and M. Tahir, ”Optimal data transmission and battery charging policies for solar powered sensor networks using Markov decision process,” in Proc. IEEE Int. Conf. Commun., Shanghai, China, Apr. 2013, pp. 992–997.
 [13] G. Martinez and C. Zhou, ”Maximum lifetime SMDP routing for energyharvesting wireless sensor networks,” in Proc. IEEE Veh. Technol. Conf., Montreal, Canada, Sept. 2016.
 [14] G. R. Liu, P. Lin, Y. Fang and Y. B. Lin, “Optimal threshold policy for inhome smart grid with renewable generation integration,” IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 4, pp. 1096–1105, Apr. 2015.
 [15] D. Niyato, E. Hossain, and A. Fallahi, “Sleep and Wakeup Strategies in SolarPowered Wireless Sensor/Mesh Networks: Performance Analysis and Optimization,” IEEE Trans. Mobile Comput., vol. 6, no. 2, pp. 221–236, Feb. 2007.
 [16] X. Yang and G. Feng, “Optimizing admission control for multiservice wireless networks with bandwidth asymmetry between uplink and downlink,” IEEE Trans. Veh. Technol., vol. 56, no. 2, pp. 907–917, Mar. 2007.