I Introduction
Data freshness is gaining increasing importance in realtime services like real time positioning, monitoring and industrial control. To support these applications, users that track the corresponding physical phenomena are scheduled to send updates to the central controller via timevarying wireless channels. However, the wireless bandwidth and interference constraint, the limited power resource of each user and the time varying nature of wireless channels create obstacles in scheduling strategy design. Moreover, traditional quality of service (QoS) guarantees such as latency and throughput have their limitations and may not guarantee a good data freshness performance. Thus, it is of great importance to revisit sampling and scheduling strategies in wireless networks in order to obtain more fresh information.
A recently proposed metric, the Age of Information (AoI) [1], namely the time elapsed since the generation timestamp of the freshest information stored at the receiver, has been widely adopted to measure data freshness in communication networks. Intuitively, to guarantee a low AoI performance in resource constrained networks, packets with short delay have to be transmitted in a timely manner. Optimizing and analyzing AoI performance in point to point communication systems with power consumption constraint have been studied [2, 3, 4, 5, 6, 7, 8]. In these works, the optimal sampling and transmission strategy in the presence of queueing delay[4] and transmission failure [7] are shown to possess a threshold structure, i.e., sampling and update transmission occur when information at the receiver is no longer fresh while the update packets, if successfully received, can significantly reduce data staleness.
AoI performance and optimization in multiuser network have been studied in [9, 10, 11, 12, 13, 14, 15, 16, 17]. When all the users in the network are identical and update packets can be generated at will, a greedy policy that samples and schedules to transmit the user with the largest AoI is shown to be optimal [9]. When there is no packetloss in the network, this greedy policy is equivalent to the round robin strategy, which is shown to be order optimal when update packets can not be generated at will and arrive randomly [15]. In [10], it is revealed that users with relatively bad channel states are updated less frequently. Scheduling to minimize AoI performance in networks with timevarying channels are studied in [12, 13], where centralized and decentralized policies to minimize the average peak age of information (PAoI) are proposed. However, the channel model considered in these works have two states and no power adaptation strategy is used to combat wireless fading effect.
To combat the aforementioned fading effect, transmission power and bandwidth limitations, which appear at different layers of communication networks, crosslayer control strategies have been studied in [18, 19, 20, 21, 22, 23, 24, 25] to minimize delay or maximize throughput. In [23], a Lazy scheduling policy that assigns scheduling decision based on the queue backlog is proposed. Considering timevarying fading nature of wireless channels, rate and power adaptation strategy is proposed in [24]. To minimize queueing delay in a point to point timevarying channel with average power constraint on the transmitter, a probabilistic scheduling strategy is proposed in [20, 21]. Although crosslayer strategies have been studied in delay minimization, throughput and utility maximization, the design to optimize Age of Information has not been very well studied.
To fill this gap, in our paper, we consider a single controller schedules multiple users to transmit updates in a wireless network. Similar to the crosslayer framework [26], the wireless link of each user is modeled to be multistate timevarying and different level of transmission power is used in different channel state to guarantee success transmission. The overall objective is to minimize the expected average AoI performance when network is restricted to bandwidth constraint and scheduling decisions have to satisfy the power constraint of each user. Inspired by [25], we first relax the hard bandwidth constraint and decouple the multiuser scheduling problem into a single user constrained Markov decision process (CMDP). Then we propose a truncated scheduling policy that can achieve an asymptotic average AoI performance over the entire network.
The main contributions of the paper are summarized as follows:

A crosslayer network opportunistic scheduling framework to study the AoI minimization with power constrained users is proposed. The channel states of all users are assumed to be known at the beginning of each slot through channel estimation before scheduling decisions are made and remain constant during the slot. Different level of transmit power is adopted in different channel state to ensure successful packet transmission. This model captures key features of practical crosslayer network optimization problem and facilitates analysis.

By relaxing the hard bandwidth constraint, we decouple the multiuser bandwidth and power constrained scheduling problem into a singleuser CMDP. The threshold structure of the optimal policy to the CMDP is exploited and then the CMDP is converted into a Linear Programming (LP).

A dualmethod is proposed to search for the Lagrange multiplier such that the relaxed bandwidth constraint can be satisfied. Inspired by [25], we propose an asymptotic optimal truncated scheduling policy to minimize AoI performance under hard bandwidth constraint. The performance of the algorithm is analyzed and verified through simulations.
The remainder of this paper is organized as follows. The network model and the data freshness metric, AoI, are introduced in Section II. In Section III, we decouple the multiuser scheduling problem into singleuser level CMDP and search for the optimal policy through LP. In Section IV, a truncated multiuser scheduling policy is proposed. Section V evaluates and analyzes the performance of the proposed algorithm and Section VI draws the conclusion.
Notations: Vectors and matrices are written in boldface lower and upper letters. The probability of event given condition is denoted as Pr
. The expectation operation with regard to random variable
is denoted as . The cardinality of set is denoted as .Ii System Model and Problem Formulation
Iia Network Model
We consider a network with a central controller collecting timesensitive data from users via wireless links. Let the time be slotted, i.e., . The central controller schedules users to transmit update at the beginning of each slot over timevarying fading channels. Let the indicator function to be a scheduling decision. If , then user is scheduled to transmit update packet during slot and the packet will be received successfully by the end of the slot. Due to the limited bandwidth resource, no more than users can be scheduled simultaneously, which casts the following restrictions on :
(1) 
We assume that the communication channel between the central controller and each user experiences an independent state block fading, where is a positive integer. The channel state remains constant during a slot but follows an i.i.d fading process across the slots. Let be a random variable that captures the channel state of user during slot , large indicates that link is more noisy and goes through stronger fading during slot . Let the probability mass function of be:
(2) 
where . For each user , the sum of must satisfy:
(3) 
When user is scheduled to transmit updates in a slot and the corresponding channel state is , in order to guarantee successful transmission, it will consume units of energy. To combat the effect of channel fading, more power will consumed when the channel is more noisy, thus is an increasing sequence. The transmitted packet will be successfully received by the central controller at the end of the slot. For a typical scheduling decision related to user , the average power consumed in consecutive slots can be computed as follows:
(4) 
IiB Age of Information
We measure data freshness of the central controller by using the metric Age of Information(AoI)[1]. By definition, the AoI is the time elapsed since the generation timestamp of the freshest information at the receiver.
Let be age of information of user at the beginning of slot . In this work, it is equivalent to the number of slots elapsed since the last delivery to user . If , fresh information about user will be received by the central controller at the end of slot , thus ; otherwise, since there is no update packet received from user during slot , information about user will be one slot older, hence increases linearly and . The evolution of AoI is organized as follows:
(5) 
IiC Problem Formulation
For a given network setup with users and channel states distributions , we measure the data freshness of the entire network by following policy in terms of the expected average AoI of all users at the beginning of each time slot for a consecutive of slots, which is computed as follows:
(6) 
where the vector denotes the AoI of all users at the beginning of slot . In this work, we assume that all the users have been synchronized initially, i.e., and omit it henceforth.
Denote
to be the class of nonanticipated policies, i.e., scheduling decisions are made based on current and past AoI, channel states as well as their probability distributions, while no future information or prediction about channel states are exploited. The central controller is fully aware of the average power constraints of each user and aim at designing policy
in order to minimize the average expected AoI of the entire network. The bandwidth and power constrained AoI (B&P Constrained AoI) minimization problem is organized as follows:Problem 1 (B&PConstrained AoI)
(7a)  
s.t.  (7b)  
(7c) 
The hard bandwidth constraint (7b) in every slot suggests that the B&PConstrained AoI problem is an intractable integer programming problem, which is extremely difficult to handle. We tackle with this challenge through the following approaches:

In Section IIID, the decoupled single user CMDP is solved through LP. And in Section IV, based on the solution to each of the decoupled single user, we propose a truncated scheduling policy that can satisfy the hard bandwidth constraint (7b).
Iii Scheduling by userlevel decomposition
In this section, we start by relaxing and decoupling the B&PConstrained AoI problem, then formulate the decoupled single user scheduling problem into a constrained Markov decision process (CMDP). We exploit the threshold structure of the optimal stationary randomized policy and the optimal solution is solved through linear programming (LP).
Iiia SingleUser Level Decomposition
Let us first relax the hard constraint (7b) into an timeaverage constraint, the problem of scheduling multiple power constrained users with relaxed bandwidth constraint (RB&PConstrained AoI) can be organized as follows:
Problem 2 (RB&PConstrained AoI)
(8a)  
s.t.  (8b)  
(8c) 
Next we establish the Lagrange function and place the relaxed constraint into the objective function (7a) as follows:
(9) 
The Lagrange multiplier associates with the relaxed constraint and can be viewed as a penalty incurred by policies that want to schedule users above the relaxed bandwidth constraint. For fixed , the optimization problem (9) can then be decoupled into single user cost minimization problem with average power consumption constraint. The objective of user is to develop a scheduling strategy such that under power constraint Eqn. (7c), the average overall cost incurred by AoI and scheduling penalty can be minimized. The decoupled single user power constrained cost minimization problem is organized as follows:
Problem 3 (Decoupled PConstrained Cost)
(10a)  
s.t.  (10b) 
Since the primal relaxed problem (9) gets decoupled, we omit the subscript henceforth. We formulate Decoupled PConstrained Cost minimization problem into an CMDP in Sec. III(B) and analyze the optimum structure in Sec. III(C). In Sec. III(D), we convert the singleuser optimization problem with fixed into a Linear Programming (LP).
IiiB Constrained Markov Decision Process Formulation
The decoupled singleuser scheduling problem can be formulated into a CMDP that consists of a quadruplet , each item is explained as follows:

State Space: The state of a user at the beginning of slot is the current number of slots elapsed since the last update and the channel state , the state space is thus countable but infinite.

Action Space: There are two possible actions , where denotes updates from the user is scheduled at the beginning of slot , and represents that the user keeps idle and is not scheduled. Notice that is different from scheduling decision , which has strict bandwidth constraint.

Probability Transfer Function: If the user is not selected to transmit updates in slot , i.e., , then the information will be one slot older and AoI increases linearly, , otherwise if the user is scheduled, then . The channel state evolves independently of , hence the probability transfer function from state is organized as follows:
(11a) (11b) 
OneStep Cost: For given state , the onestep cost by taking action contains AoI growth and scheduling penalty, which can be computed as follows:
(12a) while the onestep power consumption is: (12b)
The objective of the decoupled CMDP problem is to design a scheduling policy such that under the average power constraint,
the overall cost containing both AoI and scheduling penalty over infinite horizon can be minimized, which is computed as follows:
IiiC Characterization of the Optimal Policy
In this part, we focus on exploiting the threshold structure of the optimal policy. First we provide the formal definition of a stationary randomized policy and stationary deterministic policy:
Definition 1
Let and denote the class of stationary randomized and stationary deterministic policy, respectively. Given observation , a stationary randomized policy choose action with probability measure for all . A stationary deterministic policy selects action , where is a deterministic mapping from state space to action space.
According to [28, Theorem 4.4], the optimal policy to the above CMDP has the following property:
Corollary 1
An optimal stationary randomized policy exists for the decoupled single user power constrained scheduling problem (3), and it is a mixture of no more than two stationary deterministic policies . Let be the weight of following stationary deterministic policy and be the weight of following . Then the optimum policy is:
(13) 
Each of the deterministic policy can be obtained through the Lagrangian primaldual method [28]. Let be the Lagrange multiplier related to the average power constraint, then the single user CMDP can be converted into an unconstrained MDP, the objective is to minimize the following overall cost by designing policy with no constraint:
Problem 4 (Decoupled Unconstrained Cost)
For given Lagrange multiplier , a stationary deterministic policy to minimize the above unconstrained cost exists. Moreover, there exits a differential costtogo function that satisfies the following Bellman equation:
(14) 
where is the average cost by following the optimal policy. Next, we will prove the threshold structure of the stationary deterministic policy for given , which will present insight for the structure of the optimal stationary randomized policy to solve the CMDP problem (3).
Lemma 1
The optimal stationary deterministic policy for solving the Decoupled Unconstrained Cost minimization problem with fixed possesses a dual threshold structure, which is explained as follows:

For any channel state , there exists a threshold , such that when , the optimal action and when , .

The set of threshold is nondecreasing, i.e., .
Proof:
The detailed proof is provided in Appendix A. Here we provide an intuitive analysis. Since communication between the user and the controller is power constrained, we only schedule when the information is no longer fresh or the channel state is good, i.e., is large or is small. This behavior characterizes a threshold structure.
IiiD Probabilistic Scheduling Policy for Single user Case
Denote to be the probability that the user is scheduled to send updates with age and channel state . We aim at finding a set of optimal transmission probability such that total cost of AoI performance and scheduling penalty for a single decoupled user can be minimized. From Sec. III(C), a stationary randomized policy that solves Decoupled PConstrained Cost problem is a randomization between two stationary deterministic policies[28], each of them can be obtained by solving the Decoupled Unconstrained Cost minimization problem, which is an unconstrained MDP. Considering the threshold structure of them and Eqn. (13), it can be concluded there exists set of nondecreasing thresholds , for each state , if , the stationary randomized policy is to schedule the user, i.e., . As an outcome, for each of the decoupled single user problem, when , the user will always be scheduled and the AoI cannot be larger than the largest threshold . To find the optimal policy, we choose a large that can guarantee in the following analysis.
Denote be the steady distribution of the user’s AoI, where denotes the probability that . The probability transfer graph between the states is plotted in Fig. 1. Let and denote the one step state transition probability from to and from to , respectively, i.e.,
(15a)  
(15b) 
From the discussed threshold structure of deterministic policy, with properly selected , under the optimal scheduling policy, the steady state distribution will be 0. And we have the following lemma:
Lemma 2
Proof:
If the state evolves from to , then it can be concluded the user is not scheduled in state . At state , the probability that the user is not scheduled equals
. According to the law of total probability,
(17) 
where Eqn. (17) is obtained because of Eqn. (3), the sum on equals 1. The computation of backward probability can be obtained similarly and is hence omitted here.
Let be the probability transfer matrix between the states, which is,
(18) 
where vector is the backward state transition probability vector and . Vector is a dimension vector with all the elements being 0. According to property of the steady state distribution, we have . In addition, considering that , then we have . Thus, the steady distribution relates to strategy is the solution to the following linear equations:
(19) 
where is a dimension column vector with all the elements being 1.
Next, we will convert the search for the optimal stationary randomized scheduling strategy into an LP. We introduce a new set of variables , each denotes the probability of the user is in state and is scheduled to transmit an update. With this set of variables, we present the following theorem:
Theorem 1
The Decoupled PConstrained Cost minimization problem is equivalent to the following LP problem:
(20a)  
s.t.  (20b)  
(20c)  
(20d)  
(20e)  
(20f)  
(20g) 
Proof:
Let us compute the equivalent time average cost to Eqn. (3) by using variables . Given steady state distribution , the probability that the user is in state is . With probability , the user is selected to be scheduled and incurred a cost of , and the user is selected to keep idle with probability and incurred a cost of . Then the time average cost by following policy can be computed by:
(21) 
For each state , the power consumed for being active is . Then, the timeaverage power consumed by employing policy is:
(22) 
with this equation the power constraint (7c) can be converted in the linear constraint (20f). The constraint Eqn, (20b)(20d) can be obtained by substituting with and , the relationship is obtained from (19). Notice that , the inequality constraint (20e) can be obtained.
Till now, we construct an LP problem to obtain the optimum stationary randomized policy to minimize the total cost of a single user with fixed Lagrange multiplier . Next, we can construct the optimal stationary randomized scheduling policy to minimize Lagrange function for a single user. According to the threshold structure of each deterministic policy, we will have the following properties on :
Corollary 2
The set of optimal scheduling probabilities possesses the following characteristics:


For any channel state :
(23a) 
For a specific AoI :
(23b)
With this corollary, we can then present the threshold structure of the stationary randomized policy:
Theorem 2
The optimal stationary randomized policy for solving the singleuser scheduling problem (3) under power consumption constraint also possesses a threshold structure, which is explained as follows:

For any channel state , there exists a threshold , such that when , it is always optimal to schedule, i.e., and when , , while the scheduling decision at the may be a randomized strategy, i.e., .

The set of threshold is nondecreasing, i.e., .
Proof:
Suppose is the set of optimum decision and is the optimizer to the LP. If , suggesting the average power consumed by the optimum scheduling policy for the decoupled single user didn’t meet the power constraint. Then it can be concluded the solution to the single user CMDP is the same to the decoupled single user problem (P(3)) without the power consumption constraint (10b). Since optimal solution to unconstrained MDP belongs to the class of stationary deterministic policy, it can be concluded that . Then considering Corollary 2, sequence is increasing for fixed and sequence is increasing for fixed . Then Theorem 2 can be verified.
If , suggesting the optimum policy uses up all the power constraint. Notice that Corollary 2 is similar to [21, Lemma 2] and the proof for Theorem 2 in this case can be carried out in a similar manner to [21, Theorem 5], then it can be concluded that for all the , there exists at most one state such that and for all the other states , the scheduling probability . Thus, considering Corollary 2, the threshold structure of the optimum stationary randomized policy and the increasing characteristic of the thresholds can be verified.
Iv Multiuser Opportunistic Scheduling
In this section, we will provide an algorithm to determine the multiplier such that relaxed bandwidth constraint can be satisfied. Then, we propose an asymptotic optimal truncated scheduling algorithm for the multiuser case that satisfies the original hard bandwidth constraint Eqn. (7b).
Iva Determination of Lagrange Multiplier
After solving the single user problem for fixed , by combining the optimum scheduling strategy for each of the user, the optimal policy to minimize the Lagrange function Eqn. (9) for fixed can be obtained. Next, we describe how to obtain the optimal Lagrange multiplier so that the RB&PConstrained AoI problem can be solved.
Let denote the Langrangre dual function, i.e.,
(24) 
Since the relaxed problem gets decoupled into single user CMDP, the dual function can be computed by:
(25) 
Notice that by Theorem 1, the CMDP that minimizes is equivalent to an LP, then the duality gap between and the CMDP is zero [28]. Let and denote the average AoI and the average activation probability for user , respectively. By computing the optimum resource allocation vector through solving LP (20), the dual function can be computed as follows:
(26a)  
(26b)  
(26c) 
Finally, we apply the subgradient descent method to search for the Lagrange optimizer. Let be the Lagrange multiplier used in the iteration. According to [29, Eqn. 6.1.1], the subgradient at can be computed by:
(27) 
We start with , if , then scheduling does not have to consider the relaxed bandwidth constraint. Otherwise, we adopt an iterative algorithm update. By choosing a set of stepsizes similar to [7], the multiplier for the next iteration can be computed by:
(28) 
The iteration ends until .
However, since the RB&PConstrained AoI is a constrained Markov decision process, the optimum scheduling policy of which should be a randomization between no more than two policies, each is the solution to minimize the Lagrange function Eqn. (9). The randomization between the two policies will enable us to satisfy the relaxed bandwidth constraint Eqn. (8b) in the RB&PConstrained AoI. Next, we will talk about how to obtain the optimum randomized strategy from the obtained Lagrange multipliers sequence .
Let and be two Lagrange multipliers chosen from sequence ,
(29a)  
(29b) 
Then, let and be the total bandwidth used with respect to minimize the function Eqn. (9). Let be solution to user n’s LP problem (20) with multiplier and is the solution with multiplier . To satisfy the relaxed bandwidth constraint, the optimum distribution of the relaxed problem is a linear combination of and , which can be computed as follows:
(30) 
where the coefficient can be computed in a similar manner to [7]:
Notice that still satisfy the constraint of the LP problem for user . Consider the structure of each Decoupled PConstrained Cost problem, the optimum scheduling strategy for the RB&PConstrained is then constructed as follows:
In each slot , the central controller observes the current AoI and channel state of user , a scheduling decision is then made with probability is can be computed as follows:
(31) 
Finally, the minimum AoI performance to the RB&PConstrained AoI problem can be computed through according to the optimizer , which also formulates the lower bound on the AoI performance to the primal B&PConstrained AoI:
(32) 
IvB Multiuser opportunistic scheduling with hard constraint
In this part we construct a truncated policy based on optimal scheduling policy for each of the decoupled user and solve the primal B&PConstrained AoI problem. Let be the optimum scheduling policy obtained in Sec. IV(A), where is the scheduling decision under the relaxed constraint, which measures if user is eager be scheduled. Denote as the set of users that are eager to be scheduled. The scheduling decision under hard bandwidth constraint is then carried out as follows:

If , i.e., the total number of users that are eager to send updates is less than the available bandwidth, then all the users that are eager to be scheduled can send their updates, i.e., .

Otherwise if , the central controller selects a subset of users from and schedule them to send updates. Those users that are in set but not selected in is not selected because of limited bandwidth constraint.
Theorem 3
With the proportion of scheduling resources keeps a constant, the deviation from the optimal scheduling policy for a network with users under the proposed truncated policy is . Thus, with and , the proposed truncated policy is shown to be asymptotically optimal for the primal B&PConstrained AoI problem with hard bandwidth constraint.
Proof:
The detailed proof is provided in Appendix C.
V Simulations
In this section, we provide simulation results to demonstrate the performance of the proposed scheduling policy. Notice that from [10], the optimal policy to minimize AoI performance when all the users are identical is a greedy policy that selects the user with the largest AoI. If there is no packet loss in the network, the greedy policy is equivalent to round robin, which requires a minimum power consumption of for user . In the following simulations, we measure power consumption constraint through ratio . Small indicates that the corresponding user has a smaller amount of average power budget. We consider a states timevarying channel, the distribution is assumed to be and for all users. All the simulation results are obtained over a consecutive of slots.
Fig. 2 studies average AoI performance as a number of users, . The power constraint factor is taken from , i.e., and the bandwidth . Denote as the total power consumed by user until slot and let be the set of users that has enough power to support transmission in slot . We compare the proposed policy with a naive greedy policy that selects no more than users with the largest AoI from set for scheduling. As can be seen from the figure, the proposed truncated scheduling achieves a close average AoI performance to the lower bound and can achieve more than 30% AoI decrease compared to the greedy algorithm when .
Fig. 3 studies the asymptotic average AoI performance as a number of users, with . The power constraint of each user is selected by . As can be observed from the figure, the difference between the proposed strategy and the lower bound decreases with . The asymptotic performance is also verified in simulation results.
We visualize the scheduling policy for some representative users in Fig. 4. The network consists of users with bandwidth , the power constraint factor for each user is . Fig. (a)(d) demonstrate user with power constraint , respectively. In Fig. 4(a)(b), the transmission power for each user is limited, the scheduling threshold is a increasing sequence as channel state . Moreover, the threshold of each channel stated in Fig. 4(b) is smaller than corresponding threshold in Fig. 4(a), indicating that transmission is more likely to happen as a result of more available transmission power. The optimal strategy for a single user seeks to exploit a good channel state in order to satisfy the power constraint, while trying to keep the AoI small. If unfortunately the channel state is always bad, he will keep waiting until data staleness cannot be bare anymore or the channel state turns good. By comparing Fig. 4(a)(b), the scheduler tries to make full use of the transmission power through a refinement of activation thresholds. When , power consumption is not a problem, all the channel states shares identical activation threshold that can satisfy the relaxed bandwidth constraint, a user cannot be selected to send updates all the time even if he has enough power. Considering the greedy AoI performance depicted in Fig. 2, although greedy algorithm attempts to use up the power and bring smaller AoI performance to users equipped with enough power, it fails to exploit a good channel and opportunistically schedules those power constrained users, hence lead to a much higher AoI performance. Thus, for a network with different power constrained users, the scheduling strategy for different users varies according to their power constraints. The scheduler seeks good channels to carry out scheduling decisions for those power constrained users, while users supported by enough power are updated in a timely manner that can satisfy bandwidth constraint.
Vi Conclusions
In this work, we investigate into the problem of age minimization scheduling in power constrained wireless networks, where communication channels are multistate time varying and different levels of transmission power is adopted to ensure successful transmission. We decouple the multiuser scheduling problem into a single user level constrained Markov decision process. We reveal the threshold structure of the optimal stationary randomized policy for the single user and convert the optimal scheduling problem into a linear programming. An asymptotic optimal truncated scheduling policy for multiuser scenario that satisfies the hard bandwidth constraint is proposed. It is revealed that when power of the user is very limited, the scheduler seeks to exploit a good channel state while keeping the information fresh and minimize the scheduling opportunities. Users equipped with sufficient power are updated in a timely manner that can satisfy the hard bandwidth constraint.
Appendix A Proof of Lemma 1
Proof:
The threshold structure of the optimal policy that minimizes the average cost of (4) is proved by insights from the discounted cost problems, where is a discount factor. Given state , the expected  discounted cost starting from the state over infinite horizons by following policy can be computed:
(33) 
Let be the minimum expected total discounted cost starting from state . Then, the minimum total discounted cost will satisfy the following equation:
(34) 
To verity the threshold structure of the optimal policy to the total discounted cost problem, we will introduce the following characteristic of :
Lemma 3
For given discount factor and fixed channel state , the value function increases monotonically with ; for fixed , the value function is a nondecreasing function of channel state .
The details of the proof will be given in Appendix B. Finally, let us verify the threshold structure.
(1). For any channel state , there exists a threshold , such that the optimal action and .
If the optimal policy , i.e, it is better to schedule the sensor at state , then we can obtain the following inequality because of Bellman equation:
(35)  
By substituting Eqn. (12a) into the Bellman equation, we will have the following inequality of the value function:
(36) 
According to Lemma 3, the value function is monotonic increasing. Hence, for any , we have the following inequality:
(37) 
which implies that for state , the optimal policy for state is to schedule the user. If at state the optimal policy is to be passive, then for state , the optimal policy satisfies
Comments
There are no comments yet.