Minimizing Age of Information for Real-Time Monitoring in Resource-Constrained Industrial IoT Networks

12/16/2019
by   Qian Wang, et al.
The University of Sydney
ABB
0

This paper considers an Industrial Internet of Thing (IIoT) system with a source monitoring a dynamic process with randomly generated status updates. The status updates are sent to an designated destination in a real-time manner over an unreliable link. The source is subject to a practical constraint of limited average transmission power. Thus, the system should carefully schedule when to transmit a fresh status update or retransmit the stale one. To characterize the performance of timely status update, we adopt a recent concept, Age of Information (AoI), as the performance metric. We aim to minimize the long-term average AoI under the limited average transmission power at the source, by formulating a constrained Markov Decision Process (CMDP) problem. To address the formulated CMDP, we recast it into an unconstrained Markov Decision Process (MDP) through Lagrangian relaxation. We prove the existence of optimal stationary policy of the original CMDP, which is a randomized mixture of two deterministic stationary policies of the unconstrained MDP. We also explore the characteristics of the problem to reduce the action space of each state to significantly reduce the computation complexity. We further prove the threshold structure of the optimal deterministic policy for the unconstrained MDP. Simulation results show the proposed optimal policy achieves lower average AoI compared with random policy, especially when the system suffers from stricter resource constraint. Besides, the influence of status generation probability and transmission failure rate on optimal policy and the resultant average AoI as well as the impact of average transmission power on the minimal average AoI are unveiled.

READ FULL TEXT VIEW PDF

Authors

page 1

01/08/2020

Minimizing the Age of Information of Cognitive Radio-Based IoT Systems Under A Collision Constraint

This paper considers a cognitive radio-based IoT monitoring system, cons...
12/05/2020

Age Minimization Transmission Scheduling over Time-Correlated Fading Channel under An Average Energy Constraint

In this paper, we consider transmission scheduling in a status update sy...
03/24/2020

Age of Processing: Age-driven Status Sampling and Processing Offloading for Edge Computing-enabled Real-time IoT Applications

The freshness of status information is of great importance for time-crit...
07/15/2019

The Age of Incorrect Information: A New Performance Metric for Status Updates

In this paper, we introduce a new performance metric in the framework of...
05/09/2022

Age-driven Joint Sampling and Non-slot Based Scheduling for Industrial Internet of Things

Effective control of time-sensitive industrial applications depends on t...
07/06/2020

Optimizing Information Freshness in Two-Hop Status Update Systems under a Resource Constraint

This paper considers a two-hop status update system, in which an informa...
01/13/2020

Minimizing Age of Information via Hybrid NOMA/OMA

This paper considers a wireless network with a base station (BS) conduct...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The Internet of Thing (IoT) aims to connect a massive number of devices with different objectives and functions so as to bring an unprecedented information network and achieve value increment [1]. The application of IoT technologies in industrial environment is normally referred to as Industrial Internet of Thing (IIoT)[2], which provides pervasive connectivity to sensors, actuators and controllers in Industrial Control Systems (ICS). Real-time monitoring is pivotal for IIoT, especially for manufacturing process in industrial automation, where the controller needs to make sure every equipment is under precise control. Moreover, as the first step of network intrusion detection, real-time monitoring also plays a critical role in securing the ICS [3]. In real-time monitoring, the timely delivery of system status updates from IIoT devices to the controllers is essential. As timely update of system status monitored by IIoT devices is fundamentally different from the conventional throughput maximization and delay minimization problems, a novel performance metric Age of Information (AoI) has been introduced in [4]. It is defined as the time elapsed since the generation time of latest received status at the destination.

Recent years have witnessed considerable efforts on analyzing the AoI of various systems [4, 5, 6] and exploring the optimal sampling and updating policies to minimize system AoI [6, 7, 8, 9, 10, 11]. The average AoI of systems modeled by first-come-first-served (FCFS) and last-come-first-served (LCFS) queues were analyzed in [4] for both and queuing models, and in [5] for model, respectively. In [4, 5], both the cases with and without packet preemption were considered, while [4] further optimized the status generation rate at the source to minimize system average AoI.

Recent work on the optimization of the AoI of various systems can be grouped into two categories considering either randomly generated (arrived) status update model [6, 7, 8] or generate at will status update model [9, 11]. Queue management for systems with multiple sources that randomly generate status and share one common transmitter was investigated in [7]. Comparing to queues, single queue technique in [7] reduces transmissions and achieves lower average AoI. The single source system with randomly generated status according to a Bernouli process was considered in [8]. In the system, the transmission of each status update is assumed to take a fixed number of time slots and suffer from no error. The corresponding optimal transmission schedule to decide whether to skip or switch to transmit a new generated status was determined. The optimal status update policy for a status monitoring system with generate at will model was studied in[9] and was shown to be superior to the zero-wait policy in many scenarios.

In practice, IIoT devices are normally energy-constrained. As such, there is a strong demand for energy efficient policies and techniques. Rather than simply focusing on AoI minimization, power limitation of IIoT devices has been recently considered in system designs. Tradeoff between AoI and energy consumption has been derived in [6] as well as the optimal transmission policy to minimize average AoI. Energy harvesting techniques with finite battery capacity were considered in [10] for AoI minimization, where the optimal transmission policy to minimize the long-term average AoI is proved to be a renewal policy. However, the channel model of [10] was assumed to be error free as in [8], which is impractical in real applications. A very recent work [11] considered the error prone channel model and developed the optimal status update policy to minimize average AoI while taking resource constraints into account for the generate at will model. A key conclusion obtained in [11] is that, when the resource is limited, not all status updates should be transmitted. This leads to a natural question: when the status update is randomly generated, how will the status generation probability influence the long-term average AoI under the constraint of limited resource? To the best knowledge of the authors, this is still an open question.

Motivated by the gap, we consider an Industrial IoT system with an IIoT device (source) that monitors a dynamic process with randomly generated status updates and sends the status to its destination (e.g., controller) over an unreliable link. Under a practical constraint of limited average transmission power for the IIoT device, we develop an optimal transmission scheduling policy to minimize the long-term average AoI. Considering the limited average transmission power at the source, the system needs to carefully decide whether to transmit a fresh status update or retransmit the stale one at the beginning of each time slot. Note that the AoI is jointly affected by the transmission failure and status generation process, and the instantaneous AoI drops only when a status update is transmitted successfully, which makes the considered problem challenging. In particular, the uncertainty of status generation and transmission failure result in uncertain AoI variation. To find the optimal decision policy, we formulate the considered problem into a constrained Markov Decision Process (CMDP) problem and transform it into an unconstrained Markov Decision Process (MDP) through Lagrangian relaxation. We prove the existence of optimal stationary policy for the original CMDP, which is a randomized mixture of two deterministic stationary policies of the unconstrained MDP. We also explore the characteristics of our CMDP problem to reduce the action space of each state so as to reduce the computation complexity. We further prove the threshold structure of the optimal deterministic policy for the unconstrained MDP. Thanks to the identified threshold structure, only action shifting boundary is needed, hence the required memory at the IIoT device to execute the policy is reduced. Finally, simulation results are provided, which show that the proposed optimal policy achieves lower average AoI compared with random policy, especially when the system suffers from stricter resource constraint. Besides, the influence of status generation probability and transmission failure rate on optimal policy and the resultant average AoI, as well as the impact of average transmission power on the minimal average AoI are illustrated through numerical results.

Ii System Model and Problem Formulation

Ii-a System model

We consider a discrete-time IIoT monitoring system where a single IIoT device (e.g., sensors) monitors a dynamic process and transmits status update to a destination (e.g., controller) through an unreliable link with a constant transmission power. At the beginning of time slot , the IIoT device randomly generates a status update according to an independent and identically distributed (i.i.d) Bernoulli process , with parameter [8], and needs to decide whether to transmit the fresh status update or perform a retransmission of the previously unsuccessful status update. Successful reception of a state update is acknowledged by the feedback signal (ACK/NACK) from destination to the IIoT device, which is assumed to be transmitted through perfect channel (error free and delay free) [11]. We consider that it takes constant time to transmit an update from IIoT device to destination, which is assumed to be equal to the duration of one time slot for simplicity. There is no buffer at the IIoT device. Hence, once a new update is generated, the IIoT device needs to decide whether to transmit or drop the new status update. Besides, when status transmission failure occurs over the error prone link, the IIoT device needs to make another decision on whether to retransmit the current status or not. We define the set of the IIoT device actions as . At the beginning of each slot, the IIoT device needs to choose one action , if , the IIoT device does not transmit any status update; if , the IIoT device retransmits previously failed update; otherwise, the IIoT device transmits a new update. Following classical Automatic Repeat reQuest (ARQ) protocol, we assume that status update transmission failures of different time slot are independent and not relevant to transmission attempts. The following equality holds for the considered transmission model,

(1)

The AoI measures data freshness at the receiver, defined as time elapsed since the latest successfully received status update was generated[4]. Denote by the generation time for latest received update, the AoI of the IIoT device at destination is defined as the random process,

(2)

Hence, the AoI decreases to the total transmission time of a status update when it is received and successfully decoded.

Now, we define the state of the system at time as . Specifically, , where denotes the AoI at the beginning of time slot . denotes the total transmission times of last transmitted status update at the beginning of time slot , when there is no status update being transmitted at previous time slot, and is the maximum allowable transmission times of each status update. And indicates whether a new status update is generated by the IIoT device: denotes the generation of a new update at the beginning of time slot , , otherwise. We have , according to the definition of AoI in (2).

Status update is retransmitted when NACK is received at the IIoT device in classical ARQ protocol. However, as for the AoI framework, it is meaningless to retransmit failed outdated information when a fresh status update is available since the transmissions of the new and outdated statuses will suffer from the same transmission failure probability. Moreover, for the IIoT device, limited power leads to restricted transmission. It is a waste of energy to keep transmitting failed out-of-date status update when no fresh status update is generated. Because, anyhow, the average transmission power is limited. The AoI will not be considerably decreasing after a large number of retransmissions. On the other hand, new status update might be generated in the next slot and transmitting a fresh status update can lead to significant AoI drop. In addition, frequently updating instant status update helps to lower the AoI at the cost of higher average transmission power. Consequently, we impose an average transmission power constraint at the IIoT device. As it is assumed that the IIoT device uses a constant transmission power, average transmission power constraint is the same as a constraint on average transmission probability, denoted by .

At the beginning of each time slot, the IIoT device determines whether to transmit a new status update, retransmit failed stale status or remain idle to minimize the average AoI under average transmission power constraint. Define as a stationary scheduling policy, that maps the state to action neither deterministically or probabilistically. Let be the set of all feasible policies. The objective of AoI minimization with limited average power can be formulated as a constrained Markov Decision Process (CMDP)[13], described by a tuple , where

  • The countable state space is as above,

  • The action space is already defined above,

  • are the transition probabilities, is the probability of moving from state to when taking action . Under the considered i.i.d. status generation model and transmission model at each slot, we have

    (3)

    To be more specific,

    (4)

    and otherwise, ,

  • is the immediate reward with the reward function of state-action pairs being defined as ,

  • is the immediate costs taking action in state . Cost function of state-action pairs is

Given initial state , the infinite-horizon average reward of any feasible policy is

(5)

Define the infinite-horizon average cost with respect to policy as

(6)

Here, denotes the expectation with respect to policy , random status generation and transmission failure. Our objective is to find the optimal policy that minimizes the average AoI under the average transmission power constraint, which can be formulated as the following problem

Problem 1.
(7)

Here, we assume that the IIoT device and the destination are synchronized at the beginning. That is, the initial state , where follows status generation model . Similar to [11, 12], we assume that the formulated problem above is always feasible and the MDP here is unichain MDP, that is, under any stationary deterministic policy

, corresponding Markov chain has single (aperiodic) ergodic class

[13]. As instantaneous reward in our problem satisfies the sufficient condition

(8)

to meet the growth condition [13], placing restriction to search optimal unichain policy for feasible problem ensures the existence of optimal stationary policy according to Theorem 11.7 in [13], which immediately leads to Corollary 1.

Corollary 1.

There exists an optimal stationary policy for CMDP given in Problem 1.

As we are interested in the structure of the optimal policy, we transform the formulated CMDP problem into an unconstrained MDP problem through Lagrangian relaxation as follows :

(9)

where , . The optimal policy satisfies , which achieves average AoI and average transmission probability . Given a fixed , the following theorem shows the existence of optimal stationary and deterministic policy for the unconstrained MDP problem with countable state space and finite actions [14][15][16].

Theorem 1.

There exist a constant , a bounded function and a stationary and deterministic policy , satisfies the average reward optimality equation,

(10)

, where is the optimal policy, is the optimal average reward, and is the next state after taking action .

The proof is omitted, due to space limit. Based on Theorem 1, for the unconstrained MDP problem with fixed , there exists an optimal stationary and deterministic policy. Combining Theorem 1 and Theorem 4.4 in [17], we can directly form Corollary 2 as following,

Corollary 2.

If , then there exists an optimal stationary deterministic policy for the CMDP given in Problem 1. Otherwise, there exists an optimal stationary policy which is a randomized mixture of two stationary deterministic policies, and , where , and , with .

Then, Problem 1 can be solved in the following three steps:

  1. Solve unconstrained MDP with to judge whether holds. If so, , stop. Otherwise go to Step 2.

  2. Search and , and solve the corresponding Lagrangian relaxed unconstrained MDP problem, where and .

  3. Compute the optimal stationary policy as a randomized mixture of and .

Iii Structural Results on Optimal Policy

In this section, we reduce action space for each state so as to reduce computation complexity, and derive two structural results of the optimal policy to gain insights into relationship between system parameters and the optimal policy. First, we establish action elimination by analyzing the property of the formulated CMDP. We then unveil the monotonicity of optimal policy in terms of the AoI and the total transmission times of last transmitted status update .

Iii-a Action Elimination

The state space

can be classified as three categories: 1) a new status update is generated, i.e.,

; 2) no new status update is generated, but the transmission of last status update is unsuccessful while total transmission times of the status update does not exceed the maximum allowable transmission times, i.e., ; 3) and no new status update is generated, while either last transmission is successful, i.e., or no status update was transmitted at last slot, i.e., as well as total transmission times of the status update reaches , i.e., . We provide the following proposition which helps to understand why the state space is classified as above.

Proposition 1.

There exists an optimal policy of the CMDP of Problem 1 that will take either action or when a new status update is generated, that is , and retransmit status update only when last transmission failed and transmission time does not exceed , that is , and .

The proof of the proposition follows directly by noting that there is no point retransmitting failed stale status update when a new update is generated as the transmission failure probability is the same regardless of transmission of either new or stale status. When the status update was received at time , i.e., , retransmitting received status update will not reduce AoI. Moreover, considering the case that when last transmission failed, remaining idle before retransmission will not decrease transmission failure probability but leads to marginal decrease of the AoI even when the retransmission is successful [11]. In addition, new status update may be generated after the idle period and transmitting a fresh status update can lead to significant AoI drop.

According to the analysis above, the action space of the three classes of states can be reduced. For , ; , ; otherwise, , . In the next subsection, we will apply the Proposition 1 by modifying the transition matrix P in (4). Specifically, , (transmission is not allowed); , and , . Then, we establish the monotonicity of the optimal policy for the modified CMDP.

Iii-B Effect of States on Optimal Policy

To prove the the monotonicity of the policy in state space, we need to have the following preliminaries given in Definition 1 and Lemma 1.

Definition 1.

(Superadditive and Subadditive[16]) A multivariable function is superadditive in for fixed parameter and , if for all and ,

(11)

holds. If the reverse inequality holds, then is subadditive in for fixed parameter and .

Lemma 1.

Suppose is subadditive on , and exists. Then

(12)

is monotone nondecreasing in . While is superadditive on , and exists. Then

(13)

is monotone nonincreasing in .

The proof is similar to the proof of Lemma 4.7.1 in [16], and hence omitted. We establish the optimality of monotone policy by proving the state-action reward function of unconstrained MDP function:

(14)

is superadditive on , for fixed parameter and , and subadditive on , for fixed parameter and . Then, the optimal policy of each state is the action that achieves the minimum value as following

(15)

is monotone.

Theorem 2.

is superadditive on , for fixed parameter and , and subadditive on , for fixed parameter and .

The proof is omitted, due to space limit. Hence, we can conclude from Theorem 2 and Lemma 1, that the optimal policy for is monotone nondecreasing in state for fixed and , and nonincreasing in state , for when and is fixed. In other words, there is an optimal threshold policy based on state for the unconstrained MDP with fixed . This structure reduces the required memory for the IIoT device as only action shifting boundary is needed to conduct the policy.

Iv Numerical Results

This section provides numerical results to illustrate the analytical results provided in the preceding sections. By following the method in[18], we apply Relative Value Iteration (RVI) on finite states ( and ) to approximate the denumerable infinite state space. We use gradient descent algorithm to calculate and , where . The structure of the optimal policy for the transformed unconstrained MDP is illustrated in Fig.1, which verifies that the optimal policy is monotone nondecreasing in state and nonincreasing in state for . As we can see from Fig. 1, the threshold structure of the optimal policy is obvious. For fixed , if , , then . Specifically, considering the state which has transmitted the stale status for times, , and achieves AoI with no new status update generated, the optimal action for the state is to remain idle (not retransmit). Then the optimal action is still to remain idle for any states that have transmitted stale status for more than times with same AoI and no fresh update generated. Similarly, for fixed , if , then ; if , then . Furthermore, when new status update is generated, the action is only determined by current AoI . Besides, as increases, the resultant policy transmits less status update.

Fig. 1: Structural deterministic policy for (top) and (bottom) where , with . Here blue circle represents action , diamond represents and star represents .

Comparing with the structure of optimal policies of different and as indicated in Fig. 1, Fig. 2 and Fig. 3, we can see that as either transmission failure probability or status generation probability increases, the action shifting boundary tends to tilt to the right for and shift to the right for . As increases, the state of the system are more likely to come to the case of with transmission failure. Hence, without action boundary tilt for , average transmission probability will exceed , which is not allowed. Moreover, for , because of larger , transmitting a fresh update is less likely to reduce AoI. To balance the power limit and the AoI minimization, the action boundary of will move towards the right.

Fig. 2: Structural deterministic policy for (top) and (bottom) where , with . Here blue circle represents action , diamond represents and star represents .

Fig. 3: Structural deterministic policy for (top) and (bottom) where , with . Here blue circle represents action , diamond represents and star represents .

Fig. 4 illustrates the performance of the optimal policy with different transmission failure probability and status generation probability , comparing with a benchmarking random policy. Here, the random policy is to transmit status update with fixed probability to achieve maximum average transmission probability when a fresh status update is generated or no fresh status update while last transmission failed and . As such, the transmission conditions for the random policy and optimal policy of our modified CMDP are the same, which leads to a limit on average allowable transmission probability dependent on and as well as the transmission policy. When the IIoT device utilizes every chance to transmit status update, it will achieve the largest available transmission probability for fixed and . We can see from Fig. 4 that the optimal policy achieves lower average AoI, comparing to the random policy with same parameters and , when is small, which indicates the effectiveness of the optimal policy. As increases above certain threshold, the performance of random policy and optimal policy are the same. This is because when becomes no longer smaller than the largest available transmission probability, to achieve the lowest average AoI, both optimal policy and random policy will certainly transmit at each state when possible. Besides, the results show that larger transmission failure rate leads to larger average AoI, which is easy to understand.

Fig. 4: Tradeoff between and expected average AoI. The result is averaged over trials whose time horizon .

When the status update is generated frequently (larger ), the average AoI decreases, and the speed of the decrease becomes slower due to the average transmission probability limit , as shown in both Fig. 4 and Table I. This is because average transmission probability constraint makes it impossible to timely update each status update. When the status generation probability increases, the average transmission probability of fresh status update, , decreases, as shown in Table I. The action shifting boundary movement trend observed by comparing Fig. 2 to Fig. 1 verifies this as well. To satisfy the average transmission power constraint, the action shifting boundary should shift to the right for . The action boundary for will consequently tilt to the right. As such, the decrease of the average AoI becomes less remarkable as the status generation probability increases. In addition, when generation probability of status update equals , the system simplifies into the generate at will, and the problem becomes the ARQ protocol scheduling problem considered in [11] which is thus a special case in our model. Besides, from the intersection between optimal policy for the case , and optimal policy for the case , we can deduce that when is small, transmission failure probability plays an important role in AoI minimization and status generation probability is more crucial when is large. This is because when is small, even when the generation probability increases, fresh status update cannot be all transmitted, thus, the effect of generation probability is not significant comparing to the transmission failure probability. When is large, transmission of the status updates is no longer limited, larger indicates more chance to update status , which make its effect on the average AoI outweigh that of larger transmission failure probability.

   0.3  0.4  0.5  0.6  0.7  1
 Minimal AoI  4.01  3.53  3.33  3.21  3.10  2.99
   0.83  0.62  0.53  0.47  0.41  0.30
TABLE I: Relationship among status generation probability , average AoI and when and .

V Conclusions

In this paper, we have minimized the average Age of Information (AoI) for real-time monitoring applications of Industrial IoT, where the system status is generated randomly and transmitted under average transmission power constraint over an error prone channel. We have formulated the long-term average AoI minimization problem as an infinite time horizon Constrained Markov Decision Process with average cost criterion. We then proved that the optimal stationary policy is a randomized mixture of two deterministic monotone policies. Additionally, simulations are conducted to evaluate the influence of status generation probability and transmission failure rate on optimal policy and the resultant average AoI as well as the impact of average transmission power constraint on the average AoI. As for future work, the sampling power of each status can be included as an extra part of power consumption, and the Hybrid Automatic Repeat reQuest protocol can be applied for retransmission.

References

  • [1] C. Perera, C. Liu, and S. Jayawardena, “The emerging internet of things marketplace from an industrial perspective: A survey,” IEEE Transactions on Emerging Topics in Computing, vol. 3, no. 4, pp. 585–598, 2015.
  • [2] Sisinni E, Saifullah A, Han S, et al,“Industrial internet of things: Challenges, opportunities, and directions,” IEEE Transactions on Industrial Informatics 2018, 14(11): 4724-4734.
  • [3] U.S. Department of Homeland Security,“Recommended Practice: Improving Industrial Control System Cybersecurity with Defense-in-Depth Strategies,” 2016.
  • [4] S. K. Kaul, R. D. Yates, and M. Gruteser, “Real-time status: How often should one update?” in IEEE INFOCOM, Orlando, FL, USA, pp. 2731–2735, Mar. 2012.
  • [5] S. Kaul, R. D. Yates, and M. Gruteser, “Status updates through queues,” in Proc. of 46th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, March 2012, pp. 1–6. packet management,” Proceedings of IEEE ISIT, 2014
  • [6] Y. Gu, H. Chen, Y. Zhou, Y. Li, B. Vucetic,“ Timely Status Update in Internet of Things Monitoring Systems: An Age-Energy Tradeoff,” IEEE Internet of Things Journal, 2019.
  • [7] N. Pappas, J. Gunnarsson, L. Kratz, M. Kountouris, and V. Angelakis, “Age of information of multiple sources with queue management,” in IEEE International Conference on Communications (ICC), Jun. 2015, pp. 5935–5940.
  • [8] B.Wang, S. Feng, J. Y, “To Skip or to Switch? Minimizing Age of Information under Link Capacity Constraint,” arXiv preprint arXiv:1808.08698 (2018).
  • [9] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” in IEEE INFOCOM,
  • [10] A. Arafa, J. Yang, and S. Ulukus, “Age-minimal online policies for energy harvesting sensors with random battery recharges,” in IEEE International Conference on Communications (ICC), May 2018.
  • [11] E. T. Ceran, D. Gündüz, and A. György, “Average age of information with hybrid ARQ under a resource constraint,” Wireless Communications and Networking Conference (WCNC), 2018 IEEE. IEEE, 2018.
  • [12] D. V. Djonin and V. Krishnamurthy, “MIMO transmission control in fading channels – a constrained Markov decision process formulation with monotone randomized policies,” IEEE Trans. on Signal Processing, vol. 55, no. 10, Oct. 2007.
  • [13] E. Altman, Constrained Markov Decision Processes, ser. Stochastic modeling. Boca Raton, London: Chapman & Hall/CRC, 1999.
  • [14] D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 2, 3rd ed. Belmont, MA, USA: Athena Scientific, 2011.
  • [15] X. Guo and Q. Zhu, “Average optimality for markov decision processes in borel spaces: a new condition and approach,” Journal of Applied Probability, vol. 43, no. 2, pp. 318–334, 2006.
  • [16] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming, New York, NY, USA: Wiley, vol. 414, 2009.
  • [17] F. J. Beutler and K. W. Ross, “Optimal policies for controlled Markov chains with a constraint,” Journal of Mathematical Analysis and Applications, vol. 112, no. 1, pp. 236 – 252, 1985.
  • [18] L. I. Sennott, Stochastic dynamic programming and the control of queueing systems, vol. 504, John Wiley & Sons, 2009.