I Introduction
Maintaining timely and fresh information updates is an essential part of many emerging Internet of Things (IoT) applications, such as vehicular networks and cyberphysical system monitoring [7, 9]. To quantify such information freshness, the concept of age of information (AoI) was recently proposed [6]. In essence, the AoI is defined as the elasped time since the most recent received update was generated at an enddevice.
Many recent works have addressed pertinent AoI challenges [6, 14, 13, 17, 5, 3, 1, 16]. For instance, the authors in [14] derived the optimal status update policies to minimize the average AoI for an energy harvesting sensor equipped with a battery. The work in [13] studied the optimal packet preempting policy that can be used to minimize the average AoI with ratelimited links. In [17], the authors considered the average AoI minimization problem in a realtime IoT monitoring system, in which the IoT device incurred costs for sampling physical processes and updating its packets. The work in [5] investigated the user scheduling problem in a wireless broadcast network where massive users sent timely updates to a common destination through a shared channel. Transmission scheduling for AoI minimization under different automatic repeat request (ARQ) mechanisms was discussed in [3]. Different from the aforementioned works [14, 13, 17, 5, 3], where the packets would be transmitted immediately when they are generated at the sources, there are also some works [1, 16] that consider scenarios in which the arriving packets at the sources are queued before being transmitted. It was shown that a lastcome firstserve (LCFS) policy achieves ageoptimality for multiserver networks [1]. The status update system with multiple sources was discussed for different queueing patterns, LCFS and firstcome firstserve (FCFS), respectively in [16]. However, none of the aforementioned works have considered the problem of AoI and status updates for cognitive radio networks (CRNs), in which secondary users (SUs) must share the spectrum of the licensed, primary users (PUs).
The main contribution of this work lies in the characterization of the optimal update and relaying policy which jointly minimizes the average AoI of the SU and its energy consumption under a constraint of on AoI performance requirement of the PU in a CRN. We consider a CRN in which the PU and the SU need to update the status information packet of their associated physical processes to a common destination. In the spectrum overlay protocol, the SU can opportunistically access the spectrum of the PU [8]. The SU can relay the packets generated at the PU in exchange for opportunities to forward its own packets to the destination through the spectrum owned by the PU. Such a consideration is different from the work in [4], in which no relaying procedure is involved but the PU and the SU may generate interference to each other. Since the relaying procedure will incur an additional AoI cost to the PU, one must carefully design the status update and relaying policy for the SU, such that the PU’s AoI requirement is satisfied. To capture the impact that the SU’s policy has on the PU’s AoI, we formulate the joint AoI and energy consumption minimization problem as a constrained Markov decision process (CMDP). We characterize two different thresholdtype structures for the optimal updating and relaying policy of the SU. We show that the AoI of the PU affects the policy of the SU only when there is a packet arriving at the PU. These structures are further verified through numerical results.
Ii System Model
We consider a CRN consisting of one PU and one SU. Both the PU and the SU monitor their corresponding timevarying physical processes and transmit the update packets on the status of their physical processes to a common destination timely over a shared noiseless channel. In our model, the PU and the SU will not transmit their packets to the destination simultaneously. Hereinafter, we use the terms “primary packet” and “secondary packet” to refer to the packet originating from the PU and the SU, respectively. We consider a timeslotted system with the slots indexed by . The transmission time of one packet is assumed to be one slot.
We assume that the packet arrival at the PU follows a Bernoulli distribution with the parameter
. We define a variable such that if a primary packet arrives at the PU at the beginning of time slot , and otherwise. Thus . Since there are no buffers, the primary packet would be immediately transmitted upon arrival. The SU needs to cognitively make a decision on whether to occupy the shared channel at the beginning of each time slot. When there is no primary packet arriving at the PU, the SU can occupy the shared channel freely and update the status of its physical process on demand. If the SU wants to generate and transmit a secondary packet to the destination upon the arrival of a primary packet, it will have to receive the primary packet from the PU first and then forward it to the destination in the next time slot. Obviously, it would take two time slots to the destination if the primary packet is relayed by the SU. Therefore, we need to carefully design the updating process of the SU to improve its AoI performance, while still guaranteeing the AoI performance of the PU.Iia Age of Information Model
We use the AoI metric to measure the information timeliness from the perspective of the destination. The AoI of the PU and the SU at time slot will be given by and , respectively. Then, the evolution of the AoI of the PU can be expressed as follows.
(1) 
The AoI would decrease to one once a primary packet is transmitted directly to the destination, because it takes one slot to complete the transmission, and two if transmitted by the SU due to the additional slot needed for relaying as shown in Fig. 1(a). Similarly, we can write the evolution of the SU’s AoI as follows.
(2) 
Iii Problem Formulation
Iiia CMDP Formulation
Due to the shared channel to the destination, the AoI of the PU and the AoI of the SU are strongly correlated. If the SU decides to generate and transmit its own packet while receiving the primary packet to its buffer at time slot , the PU would suffer additional AoI cost compared with transmitting to the destination directly. Thus, the policy of the SU must balance the fundamental tradeoff between its own performance and that of the PU. We will next cast this problem into a CMDP to allow an AoI performance guarantee for the PU. The components of the CMDP are given as follows.

States: The system state at time slot is defined by a 4tuple, i.e. , where denotes the system state space. Here means that there is a primary packet at the SU at the beginning of time slot , which was transmitted from the PU in the previous time slot . Note that if , because the SU will relay the primary packet immediately after it transmits its own packet;

Actions: There are three types of states, each of which is associated with different action set: i). The first type is , for which there is no primary packet arriving at the PU, and the SU can freely choose to generate and transmit secondary packet or stay silent. ii). The second is . In this case, the SU can generate and transmit a secondary packet while the primary packet is transmitted to the SU, or the SU stays silent and the PU would transmit to the destination directly. iii), The third is . In this case, either the SU needs to transmit the primary packet, or the PU transmits the primary packet if there is a newly arrived primary packet and the primary packet at the SU would be discarded, because it is staler than the newly arrived primary packet at the PU, as seen in Fig. 1(b). The action taken in time slot is denoted by . For simplicity, we use to indicate the two different actions. Action means the SU updates its status. For the second type state, the SU would be receiving the primary packet from the PU to its buffer while transmitting the secondary packet. Action means the SU keeps silent. Thus the PU would transmit a primary packet to the destination if there is any;

Transition Probabilities: Let
be the probability that the state changes to
under taking action in state . The derivation of the transition probabilities is quite straightforward and mainly depends on the random process of the packet arrival at the PU. Due to space limitations, here, we only give an example of the second type states as follows,(3a) (3b) (3c) (3d) 
Cost: The immediate cost function is defined as the weightedsum of the AoI of the SU and the energy consumption:
(4) where is a constant to balance these two terms and is the energy cost needed to generate a secondary packet.
We can now formulate our joint AoI and energy consumption minimization problem as an infinite horizon average cost CMDP, given as follows.
(5a)  
(5b) 
where the expectation is taken with respect to a certain policy, and represents the maximum tolerance of expected AoI cost to the PU per time slot. Here, (), only when and (otherwise). Obviously, when at least one of and equals zero, the PU will not be affected by the SU at all. When , namely, a primary packet arrives and is relayed to the SU, the additional cost to the PU is exactly its current AoI. Hence constraint (5b) properly captures the effect caused by the policy at the SU.
IiiB CMDP Relaxation via Lagrange Method
To obtain the optimal policy for the CMDP in (5), we adopt the Lagrange method to transform the CMDP into a unconstrained MDP parameterized by introducing a Lagrange multiplier . The Lagrange cost function at time slot is defined as
(6) 
Accordingly, we have the unconstrained MDP with the optimal objective value denoted by , i.e.
(7) 
Fortunately, the relation between the optimal value of the constraint CMDP in (5) and the optimal value of the relaxed unconstrained MDP in (7) can be expressed as follows [10].
(8) 
There is also a precise relation between the optimal policy of the CMDP and the optimal policy of the unconstrained MDP. According to [10] and focusing on the stationary policy, the following lemma holds for the CMDP with single constraint.
Lemma 1.
The optimal policy for the CMDP in (5) can be expressed as a mixture of two deterministic stationary policies and , namely,
(9) 
where is a randomization parameter and is the optimal policy for the unconstrained MDP problem with the Lagrange multiplier .
In practice, and
can be calculated through some iterative Lagrange multiplier estimation methods such as the Robbins–Monro algorithm
[11].Iv Structures of Optimal Policies and Algorithm Designing
In this section, we focus on the unconstrained MDP in (7). According to [2], the following lemma holds.
Lemma 2.
Given the Lagrange multiplier , the Bellman equation can be expressed as follows,
(10) 
where is the value function reflecting a relative and differential cost for each state , is the next state of .
The Bellman equation can be solved using the relative value iteration with an arbitrary but fixed reference state ,
(11) 
Lemma 3.
Given the Lagrange multiplier , the value function is nondecreasing in and , and can be decoupled as:
(12) 
Moreover, for the third type state, is a constant value for all , and so is .
Proof.
All proofs can be found in our technical report [15]. ∎
The properties of the value function can be used to show the following thresholdbased structures of the optimal policy.
Theorem 1.
The optimal policy exhibits the following thresholdbased structures:

For the first state type, i.e. , the optimal policy is of threshold type. Moreover, it is independent of the AoI of the PU, namely, there exists a constant such that
(13) 
For the second state type, i.e., , if it is optimal to generate an update packet and relay the primary packet for the PU in state , then it is still optimal to choose this action for state , a.k.a. a switch type structure.
(14)
The above theorem reveals some useful properties for improving the conventional value iteration algorithm. Normally, we have to calculate all the stateaction value functions for every state. Many of these calculations can be omitted by simply comparing the AoI of the PU and the SU with the states whose optimal action has been figured out in each iteration.
To cope with the countable infinite state space, we construct an approximate MDP with a finite number of states by truncating the AoI with a predetermined value . The state set of the approximate MDP is denoted by , where . The overall steps for solving the unconstrained MDP are summarized in Algorithm 1. Note that it can be guaranteed that the optimal policies of the approximate MDP and the original MDP problem are asymptotically identical when [5].
V Numerical Results
In this section, we evaluate, using numerical simulations, the performance of the proposed scheme. We first verify the structures of the optimal policy in Theorem 1. The parameters are set as follows. , , and . We take as a small example to illustrate the structures. From Fig. 2(a), we can see that, for the first type state , the optimal policy has no relationship with the AoI of the PU, and the SU would simply choose to generate and transmit a secondary packet if its AoI is larger than , or keep silent in this time slot to save energy. Fig. 2(b) shows the structure of the optimal policy for the second type state . It is clearly shown that the threshold which determines the decision of the SU increases with the AoI of the PU. This is because action is essentially to transmit the secondary packet by sacrificing one transmission opportunity of the PU. Thus it is costeffective only when the status information of the SU at the destination is stale enough.
To benchmark our proposed policy, we compare it with a baseline policy. Under the baseline policy, the SU would transmit a secondary packet whenever there is no primary packet at the PU, and the SU would keep silent if there is a primary packet at the PU. This guarantees that the PU has a higher priority. Such a baseline policy is similar to the zerowaiting policy in [12]. We set .
As we can see in Fig. 3, the proposed policy achieves a lower cost up to than the baseline. Moreover, the performance of the benchmark policy is significantly affected by the parameter due to the lack of dynamically decision making compared to the proposed policy.
Vi Conclusions
In this paper, we have studied the optimal update and relaying policy for the SU in a CRN. To optimize the AoI performance for both the PU and the SU, we have formulated a CMDP problem and have analyzed the structures of the optimal policy for the relaxed problem by showing the properties of the value function. It has been shown that the optimal policy is of threshold type and switch type in two different system states, respectively. We have also put forward a structureaware value iteration algorithm by utilizing the discovered structures. Numerical simulations also show the effectiveness of the proposed policy over a benchmark policy.
References
 [1] (2019, early access) Minimizing the age of information through queues. IEEE Trans. Inf. Theory. Cited by: §I.
 [2] (2012) Dynamic programming and optimal control, 4th ed, volume ii. Vol. , Athena scientific. Cited by: §IV.
 [3] (2019Mar.) Average age of information with hybrid ARQ under a resource constraint. IEEE Trans. Wireless Commun. 18 (3), pp. 1900–1913. External Links: Document, ISSN 15361276 Cited by: §I.
 [4] (2019Aug.) Minimizing age of information in cognitive radiobased IoT systems: underlay or overlay?. arXiv preprint arXiv:1903.06886. Cited by: §I.
 [5] (2017Jun.) Age of information: design and analysis of optimal scheduling algorithms. In Proc. of IEEE International Symposium on Information Theory (ISIT), Vol. , Aachen, Germany, pp. 561–565. External Links: Document, ISSN 21578117 Cited by: §I, §IV.
 [6] (2012Mar.) Realtime status: how often should one update?. In Proc. of IEEE International Conference on Computer Communications (INFOCOM), Vol. , Orlando, FL, USA, pp. 2731–2735. External Links: Document, ISSN 0743166X Cited by: §I, §I.
 [7] (201205) Cyber–physical systems: a perspective at the centennial. Proc. IEEE 100 (Special Centennial Issue), pp. 1287–1308. External Links: Document, ISSN 00189219 Cited by: §I.
 [8] (2008Dec.) Resource allocation for spectrum underlay in cognitive radio networks. IEEE Trans. Wireless Commun. 7 (12), pp. 5306–5315. External Links: Document, ISSN 15361276 Cited by: §I.
 [9] (2019, to appear) A vision of 6G wireless systems: applications, trends, technologies, and open research problems. IEEE Network. Cited by: §I.
 [10] (1993Jan.) Constrained average cost markov decision chains. Probab. Eng. Inf. Sci. 7 (1), pp. 69–83. Cited by: §IIIB.
 [11] (2005) Introduction to stochastic search and optimization: estimation, simulation, and control. Vol. 65, John Wiley & Sons. Cited by: §IIIB.
 [12] (2017Nov.) Update or wait: how to keep your data fresh. IEEE Trans. Inf. Theory 63 (11), pp. 7492–7508. External Links: Document, ISSN 00189448 Cited by: §V.
 [13] (2019, to appear) When to preempt? age of information minimization under link capacity constraint. IEEE J. Commun. Netw.. Cited by: §I.
 [14] (2018Mar.) Optimal status update for age of information minimization with an energy harvesting source. IEEE Trans. Green Commun. and Netw. 2 (1), pp. 193–204. External Links: Document, ISSN 24732400 Cited by: §I.
 [15] (2019) Age of information analysis for dynamic spectrum sharing. Note: Websitewww.dropbox.com/s/plx713rjj806e32/AoICRNs_GlobalSip2019.pdf?dl=0 Cited by: §IV.
 [16] (2019Mar.) The age of information: realtime status updating by multiple sources. IEEE Trans. Inf. Theory 65 (3), pp. 1807–1827. External Links: Document, ISSN 00189448 Cited by: §I.
 [17] (2018Dec.) Optimal sampling and updating for minimizing age of information in the internet of things. In Proc. of IEEE Global Communications Conference (GLOBECOM), Vol. , Abu Dhabi, UAE, pp. 1–6. External Links: Document, ISSN 25766813 Cited by: §I.
Comments
There are no comments yet.