When to Preempt? Age of Information Minimization under Link Capacity Constraint

12/11/2018 ∙ by Boyu Wang, et al. ∙ 0

In this paper, we consider a scenario where a source continuously monitors an object and sends time-stamped status updates to a destination through a rate-limited link. We assume updates arrive randomly at the source according to a Bernoulli process. Due to the link capacity constraint, it takes multiple time slots for the source to complete the transmission of an update. Therefore, when a new update arrives at the source during the transmission of another update, the source needs to decide whether to skip the new arrival or to switch to it, in order to minimize the expected average age of information (AoI) at the destination. We start with the setting where all updates are of the same size, and prove that within a broadly defined class of online policies, the optimal policy should be a renewal policy, and has a sequential switching property. We then show that the optimal decision of the source in any time slot has threshold structures, and only depends on the age of the update being transmitted and the AoI at the destination. We then consider the setting where updates are of different sizes, and show that the optimal Markovian policy also has a multiple-threshold structure. For each of the settings, we explicitly identify the thresholds by formulating the problem as a Markov Decision Process (MDP), and solve it through value iteration. Special structural properties of the corresponding optimal policy are utilized to reduce the computational complexity of the value iteration algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Enabled by the proliferation of ubiquitous sensing devices and the pervasive wireless data connectivity, real-time status monitoring has become a reality in large-scale cyber-physical systems, such as power grids, manufacturing facilities, and smart transportation systems. However, the unprecedented high-dimensionality and generation rate of the sensing data also impose critical challenges on its timely delivery. In order to measure and ensure the freshness of the information available to the monitor, a metric called Age of Information (AoI) has been introduced and analyzed in various status updating systems [2]. Specifically, at time , the AoI in the system is defined as , where is the time stamp of the latest received update at the destination. Since AoI depends on data generation as well as queueing and transmission, it is fundamentally different from traditional network performance metrics, such as throughput and delay.

Modeling the status updating process as a queueing process, time-average AoI has been analyzed in systems with a single server [2, 3, 4, 5, 6, 7, 8, 9, 10], and multiple servers [11, 12, 13, 14, 15]. Peak Age of Information (PAoI) has been introduced and studied in [16, 17, 18]. The optimality properties of a preemptive Last Generated First Served service discipline are identified in [19].

AoI minimization has also been investigated, either by controlling the generation process of the updates [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], or by scheduling the transmission of updates that have already been generated [31, 32, 33, 34, 35, 36]. Optimal status updating policy with knowledge of the server state has been studied in [20]. AoI-optimal sampling of a Wiener process is investigated in [21]. Under an energy harvesting setting, optimal status updating have been studied in [22, 23, 24, 25, 26, 27, 28, 29, 30]. Transmission scheduling for AoI minimization has been studied for broadcast channels [31, 32, 33, 34], and for multiple access channels [35]. Age-optimal link scheduling in a multiple-source system with conflicting links is studied in [36], and the problem is shown to be NP-complete in general.

Recently, a few works start to investigate the impact of service preemption on the time-average AoI in various status updating systems. The common assumption is that new updates arrive at the source when it is in service (e.g., transmitting) of another update. The source thus needs to decide whether to drop the new update or the old one in order to optimize the AoI. Time-average AoI is analyzed and compared under different preemption policies in different systems [37, 38, 39, 40, 41, 42]. In [37], it shows that for an system, preemption achieves lower AoI. The work in [38]

indicates that when the service time is Gamma distribution, preemption may not be the best. In

[39], it studies the time-average AoI of an system when the source chooses to always preempt or always not to preempt, and shows that when the updates are transmitting over an erasure channel, not to preempt achieves smaller AoI. In [40], it considers the time-average AoI under different preemption policies in an energy harvesting status updating system. The service time is assumed to be exponential and both the updating generation and energy harvesting process are assumed to be Poisson. It derives the time-average AoI when the server always preempt a prior update in server when a new update arrives, and numerically compares it with the average AoI when the server does not preempt. It shows that the former is better than the latter when the system is in the “energy rich" regime. In [41], it focuses on stationary Markov and randomized policies that depend on the instantaneous AoI, and shows that dropping the new or old update is optimal for certain service time distributions. In [42]

, the optimal policy for AoI minimization in an error-prone channel under the hybrid automatic repeat request (HARQ) protocol is studied. It assumes that the probability of successful decoding increases as the number of retransmissions increases. The problem is formulated as a constrained MDP and solved numerically.

In this paper, we investigate the age-optimal online transmission scheduling for a single link under the assumption that the link capacity is limited and each update takes multiple time slots to transmit. During the transmission of an update, new updates may arrive. We assume the size of each arrived update is available to the source immediately, thus it has accurate information about how many time slots it takes to deliver the update. Then, the source has to decide whether to switch to the new arrival, or to continue its current transmission and drop the new update, based on causally known update arrival profile (size and arrival time). We consider two possible scenarios, based on different assumption of the sizes of the updates:

1) Updates of uniform size. In this case, the transmission time of each update is fixed and the AoI will be reset to the same value once the update is delivered successfully. We first prove that within a broadly defined class of online policies, the optimal policy should be a renewal policy, and the decision-making over each renewal interval only depends on the arrival time of the updates in that interval. Then, we show that the optimal renewal policy has a multiple-threshold structure, which enables us to formulate the problem as an MDP, and identify the thresholds numerically through structured value iteration.

2) Updates of non-uniform sizes. In this case, each update may take a different number of time slots to transmit. Thus the AoI will be reset to different values when an update is delivered successfully. Therefore, the optimal policy depends on the arrival times of the updates, as well as their sizes. To make the problem tractable, we restrict to stationary Markov policies where the decision of the transmitter depends on the AoI at the destination, the age of the update being transmitted, and the size of the new update. We show that the optimal policy exhibits certain threshold structures along different dimensions of the system state. We then propose a structured value iteration algorithm to solve for the thresholds numerically.

The remainder of the paper is structured as follows: In Section II, we describe the system model and problem formulation. In Sections III and IV, we consider cases with updates of uniform size case and non-uniform sizes, respectively. We evaluate the proposed policies numerically in Section V and conclude in Section VI.

Ii System Model and Problem Formulation

We consider a single-link status monitoring system where the source keeps sending time-stamped status updates to a destination through a rate-limited link. We assume the time axis is discretized into time slots, labeled as . At the beginning of time slot , an update packet is generated and arrives at the source according to an independent and identically distributed (i.i.d.) Bernoulli process with parameter . We consider the scenario where the size of each update is large compared with the link capacity, so that it takes multiple time slots to transmit. The distribution of the transmission time of the updates will be specified later. Similar to [31, 33, 34], we assume that at most one update can be transmitted during each time slot, and there is no buffer at the source to store any updates other than the one being transmitted. Therefore, once an update arrives at the source, it needs to decide whether to transmit it and drop the one being transmitted if there is any, or to drop the new arrival.

A status update policy is denoted as , which consists of a sequence of transmission decisions . We let . Specifically, when , can take both values 1 and 0: If , the source will start transmitting the new arrival in time slot and drop the unfinished update if there is one. We term this as switch; Otherwise, if , the source will drop the new arrival, and continue transmitting the unfinished update if there is one, or be idle otherwise. We term this as skip. When , we can show that dropping the update being transmitted is sub-optimal. Thus, we restrict to the policies under which can only take value 0, i.e., to continue transmitting the unfinished update if there is one, or to idle.

Let be the the time slot when an update is completely transmitted to the destination. Then, the inter-update delays can be denoted as , for . Without loss of generality, we assume . We note that since the update arrivals will either be dropped or transmitted immediately, the AoI after a completed transmission is reset to the transmission time of the delivered update, denoted as . An example sample path of the AoI evolution under a given status update policy is shown in Fig. 1. As illustrated, some updates are skipped when they arrive, while others are transmitted partially or completely.

Fig. 1: AoI evolution with equal transmission time . Circles represent transmitted updates, and crosses represent skipped ones. Red dashed curve indicates the transmitted portion of the corresponding update. .

We use to denote the total number of successfully delivered status updates over . Define as the cumulative AoI experienced by the system over . Denote , i.e., the total AoI experienced by the receiver over the

th epoch

. Then,

We focus on a set of online policies , in which the information available for determining includes the decision history , the update arrival profile , as well as the statistics of the update arrivals (i.e., arrival rate and the distribution of update sizes). The optimization problem can be formulated as

(1)

where the expectation in the objective function is taken over all possible update arrival sample paths.

Iii Updates of Uniform Size

In this section, we focus on the scenario where the updates are of the same size, and the required transmission time is equal to time slots, where is an integer greater than or equal to two.

Iii-a Structure of the Optimal Policy

Consider the th epoch, i.e., the duration between time slots and under any online policy in . Let be the time slot when the th update after arrives, and let . Denote the update arrival profile in epoch as . Then, we introduce the following definition.

Definition 1 (Uniformly Bounded Policy)

Under an online policy , if there exists a function , such that for any , the length of the corresponding epoch is upper bounded by , , then this policy is a uniformly bounded policy.

Denote the subset of uniformly bounded policies as . Then, we have the following theorem.

Theorem 1

Any uniformly bounded policy is sub-optimal to a renewal policy. That is, form a renewal process. Besides, the decision over the th renewal epoch only depends on causally.

The proof of Theorem 1 is provided in Appendix A. Based on Theorem 1, in the following, we will focus on renewal policies that depend on only.

Lemma 1

If the source is idle when an update arrives, it should start transmitting the update immediately.

Lemma 1 can be proved through contradiction. If the source skips an update when it is idle, we can show that by switching to the new update instead, the AoI can always be improved statistically, thus the orignal policy cannot be optimal. The detailed proof is omitted for the brevity of this paper.

Definition 2 (Sequential Switching Policy)

A sequential switching (SS) policy is a renewal policy under which the source switches to an update arriving at time slot only if it switches to all update arrivals prior to in the same epoch.

Remark: The definition of SS policy implies that once a source skips a new update arrival at , it will skip all of the upcoming update arrivals until it finishes the one being transmitted at . We point out that an SS policy is in general different from threshold type of policies, as it does not impose any threshold structure on when the source should skip or switch to a new update arrival. We have the following observations.

Lemma 2

The optimal renewal policy in is an SS policy.

Lemma 3

Under the optimal SS policy in , if the source is transmitting an update that arrives at the th time slot in an epoch when the new update arrives, then, there exists a threshold , , which depends on only, such that if the new update arrives before or at the th time slot in that epoch, the source will switch to the new arrival; otherwise, it will skip the new arrival and complete the current transmission.

The proofs of Lemma 2 and Lemma 3 are provided in Appendix B and Appendix C

, respectively. There are proved through contradiction, i.e., if the optimal policy does not exhibit the SS and threshold structures, respectively, we can always construct another renewal policy to achieve smaller average AoI. The alternative policy is constructed in a sophisticated manner to ensure that the first moment of the length of each renewal interval stays the same as under the original policy while strictly reducing the second moment, thus achieving a smaller expected average AoI.

Based on Lemma 2 and Lemma 3, the structure of the optimal renewal policy is characterized in the following theorem.

Theorem 2

Under the optimal policy in , there exists a sequence of thresholds , such that if the source is transmitting an update that arrives in the th () time slot in an renewal epoch when a new update arrives, and the arrival time of the new update is before or at the th time slot in the epoch, the source will switch to the new arrival; Otherwise, if the next update arrives after , or the update being transmitted arrives after , the source will skip all upcoming arrivals until it finishes the current transmission.

The proof of Theorem 2 can be found in Appendix D. It indicates that the optimal decision of the source only depends on two parameters: the arrival time of the update being transmitted, and the arrival time of the new update, both relative to the beginning of the renewal epoch.

Iii-B MDP formulation

Motivated by the Markovian structure of the optimal policy in Theorem 2, we cast the problem as an MDP, and numerically search for the optimal thresholds and as follows.

States: We define the state , where and are the AoI in the system, and the age of the unfinished update, at the beginning of time slot , respectively. is the update arrival status. Then, , , and the state space can be determined accordingly.

Actions: , as defined in Sec. II.

Transition probabilities: Denote transition probability from state to another state under action as . Then, if , i.e., the transmitter either continues its transmission, or stays idle, we have

(2)
(3)

If , i.e., there is a new update arrives at the beginning of time slot , and the transmitter switches to it, , .

Cost: Let be the immediate age under state , i.e., .

Denote the relative value function as . Then the optimal Markovian policy to minimize the long-term average AoI is the solution to the following Bellman’s equation [43]

(4)

where is the optimal average age, is the next system state when is taken under state , and is the set of allowable actions for given state . if there is a new update arrival, and otherwise.

Let the reference state be . Then, the optimal policy can be determined through relative value iteration as follows:

(5)

where , , and converges to as  [43].

To make the problem numerically tractable, we truncate the state space of the original MDP as by capping the AoI at , i.e., . It can be shown that when is sufficiently large, the optimal policy for the truncated MDP is identical to that of the original MDP [44].

In order to further reduce the computational complexity, we leverage the multi-threshold structure of the optimal policy during the value iteration procedure, as detailed in the structured value iteration algorithm in Algorithm 1. With the multiple-threshold structure, Algorithm 1 does not need to seek the optimal action by equation (5) for all states in each iteration as the traditional value iteration algorithm does. Specifically, if the optimal action for a state is to skip the new arrival, the optimal action for state , must be to skip as well. Similarly, if the optimal action for a state is to switch to the new arrival, the optimal action for state , , must be to switch. Thus, the computational complexity can be significantly reduced. Besides, since the truncated MDP is a unichain, Algorithm 1 converges in a finite number of iterations [45].

1:Initialize: .
2:for  do
3:     for  do
4:         if  then
5:              ;
6:         else if  then
7:              ;
8:         else if  then
9:              ;
10:         else
11:              
12:         end if
13:         
14:     end for
15:end for
16:return
Algorithm 1 Structured Value Iteration for Uniform Update Size Case.

Iv Updates of Non-uniform Sizes

In this section, we consider the scenario where the size of each update is non-uniform, and the required transmission time is a random variable following a known distribution. Specifically, we assume the required transmission time of each update is an i.i.d random variable following a probability mass function (PMF)

over a bounded support in . For ease of exposition, we assume each update takes at least two time slots to transmit. Once a new update arrives at the source, its size is revealed to the transmitter immediately. Our objective is to decide whether to switch to the new update or to skip it, based on causal observations and the statistics of the update arrivals.

Compared with the uniform update size case, the source should also track the various sizes of the updates and take these into consideration when making decisions. In order to make the problem tractable, we restrict to Markovian polices where the decision at any time slot depends on the AoI at the destination, denoted as , the remaining time to the destination of the update being transmitted, denoted as , the required transmission time of the update being transmitted, denoted as , and the required transmission time of the new arrival, denoted as . If no new update arrives at the time slot , ; Otherwise follows distribution .

Denote the system state at time slot as . We note that . Then, after the source takes an action , the state at time slot will change as follows. If , the source either continues its transmission and skips the new update, or stays idle. Thus,

(6)
(7)
(8)

If and , i.e., the source switches to the new arrival, , , .

Let be the immediate cost under state with action . Similar to the uniform size case, we let .

Iv-a Structural Properties of Optimal Policy

In order to obtain some structural properties of the optimal policy, in this section, we will first introduce an infinite horizon -discounted MDP as follows:

(9)

where . It has been shown that the optimal policy to minimize the average AoI can be obtained by solving (9) when [43]. In order to identify the structural properties of the optimal policy, we start with the following value iteration formulation:

(10)

where .

Assume . If , denote

(11)
(12)

Then, . We have

(13)

and

(14)

where the expectation is taken with respect to , and . We adopt in (13) to indicate that if , and , the system will remain idle.

If , , where follows the same form of (13).

We have the following observations:

Lemma 4

Let . Then, is monotonically increasing in , in for , and in for , .

Proof:  We first prove the monotonicity in through induction. First, it holds when . Assume it holds for . Then, it suffices to show that it holds for as well. If , . We note that is an increasing function in according to the induction assumption. If , is increasing in based on its definition in (13). If , is increasing in according to the induction assumption. Then, must monotonically increase in , as taking the minimum preserves the monotonicity. If , is monotonically increasing in as well.

Next, we prove the monotonicity in for . We prove it through induction as well. It holds when . Assume it is true for . We note that is independent of when . Besides, if , is increasing in based on the monotonicity of in . If , is increasing in based on to the induction assumption. Thus is increasing in for .

In order to show the monotonicity of in for , we first show that is increasing in for , . Based on the definition of in (14), it suffices to show is increasing in , for . We note it holds when . Assume is increasing in . Then, we will show that is increasing in as well. We note that if , is independent with . Besides, when , , which is increasing in according to the monotonicity of ; when , , which is increasing in based on the induction assumption. Thus, is increasing in after taking the minimum of and . The monotonicity of in for is thus established.

Since is independent of , while is increasing in when , after taking the minimum of them, is increasing in for as well.  

Lemma 5

if and only if , .

Proof:  First, we note that

(15)

Besides, for ,

(16)

If

(17)

then, according to (IV-A),

(18)

The sufficiency is thus established.

On the other hand, if

(19)

then for every possible , we must have

Based on (IV-A) and the assumption that , we have

(20)

which proves the necessity of the condition.  

Corollary 1

For states with , if and only if , .

Proof:  Since when , is equivalent to

The corollary is thus proved based on Lemma 5.  

Corollary 2

For states with , if and only if , .

Corollary 2 can be proved in a way similar to the proof of Corollary 1, and is thus omitted.

Based on Lemmas 4, 5 and Corollaries 12, we have the following theorem.

Theorem 3

Denote and assume . The optimal policy for the -discounted MDP has the following structure:

  • If the optimal action for is to switch, then the optimal action for any state , , is to switch as well.

  • If , then the optimal action for is to switch to the new update.

  • If the optimal action for is to switch, then for any , the optimal action for state is to switch as well.

  • If , then the optimal action for for is to switch to the new update.

  • If the optimal action for is to skip, then the optimal action for state is to skip as well.

The proof of Theorem 3 is provided in Appendix E. Theorem 3(a) indicates the threshold structure on the size of the new update arrival, i.e., the source prefers to switch to a new update if its size is small, and will skip it if its size is large. Theorem 3(b) is a consequence of Theorem 3(a). Theorem 3(c) shows that there exists a threshold on the size of the update being transmitted, i.e., the source prefers to drop updates with larger sizes and switch to new updates. Theorem 3(d) says that the source should immediately start transmitting the new update arrival if it has been idle, which is consistent with Lemma 1 for the uniform update size case. Theorem 3(e) essentially indicates the threshold structure on the instantaneous AoI at the destination: that the source prefers to skip new updates when the AoI is large, as it is in more urgent need to complete the current transmission and reset the AoI to a smaller value.

We point out that all of the structural properties of the optimal policy derived for the -discounted problem hold when [33]. Thus, the optimal policy for the time-average problem also exhibits similar structures.

Iv-B Structured Value Iteration

Following an approach similar to the uniform update size case, we leverage the structural properties of the optimal policy and develop a structured value iteration algorithm to obtain the thresholds numerically. The detailed algorithm is presented in Algorithm 1.

Initialize: .
for  do
     for  do
         if  then
              ;
         else if  then
              ;
         else if  then
              ;
         else if  then
              ;
         else if  then
              ;
         else
              
         end if
         
     end for
end for
return
Algorithm 2 Structured Value Iteration for Non-uniform Update Size Case.

V Numerical Results

In this section, we numerically search for the optimal policies for both uniform and non-uniform update size cases according to Algorithm 1 and Algorithm 2, respectively. For both truncated MDPs, we set and the number of iterations to be .

V-a Updates of Uniform Size

First, we focus on the uniform update size case. We set , . Fig. 2(a) shows the optimal action for each state . We note the monotonicity of the thresholds in both and . We then plot the optimal action for each pair of arrival time of the update being transmitted (i.e., active update) and that of the new arrival in a renewal epoch in Fig. 2(b). We note that the thresholds , , , and . They are monotonically decreasing, as predicted by Theorem 2. When the update being transmitted arrives later than the third time slot in that epoch, all upcoming updates will be skipped.

(a) (b)
Fig. 2: The optimal policy when , . Circles represent switch, while crosses represent skip.

Then, we evaluate the time-average AoI under the optimal policy identified by Algorithm 1 and two baseline policies, namely, Always Skip and Always Switch policies, over time slots. Under the Always Skip policy, the source will never switch to any new update arrival until it finishes the one being transmitted, while under the Always Switch policy, the source will always switch to new updates upon their arrivals. As we observe in Fig. 3, the performance differences between those three policies are negligible when is close to . This is because when is small, the source won’t receive a new update before it finishes transmitting the current update with high probability, thus the source behaves almost identically under all three policies. As increases, the performances of the optimal policy and the Always Skip policy are still very close to each other, while the Always Switch policy renders the highest AoI. To distinguish the performances of the Always Skip policy and the optimal policy, we plot the the performance gap between them on time-average AoI in Fig. 4. As we note, as increases, the performance gap first increases and then decreases to zero. This is because when is not very large, the source still needs to preempt the transmission of an update occasionally in order to regulate the inter-update delays under the optimal policy; and when is sufficiently large, this becomes unnecessary, as it will have a new update arrival soon after a successful update. Thus, the optimal policy becomes the same as the Always Skip policy in this regime. It is interesting to note the surprisingly small performance gap between the optimal policy and the Always Skip policy in the simulation results. Bounding it theoretically is one of our future steps.

Fig. 3: Average AoI with .
Fig. 4: Performance gap between the optimal policy and Always Skip.

V-B Updates of Non-uniform Sizes

In this subsection, we focus on the non-uniform update size case. For illustration, we consider a scenario where the updates are of two possible sizes, and the corresponding transmission times are 5 and 8 time slots, respectively. We assume

is a uniform distribution over

, and the probability of arrival .

We first obtain the the optimal policy according to Algorithm 2. The policy depends on the instantaneous state , i.e., the current AoI , the remaining transmission time of the active update , the sizes of the active update and the new update, and , respectively. Thus, we plot the optimal policy for each fixed pair in Fig. 5. We note that the optimal policy exhibits the structural properties predicted in Theorem 3. We also note that when , the source always prefers to skip the new update. This can be explained by the intuition that switching to an update with longer transmission time will lead to larger AoI when is not very small.

Then, we evaluate the time-average AoI performances under the optimal policy identified by Algorithm 2, the Always Skip policy, and the Always Switch policy for different . We note the optimal policy outperforms the other two policies. Compared with the uniform update size case, the performance gap between the optimal policy and the Always Skip policy is more significant. This indicates that switching to a new update of small size would make a substantial impact on the overall AoI.

(a) , . (b) , .
(c) , . (d) , .
Fig. 5: The optimal policy when . Circles represent switch, while crosses represent skip.
Fig. 6: Average AoI comparison. .

Vi Conclusions

In this paper we have considered a single-link status updating system under link capacity constraint. We first assumed the update size is uniform, and proved that within a broadly defined class of online policies, the optimal policy should be a renewal policy, and has a sequential switching property. We then showed that the optimal decision of the source in any time slot has a multiple-threshold structure, and only depends on the age of the update being transmitted and the AoI in the system. We then considered a more general case that the updates have different sizes, and showed the optimal policy exhibits certain threshold structure along different dimensions of the system state. For both cases, the optimal policies are numerically identified through structured value iteration under the MDP framework.

Appendix A Proof of Theorem 1

Lemma 6

Under any , it must have .

Proof:  The proof of this lemma is adapted from the proof of Theorem 3 in [24]. For the completeness, we provided the detailed proof here.

Denote

as the cumulative distribution function of

under a uniform bounded policy, i.e., . Recall that is the number of successfully delivered status updates over . We have

(21)

We note that (23) follows from the definition of uniformly bounded policy and (24) follows from the fact that is independent of other parameters.

(22)
(23)
(24)

where (23) follows from the definition of uniformly bounded policy and (24) follows from the fact that is independent of other parameters. We note that

(25)

We have

(26)
(27)
(28)

where (27) follows from (24), and (28) follows from (21).

For any fixed satisfying , we have

(29)
(30)

where (29) follows from that fact that is a non-increasing function, (30) follows from the fact that the inter-update delay is greater than or equal to time slots.

Therefore, for any

(31)

Since in (25), we have .  

Denote as the inter-update delays generated under a uniformly bounded policy . In general, the policy over the th epoch depends on the history over the previous th epoch, denoted as , the update arrivals over the th epoch, denoted as , as well as external randomness, denoted as . Let be the cumulative AoI over the th epoch under the sample path for given , , and , and be the corresponding length of the epoch. Then, we have the inequalities in (32)-(36). In (36), , i.e., the minimum average AoI over one epoch, among all possible epochs, history, as well as external randomness. We then apply this policy over all epochs irrespective history and the external randomness. Due to the memoryless property of the update arrival process, this is always feasible and achieves over each epoch. Thus, we can always obtain a renewal policy outperform the original which only causally depends on .

(32)
(33)
(34)
(35)
(36)

Appendix B Proof of Lemma 2

We prove this lemma through contradiction. Now assume the optimal policy is not an SS policy. Without loss of generality, we consider the first renewal epoch starting at time 0 (the beginning of time slot ). We assume under there exists a sample path under which the source transmits the new update arrival at time slot and does not switch to the next arrival at time slot in the same epoch, i.e., . Depending on the upcoming random arrivals, the sample path may evolve into different sample paths. Denote the set of such sample paths as , as they share the same history up to time slot . We can partition into two subsets:

  • : The source skips all the upcoming arrivals and completes the transmission of the update arrives at .

  • : The source switches to some later arrival.

Let be the corresponding length of the renewal epoch under policy . Then, for sample paths in , and