# 3-Competitive Policy for Minimizing Age of Information in Multi-Source M/G/1 Queuing Model

We consider a multi-source network with a common monitor, where fresh updates are generated at each source, following a Poisson process. At any time, at most one source can transmit its update to the monitor, and transmission time for updates of each source follows some general distribution. The goal is to find a causal scheduling policy such that at any time, the latest update available at each source is fresh. In this paper, we quantify freshness using the age of information (AoI) metric, and propose a randomized policy, which we show is 3-competitive with respect to Pareto-optimal policies (that minimize the expected average AoI of each source). We also show that for a particular choice of the randomization parameter, the proposed randomized policy is 3-competitive with respect to an optimal policy that minimizes the weighted sum of the expected average AoI of all sources.

## Authors

• 7 publications
05/06/2022

### Minimizing Age of Information under Arbitrary Arrival Model with Arbitrary Packet Size

We consider a single source-destination pair, where information updates ...
05/08/2020

### You Snooze, You Lose: Minimizing Channel-Aware Age of Information

We propose a variant of the Age of Information (AoI) metric called Chann...
04/30/2019

### Age-Optimal Transmission of Rateless Codes in an Erasure Channel

In this paper, we examine a status updating system where updates generat...
04/06/2022

### Scheduling to Minimize Age of Information with Multiple Sources

We consider a G/G/1 queueing system with a single server, where updates ...
12/24/2020

### The Age of Incorrect Information: an Enabler of Semantics-Empowered Communication

In this paper, we introduce the Age of Incorrect Information (AoII) as a...
04/22/2021

### Minimizing the Sum of Age of Information and Transmission Cost under Stochastic Arrival Model

We consider a node-monitor pair, where updates are generated stochastica...
02/19/2021

### A Reinforcement Learning Approach to Age of Information in Multi-User Networks with HARQ

Scheduling the transmission of time-sensitive information from a source ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Reliance on time-sensitive networked applications (such as remote monitoring, telehealth services, control, etc.) for critical roles, have necessitated the need for a general transmission policy that could ensure timely delivery of fresh updates of each source, at the corresponding destination. The policy should mitigate the effect of constraints on update generation, transmission delay, capacity of the shared channel, number of sources, etc., and must be simple to implement. In this paper, we propose a particular randomized policy, and show that it adheres to the above-mentioned requirements.

In particular, we consider a general multi-source M/G/1 queuing model, where at each source, the updates are generated with exponentially distributed inter-generation time. Also, at any time, at most one source can transmit, and transmission delay for each source follows some general distribution. Note that the mean update generation rate and transmission delay distribution may be different for each source, however they do not change with time.

The objective is to find an online transmission policy, such that at any time, the latest update of each source available at the destination is fresh. Formally, we quantify freshness using age of information (AoI) metric [kaul2012real, kaul2012status, saurav2021game, saurav2021online]. At any time, AoI of a source is equal to the time elapsed since the generation time of the latest update of the source, available at the monitor. Thus, for any policy, if the long-term average AoI (AAoI) is small (less than a certain threshold) for each source, then the policy is considered to have ensured timely delivery of fresh updates. In the considered model, we assume that each source has its own recommended AAoI threshold .

Ideally, for each source , its AAoI should be less than the recommended threshold . However, due to constraints on availability of updates, transmission delay, excessive number of sources, etc., this may not be possible. In fact, the set

of vectors

, such that for some online policy , the expected AAoI of each source is less than , is not known. Therefore, in this paper, we first derive a necessary condition to characterize this set . Then, we propose a simple randomized policy , that at any time, among all the sources, picks a source

with a fixed probability

, and transmits its updates (if it has an update). We show that for any vector that satisfies the derived necessary condition, the expected AAoI of each source under is at most .

In the later part of this paper, we also consider the setting, where instead of threshold vector , only relative weights are known for the sources, and the objective is to minimize the weighted sum of the expected AAoI (WSAAoI) of the sources, following an online policy. We show that for an appropriate choice of randomization parameter ’s, the same randomized policy proposed for the earlier setting, guarantees WSAAoI that is at most three times the WSAAoI for an optimal online policy.

Although the theoretical guarantee for has a multiplicative gap of (relative to an optimal online policy), for the general multi-source M/G/1 queuing model, this is a significant result. In prior work, for multi-source setting with stochastic packet generation, the best known online policy [kadota2019minimizing] has a multiplicative gap of (the results in [kadota2019minimizing] are for slotted time M/M/1 queuing model, unlike the general continuous time M/G/1 model considered in this paper).

In most of the prior work on AoI (e.g., [kadota2018optimizing, kadota2018scheduling, kadota2019minimizing, saurav2021minimizing]), a major reason for the large gap in AAoI guarantee of an online policy, and an optimal online policy , is the use of a weak lower bound on the AAoI of . Generally (as in [kadota2018optimizing, kadota2018scheduling, kadota2019minimizing, saurav2021minimizing, bhat2020throughput]

), the lower bound disregards the effect of variance

of inter-generation time of updates on the AAoI of sources under . In this paper, we specifically include the effect of (for each source ) in the lower bound on the AAoI of sources, and were able to improve the lower bound (and minimize the gap).

Currently, in this paper, the major limitation in improving the theoretical guarantee for (or, designing a better online policy), is the weak lower bound on the waiting time of updates under on optimal online policy . For any transmitted update, the waiting time is equal to the difference between the time when the update got generated, and time when a policy transmits the update. Thus, waiting times have a significant impact on the AAoI of any policy. However, in general multi-source setup with stochastic update generation, to the best of our knowledge, none of the prior work has been able to derive a lower bound on the waiting time of updates under , that is better than .

However, when the number of sources in the system is , the problem gets simplified, and under specific assumption [saurav2021minimizing, sun2017update] optimal online policies are known. Hence, for fair evaluation of the performance of , instead of comparing with a lower bound, we considered a particular setting of [sun2017update] (source can generate a new update at any time, and the transmission delay for each update follows some general distribution), for which an optimal online policy is known, and compared with , using numerical simulation. Despite the simplicity, and broader applicability of , we found that its AAoI is close to the AAoI of (for the two transmission delay distributions that we considered).

The rest of this paper is organised as follows. In Section II, we discuss the considered M/G/1 queuing model in detail, and formally define the objective. In Section III, we derive the necessary condition to characterize the set of AAoI vector that an optimal online policy may achieve. In Section IV, we propose the randomized policy , and derive an upper bound on the expected AAoI of each source under . We show that for any , the expected AAoI of the sources under is at most . In Section V, we consider weighted sum expected AAoI (i.e., WSAAoI) minimization problem, and generalize (and the corresponding guarantee) for this setting. Finally, in Section VI, we discuss the numerical simulations.

## Ii System Model

Consider a system consisting of sources and a monitor. At each source , updates (henceforth, packets) are generated with exponentially distributed inter-generation time , with mean . The sources transmit their packets to the monitor, over a common channel, that at any time , allows at most one source to transmit (one packet). Each packet transmitted by source gets received at the monitor after random transmission delay , where is some general distribution with mean .111For different sources, ’s may belong to different family of distributions. The sources may choose whether to transmit a packet, or discard it, but only until the transmission of the packet is initiated. Once initiated, a transmission cannot be preempted.

###### Definition 1

While a packet is under transmission, the channel is said to be busy. Otherwise, the channel is free. A transmission can be initiated only when the channel is free.

At any time , the age of information (AoI) of a source (denoted ) is equal to the time elapsed since the generation time of the latest packet of the source that has been received at the monitor. Thus, as shown in Figure 1, , where denotes the generation time of the latest update of source that has been received at the monitor until time . Average AoI (in short, AAoI) of source until time is defined as

 ¯¯¯¯¯Δℓ(t)=∫t0Δℓ(i)dit. (1)
###### Definition 2

A centralized online transmission policy (in short, an online policy) is an algorithm, that at each time (when the channel is free), using only the causal information available at all the sources at time , decides which source gets to transmit (at time ). In this paper, we only consider the set of online policies that never preempt any packet that is under transmission.

The objective in this paper is to find an online policy (Definition 2), such that for any given vector of recommended AAoI for the sources, the policy minimizes the ratio for each source . To make this formal, for any given , we define the cost for a policy as

 Γ(π;α)=maxℓ{limt→∞Eπ[¯¯¯¯¯Δℓ(t)]/αℓ}, (2)

where denotes expectation with respect to policy , as well as the packet generation and transmission delay distribution of the sources. Thus, the objective is to find

 π⋆ON=argminπ∈ΠON  Γ(π;α). (3)
###### Remark 1

Since and are finite for each source , as shown in Appendix A, there exists an online policy , such that the expected AAoI of each source under policy is finite (except when , in which case, for all online policies , and the objective (3) becomes meaningless). Hence, the expected AAoI of each source under must also be finite. Therefore, without loss of generality, in the rest of this paper, we disregard the policies in , for which the expected AAoI of any of the source is infinity.

Note that implies that the AAoI of each source under policy , achieves the recommended value. However, for arbitrary , such policy may not exist (e.g., if , , ). Therefore, for problem (3) to be meaningful, we only consider , where

 C={α∈RN|∃π∈ΠON, s.t. Γ(π;α)≤1} (4)

denotes the set of feasible (Definition 3), called the capacity region.

###### Definition 3

is said to be feasible under policy , if , i.e., , .

Further, because the cost for a policy depends on (e.g., when , for all reasonable policies in ), we quantify the performance of policy using its competitive ratio (5), which is equal to the cost (2) for the policy, maximized over all .

 \textscCRπ=maxα∈C  Γ(π,α). (5)

To solve (3) and analyze (5), it is critical to first characterize the capacity region (4). Hence, in next section, we derive a necessary condition that any that lies in , must satisfy.

## Iii Capacity Region C

Consider the following lower bound on the expected AAoI of source under policy .

###### Lemma 1

For any policy , the expected AAoI of source satisfies

 limt→∞Eπ[¯¯¯¯¯Δℓ(t)]≥12⎛⎝μ2ℓ/2¯¯¯¯Tπℓ+¯¯¯¯Tπℓ+2γℓ⎞⎠, (6)

where denotes the average of the inter-generation time of packets of source that are transmitted by policy , and satisfies

 N∑ℓ=1γℓ¯¯¯¯Tπℓ≤1. (7)

See Appendix B.

###### Remark 2

Note that is the variance of the exponentially distributed inter-generation time of packets (with mean ) at source .

Recall that lies in , only if it is feasible with respect to some policy (Definition 3), i.e., for some , , . Hence, using (6), we get that for any , such that for each source ,

 αℓ≥12⎛⎝μ2ℓ/2¯¯¯¯Tπℓ+¯¯¯¯Tπℓ+2γℓ⎞⎠. (8)

But solving the quadratic inequality (8), we find that (8) can be true only if , , , and , such that for each source ,

 ¯¯¯¯Tπℓ≤(αℓ−γℓ)+√(αℓ−γℓ)2−μ2ℓ/2. (9)

Note that the conditions and are simultaneously true only if . Also, (7) and (9), together imply . Hence, we get the following necessary condition for any that lies in .

###### Lemma 2

lies in , only if

1. , , and

2. , where

 ¯¯¯¯Tmaxℓ=(αℓ−γℓ)+√(αℓ−γℓ)2−μ2ℓ/2. (10)
###### Remark 3

Note that for minimizing (3), it is sufficient to consider only those sources for which ’s are finite. Hence, without loss of generality, we assume that is finite for each source . Thus, by definition, (10) is also finite for each source .

###### Corollary 1

For any , (10) satisfies

 12⎛⎝μ2ℓ/2¯¯¯¯Tmaxℓ+¯¯¯¯Tmaxℓ+2γℓ⎞⎠=αℓ,  ∀ℓ∈{1,⋯,N}. (11)
###### Proof:

Substituting (10) in the L.H.S. of (11), we get (R.H.S. of (11)).

Next, we propose a randomized policy , and show that for any given that satisfies the two conditions in Lemma 2, has competitive ratio (5) at most .

## Iv Randomized Policy πR

Consider a randomized policy that at any time , if the channel is free (Definition 1), among all the sources, picks source , with probability

 pℓ=1/¯¯¯¯Tmaxℓ∑Ni=11/¯¯¯¯Tmaxi, (12)

(where is defined in (10)), and transmits its latest generated packet (if it has a packet to transmit, otherwise, idles for time units). If the channel is busy, waits for the channel to become free.

The following lemma provides an upper bound on the expected AAoI of each source under .

###### Lemma 3

Under (Algorithm 1), the expected AAoI for each source satisfies

 limt→∞ER[¯¯¯¯¯Δℓ(t)]≤12⎛⎝μ2ℓ¯¯¯¯Tmaxℓ+3¯¯¯¯Tmaxℓ+2γℓ⎞⎠. (13)
###### Proof:

See Appendix B.

The main result of this paper is as follows.

###### Theorem 1

The competitive ratio for (Algorithm 1) is .

###### Proof:

From Corollary 1 and Lemma 3, we get that for any , , for each source . Hence, . Thus, .

Theorem 1 shows that if any policy can guarantee AAoI for the sources, then under , AAoI for the sources cannot be more than . Given the ease in implementing ( does not depend on the family of distribution that ’s belong to), this is an interesting result. However, in its current form, needs to know . In next section, we generalize for systems where instead of , relative weights are known for the AAoI of the sources, and the objective is to minimize the weighted sum expected AAoI. We also derive an upper bound on the competitive ratio bound of with respect to an optimal policy that minimizes the weighted sum of the expected AAoI of all sources.

## V Weighted Sum Expected AAoI Minimization

Let denote the relative weights for each source in the system, and define weighted sum expected AAoI to be

 Γ(π;w)=limt→∞N∑ℓ=1wℓEπ[¯¯¯¯¯Δℓ(t)]. (14)

The objective is to find an online policy that minimizes for any given . Formally, the objective is to solve the following optimization problem:

 π⋆=argminπ∈ΠON  Γ(π;w). (15)

Although is not known, for analysis, let be the expected AAoI for the source under , as . Then, . Using Lemma 1, we get

 N∑ℓ=1wℓα⋆ℓ≥12N∑ℓ=1wℓ⎛⎝μ2ℓ/2¯¯¯¯T⋆ℓ+¯¯¯¯T⋆ℓ+2γℓ⎞⎠, (16)

where denotes the average of the inter-generation time of packets of source that are transmitted under policy . Since , ’s satisfy (7), and the R.H.S. of (16) is at least

 min1/¯¯¯Tℓ,∀ℓ  12N∑ℓ=1wℓ(μ2ℓ/2¯¯¯¯Tℓ+¯¯¯¯Tℓ+2γℓ), (17) s.t. N∑ℓ=1(γℓ/¯¯¯¯Tℓ)≤1.

Therefore, denoting the minimizer of (17) by , , we get , where

 αoℓ=12⎛⎝μ2ℓ/2¯¯¯¯Toℓ+¯¯¯¯Toℓ+2γℓ⎞⎠ (18)

is a lower bound on .

###### Remark 4

Note that (17) is a convex optimization problem, and can be easily solved using standard optimization tools such as CVX in Matlab. Hence, in the rest of this paper, we assume that and are known for each source .

Now, for each source , define

 poℓ=1/¯¯¯¯Toℓ∑Ni=1(1/¯¯¯¯Toℓ). (19)
###### Theorem 2

For the randomized policy (Algorithm 1) with (19) (for each source ), the weighted sum expected AAoI , for any relative weight vector .

###### Proof:

Replacing by in (11), (12) and (13), we get (19),

 12⎛⎝μ2ℓ/2¯¯¯¯Toℓ+¯¯¯¯Toℓ+2γℓ⎞⎠=αoℓ, and (20) limt→∞ER[¯¯¯¯¯Δℓ(t)]≤12⎛⎝μ2ℓ¯¯¯¯Toℓ+3¯Toℓ+2γℓ⎞⎠, (21)

for each source . Substituting (20) in (21), and taking the weighted sum of the resulting expression over all sources, we get . Since , we get

 Γ(πR;w)=limt→∞ER[¯¯¯¯¯Δℓ(t)]≤3N∑ℓ=1wℓα⋆ℓ=3⋅Γ(π⋆;w).\IEEEQEDhereeqn

Theorem 2 shows that the weighted sum expected AAoI for the randomized policy is at most thrice compared to any other online policy. For M/G/1 queuing model, this is the best guarantee known for any online policy. Also, the proof of Theorem 2 provides a general recipe for generalizing the results for ‘per source AAoI minimization problem’ to the ‘weighted sum AAoI minimization problem’.

## Vi Numerical Results

Theorems 1 and 2 provide some important analytical results regarding the performance of randomized policy (Algorithm 1). In this section, we use numerical simulations to verify these results, and derive new insights.

###### Remark 5

For all the simulations, we assume the initial AoI of the sources to be , and the time horizon units.

First, to analyze the effect of number of sources on the AAoI of an individual source, we consider a system with identical sources (for each source , , , and ), and simulate the system under policy for , assuming to be an exponential distribution with mean , and uniform distribution with mean (the rationale is to consider memoryless distribution, as well as a non-memoryless distribution ). Then, we plot the AAoI of source for different choices of and in Figure 2.

As shown in Figure 2, as increases, AAoI of source 1 increases linearly, with slope proportional to . This is because when sources are identical, the probability (12) of picking a source for transmission is , which implies the expected time interval between two successive instants when source gets to transmit is proportional to (since expected transmission delay for every transmitted packet is ).

Further, to analyze the effect of the recommended AAoI of source , we consider a system with sources, with mean packet inter-generation time , mean transmission delay , and . Then, we simulate the system under policy for , and plot the corresponding AAoI values for two of the sources in Figure 3.

###### Remark 6

The choice of parameters ’s, ’s and ’s are arbitrary, to avoid symmetry between sources. Further, we only consider , because for the considered choice of other parameters, , i.e., the conditions in Lemma 2 are satisfied, only when

Note that with increase in , (10) increases, whereas (for ) decreases. Hence, (12) (proportional to decreases, while (for ; inversely proportional to ) increases with increase in . This is also illustrated in Figure 3, where with increase in , AAoI of source increases, while AAoI of other source (source ) decreases. In addition, Figure 3 shows that when (i.e., when ), AAoI of source is less than , which is expected because of Theorem 1.

Next, to verify Theorem 2, we again consider a system with sources, and parameters , mean transmission delay , and the relative weight vector . Then, we simulate the system under policy for different values of , and family of transmission delay distribution (family of distribution is same for each source ), and plot the output in Figure 4.

###### Remark 7

For simulating when the objective is to minimize the weighted Sum AAoI, we compute () by solving (17) (using CVX toolbox in Matlab), and use it to obtain by substituting ’s in (19).

It is evident from Figure 4 that the weighted sum AAoI for policy is less than times the theoretical lower bound computed by solving the optimization problem (17). Also, it can be noted that the mean transmission delay has significant impact on the weighted sum AAoI of , whereas the family of transmission delay distribution (i.e., exponential or uniform) has comparatively negligible effect. Note that for any given value of the mean transmission delay for the sources, the lower bound (17) is independent of the family of distribution of .

Finally, we consider a system with single source () that can generate fresh packets at any time (i.e., the mean packet inter-generation time ), and each packet transmitted by the source suffers random transmission delay according to some general distribution , with mean . Also, when , since a policy does not need to choose among multiple sources, we consider a simplified version of , where the source generates and transmits a packet whenever the channel is free, and the previously transmitted update is at least (17) time units old.

###### Remark 8

is the inter-generation time of successive transmitted packets in the lower bound (on AAoI of the source), when and .

We simulate the system under , and an optimal online policy proposed in [sun2017update], for different choices of and , and plot the AAoI for the policies in Figure 5. Interestingly, for exponential and uniform distribution , we find that the difference between the AAoI of the source under and is negligible. This is despite the fact that is independent of the family of distribution that belongs to, and only depends on . Whereas, is a threshold-based policy, where the threshold needs to be computed for each distribution , which might be difficult for certain family of distributions.

## Vii Conclusion

In this paper, we considered M/G/1 queuing model with multiple sources, where the objective is to minimize the expected average age of information (AAoI) of each source. We proposed an online randomized policy for prioritizing sources (and their packets), and showed that if there exists any online policy that can minimize expected AAoI of the sources below , then the proposed policy can guarantee expected AAoI of the sources, less than . We further showed that in the setting where is not known, and the objective is to minimize the weighted sum expected AAoI (WSAAoI) of the sources, the proposed policy guarantees WSAAoI that is at most times the minimum possible value (under any online policy). Using numerical simulations, we also showed that in special cases of the problem, where an optimal online policy is known, the proposed randomized policy might still be preferable due to ease of implementation, and near-optimal performance.

## Appendix A Existence of an online policy for which the expected AAoI of each source is finite

Recall that , and are finite. Therefore, packet inter-generation times and transmission delays are finite with probability 1, for each source . Hence, when is finite, for a round-robin policy, that picks a source, waits until a fresh packet is generated at the picked source, then transmits the generated packets, and then picks another source in cyclic order when channel becomes free, will have finite expected AAoI for each source.

## Appendix B Proof of Lemma 1 and Lemma 3

Let denote the sequence of packets of source that get transmitted under policy . Also, let , and respectively denote the generation time of packet , time when transmission of packet begins, and the time when transmission of packet completes. Now, define , , and . Note that is the inter-generation time of packets that are transmitted by policy , and is equal to the age of packet at the instant it is received at the monitor.

###### Remark 9

Note that

is a random variable that denotes the transmission delay of packet

. By definition, is independent of policy .

Figure 6 shows a sample AoI plot for source , labelled with the quantities defined above. As evident from Figure 6 (and shown in detail in [kaul2012status]), AAoI (1) of source can be expressed in terms of the defined quantities as follows

 limt→∞¯¯¯¯¯Δπℓ(t)(a)=limt→∞∑Rπℓ(t)i=1((Tπℓi)2/2+TπℓiZπℓi)t, (22)

where denotes the number of packets transmitted by source under policy , until time . Also, (shown in detail in [saurav2021minimizing]). Therefore,

 limt→∞∑Rπℓ(t)i=1TπℓiRπℓ(t)=limt→∞tRπℓ(t)=¯¯¯¯Tπℓ, (23)

where denotes the average inter-generation time of packets transmitted by policy .

From (23), it follows that when , . Therefore,

 Rπℓ(t)∑i=1Tπℓi−Rπℓ(t)¯¯¯¯Tπℓ=Rπℓ(t)∑i=1(Tπℓi−¯¯¯¯Tπℓ)=Rπℓ(t)∑i=1δπℓi=0 (24)

where . This also implies that when ,

 Rπℓ(t)∑i=1Tπℓi=Rπℓ(t)∑i=1δπℓi+Rπℓ(t)∑i=1¯¯¯¯Tπℓ=Rπℓ(t)⋅¯¯¯¯Tπℓ. (25)

Further, squaring both sides of , we get . Hence,

 Rπℓ(t)∑i=1(Tπℓi)2 =Rπℓ(t)∑i=1(δπℓi)2+Rπℓ(t)∑i=1(¯¯¯¯Tπℓ)2+2¯¯¯¯TπℓRπℓ(t)∑i=1δπℓi, (a)=Rπℓ(t)∑i=1(δπℓi)2+Rπℓ(t)∑i=1(¯¯¯¯Tπℓ)2, =Rπℓ(t)∑i=1(δπℓi)2+Rπℓ(t)⋅(¯¯¯¯Tπℓ)2, (26)

where we get (a), because (from (24)).

Substituting (25) and (B) in (22), and using the relation (for ), we get

 limt→∞¯¯¯¯¯Δπℓ(t) =limt→∞⎛⎜⎝∑Rπℓ(t)i=1(δπℓi)22Rπℓ(t)⋅¯¯¯¯Tπℓ+¯¯¯¯Tπℓ2+∑Rπℓ(t)i=1TπℓiZπℓi∑Rπℓ(t)i=1Tπℓi⎞⎟⎠, =βπℓ2¯¯¯¯Tπℓ+¯¯¯¯Tπℓ2+ϕπℓ, (27)

where

 βπℓ =limt→∞∑Rπℓ(t)i=1(δπℓi)2Rπℓ(t), and (28) ϕπℓ =limt→∞∑Rπℓ(t)i=1TπℓiZπℓi∑Rπℓ(t)i=1Tπℓi. (29)

Taking expectation on both sides of (B), and using Tonelli’s Theorem [patrick1995probability] to exchange limit and expectation (by definition, each term in (B) is non-negative and measurable), we get

 limt→∞ Eπ[Δπℓ(t)]=Eπ⎡⎣βπℓ2¯¯¯¯Tπℓ+¯¯¯¯Tπℓ2⎤⎦+Eπ[ϕπℓ] =Eπ[βπℓ]2¯¯¯¯Tπℓ+¯¯¯¯Tπℓ2+limt→∞Eπ⎡⎢⎣∑Rπℓ(t)i=1TπℓiZπℓi∑Rπℓ(t)i=1Tπℓi⎤⎥⎦. (30)

### B-a Proof of Lemma 1

Recall that , where , and ’s are independent and identically distributed according to distribution . Also, ’s are non-negative (by definition). Therefore,