I Introduction
In recent years, there has been a dramatic proliferation of research on an age of information. The age of information is inspired by a variety of network applications requiring timely information to accomplish some tasks. Examples include information updates for smartphone users, e.g., traffic and transportation, as well as status updates for smart systems, e.g., smart transportation systems and smart grid systems.
On the one hand, a smartphone user needs timely traffic and transportation information for planning the best route. On the other hand, timely information about vehicles’ positions and speeds is needed for planning a collisionfree transportation system. In both cases, snapshots of the information are generated by their sources at some epochs and sent to the end devices (e.g., smartphone users and vehicles) in the form of packets over wired or wireless networks. Since the information at the end devices is expected to be as timely as possible, the age of information is therefore proposed to capture the
freshness of the information at the end devices; more precisely, it measures the elapsed time since the generation of the freshest packets. The goal is to develop networks supporting the agesensitive applications. It is interesting that either throughputoptimal design or delayoptimal design might not result in the minimum age [1].In this paper, we consider that a basestation (BS) updates many users over a wireless broadcast network, where new information is randomly generated. We assume that the BS can update at most one user for each transmission opportunity. Under the transmission constraint, a transmission scheduling algorithm manages how the channel resources are allocated for each time, depending on the packet arrivals and the ages of the information. The scheduling design is a critical issue to provide good network performance. We hence design and analyze a scheduling algorithm to minimize the longrun average age.
The wireless broadcast network is similar to the model in the earlier work [2] of the present author; however, a lowcomplexity scheduling algorithm is unexplored. To fill this gap, this work investigates an ageoptimal scheduling from the perspective of restless bandits [3]. Whittle [4] considers a relaxed restless multiarmed bandit problem and decouple the problem into many subproblems consisting of a single bandit, while proposing an index policy and a concept of indexability. The Whittle index policy is asymptotically optimal under certain conditions [5], and in practice performs strikingly well [6]. Note that each user in our problem can be viewed as a restless bandit; as such, we apply the Whittle’s approach to develop a scheduling algorithm.
Ia Contributions
We transform our problem into a relaxed restless multiarmed bandit problem and investigate the problem from the Whittle’s perspective. However, in general, a closed form of the Whittle index might be unavailable. To tackle this issue, we formulate each decoupled subproblem as a Markov decision process (MDP), with the purpose of minimizing an average cost. Since our MDP involves an
average cost optimization over infinite horizon with a countably infinite state space, our problem is challenging to analyze [7]. We prove that an optimal policy of the MDP is stationary and deterministic; in particular, it is a simple threshold type. We then derive an optimal threshold by exploiting the threshold type along with a postaction age. It turns out that the postaction age simplifies the calculation of the average cost; as such, we successfully obtain the Whittle index in a closed form and show the indexability. Finally, we propose a Whittle index scheduling algorithm and numerically validate its performance.IB Related works
The age of information has attracted many interests from the research community, e.g., [1, 8, 9] and see the survey [10]. Of the most relevant works on scheduling multiple users are [11, 12, 13, 14]. The works [11, 12] consider queues at a BS to store all outofdate packets, different from ours. The paper [13] considers a buffer to store the latest information with periodic arrivals; whereas information updates in [14] can be generated at will. Our work contributes to the age of information by developing a lowcomplexity algorithm for scheduling stochastic information arrivals.
Ii System overview
Iia Network model
We consider a wireless broadcast network consisting of a basestation (BS) and wireless users in Fig. 1. Each user is interested in a type of information generated by a source , for , respectively. All information is sent in the form of packets by the BS over a noiseless broadcast channel.
We consider a discretetime system with slot . The packets from the sources (if any) arrive at the BS at the beginning
of each slot. The arrivals at the BS for different users are independent of each other, and also independent and identically distributed (i.i.d.) over slots, governed by a Bernoulli distribution. Precisely, by
we indicate if a packet from source arrives at the BS in slot , where if there is a packet, with ; otherwise, .We assume that the BS can send at most one packet during each slot, i.e., the BS can update at most one user in each slot. Moreover, we focus on the setting that the BS does not buffer a packet if it is not transmitted in the arriving slot. The nobuffer network is motivated by [2].
By we denote a decision of the BS in slot , where if no one will be updated in slot and for if user is scheduled to be updated in slot . A scheduling algorithm specifies a decision for each slot. Next, we will define an age of information as our design criterion.
IiB Age of information model
The age of information implies the freshness of the information at the users. We initialize the ages of all arriving packets at the BS to be zero. The age of information at a user becomes one on receiving a packet, due to one slot of the transmission time. Let be the age of information for user in slot before the BS makes a scheduling decision. Suppose that the age increases linearly with slots. Then, the dynamics of the age of information for user is
where the age in the next slot is one if the user gets updated on the new information; otherwise, the age increases by one. Since the BS can update at most one user for each slot, for all , for all , and for all .
IiC Problem formulation
We define the average age under a scheduling algorithm by
where represents the conditional expectation, given that the algorithm is employed. Note that we focus on the total age, but our work can be easily extended to a weighted sum of the ages. Our goal is to develop a lowcomplexity scheduling algorithm whose average age is close to the minimum by leveraging the Whittle’s methodology [4].
Iii Scheduling algorithm design
We will develop a scheduling algorithm based on restless bandits [3] in stochastic control theory. To reach the goal, in this section, we start with casting our problem as a restless multiarmed bandit problem [3], followed by introducing the Whittle index [4] as a solution to the multiarmed bandit problem. A challenge of this approach is to obtain the Whittle index. We then explicitly derive the Whittle index in a simpler way using a postaction age. We finally propose a scheduling algorithm based on the Whittle index.
Iiia Restless bandits and Whittle’s approach
A restless bandit generalizes a classic bandit by allowing the bandit to keep evolving under a passive action, but in a distinct way from its continuation under an active action. However, the restless bandits problem, in general, is PSPACEhard [3]. Whittle hence investigates a relaxed version, where a constraint on the number of active bandits for each slot is replaced by the expected number. With this relaxation, Whittle then applies a Lagrangian approach to decouple the multiarmed bandit problem into multiple subproblems.
We can regard each user in our problem as a restless bandit. Following the Whittle’s approach, we can decouple our problem into subproblems. A subproblem consists of a user and adheres to the network model in Section II with = 1, except for an additional cost for updating the user. In each subproblem, we aim at determining whether or not the user should be updated for each slot, in order to strike a balance between the updating cost and the cost incurred by age. In fact, the cost is a scalar Lagrange multiplier in the Lagrangian approach. Since each subproblem consists of a single user, hereafter we omit the index for simplicity.
IiiB Decoupled subproblem
We formulate the subproblem as a Markov decision process (MDP), with the components [15] as follows.
States: We define the state of the MDP in slot by . This is an infinitestate MDP as the age is possibly unbounded.
Actions: Let be an action of the MDP in slot indicating the BS’s decision, where if the BS decides to update the user and if the BS decides to idle.
Cost: Let be an immediate cost if action is taken in slot under state , with the definition as follows.
(1) 
where the first part is the resulting age in the next slot and the second part is the incurred cost for updating the user.
A policy of the MDP specifies an action for each slot. A policy is history dependent if depends on and . A policy is stationary if when for any . Moreover, a randomized policy chooses an action with a probability, while a deterministic policy chooses an action with certainty.
The average cost under a policy is defined by
Definition 1.
A policy (that can be history dependent) is costoptimal if it minimizes the average cost.
The objective of the MDP is to find a policy that minimizes the average cost. According to [15], there may not exist a costoptimal policy that is stationary or deterministic. Hence, in the next section, we aim at investigating a costoptimal policy.
IiiC Characterizing a costoptimal policy
We will study structures of a costoptimal policy in this section. First, we show that a costoptimal policy is stationary and deterministic as follows.
Theorem 2.
There exists a stationary and deterministic policy that is costoptimal, independent of initial state .
Proof.
Given initial state , we define the expected total discounted cost [15] under policy by
where is a discount factor. Moreover, let be the minimum expected total discounted cost. A policy that minimizes the expected total discounted cost is called optimal policy.
According to [16], a deterministic stationary policy is costoptimal if the following two conditions hold.

There exists a deterministic stationary policy of the MDP such that the associated average cost is finite: Let be the deterministic stationary policy of always choosing the action for each slot if there is an arrival. The age under the policy
forms a discretetime Markov chain (DTMC) in Fig.
2. The steadystate distribution of the DTMC isHence, the average age is
On the other hand, the average updating cost is as the arrival probability is . Hence, the average cost under the policy is the average age (i.e., ) plus the average updating cost (i.e., ), which is finite and yields the result.

There exists a nonnegative such that the relative cost function for all and , where is a reference state: Similar to [2], we can show that is a nondecreasing function in age given arrival indicator ; moreover, is a nonincreasing function in given . Then, we can choose by choosing the reference state .
By verifying the two conditions, the theorem immediately follows from [16]. ∎
Next, we further investigate a costoptimal policy by showing that it is a special type of deterministic stationary policy.
Definition 3.
A thresholdtype policy is a deterministic stationary policy of the MDP. The action for state is to idle, for all . Moreover, if the action for state is to update, then the action for state is to update as well. In other words, there exists a threshold such that the action is to update if there is an arrival and the age is greater than or equal to ; otherwise, the action is to idle.
Theorem 4.
If , then there exists a policy of the threshold type that is costoptimal.
Proof.
It is obvious that an optimal action for state is to idle if . To establish the optimality of the threshold structure for state , we need the discounted cost optimality equation (see the proof of Theorem 2 and [15]): if the first condition in the proof of Theorem 2 holds, then for any state the minimum expected total discounted cost satisfies
where the expectation is taken over all possible next state reachable from state .
Subsequently, we intend to prove that an optimal policy is the threshold type. Let . Then, . Moreover, an optimal action for state is [15]. Suppose an optimal action for state is to update, i.e.,
Then, an optimal action for state is still to update since
where (a) results from the nondecreasing function of in given (see proof of Theorem 2). Hence, an optimal policy is the threshold type.
To find an optimal threshold for minimizing the average cost, we explicitly derive the average cost in the next theorem.
Theorem 5.
Given the thresholdtype policy with the threshold , then the average cost, denoted by , under the policy is
(2) 
Proof.
Let be the age after an action in slot ; precisely, if and , then . Note that , called postaction age (similar to the postdecsion state [17]), is different from the preaction age .
The postaction age forms a DTMC in Fig. 3. Moreover, we associate each state in the DTMC with a cost. The DTMC incurs the cost of in slot when the postaction age in slot is since the postaction age implies that the BS updates the user, while incurring the cost of in slot when the postaction age is . Then, the steadystate distribution of the DTMC is
Therefore, the average cost of the DTMC is
∎
We want to elaborate on the postaction age in the proof. There might be alternatives for obtaining the average cost as follows:

As in many literature, e.g., [3], we can find an optimal action for each state by solving the average cost optimality equation [15]. However, we might not arrive at the average cost optimality equation as this is an infinitestate MDP [16]. Even though the average cost optimality equation is established, it is usually hard to solve the optimality equation directly.

Given threshold , then state forms a twodimensional DTMC. It is usually hard to solve the steadystate distribution for a multidimensional DTMC.

Given threshold , the preaction age forms a DTMC as well. However, we cannot associate each state with a cost, since the cost depends on not only state but also action (see Eq. (1)). On the contrary, the cost for the postaction age is determined by state only.
IiiD Deriving the Whittle index
Now, we are ready to define the Whittle index.
Definition 6.
We define the Whittle index by the cost that makes both actions for state equally desirable.
Theorem 7.
The Whittle index of the subproblem for state is
Proof.
It is obvious that the Whittle index for state is as both actions result in the same immediate cost and the same age of next slot if .
Let in Eq. (2) for the domain of . Note that is strictly convex in the domain. Let be the minimizer of . Then, an optimal threshold for minimizing the average cost is either or : the optimal threshold is if and if . If there is a tie, both choices are optimal.
Hence, both actions for state are equally desirable if and only if the age satisfies
i.e., and both thresholds of and are optimal. By solving the above equation, we obtain the cost to make both actions equally desirable, as the Whittle index in the theorem. ∎
According to Theorem 7, both actions might have a tie. If there is a tie, we break the tie in favor of idling. Then, we can explicitly express the optimal threshold in the next theorem.
Theorem 8.
The optimal threshold for minimizing the average cost is if the cost satisfies , for all .
Proof.
Since is the cost to make both actions for state equally desirable and we break a tie in favor of idling, then the optimal threshold is if the cost is , for all . We claim that the optimal threshold monotonically increases with cost , and then the theorem follows.
To verify the claim, we can focus on an optimal policy according to the proof of Theorem 4. Suppose that an optimal action, associated with a cost , for state is to idle, i.e.,
Then, an optimal action, associated with a cost , for state is to idle as well since
Then, the monotonicity is established. ∎
Next, according to [4], we have to demonstrate the indexability defined as follows.
Definition 9.
Given cost , let be the set of states such that the optimal action for the states is to idle. The subproblem is indexable if the set monotonically increases from the empty set to the entire state space, as increases from to .
Theorem 10.
The subproblem is indexable.
Proof.
If , the optimal action for every state is to update; as such, . If , then is composed of the set and a set of for some ’s. According to Theorem 8, the optimal threshold monotonically increases as increases, and hence the set monotonically increases to the entire state space. ∎
IiiE Scheduling algorithm design
Now, we are ready to propose a scheduling algorithm based on the Whittle index. For each slot , the BS observes age and arrival indicator for every user ; then, update a user with the highest value of the Whittle index . We can think of the index as the cost to update user . The intuition of the scheduling algorithm is that the BS intends to send the most valuable packet. In Fig. 4, we compare the proposed algorithm with the ageoptimal scheduling algorithm in [2] for two users over 100,000 slots. It turns out that the simple index algorithm almost achieves the minimum average age.
Iv Conclusion
This paper treated a wireless broadcast network, where many users are interested in different types of information delivered by a basestation. Under a transmission constraint, we studied a transmission scheduling problem, with respect to the age of information. We have proposed a lowcomplexity scheduling algorithm leveraging the Whittle’s methodology. Numerical studies showed that the proposed algorithm almost minimizes the average age. To investigate a regime under which the proposed algorithm is optimal would be an interesting extension.
Acknowledgement
The author is grateful to anonymous reviewers for their constructive comments.
References
 [1] S. Kaul, R. D. Yates, and M. Gruteser, “RealTime Status: How Often Should One Update?” Proc of IEEE INFOCOM, pp. 2731–2735, 2012.
 [2] Y.P. Hsu, E. Modiano, and L. Duan, “Age of Information: Design and Analysis of Optimal Scheduling Algorithms,” Proc. of IEEE ISIT, pp. 561–565, 2017.
 [3] J. Gittins, K. Glazebrook, and R. Weber, MultiArmed Bandit Allocation Indices. John Wiley & Sons, 2011.
 [4] P. Whittle, “Restless Bandits: Activity Allocation in a Changing World,” Journal of applied probability, pp. 287–298, 1988.
 [5] R. R. Weber and G. Weiss, “On an index policy for restless bandits,” Journal of Applied Probability, vol. 27, no. 3, pp. 637–648, 1990.
 [6] M. Larranaga, U. Ayesta, and I. M. Verloop, “Stochastic and Fluid Index Policies for Resource Allocation Problems,” Proc. of IEEE INFOCOM, pp. 1230–1238, 2015.
 [7] D. P. Bertsekas, Dynamic Programming and Optimal Control Vol. I and II. Athena Scientific, 2012.
 [8] M. Costa, M. Codreanu, and A. Ephremides, “On the Age of Information in Status Update Systems with Packet Management,” IEEE Trans. Inf. Theory, vol. 62, no. 4, pp. 1897–1910, 2016.
 [9] A. M. Bedewy, Y. Sun, and N. B. Shroff, “AgeOptimal Information Updates in Multihop Networks,” Proc. of IEEE ISIT, pp. 576–580, 2017.
 [10] A. Kosta, N. Pappas, and V. Angelakis, “Age of Information: A New Concept, Metric, and Tool,” Foundations and Trends® in Networking, vol. 12, no. 3, pp. 162–259, 2017.
 [11] Q. He, D. Yuan, and A. Ephremides, “Optimal Link Scheduling for Age Minimization in Wireless Systems,” accepted to IEEE Trans. Inf. Theory, 2017.
 [12] C. Joo and A. Eryilmaz, “Wireless Scheduling for Information Freshness and Synchrony: Driftbased Design and HeavyTraffic Analysis,” Proc. of IEEE WIOPT, pp. 1–8, 2017.
 [13] I. Kadota, E. UysalBiyikoglu, R. Singh, and E. Modiano, “Minimizing Age of Information in Broadcast Wireless Networks,” Proc. of Allerton, 2016.
 [14] R. D. Yates, P. Ciblat, A. Yener, and M. Wigger, “AgeOptimal Constrained Cache Updating,” Proc. of IEEE ISIT, pp. 141–145, 2017.
 [15] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. The MIT Press, 1994.
 [16] L. I. Sennott, “Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs,” Operations Research, vol. 37, pp. 626–633, 1989.

[17]
W. B. Powell,
Approximate Dynamic Programming: Solving the Curses of Dimensionality
. John Wiley & Sons, 2011.
Comments
There are no comments yet.