In recent years, there has been a dramatic proliferation of research on an age of information. The age of information is inspired by a variety of network applications requiring timely information to accomplish some tasks. Examples include information updates for smart-phone users, e.g., traffic and transportation, as well as status updates for smart systems, e.g., smart transportation systems and smart grid systems.
On the one hand, a smart-phone user needs timely traffic and transportation information for planning the best route. On the other hand, timely information about vehicles’ positions and speeds is needed for planning a collision-free transportation system. In both cases, snapshots of the information are generated by their sources at some epochs and sent to the end devices (e.g., smart-phone users and vehicles) in the form of packets over wired or wireless networks. Since the information at the end devices is expected to be as timely as possible, the age of information is therefore proposed to capture thefreshness of the information at the end devices; more precisely, it measures the elapsed time since the generation of the freshest packets. The goal is to develop networks supporting the age-sensitive applications. It is interesting that either throughput-optimal design or delay-optimal design might not result in the minimum age .
In this paper, we consider that a base-station (BS) updates many users over a wireless broadcast network, where new information is randomly generated. We assume that the BS can update at most one user for each transmission opportunity. Under the transmission constraint, a transmission scheduling algorithm manages how the channel resources are allocated for each time, depending on the packet arrivals and the ages of the information. The scheduling design is a critical issue to provide good network performance. We hence design and analyze a scheduling algorithm to minimize the long-run average age.
The wireless broadcast network is similar to the model in the earlier work  of the present author; however, a low-complexity scheduling algorithm is unexplored. To fill this gap, this work investigates an age-optimal scheduling from the perspective of restless bandits . Whittle  considers a relaxed restless multi-armed bandit problem and decouple the problem into many sub-problems consisting of a single bandit, while proposing an index policy and a concept of indexability. The Whittle index policy is asymptotically optimal under certain conditions , and in practice performs strikingly well . Note that each user in our problem can be viewed as a restless bandit; as such, we apply the Whittle’s approach to develop a scheduling algorithm.
We transform our problem into a relaxed restless multi-armed bandit problem and investigate the problem from the Whittle’s perspective. However, in general, a closed form of the Whittle index might be unavailable. To tackle this issue, we formulate each decoupled sub-problem as a Markov decision process (MDP), with the purpose of minimizing an average cost. Since our MDP involves anaverage cost optimization over infinite horizon with a countably infinite state space, our problem is challenging to analyze . We prove that an optimal policy of the MDP is stationary and deterministic; in particular, it is a simple threshold type. We then derive an optimal threshold by exploiting the threshold type along with a post-action age. It turns out that the post-action age simplifies the calculation of the average cost; as such, we successfully obtain the Whittle index in a closed form and show the indexability. Finally, we propose a Whittle index scheduling algorithm and numerically validate its performance.
I-B Related works
The age of information has attracted many interests from the research community, e.g., [1, 8, 9] and see the survey . Of the most relevant works on scheduling multiple users are [11, 12, 13, 14]. The works [11, 12] consider queues at a BS to store all out-of-date packets, different from ours. The paper  considers a buffer to store the latest information with periodic arrivals; whereas information updates in  can be generated at will. Our work contributes to the age of information by developing a low-complexity algorithm for scheduling stochastic information arrivals.
Ii System overview
Ii-a Network model
We consider a wireless broadcast network consisting of a base-station (BS) and wireless users in Fig. 1. Each user is interested in a type of information generated by a source , for , respectively. All information is sent in the form of packets by the BS over a noiseless broadcast channel.
We consider a discrete-time system with slot . The packets from the sources (if any) arrive at the BS at the beginning
of each slot. The arrivals at the BS for different users are independent of each other, and also independent and identically distributed (i.i.d.) over slots, governed by a Bernoulli distribution. Precisely, bywe indicate if a packet from source arrives at the BS in slot , where if there is a packet, with ; otherwise, .
We assume that the BS can send at most one packet during each slot, i.e., the BS can update at most one user in each slot. Moreover, we focus on the setting that the BS does not buffer a packet if it is not transmitted in the arriving slot. The no-buffer network is motivated by .
By we denote a decision of the BS in slot , where if no one will be updated in slot and for if user is scheduled to be updated in slot . A scheduling algorithm specifies a decision for each slot. Next, we will define an age of information as our design criterion.
Ii-B Age of information model
The age of information implies the freshness of the information at the users. We initialize the ages of all arriving packets at the BS to be zero. The age of information at a user becomes one on receiving a packet, due to one slot of the transmission time. Let be the age of information for user in slot before the BS makes a scheduling decision. Suppose that the age increases linearly with slots. Then, the dynamics of the age of information for user is
where the age in the next slot is one if the user gets updated on the new information; otherwise, the age increases by one. Since the BS can update at most one user for each slot, for all , for all , and for all .
Ii-C Problem formulation
We define the average age under a scheduling algorithm by
where represents the conditional expectation, given that the algorithm is employed. Note that we focus on the total age, but our work can be easily extended to a weighted sum of the ages. Our goal is to develop a low-complexity scheduling algorithm whose average age is close to the minimum by leveraging the Whittle’s methodology .
Iii Scheduling algorithm design
We will develop a scheduling algorithm based on restless bandits  in stochastic control theory. To reach the goal, in this section, we start with casting our problem as a restless multi-armed bandit problem , followed by introducing the Whittle index  as a solution to the multi-armed bandit problem. A challenge of this approach is to obtain the Whittle index. We then explicitly derive the Whittle index in a simpler way using a post-action age. We finally propose a scheduling algorithm based on the Whittle index.
Iii-a Restless bandits and Whittle’s approach
A restless bandit generalizes a classic bandit by allowing the bandit to keep evolving under a passive action, but in a distinct way from its continuation under an active action. However, the restless bandits problem, in general, is PSPACE-hard . Whittle hence investigates a relaxed version, where a constraint on the number of active bandits for each slot is replaced by the expected number. With this relaxation, Whittle then applies a Lagrangian approach to decouple the multi-armed bandit problem into multiple sub-problems.
We can regard each user in our problem as a restless bandit. Following the Whittle’s approach, we can decouple our problem into sub-problems. A sub-problem consists of a user and adheres to the network model in Section II with = 1, except for an additional cost for updating the user. In each sub-problem, we aim at determining whether or not the user should be updated for each slot, in order to strike a balance between the updating cost and the cost incurred by age. In fact, the cost is a scalar Lagrange multiplier in the Lagrangian approach. Since each sub-problem consists of a single user, hereafter we omit the index for simplicity.
Iii-B Decoupled sub-problem
We formulate the sub-problem as a Markov decision process (MDP), with the components  as follows.
States: We define the state of the MDP in slot by . This is an infinite-state MDP as the age is possibly unbounded.
Actions: Let be an action of the MDP in slot indicating the BS’s decision, where if the BS decides to update the user and if the BS decides to idle.
: The transition probability from stateto state under action is:
Cost: Let be an immediate cost if action is taken in slot under state , with the definition as follows.
where the first part is the resulting age in the next slot and the second part is the incurred cost for updating the user.
A policy of the MDP specifies an action for each slot. A policy is history dependent if depends on and . A policy is stationary if when for any . Moreover, a randomized policy chooses an action with a probability, while a deterministic policy chooses an action with certainty.
The average cost under a policy is defined by
A policy (that can be history dependent) is cost-optimal if it minimizes the average cost.
The objective of the MDP is to find a policy that minimizes the average cost. According to , there may not exist a cost-optimal policy that is stationary or deterministic. Hence, in the next section, we aim at investigating a cost-optimal policy.
Iii-C Characterizing a cost-optimal policy
We will study structures of a cost-optimal policy in this section. First, we show that a cost-optimal policy is stationary and deterministic as follows.
There exists a stationary and deterministic policy that is cost-optimal, independent of initial state .
Given initial state , we define the expected total discounted cost  under policy by
where is a discount factor. Moreover, let be the minimum expected total discounted cost. A policy that minimizes the expected total -discounted cost is called -optimal policy.
According to , a deterministic stationary policy is cost-optimal if the following two conditions hold.
There exists a deterministic stationary policy of the MDP such that the associated average cost is finite: Let be the deterministic stationary policy of always choosing the action for each slot if there is an arrival. The age under the policy
forms a discrete-time Markov chain (DTMC) in Fig.2. The steady-state distribution of the DTMC is
Hence, the average age is
On the other hand, the average updating cost is as the arrival probability is . Hence, the average cost under the policy is the average age (i.e., ) plus the average updating cost (i.e., ), which is finite and yields the result.
There exists a non-negative such that the relative cost function for all and , where is a reference state: Similar to , we can show that is a non-decreasing function in age given arrival indicator ; moreover, is a non-increasing function in given . Then, we can choose by choosing the reference state .
By verifying the two conditions, the theorem immediately follows from . ∎
Next, we further investigate a cost-optimal policy by showing that it is a special type of deterministic stationary policy.
A threshold-type policy is a deterministic stationary policy of the MDP. The action for state is to idle, for all . Moreover, if the action for state is to update, then the action for state is to update as well. In other words, there exists a threshold such that the action is to update if there is an arrival and the age is greater than or equal to ; otherwise, the action is to idle.
If , then there exists a policy of the threshold type that is cost-optimal.
It is obvious that an optimal action for state is to idle if . To establish the optimality of the threshold structure for state , we need the discounted cost optimality equation (see the proof of Theorem 2 and ): if the first condition in the proof of Theorem 2 holds, then for any state the minimum expected total discounted cost satisfies
where the expectation is taken over all possible next state reachable from state .
Subsequently, we intend to prove that an -optimal policy is the threshold type. Let . Then, . Moreover, an -optimal action for state is . Suppose an -optimal action for state is to update, i.e.,
Then, an -optimal action for state is still to update since
where (a) results from the non-decreasing function of in given (see proof of Theorem 2). Hence, an -optimal policy is the threshold type.
To find an optimal threshold for minimizing the average cost, we explicitly derive the average cost in the next theorem.
Given the threshold-type policy with the threshold , then the average cost, denoted by , under the policy is
Let be the age after an action in slot ; precisely, if and , then . Note that , called post-action age (similar to the post-decsion state ), is different from the pre-action age .
The post-action age forms a DTMC in Fig. 3. Moreover, we associate each state in the DTMC with a cost. The DTMC incurs the cost of in slot when the post-action age in slot is since the post-action age implies that the BS updates the user, while incurring the cost of in slot when the post-action age is . Then, the steady-state distribution of the DTMC is
Therefore, the average cost of the DTMC is
We want to elaborate on the post-action age in the proof. There might be alternatives for obtaining the average cost as follows:
As in many literature, e.g., , we can find an optimal action for each state by solving the average cost optimality equation . However, we might not arrive at the average cost optimality equation as this is an infinite-state MDP . Even though the average cost optimality equation is established, it is usually hard to solve the optimality equation directly.
Given threshold , then state forms a two-dimensional DTMC. It is usually hard to solve the steady-state distribution for a multi-dimensional DTMC.
Given threshold , the pre-action age forms a DTMC as well. However, we cannot associate each state with a cost, since the cost depends on not only state but also action (see Eq. (1)). On the contrary, the cost for the post-action age is determined by state only.
Iii-D Deriving the Whittle index
Now, we are ready to define the Whittle index.
We define the Whittle index by the cost that makes both actions for state equally desirable.
The Whittle index of the sub-problem for state is
It is obvious that the Whittle index for state is as both actions result in the same immediate cost and the same age of next slot if .
Let in Eq. (2) for the domain of . Note that is strictly convex in the domain. Let be the minimizer of . Then, an optimal threshold for minimizing the average cost is either or : the optimal threshold is if and if . If there is a tie, both choices are optimal.
Hence, both actions for state are equally desirable if and only if the age satisfies
i.e., and both thresholds of and are optimal. By solving the above equation, we obtain the cost to make both actions equally desirable, as the Whittle index in the theorem. ∎
According to Theorem 7, both actions might have a tie. If there is a tie, we break the tie in favor of idling. Then, we can explicitly express the optimal threshold in the next theorem.
The optimal threshold for minimizing the average cost is if the cost satisfies , for all .
Since is the cost to make both actions for state equally desirable and we break a tie in favor of idling, then the optimal threshold is if the cost is , for all . We claim that the optimal threshold monotonically increases with cost , and then the theorem follows.
To verify the claim, we can focus on an -optimal policy according to the proof of Theorem 4. Suppose that an -optimal action, associated with a cost , for state is to idle, i.e.,
Then, an -optimal action, associated with a cost , for state is to idle as well since
Then, the monotonicity is established. ∎
Next, according to , we have to demonstrate the indexability defined as follows.
Given cost , let be the set of states such that the optimal action for the states is to idle. The sub-problem is indexable if the set monotonically increases from the empty set to the entire state space, as increases from to .
The sub-problem is indexable.
If , the optimal action for every state is to update; as such, . If , then is composed of the set and a set of for some ’s. According to Theorem 8, the optimal threshold monotonically increases as increases, and hence the set monotonically increases to the entire state space. ∎
Iii-E Scheduling algorithm design
Now, we are ready to propose a scheduling algorithm based on the Whittle index. For each slot , the BS observes age and arrival indicator for every user ; then, update a user with the highest value of the Whittle index . We can think of the index as the cost to update user . The intuition of the scheduling algorithm is that the BS intends to send the most valuable packet. In Fig. 4, we compare the proposed algorithm with the age-optimal scheduling algorithm in  for two users over 100,000 slots. It turns out that the simple index algorithm almost achieves the minimum average age.
This paper treated a wireless broadcast network, where many users are interested in different types of information delivered by a base-station. Under a transmission constraint, we studied a transmission scheduling problem, with respect to the age of information. We have proposed a low-complexity scheduling algorithm leveraging the Whittle’s methodology. Numerical studies showed that the proposed algorithm almost minimizes the average age. To investigate a regime under which the proposed algorithm is optimal would be an interesting extension.
The author is grateful to anonymous reviewers for their constructive comments.
-  S. Kaul, R. D. Yates, and M. Gruteser, “Real-Time Status: How Often Should One Update?” Proc of IEEE INFOCOM, pp. 2731–2735, 2012.
-  Y.-P. Hsu, E. Modiano, and L. Duan, “Age of Information: Design and Analysis of Optimal Scheduling Algorithms,” Proc. of IEEE ISIT, pp. 561–565, 2017.
-  J. Gittins, K. Glazebrook, and R. Weber, Multi-Armed Bandit Allocation Indices. John Wiley & Sons, 2011.
-  P. Whittle, “Restless Bandits: Activity Allocation in a Changing World,” Journal of applied probability, pp. 287–298, 1988.
-  R. R. Weber and G. Weiss, “On an index policy for restless bandits,” Journal of Applied Probability, vol. 27, no. 3, pp. 637–648, 1990.
-  M. Larranaga, U. Ayesta, and I. M. Verloop, “Stochastic and Fluid Index Policies for Resource Allocation Problems,” Proc. of IEEE INFOCOM, pp. 1230–1238, 2015.
-  D. P. Bertsekas, Dynamic Programming and Optimal Control Vol. I and II. Athena Scientific, 2012.
-  M. Costa, M. Codreanu, and A. Ephremides, “On the Age of Information in Status Update Systems with Packet Management,” IEEE Trans. Inf. Theory, vol. 62, no. 4, pp. 1897–1910, 2016.
-  A. M. Bedewy, Y. Sun, and N. B. Shroff, “Age-Optimal Information Updates in Multihop Networks,” Proc. of IEEE ISIT, pp. 576–580, 2017.
-  A. Kosta, N. Pappas, and V. Angelakis, “Age of Information: A New Concept, Metric, and Tool,” Foundations and Trends® in Networking, vol. 12, no. 3, pp. 162–259, 2017.
-  Q. He, D. Yuan, and A. Ephremides, “Optimal Link Scheduling for Age Minimization in Wireless Systems,” accepted to IEEE Trans. Inf. Theory, 2017.
-  C. Joo and A. Eryilmaz, “Wireless Scheduling for Information Freshness and Synchrony: Drift-based Design and Heavy-Traffic Analysis,” Proc. of IEEE WIOPT, pp. 1–8, 2017.
-  I. Kadota, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Minimizing Age of Information in Broadcast Wireless Networks,” Proc. of Allerton, 2016.
-  R. D. Yates, P. Ciblat, A. Yener, and M. Wigger, “Age-Optimal Constrained Cache Updating,” Proc. of IEEE ISIT, pp. 141–145, 2017.
-  M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. The MIT Press, 1994.
-  L. I. Sennott, “Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs,” Operations Research, vol. 37, pp. 626–633, 1989.
W. B. Powell,
Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, 2011.