Information usually has the greatest value when it is fresh . For example, real-time knowledge about the location, orientation, and speed of motor vehicles is imperative in autonomous driving, and the access to timely updates about the stock price and interest-rate movements is essential for developing trading strategies on the stock market. In [2, 3], the concept of Age of Information was introduced to measure the freshness of information that a receiver has about the status of a remote source. Consider a sequence of source samples that are sent through a queue to a receiver, as illustrated in Fig. 1. Each sample is stamped with its generation time. Let be the time stamp of the newest sample that has been delivered to the receiver by time instant . The age of information, as a function of , is defined as , which is the time elapsed since the newest sample was generated. Hence, a small age indicates that there exists a fresh sample of the source status at the receiver.
In practice, the status of different sources may vary over time with different speeds. For example, the location of a car can change much faster than the temperature of its engine. While the age of information represents the time difference between the samples available at the transmitter and receiver, it is independent of the changing speed of the source. Hence, the age is not an appropriate measure for comparing the freshness of information about different sources.
In recent years, several examples and approaches for evaluating the freshness of information about time-correlated sources have been discussed in, e.g., [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. In [4, 5, 6]
and the references therein, the received samples are used to estimate the source value in real-time, where the estimation error is used to measure the freshness of information available at the receiver. In, an age penalty function was employed to describe the level of dissatisfaction for having aged samples at the receiver, where is an arbitrary non-negative and non-decreasing function of the age that can be specified based on the application; in addition, an optimal sampling strategy was developed to minimize the time-average expected age penalty function. In , the authors considered the relationship between the auto-correlation function (where denotes the source status at time instant ) and the age penalty function in , and provided analytical expressions for the long-run time average of a few auto-correlation functions. In [9, 10, 11, 12, 13], several scheduling policies were developed to minimize an arbitrary non-decreasing functional of the age process in several network settings. The age penalty models in [9, 10, 11, 12, 13] are quite general, which include most age penalty models considered in previous studies as special cases. For example, because the functional is a mapping from the space of age processes to real numbers, it can be selected to describe the time-average age (i.e., ), or the time-average of an age penalty function that depends on the age levels at multiple time instants (i.e., ).
In this paper, we propose a new measure for the freshness of information, which can precisely describe how information ages over time. For Markov sources, an online sampling policy is developed to optimize the freshness of information.111Non-Markov sources will be considered in our future work. The detailed contributions of this paper are summarized as follows:
We propose to use the mutual information between the real-time source value and the received samples to quantify the freshness of the information contained in the received samples. This mutual information term is easy to compute for Markov sources: By using the data processing inequality, this mutual information is shown to be a non-negative and non-increasing function of the age (Lemma 1). Therefore, the “aging” of the received information can be interpreted as a procedure that this mutual information reduces as the age grows.
In order to optimize the freshness of information, we study the optimal sampling strategy that maximizes the time-average expected mutual information. This problem is solved in two steps: (i) We first generalize  to obtain an optimal sampling strategy that minimizes the time-average expected age penalty function , where is an arbitrary non-decreasing function of the age (Theorem 1). (ii) Next, we apply the result of Step (i) to a special age penalty function, i.e., the negative of the mutual information, which is a non-positive and non-decreasing function of the age.
The obtained optimal sampling strategy has a nice structure: A new sample is taken once a conditional mutual information reduces to a threshold , and the threshold is equal to the optimum value of the time-average expected mutual information that we are maximizing (Theorem 2). Numerical results are provided to compare different sampling policies.
I-a Relationship with Previous Work
The age penalty function in  is non-negative and non-decreasing. It cannot be directly applied to our problem, because the negative of the mutual information is a non-positive and non-decreasing function of the age. We relaxed to be an arbitrary non-decreasing function in this paper.
In , a two-layered nested bisection search algorithm was developed to compute the threshold . In this paper, is characterized as the solution of a fixed-point equation, which can be solved by a single layer of bisection search. Hence, the computation of is simplified.
In , the optimal sampling strategy was obtained for a continuous-time system. In this paper, we develop an optimal sampling strategy for a discrete-time system, without taking any approximation or sub-optimality.
It was assume in  that after the previous sample was delivered, the next sample must be generated within a fixed amount of time. By adopting more powerful proof techniques, we are able to remove such an assumption and greatly simplify the proof procedure in this paper.
Ii System Model
We consider a discrete-time status-update system that is illustrated in Fig. 1, where samples of a source are taken and sent to a receiver through a communication channel. The channel is modeled as a single-server FIFO queue with i.i.d. service times. The system starts to operate at time instant . The -th sample is generated at time instant and is delivered to the receiver at time instant with a discrete service time , where , , and for all . Each sample packet contains both the sampling time and the sample value . The samples that the receiver has received by time instant are denoted by the set
At any time instant , the receiver uses the received samples to reconstruct an estimate of the real-time source value , where we assume that the estimator neglects the knowledge implied by the timing for taking the samples.
The initial state of the system is assumed to satisfy , , and is a finite constant.
Let represent a sampling policy and denote the set of causal sampling policies that satisfy the following two conditions: (i) Each sampling time is chosen based on history and current information of the system, but not on any future information. (ii) The inter-sampling times form a regenerative process [14, Section 6.1]222We assume that is a regenerative process because we will optimize , but operationally a nicer objective function is . These two objective functions are equivalent if is a regenerative process.: There exists an increasing sequence of almost surely finite random integers such that the post- process has the same distribution as the post- process and is independent of the pre- process ; in addition,
We assume that the Markov chainand the service times are determined by two mutually independent external processes, which do not change according to the adopted sampling policy.
Iii Mutual Information as a Measure of the Freshness of Information
In this paper, we propose to use the mutual information
as a metric for evaluating the freshness of information that is available at the receiver. In information theory, is the amount of information that the received samples carries about the real-time source value . If is close to , the received samples are considered to be fresh; if is almost , the received samples are considered to be obsolete. In addition, because has naturally incorporated the information structure of the source , it can effectively characterize the freshness of information about sources with different time-varying patterns.
One way to interpret is to consider how helpful the received samples are for inferring . By using the Shannon code lengths [15, Section 5.4], the expected minimum number of bits required to specify satisfies
where can be interpreted as the expected minimum number of binary tests that are needed to infer . On the other hand, with the knowledge of , the expected minimum number of bits required to specify satisfies
is a random vector consisting of a large number of symbols (e.g.,represents an image containing many pixels or the channel coefficients of many OFDM subcarriers), the one bit of overhead in (4) and (5) is insignificant. Hence, is approximately the reduction in the description cost for inferring without and with the knowledge of .
Iii-a Markov Sources
To get more insights, let us consider the class of Markov sources and use the Markov property to simplify . By using the data processing inequality , it is not hard to show that has the following property:
If is a time-homogeneous Markov chain and is defined in (1), then the mutual information
can be expressed as a non-negative and non-increasing function of the age .
Next, because is time-homogeneous, for all , which is a function of the . Further, because is a Markov chain, owing to the data processing inequality [15, Theorem 2.8.1], is non-increasing in . Finally, mutual information is non-negative. This completes the proof. ∎
According to Lemma 1, information “aging” can be considered as a procedure that the amount of information that is preserved in for inferring the real-time source value decreases as the age grows. This is similar to the data processing inequality  which states that no processing of the data can increase the information that contains about ; the difference is that in the status-update systems that we consider, the sample set , the age , and the signal value are all evolving over time.
Two examples of the Markov source are provided in the sequel as illustrations of Lemma 1:
Iii-A1 Gaussian Markov Source
Suppose that is a first-order discrete-time Gaussian Markov process, defined by
where and the ’s are zero-mean i.i.d.. Because is a Gaussian Markov process, one can show that 
Since and is an integer, is a positive and decreasing function of the age . Note that if , then , because the absolute entropy of a Gaussian random variable is infinite.
Iii-A2 Binary Markov Source
Suppose that is a binary symmetric Markov chain defined by
where denotes binary modulo-2 addition and the ’s are i.i.d. Bernoulli random variables with mean . One can show that
where and is the binary entropy function defined by with a domain [15, Eq. (2.5)]. Because is increasing on , is a non-negative and decreasing function of the age .
Iv Online Sampling for Information Freshness
In this section, we will develop an optimal online sampling policy that can maximize the freshness of information about Markov sources.
Iv-a Problem Formulation
To optimize the freshness of information, we formulate an online sampling problem for maximizing the time-average expected mutual information between and over an infinite time-horizon:
where is the optimal value of (11). We assume that is finite.
It is helpful to remark that in (11) is different from the Shannon capacity considered in, e.g., [17, 15]: In (11), our goal is to maximize the freshness of information and make more accurate inference about the real-time source value; this goal is achieved by minimizing the average amount of mutual information that is lost as the received data becomes obsolete. On the other hand, the focus of Shannon capacity theory is mainly on maximizing the rate of information that can be reliably transmitted to the receiver, but (in most cases) without significant concerns about whether the received information is new or old.
Iv-B Optimal Online Sampling Policy
In , an age penalty function was defined to characterize the level of dissatisfaction for having aged information at the receiver, where is an arbitrary non-negative and non-decreasing function that can be specified according to the application. For continuous-time status-update systems, the optimal sampling policy for minimizing the time-average expected age penalty was obtained in . Unfortunately, we are not able to apply the results in  to solve (11). Specifically, if we choose an age penalty function , then Lemma 1 suggests that is a non-positive and non-decreasing, which is different from the non-negative and non-decreasing age penalty function required in . In addition, we consider a discrete-time system in this paper, which is different from the continuous-time system in .
To address this problem, we generalize  by considering an arbitrary non-decreasing age penalty function (no matter positive or negative) and design an optimal sampling policy that minimizes the time-average expected age penalty. To that end, we consider the following discrete-time age penalty minimization problem:
where is an arbitrary non-decreasing function and denotes the optimal value of (12). We assume that is finite. Problem (12) is a Markov decision problem. A closed-form solution of (12) is provided in the following theorem:
See Section V. ∎
Next, we consider a special case that . It follows from Theorem 1 that
The optimal sampling policy in (2) and (16) has a nice structure: The next sampling time is determined based on the mutual information between the freshest received sample and the signal value , where is the delivery time of the -th sample. Because the transmission time will be known by both the transmitter and receiver at time , is the side information that is characterized by the conditional mutual information . The conditional mutual information decreases as time grows. According to (2), the -th sample is generated at the smallest integer time instant satisfying two conditions: (i) The -th sample has already been delivered, i.e., , and (ii) The conditional mutual information has reduced to be no greater than a pre-determined threshold . In addition, according to (16), the threshold is equal to the optimum objective value in (11), i.e., the optimum of the time-average expected mutual information that we are maximizing. Note that the sampling times and delivery times on the right-hand side of (16) depends on . Hence, is a fixed point of (16).
The optimal sampling policy is illustrated in Fig. 2, where the service time is equal to either or with equal probability. The service time , delivery time , and conditional mutual information of the samples are depicted in the figure. One can observe that if the service time of the previous sample is , the sampler will wait until the conditional mutual information drops below the threshold and then take the next sample; if the service time of the previous sample is , the next sample is taken upon the delivery of the previous sample at time , because is below then.
Notice that in the optimal sampling policy (2) and (16), there is at most one sample in transmission at any time and no sample is waiting in the queue. This is different from the traditional uniform sampling policy, in which the waiting time in the queue can be quite high and, as a result, the freshness of information is low. This phenomenon will be illustrated by our numerical results in Section VI.
V Proof of Theorem 1
V-a Simplification of Problem (12)
In [7, 5], it was shown that no new sample should be taken when the server is busy. The reason is as follows: If a sample is taken when the server is busy, it has to wait in the queue for its transmission opportunity; meanwhile the sample is becoming stale. A better strategy is to take a new sample once the server becomes idle. By using the sufficient statistic of the Markov chain , one can show that the second strategy is better.
Because of this, we only need to consider a sub-class of sampling policies in which each sample is generated and submitted to the server after the previous sample is delivered, i.e.,
Let represent the waiting time between the delivery time of sample and the generation time of sample . Since , we have and . Given , is uniquely determined by . Hence, one can also use to represent a sampling policy in .
In addition, for each policy in , it holds that . In this case, the age in (2) can be expressed as
which is a function of . Define
then (12) can be simplified as
In order to solve (20), let us consider the following Markov decision problem with a parameter :
V-B Optimal Solution of (21) for
Next, we present an optimal solution to (21) for .
A policy is said to be a stationary randomized policy, if it observes and then chooses a waiting time based on the observed value of , according to a conditional probability measure that is invariant for all Let () denote the set of stationary randomized policies, defined by
If the service times are i.i.d., then there exists a stationary randomized policy that is optimal for solving (21) with .
In (21), the minimization of the term
over depends on via . Hence, is a sufficient statistic for determining in (21). This means that the rule for determining
can be represented by the conditional probability distribution, and in addition, there exists an optimal solution to (21), in which is determined by solving
) depends on the joint distribution ofand . Because the ’s are i.i.d., the joint distribution of and is invariant for Hence, the optimal conditional probability measure solving (24) is invariant for By definition, there exists a stationary randomized policy that is optimal for solving Problem (21) with , which completes the proof. ∎
Next, by using an idea similar to that in the solution of [21, Problem 5.5.3], we can obtain
If is non-decreasing and the service times are i.i.d., then an optimal solution of (21) is given by
Vi Numerical Results
In this section, we evaluate the freshness of information achieved in the following three sampling policies:
Uniform sampling: Periodic sampling with a period given by .
Zero-wait: In this sampling policy, a new sample is taken once the previous sample is delivered to the receiver, so that .
Optimal policy: The sampling policy given by Theorem 2.
Let , , and be the average mutual information of these three sampling policies.
We consider the binary Markov source in (9). The service time is equal to either or with equal probability.333The service time distribution is different from that used in Figure 2. Figure 3 depicts the time-average expected mutual information versus the mean of the Bernoulli random variables in (9). One can observe that holds for every value of . Notice that because of the queueing delay in the uniform sampling policy, is much smaller than and . In addition, as grows from 0 to 0.5, the changing speed of the binary Markov source increases and the freshness of information (i.e., the time-average expected mutual information) decreases. When , the ’s form an i.i.d. sequence and the freshness of information is zero in all three sampling policies.
In this paper, we have used mutual information to evaluate the freshness of the received samples that describe the status of a remote source. We have developed an optimal sampling policy that can maximize the time-average expectation of the above mutual information. This optimal sampling policy has been shown to have a nice structure. In addition, we have generalized  by finding the optimal sampling strategies for minimizing the time-average expectation of arbitrary non-decreasing age penalty functions.
-  C. Shapiro and H. Varian, Information Rules: A Strategic Guide to the Network Economy. Harvard Business Press, 1999.
-  X. Song and J. W. S. Liu, “Performance of multiversion concurrency control algorithms in maintaining temporal consistency,” in Fourteenth Annual International Computer Software and Applications Conference, Oct 1990, pp. 132–139.
-  S. Kaul, R. D. Yates, and M. Gruteser, “Real-time status: How often should one update?” in IEEE INFOCOM, 2012.
-  R. D. Yates and S. Kaul, “Real-time status updating: Multiple sources,” in IEEE ISIT, July 2012, pp. 2666–2670.
-  Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of the Wiener process over a channel with random delay,” in IEEE ISIT, 2017.
-  X. Gao, E. Akyol, and T. Başar, “Optimal communication scheduling and remote estimation over an additive noise channel,” Automatica, vol. 88, pp. 57 – 69, 2018.
-  Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” IEEE Trans. Inf. Theory, vol. 63, no. 11, pp. 7492–7508, Nov. 2017.
-  A. Kosta, N. Pappas, A. Ephremides, and V. Angelakis, “Age and value of information: Non-linear age case,” in IEEE ISIT, June 2017, pp. 326–330.
-  A. M. Bedewy, Y. Sun, and N. B. Shroff, “Optimizing data freshness, throughput, and delay in multi-server information-update systems,” in IEEE ISIT, 2016.
-  ——, “Age-optimal information updates in multihop networks,” in IEEE ISIT, 2017.
-  ——, “Minimizing the age of information through queues,” submitted to IEEE Trans. Inf. Theory, 2017, http://arxiv.org/abs/1709.04956.
-  ——, “The age of information in multihop networks,” submitted to IEEE Trans. Inf. Theory, 2017, https://arxiv.org/abs/1712.10061.
-  Y. Sun, E. Uysal-Biyikoglu, and S. Kompella, “Age-optimal updates of multiple information flows,” in IEEE INFOCOM Workshops — the 1st Workshop on the Age of Information (AoI Workshop), 2018.
-  P. J. Haas, Stochastic Petri Nets: Modelling, Stability, Simulation. New York, NY: Springer New York, 2002.
-  T. Cover and J. Thomas, Elements of Information Theory. John Wiley and Sons, 1991.
-  I. M. Gel’fand and A. M. Yaglom, “Calculation of the amount of information about a random function contained in another such function,” American Mathematical Society Translations, vol. 12, pp. 199–246, 1959.
-  V. Anantharam and S. Verdú, “Bits through queues,” IEEE Trans. Inf. Theory, vol. 42, no. 1, pp. 4–18, Jan 1996.
-  S. M. Ross, Stochastic Processes, 2nd ed. John Wiley & Sons, 1996.
-  W. Dinkelbach, “On nonlinear fractional programming,” Management Science, vol. 13, no. 7, pp. 492–498, 1967.
-  Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of the Wiener process over a channel with random delay,” Jan. 2017, http://arxiv.org/abs/1701.06734.
-  D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA: Athena Scientific, 1999.