I Introduction
Information usually has the greatest value when it is fresh [1]. For example, realtime knowledge about the location, orientation, and speed of motor vehicles is imperative in autonomous driving, and the access to timely updates about the stock price and interestrate movements is essential for developing trading strategies on the stock market. In [2, 3], the concept of Age of Information was introduced to measure the freshness of information that a receiver has about the status of a remote source. Consider a sequence of source samples that are sent through a queue to a receiver, as illustrated in Fig. 1. Each sample is stamped with its generation time. Let be the time stamp of the newest sample that has been delivered to the receiver by time instant . The age of information, as a function of , is defined as , which is the time elapsed since the newest sample was generated. Hence, a small age indicates that there exists a fresh sample of the source status at the receiver.
In practice, the status of different sources may vary over time with different speeds. For example, the location of a car can change much faster than the temperature of its engine. While the age of information represents the time difference between the samples available at the transmitter and receiver, it is independent of the changing speed of the source. Hence, the age is not an appropriate measure for comparing the freshness of information about different sources.
In recent years, several examples and approaches for evaluating the freshness of information about timecorrelated sources have been discussed in, e.g., [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. In [4, 5, 6]
and the references therein, the received samples are used to estimate the source value in realtime, where the estimation error is used to measure the freshness of information available at the receiver. In
[7], an age penalty function was employed to describe the level of dissatisfaction for having aged samples at the receiver, where is an arbitrary nonnegative and nondecreasing function of the age that can be specified based on the application; in addition, an optimal sampling strategy was developed to minimize the timeaverage expected age penalty function. In [8], the authors considered the relationship between the autocorrelation function (where denotes the source status at time instant ) and the age penalty function in [7], and provided analytical expressions for the longrun time average of a few autocorrelation functions. In [9, 10, 11, 12, 13], several scheduling policies were developed to minimize an arbitrary nondecreasing functional of the age process in several network settings. The age penalty models in [9, 10, 11, 12, 13] are quite general, which include most age penalty models considered in previous studies as special cases. For example, because the functional is a mapping from the space of age processes to real numbers, it can be selected to describe the timeaverage age (i.e., ), or the timeaverage of an age penalty function that depends on the age levels at multiple time instants (i.e., ).In this paper, we propose a new measure for the freshness of information, which can precisely describe how information ages over time. For Markov sources, an online sampling policy is developed to optimize the freshness of information.^{1}^{1}1NonMarkov sources will be considered in our future work. The detailed contributions of this paper are summarized as follows:

We propose to use the mutual information between the realtime source value and the received samples to quantify the freshness of the information contained in the received samples. This mutual information term is easy to compute for Markov sources: By using the data processing inequality, this mutual information is shown to be a nonnegative and nonincreasing function of the age (Lemma 1). Therefore, the “aging” of the received information can be interpreted as a procedure that this mutual information reduces as the age grows.

In order to optimize the freshness of information, we study the optimal sampling strategy that maximizes the timeaverage expected mutual information. This problem is solved in two steps: (i) We first generalize [7] to obtain an optimal sampling strategy that minimizes the timeaverage expected age penalty function , where is an arbitrary nondecreasing function of the age (Theorem 1). (ii) Next, we apply the result of Step (i) to a special age penalty function, i.e., the negative of the mutual information, which is a nonpositive and nondecreasing function of the age.

The obtained optimal sampling strategy has a nice structure: A new sample is taken once a conditional mutual information reduces to a threshold , and the threshold is equal to the optimum value of the timeaverage expected mutual information that we are maximizing (Theorem 2). Numerical results are provided to compare different sampling policies.
Ia Relationship with Previous Work
The closest study to this paper is [7]. The differences between [7] and this paper are explained in the following:

The age penalty function in [7] is nonnegative and nondecreasing. It cannot be directly applied to our problem, because the negative of the mutual information is a nonpositive and nondecreasing function of the age. We relaxed to be an arbitrary nondecreasing function in this paper.

In [7], a twolayered nested bisection search algorithm was developed to compute the threshold . In this paper, is characterized as the solution of a fixedpoint equation, which can be solved by a single layer of bisection search. Hence, the computation of is simplified.

In [7], the optimal sampling strategy was obtained for a continuoustime system. In this paper, we develop an optimal sampling strategy for a discretetime system, without taking any approximation or suboptimality.

It was assume in [7] that after the previous sample was delivered, the next sample must be generated within a fixed amount of time. By adopting more powerful proof techniques, we are able to remove such an assumption and greatly simplify the proof procedure in this paper.
Ii System Model
We consider a discretetime statusupdate system that is illustrated in Fig. 1, where samples of a source are taken and sent to a receiver through a communication channel. The channel is modeled as a singleserver FIFO queue with i.i.d. service times. The system starts to operate at time instant . The th sample is generated at time instant and is delivered to the receiver at time instant with a discrete service time , where , , and for all . Each sample packet contains both the sampling time and the sample value . The samples that the receiver has received by time instant are denoted by the set
(1) 
At any time instant , the receiver uses the received samples to reconstruct an estimate of the realtime source value , where we assume that the estimator neglects the knowledge implied by the timing for taking the samples.
Let be the time stamp of the freshest sample that the receiver has received by time instant . Then, the age of information, or simply the age, at time instant is defined as [2, 3]
(2) 
The initial state of the system is assumed to satisfy , , and is a finite constant.
Let represent a sampling policy and denote the set of causal sampling policies that satisfy the following two conditions: (i) Each sampling time is chosen based on history and current information of the system, but not on any future information. (ii) The intersampling times form a regenerative process [14, Section 6.1]^{2}^{2}2We assume that is a regenerative process because we will optimize , but operationally a nicer objective function is . These two objective functions are equivalent if is a regenerative process.: There exists an increasing sequence of almost surely finite random integers such that the post process has the same distribution as the post process and is independent of the pre process ; in addition,
We assume that the Markov chain
and the service times are determined by two mutually independent external processes, which do not change according to the adopted sampling policy.Iii Mutual Information as a Measure of the Freshness of Information
In this paper, we propose to use the mutual information
(3) 
as a metric for evaluating the freshness of information that is available at the receiver. In information theory, is the amount of information that the received samples carries about the realtime source value . If is close to , the received samples are considered to be fresh; if is almost , the received samples are considered to be obsolete. In addition, because has naturally incorporated the information structure of the source , it can effectively characterize the freshness of information about sources with different timevarying patterns.
One way to interpret is to consider how helpful the received samples are for inferring . By using the Shannon code lengths [15, Section 5.4], the expected minimum number of bits required to specify satisfies
(4) 
where can be interpreted as the expected minimum number of binary tests that are needed to infer . On the other hand, with the knowledge of , the expected minimum number of bits required to specify satisfies
(5) 
If
is a random vector consisting of a large number of symbols (e.g.,
represents an image containing many pixels or the channel coefficients of many OFDM subcarriers), the one bit of overhead in (4) and (5) is insignificant. Hence, is approximately the reduction in the description cost for inferring without and with the knowledge of .Iiia Markov Sources
To get more insights, let us consider the class of Markov sources and use the Markov property to simplify . By using the data processing inequality [15], it is not hard to show that has the following property:
Lemma 1.
If is a timehomogeneous Markov chain and is defined in (1), then the mutual information
(6) 
can be expressed as a nonnegative and nonincreasing function of the age .
Proof.
Because is a Markov chain, contains all the information in about . In other words, is a sufficient statistic of for estimating . Then, (6) follows from [15, Eq. (2.124)].
Next, because is timehomogeneous, for all , which is a function of the . Further, because is a Markov chain, owing to the data processing inequality [15, Theorem 2.8.1], is nonincreasing in . Finally, mutual information is nonnegative. This completes the proof. ∎
According to Lemma 1, information “aging” can be considered as a procedure that the amount of information that is preserved in for inferring the realtime source value decreases as the age grows. This is similar to the data processing inequality [15] which states that no processing of the data can increase the information that contains about ; the difference is that in the statusupdate systems that we consider, the sample set , the age , and the signal value are all evolving over time.
Two examples of the Markov source are provided in the sequel as illustrations of Lemma 1:
IiiA1 Gaussian Markov Source
Suppose that is a firstorder discretetime Gaussian Markov process, defined by
(7) 
where and the ’s are zeromean i.i.d.
Gaussian random variables with variance
. Because is a Gaussian Markov process, one can show that [16](8) 
Since and is an integer, is a positive and decreasing function of the age . Note that if , then , because the absolute entropy of a Gaussian random variable is infinite.
IiiA2 Binary Markov Source
Suppose that is a binary symmetric Markov chain defined by
(9) 
where denotes binary modulo2 addition and the ’s are i.i.d. Bernoulli random variables with mean . One can show that
(10) 
where and is the binary entropy function defined by with a domain [15, Eq. (2.5)]. Because is increasing on , is a nonnegative and decreasing function of the age .
Iv Online Sampling for Information Freshness
In this section, we will develop an optimal online sampling policy that can maximize the freshness of information about Markov sources.
Iva Problem Formulation
To optimize the freshness of information, we formulate an online sampling problem for maximizing the timeaverage expected mutual information between and over an infinite timehorizon:
(11) 
where is the optimal value of (11). We assume that is finite.
It is helpful to remark that in (11) is different from the Shannon capacity considered in, e.g., [17, 15]: In (11), our goal is to maximize the freshness of information and make more accurate inference about the realtime source value; this goal is achieved by minimizing the average amount of mutual information that is lost as the received data becomes obsolete. On the other hand, the focus of Shannon capacity theory is mainly on maximizing the rate of information that can be reliably transmitted to the receiver, but (in most cases) without significant concerns about whether the received information is new or old.
IvB Optimal Online Sampling Policy
In [7], an age penalty function was defined to characterize the level of dissatisfaction for having aged information at the receiver, where is an arbitrary nonnegative and nondecreasing function that can be specified according to the application. For continuoustime statusupdate systems, the optimal sampling policy for minimizing the timeaverage expected age penalty was obtained in [7]. Unfortunately, we are not able to apply the results in [7] to solve (11). Specifically, if we choose an age penalty function , then Lemma 1 suggests that is a nonpositive and nondecreasing, which is different from the nonnegative and nondecreasing age penalty function required in [7]. In addition, we consider a discretetime system in this paper, which is different from the continuoustime system in [7].
To address this problem, we generalize [7] by considering an arbitrary nondecreasing age penalty function (no matter positive or negative) and design an optimal sampling policy that minimizes the timeaverage expected age penalty. To that end, we consider the following discretetime age penalty minimization problem:
(12) 
where is an arbitrary nondecreasing function and denotes the optimal value of (12). We assume that is finite. Problem (12) is a Markov decision problem. A closedform solution of (12) is provided in the following theorem:
Theorem 1.
Proof.
See Section V. ∎
Next, we consider a special case that . It follows from Theorem 1 that
Theorem 2.
The optimal sampling policy in (2) and (16) has a nice structure: The next sampling time is determined based on the mutual information between the freshest received sample and the signal value , where is the delivery time of the th sample. Because the transmission time will be known by both the transmitter and receiver at time , is the side information that is characterized by the conditional mutual information . The conditional mutual information decreases as time grows. According to (2), the th sample is generated at the smallest integer time instant satisfying two conditions: (i) The th sample has already been delivered, i.e., , and (ii) The conditional mutual information has reduced to be no greater than a predetermined threshold . In addition, according to (16), the threshold is equal to the optimum objective value in (11), i.e., the optimum of the timeaverage expected mutual information that we are maximizing. Note that the sampling times and delivery times on the righthand side of (16) depends on . Hence, is a fixed point of (16).
The optimal sampling policy is illustrated in Fig. 2, where the service time is equal to either or with equal probability. The service time , delivery time , and conditional mutual information of the samples are depicted in the figure. One can observe that if the service time of the previous sample is , the sampler will wait until the conditional mutual information drops below the threshold and then take the next sample; if the service time of the previous sample is , the next sample is taken upon the delivery of the previous sample at time , because is below then.
Notice that in the optimal sampling policy (2) and (16), there is at most one sample in transmission at any time and no sample is waiting in the queue. This is different from the traditional uniform sampling policy, in which the waiting time in the queue can be quite high and, as a result, the freshness of information is low. This phenomenon will be illustrated by our numerical results in Section VI.
V Proof of Theorem 1
Va Simplification of Problem (12)
In [7, 5], it was shown that no new sample should be taken when the server is busy. The reason is as follows: If a sample is taken when the server is busy, it has to wait in the queue for its transmission opportunity; meanwhile the sample is becoming stale. A better strategy is to take a new sample once the server becomes idle. By using the sufficient statistic of the Markov chain , one can show that the second strategy is better.
Because of this, we only need to consider a subclass of sampling policies in which each sample is generated and submitted to the server after the previous sample is delivered, i.e.,
(17) 
Let represent the waiting time between the delivery time of sample and the generation time of sample . Since , we have and . Given , is uniquely determined by . Hence, one can also use to represent a sampling policy in .
Because is a regenerative process, using the renewal theory in [18] and [14, Section 6.1], one can show that in Problem (12), and are convergent sequences and
In addition, for each policy in , it holds that . In this case, the age in (2) can be expressed as
Hence,
(18) 
which is a function of . Define
(19) 
then (12) can be simplified as
(20) 
In order to solve (20), let us consider the following Markov decision problem with a parameter :
(21) 
where is the optimum value of (21). Similar with Dinkelbach’s method [19] for nonlinear fractional programming, the following lemma in [20] also holds for our Markov decision problem (20):
Lemma 2.
VB Optimal Solution of (21) for
Next, we present an optimal solution to (21) for .
Definition 1.
A policy is said to be a stationary randomized policy, if it observes and then chooses a waiting time based on the observed value of , according to a conditional probability measure that is invariant for all Let () denote the set of stationary randomized policies, defined by
Lemma 3.
If the service times are i.i.d., then there exists a stationary randomized policy that is optimal for solving (21) with .
Proof.
In (21), the minimization of the term
(23) 
over depends on via . Hence, is a sufficient statistic for determining in (21). This means that the rule for determining
can be represented by the conditional probability distribution
, and in addition, there exists an optimal solution to (21), in which is determined by solving(24) 
and then use the observation and the optimal conditional probability distribution that solves (24) to decide . Finally, notice that the minimizer of (24
) depends on the joint distribution of
and . Because the ’s are i.i.d., the joint distribution of and is invariant for Hence, the optimal conditional probability measure solving (24) is invariant for By definition, there exists a stationary randomized policy that is optimal for solving Problem (21) with , which completes the proof. ∎Next, by using an idea similar to that in the solution of [21, Problem 5.5.3], we can obtain
Lemma 4.
If is nondecreasing and the service times are i.i.d., then an optimal solution of (21) is given by
(25) 
where .
Proof.
Vi Numerical Results
In this section, we evaluate the freshness of information achieved in the following three sampling policies:

Uniform sampling: Periodic sampling with a period given by .

Zerowait: In this sampling policy, a new sample is taken once the previous sample is delivered to the receiver, so that .

Optimal policy: The sampling policy given by Theorem 2.
Let , , and be the average mutual information of these three sampling policies.
We consider the binary Markov source in (9). The service time is equal to either or with equal probability.^{3}^{3}3The service time distribution is different from that used in Figure 2. Figure 3 depicts the timeaverage expected mutual information versus the mean of the Bernoulli random variables in (9). One can observe that holds for every value of . Notice that because of the queueing delay in the uniform sampling policy, is much smaller than and . In addition, as grows from 0 to 0.5, the changing speed of the binary Markov source increases and the freshness of information (i.e., the timeaverage expected mutual information) decreases. When , the ’s form an i.i.d. sequence and the freshness of information is zero in all three sampling policies.
Vii Conclusion
In this paper, we have used mutual information to evaluate the freshness of the received samples that describe the status of a remote source. We have developed an optimal sampling policy that can maximize the timeaverage expectation of the above mutual information. This optimal sampling policy has been shown to have a nice structure. In addition, we have generalized [7] by finding the optimal sampling strategies for minimizing the timeaverage expectation of arbitrary nondecreasing age penalty functions.
References
 [1] C. Shapiro and H. Varian, Information Rules: A Strategic Guide to the Network Economy. Harvard Business Press, 1999.
 [2] X. Song and J. W. S. Liu, “Performance of multiversion concurrency control algorithms in maintaining temporal consistency,” in Fourteenth Annual International Computer Software and Applications Conference, Oct 1990, pp. 132–139.
 [3] S. Kaul, R. D. Yates, and M. Gruteser, “Realtime status: How often should one update?” in IEEE INFOCOM, 2012.
 [4] R. D. Yates and S. Kaul, “Realtime status updating: Multiple sources,” in IEEE ISIT, July 2012, pp. 2666–2670.
 [5] Y. Sun, Y. Polyanskiy, and E. UysalBiyikoglu, “Remote estimation of the Wiener process over a channel with random delay,” in IEEE ISIT, 2017.
 [6] X. Gao, E. Akyol, and T. Başar, “Optimal communication scheduling and remote estimation over an additive noise channel,” Automatica, vol. 88, pp. 57 – 69, 2018.
 [7] Y. Sun, E. UysalBiyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” IEEE Trans. Inf. Theory, vol. 63, no. 11, pp. 7492–7508, Nov. 2017.
 [8] A. Kosta, N. Pappas, A. Ephremides, and V. Angelakis, “Age and value of information: Nonlinear age case,” in IEEE ISIT, June 2017, pp. 326–330.
 [9] A. M. Bedewy, Y. Sun, and N. B. Shroff, “Optimizing data freshness, throughput, and delay in multiserver informationupdate systems,” in IEEE ISIT, 2016.
 [10] ——, “Ageoptimal information updates in multihop networks,” in IEEE ISIT, 2017.
 [11] ——, “Minimizing the age of information through queues,” submitted to IEEE Trans. Inf. Theory, 2017, http://arxiv.org/abs/1709.04956.
 [12] ——, “The age of information in multihop networks,” submitted to IEEE Trans. Inf. Theory, 2017, https://arxiv.org/abs/1712.10061.
 [13] Y. Sun, E. UysalBiyikoglu, and S. Kompella, “Ageoptimal updates of multiple information flows,” in IEEE INFOCOM Workshops — the 1st Workshop on the Age of Information (AoI Workshop), 2018.
 [14] P. J. Haas, Stochastic Petri Nets: Modelling, Stability, Simulation. New York, NY: Springer New York, 2002.
 [15] T. Cover and J. Thomas, Elements of Information Theory. John Wiley and Sons, 1991.
 [16] I. M. Gel’fand and A. M. Yaglom, “Calculation of the amount of information about a random function contained in another such function,” American Mathematical Society Translations, vol. 12, pp. 199–246, 1959.
 [17] V. Anantharam and S. Verdú, “Bits through queues,” IEEE Trans. Inf. Theory, vol. 42, no. 1, pp. 4–18, Jan 1996.
 [18] S. M. Ross, Stochastic Processes, 2nd ed. John Wiley & Sons, 1996.
 [19] W. Dinkelbach, “On nonlinear fractional programming,” Management Science, vol. 13, no. 7, pp. 492–498, 1967.
 [20] Y. Sun, Y. Polyanskiy, and E. UysalBiyikoglu, “Remote estimation of the Wiener process over a channel with random delay,” Jan. 2017, http://arxiv.org/abs/1701.06734.
 [21] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA: Athena Scientific, 1999.
Comments
There are no comments yet.