I Introduction
Information usually has the greatest value when it is fresh [1, p. 56]. For example, realtime knowledge about the location, orientation, and speed of motor vehicles is imperative in autonomous driving, and the access to timely updates about the stock price and interestrate movements is essential for developing trading strategies on the stock market. In [2, 3], the concept of Age of Information was introduced to measure the freshness of information that a receiver has about the status of a remote source. Consider a sequence of source samples that are sent through a queue to a receiver. Let be the generation time of the newest sample that has been delivered to the receiver by time . The age of information, as a function of , is defined as , which is the time elapsed since the newest sample was generated. Hence, a small age indicates that there exists a recently generated sample at the receiver.
In practice, some information sources (e.g., vehicle location, stock price) vary quickly over time, while others (e.g., temperature, interestrate) change slowly. Consider again the example of autonomous driving: The location information of motor vehicles collected 0.5 seconds ago could already be quite stale for making control decisions^{1}^{1}1A car will travel 15 meters during 0.5 seconds at the speed of 70 mph., but the engine temperature measured a few minutes ago is still valid for engine health monitoring. From this example, one can observe that data freshness should be evaluated based on (i) the timevarying pattern of the source and (ii) how valuable the fresh data is in the usage context. Both of these features can be characterized by nonlinear functions of the age , where could be the utility value of data with age
, temporal autocorrelation function of the source, estimation error of signal value, or other applicationspecific performance metrics
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. A survey of nonlinear age functions and their applications is provided in Section IIIB. Recently, the age of information has received significant attention, because of the rapid deployment of realtime applications. A large portion of existing studies on age have been devoted to the linear age function , e.g., [3, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]. However, the design of data update policies for optimizing nonlinear age metrics remains largely unexplored.In this paper, we study a problem of sampling an information source, where the samples are forwarded to a remote receiver through a channel that is modeled as a FIFO queue. The optimal sampler design for optimizing nonlinear age metrics subject to a sampling rate constraint is obtained. The contributions of this paper are summarized as follows:

We consider a class of data freshness metrics, where the utility for data freshness is represented by a nonincreasing function of the age . Accordingly, the penalty for data staleness is denoted by a nondecreasing function of . The sampler design problem for optimizing these data freshness metrics is formulated as a constrained Markov decision process (MDP) with a possibly uncountable state space.

We prove that an optimal sampling solution to this MDP is a deterministic or randomized threshold policy, where the threshold is equal to the optimum objective value of the MDP plus the optimal Lagrangian dual variable associated to the sampling rate constraint; see Section VE for the details. The threshold can be computed by bisection search, and the randomization probabilities are chosen to satisfy the sampling rate constraint. The curse of dimensionality is circumvented in this sampling solution by exploiting the structure of the MDP. This age optimality result holds for (i) general monotonic age metrics, (ii) general service time distributions of the queueing server, and (iii) both continuoustime and discretetime sampling. Among the technical tools used to prove these results are an extension of Dinkelbach’s method for MDP, and a geometric multiplier technique for establishing strong duality. These technical tools were recently used in [45, 46], where a quite different sampling problem was solved.

When there is no sampling rate constraint, a logical sampling policy is the zerowait sampling policy [3, 22, 14], which is throughputoptimal and delayoptimal, but not necessarily ageoptimal. We develop sufficient and necessary conditions for characterizing the optimality of the zerowait sampling policy for general monotonic age metrics. Our numerical results show that the optimal sampling policies can be much better than zerowait sampling and the classic uniform sampling.
The rest of this paper is organized as follows. In Section II, we discuss some related work. In Section III, we describe the system model and the formulation of the optimal sampling problem; a short survey of nonlinear age functions is also provided. In Section IV, we present the optimal sampling policy for different system settings, as well as a sufficient and necessary condition for the optimality of the zerowait sampling policy. The proofs are provided In Section V. The numerical results and the conclusion are presented in Section VI and Section VII. This paper is an extended version of [16].
Ii Related Work
The age of information was used as a data freshness metric as early as 1990s in the studies of realtime databases [2, 47, 48, 49]. Queueing theoretic techniques were introduced to evaluate the age of information in [3]. The average age, average peak age, and age distribution have been analyzed for various queueing systems in, e.g.,[3, 18, 50, 19, 20, 51, 52, 15, 53]. It was observed that a LastCome, FirstServed (LCFS) scheduling policy can achieve a smaller timeaverage age than a few other scheduling policies. The optimality of the LCFS policy, or more generally the LastGenerated, FirstServed (LGFS) policy, was first proven in [54]. This age optimality result holds for several queueing systems with multiple servers, multiple hops, and/or multiple sources [54, 55, 56, 57, 58].
When the transmission power of the source is subject to an energyharvesting constraint, the age of information was minimized in, e.g., [21, 22, 14, 23, 24, 25, 26, 27, 28]. Source coding and channel coding schemes for reducing the age were developed in, e.g., [29, 30, 31, 32]. Ageoptimal transmission scheduling of wireless networks have been investigated in, e.g., [33, 34, 35, 36, 59, 37, 60, 38, 39, 40]. Game theoretical perspective of the age was studied in [61, 62, 41, 42]. The aging effect of channel state information was analyzed in, e.g., [63, 64, 65]. An interesting connection between age of information and remote estimation were revealed in [45, 46], where a challenging sampling problem of Wiener processes was solved analytically. An extension of [45, 46] was conducted recently in [17]. The impact of the age to control systems was studied in [66, 43]. Emulations and measurements of the age were conducted in [67, 68, 44]. An agebased transport protocol was developed in [69].
The most relevant studies to this paper are [14, 31, 40]. This paper is an extension of [14]: The data freshness metrics considered in this paper are more general than those of [14]; the optimal sampling policies developed in this paper are simpler and more insightful than those in [14]; continuoustime sampling was addressed in [14], which is generated to discretetime sampling in this paper. In [31, 40], optimal sampling policies were developed to minimize the timeaverage age for status updates over wireless channels, where the optimal sampling policies were shown to be randomized threshold policies. Structural properties of the randomized threshold policies were obtained in [31, 40] to simplify the value iteration or policy iteration algorithms therein. The linear age function considered in [31, 40] is a special case of the monotonic age functions considered in this paper, and the channel models in [31, 40] are different from ours. In our study, the optimal sampling policies are characterized analytically and can be computed by bisection search. In a special case of [31], a closedform optimal sampling solution was obtained. However, it is unclear whether analytical or closedform solutions can be found for the general cases considered in [31, 40].
Iii Model, Metrics, and Formulation
Iiia System Model
We consider the data update system illustrated in Fig. 1, where samples of a source process are taken and sent to a receiver through a communication channel. The channel is modeled as a singleserver FIFO queue with i.i.d. service times. The system starts to operate at time . The th sample is generated at time and is delivered to the receiver at time with a service time , which satisfy , , , and for all . Each sample packet contains the sampling time and the sample value . Once a sample is delivered, the receiver sends an acknowledgement (ACK) back to the sampler with zero delay. Hence, the sampler has access to the idle/busy state of the server in realtime.
Let be the generation time of the freshest sample that has been delivered to the receiver by time . Then, the age of information, or simply age, at time is defined by [2, 3]
(1) 
which is plotted in Fig. 2. Because , can be also written as
(2) 
The initial state of the system is assumed to be , , and is a finite constant.
In this paper, we will consider both continuoustime and discretetime statusupdate systems. In the continuoustime setting, can take any positive value. In the discretetime setting, is a multiple of period ; as a result, are all discretetime variables. For notational simplicity, we choose second such that all the discretetime variables are integers. The results for other values of can be readily obtained by time scaling.
In practice, the continuoustime setting can be used to model statusupdate systems with a high clock rate, while the discretetime setting is appropriate for characterizing sensors that have a very low energy budget and can only wake up periodically from a lowpower sleep mode.
IiiB Data Staleness and Freshness Metrics: A Survey
The dissatisfaction for data staleness (or the eagerness for data refreshing) is represented by a penalty function of the age , where the function is nondecreasing. This nondecreasing requirement on complies with the observations that stale data is usually less desired than fresh data [1, 4, 5, 6, 7, 8, 9]. This data staleness model is quite general, as it allows to be nonconvex or discontinuous. These data staleness metrics are clearly more general than those in [13, 14], where was restricted to be nonnegative and nondecreasing.
Similarly, data freshness can be characterized by a nonincreasing utility function of the age [5, 7]. One simple choice is . Note that because the age is a function of time , and are both timevarying, as illustrated in Fig. 3. In practice, one can choose and based on the information source and the application under consideration, as illustrated in the following examples.
IiiB1 Autocorrelation Function of the Source
The autocorrelation function can be used to evaluate the freshness of the sample [15]. For some stationary sources, is a nonincreasing function of the age , which can be considered as an age utility function . For example, in stationary ergodic GaussMarkov block fading channels, the impact of channel aging can be characterized by the autocorrelation function of fading channel coefficients. When the age is small, the autocorrelation function and the data rate both decay with respect to the age [63].
IiiB2 Estimation Error of Realtime Source Value
Consider a statusupdate system, where samples of a Markov source are forwarded to a remote estimator. The estimator uses causally received samples to reconstruct an estimate of realtime source value. If the sampling times are independent of the observed source , the estimation error at time can be expressed as an age penalty function [20, 45, 46, 17]. If the sampling times are chosen based on causal knowledge about the source, the estimation error is no longer a function of the age [45, 46, 17].
IiiB3 Information based Data Freshness Metric
Let
(3) 
denote the samples that have been delivered to the receiver by time . One can use the mutual information — the amount of information that the received samples carry about the current source value — to evaluate the freshness of . If is close to , the samples contains a lot of information about and is considered to be fresh; if is almost , provides little information about and is deemed to be obsolete.
One way to interpret is to consider how helpful the received samples are for inferring . By using the Shannon code lengths [70, Section 5.4], the expected minimum number of bits required to specify satisfies
(4) 
where can be interpreted as the expected minimum number of binary tests that are needed to infer . On the other hand, with the knowledge of , the expected minimum number of bits that are required to specify satisfies
(5) 
If
is a random vector consisting of a large number of symbols (e.g.,
represents an image containing many pixels or the coefficients of MIMOOFDM channels), the one bit of overhead in (4) and (5) is insignificant. Hence, is approximately the reduction in the description cost for inferring without and with the knowledge of .If
is a stationary, timehomogeneous Markov chain, by data processing inequality
[70, Theorem 2.8.1], it is easy to prove the following lemma:Lemma 1.
If is a stationary, timehomogeneous Markov chain, is defined in (3), and the sampling times are independent of , then the mutual information
(6) 
is a nonnegative and nonincreasing function of .
Proof.
See Appendix A. ∎
Lemma 1 provides an intuitive interpretation of “information aging”: The amount of information that is preserved in for inferring the current source value decreases as the age grows. When the ’s are independent of , the sampling times of delivered samples do not carry any information about the current source value . One interesting future research direction is how to exploit the timing information in to improve data freshness.
Next, we provide the closedform expression of for two Markov sources:
GaussMarkov Source: Suppose that is a firstorder discretetime GaussMarkov process, defined by
(7) 
where and the ’s are zeromean i.i.d.
Gaussian random variables with variance
. Because is a GaussMarkov process, one can show that [71](8) 
Since and is an integer, is a positive and decreasing function of the age . Note that if , then , because the absolute entropy of a Gaussian random variable is infinite.
Binary Markov Source: Suppose that is a binary symmetric Markov process defined by
(9) 
where denotes binary modulo2 addition and the ’s are i.i.d. Bernoulli random variables with mean . One can show that
(10) 
where and is the binary entropy function defined by with a domain [70, Eq. (2.5)]. Because is increasing on , is a nonnegative and decreasing function of the age .
Similarly, one can also use the conditional entropy to represent the staleness of . If the ’s are independent of and is a timehomogeneous Markov chain, is a nondecreasing function of the age [10, 11, 12]. In this result, the Markov chain is required to be timehomogeneous, but not necessarily stationary. If the sampling times are determined based on causal knowledge of , is no longer a function of the age.
IiiC Formulation of Optimal Sampling Problems
Let represent a sampling policy and denote the set of causal sampling policies that satisfy the following two conditions: (i) Each sampling time is chosen based on history and current information of the idle/busy state of the channel. (ii) The intersampling times form a regenerative process [72, Section 6.1]^{2}^{2}2We assume that is a regenerative process because we will optimize , but operationally a nicer objective function is . These two criteria are equivalent, if is a regenerative process, or more generally, if has only one ergodic class. If no condition is imposed, however, they are different.: There exists an increasing sequence of almost surely finite random integers such that the post process has the same distribution as the post process and is independent of the pre process ; in addition, , , and
We assume that the sampling times are independent of the source process , and the service times of the queue do not change according to the sampling policy. We further assume that for all finite .
In this paper, we study the optimal sampling policy that minimizes (maximizes) the average age penalty (utility) subject to an average sampling rate constraint. In the continuoustime case, we will consider the following problem:
(11)  
s.t.  (12) 
where is the optimal value of (11) and is the maximum allowed sampling rate. In the discretetime case, we need to solve the following optimal sampling problem:
(13)  
s.t.  (14) 
where is the optimal value of (13). We assume that and are finite. The problems for maximizing the average age utility can be readily obtained from (11) and (13) by choosing . In practice, the cost for data updates increases with the average sampling rate. Therefore, Problems (11) and (13) represent a tradeoff between data staleness (freshness) and update cost.
Problems (11) and (13) are constrained MDPs, one with a continuous (uncountable) state space and the other with a countable state space. Because of the curse of dimensionality [73], it is quite rare that one can explicitly solve such problems and derive analytical or closedform solutions that are arbitrarily accurate.
Iv Main Results: Optimal Sampling Policies
In this section, we present a complete characterization of the solutions to (11) and (13). Specifically, the optimal sampling policies are either deterministic or randomized threshold policies, depending on the scenario under consideration. Efficiently computation algorithms of the thresholds and the randomization probabilities are provided. The proofs are relegated to Section V.
Iva Continuoustime Sampling without Rate Constraint
We first consider the continuoustime sampling problem (11). When there is no sampling rate constraint (i.e., ), an solution to (11) is provided in the following theorem:
Theorem 1 (Continuoustime Sampling without Rate Constraint).
The optimal sampling policy in (15)(16) has a nice structure. Specifically, the th sample is generated at the earliest time satisfying two conditions: (i) The th sample has already been delivered by time , i.e., , and (ii) The expected age penalty has grown to be no smaller than a predetermined threshold . Notice that if , then is the delivery time of the th sample. In addition, is equal to the optimum objective value of (11). Hence, (15)(16) requires that the expected age penalty upon the delivery of the th sample is no smaller than , i.e., the minimum possible timeaverage expected age penalty.
Next, we develop an efficient algorithm to find the root of (16). Because the ’s are i.i.d., the expectations in (16) are functions of and are irrelevant of . Given , these expectations can be evaluated by Monte Carlo simulations or importance sampling. Define
(17) 
then
(18) 
If the analytical expression of is available, then (18) can be used to simplify the numerical evaluation of the expected integral in (16). As shown in Section VF, (16) has a unique solution. We use a simple bisection method to solve (16), which is illustrated in Algorithm 1.
IvA1 Optimality Condition of Zerowait Sampling
When , one logical sampling policy is the zerowait sampling policy [22, 14, 3], given by
(19) 
This zerowait sampling policy achieves the maximum throughput and the minimum queueing delay. In the special case of , Theorem 5 of [14] provided a sufficient and necessary condition for characterizing the optimality of the zerowait sampling policy. We now generalize that result to nonlinear age functions in the following corollary:
Corollary 1.
One can consider as the minimum possible value of . It immediately follows from Corollary 1 that
Corollary 2.
The condition
is satisfied by many commonly used distributions, such as exponential distribution, geometric distribution, Erlang distribution, and hyperexponential distribution. According to Corollary
2(b), if is strictly increasing, the zerowait sampling policy (19) is not optimal for these commonly used distributions.IvB Continuoustime Sampling with Rate Constraint
When the sampling rate constraint (12) is imposed, an solution to (11) is presented in the following theorem:
Theorem 2 (Continuoustime Sampling with Rate Constraint).
If is nondecreasing, for all finite , and the service times are i.i.d. with , then (15)(16) is an optimal solution to (11), if
(21) 
Otherwise, is an optimal solution to (11), where
(22) 
and are given by
(23)  
(24) 
, , is determined by solving
(25) 
and is given by^{3}^{3}3If almost surely, then becomes a deterministic threshold policy and can be any number within .
(26) 
According to Theorem 2, the solution to (11) consists of two cases: In Case 1, the deterministic threshold policy in Theorem 1 is an optimal solution to (11), which needs to satisfy (21). In Case 2, the randomized threshold policy in (22)(26) is an optimal solution to (11), which needs to satisfy
(27) 
We note that the only difference between (23) and (24) is that “” is used in (23) while “” is employed in (24).^{4}^{4}4Clearly, an important issue is the optimality of such a randomized threshold policy, which is proven in Section V. If there exists a timeinterval such that
(28) 
as shown in Fig. 4(a), then . In this case, or may not satisfy (27), but their randomized mixture in (22) can satisfy (27). In particular, if and are given by (25) and (26), then (27) follows.
We provide a lowcomplexity algorithm to compute the randomized threshold policy in (22)(26): As shown in Appendix C, there is a unique satisfying (25). We use the bisection method in Algorithm 2 to solve (25) and obtain . After that, and can be computed by substituting into (22)(24) and (26). Because of the similarity between (23) and (24), and are quite sensitive to the numerical error in . This issue can be resolved by replacing in (22) and (26) with and replacing in (22) and (26) with , where and are determined by
(29)  
(30) 
respectively, and is the tolerance in Algorithm 2. One can improve the accuracy of this solution by (i) reducing the tolerance and (ii) computing the expectations more accurately by increasing the number of Monte Carlo realizations or using advanced techniques such as importance sampling.
As depicted in Fig. 4(b)(c), if is strictly increasing on , then almost surely and (22) reduces to a deterministic threshold policy. In this case, Theorem 2 can be greatly simplified, as stated in the following corollary:
Corollary 3.
If is strictly increasing or the distribution of is sufficiently smooth, is strictly increasing in . Hence, the extra condition in Corollary 3 is satisfied for a broad class of age penalty functions and service time distributions.
A restrictive case of problem (11) was studied in [14], where was assumed to be positive and nondecreasing. There is an error in Theorem 3 of [14], because the condition “ is strictly increasing in ” is missing. Further, the solution in Theorem 3 of [14] is more complicated than that in Corollary 3. A special case of Corollary 3 with was derived in Theorem 4 of [14].
IvC Discretetime Sampling
We now move on to the discretetime sampling problem (13). When there is no sampling rate constraint (i.e., ), the solution to (13) is provided in the following theorem:
Theorem 3 (Discretetime Sampling without Rate Constraint).
Theorem 3 is quite similar to Theorem 1, with two minor differences: (i) The sampling time in (15) is a real number, which is restricted to an integer in (33). (ii) The integral in (16) becomes a summation in (34).
In the discretetime case, the optimality of the zerowait sampling policy is characterized as follows.
Corollary 4.
When the sampling rate constraint (14) is imposed, the solution to (13) is provided in the following theorem.
Theorem 4 (Discretetime Sampling with Rate Constraint).
Theorem 4 is similar to Theorem 2, but there are two differences: (i) and are real numbers in (23)(24), which are restricted to integers in (38)(39). (ii) If is strictly increasing in , then holds almost surely in (23)(24) and Theorem 2 can be greatly simplified. However, in the discretetime case, even if is strictly increasing in , may still occur in (38)(39). In fact, it is rather common that holds for the optimal , because of the following reason:
If almost surely, then (37) becomes a deterministic threshold policy that needs to ensure (27). However, because and are integers, such a deterministic threshold policy is difficult to satisfy (27) for all possible values of . On the other hand, if , the randomized threshold policy in (37)(41) can satisfy (27). Hence, even though is strictly increasing in , Theorem 4 cannot be further simplified. This is a key difference between continuoustime and discretetime sampling.
The computation algorithms of the optimal discretetime sampling policies are similar to their counterparts in the continuoustime case, and hence are omitted here.
IvD An Example: Mutual Information Maximization
Next, we provide an example to illustrate the above theoretical results. Suppose that is a stationary, timehomogeneous Markov chain and the sampling times are independent of . The optimal sampling problem that maximizes the timeaverage expected mutual information between and is formulated as
(42) 
where is the optimal value of (42). We assume that is finite. Problem (42) is a special case of (13) satisfying and . The following result follows immediately from Theorem 3.
Comments
There are no comments yet.