Information usually has the greatest value when it is fresh [1, p. 56]. For example, real-time knowledge about the location, orientation, and speed of motor vehicles is imperative in autonomous driving, and the access to timely updates about the stock price and interest-rate movements is essential for developing trading strategies on the stock market. In [2, 3], the concept of Age of Information was introduced to measure the freshness of information that a receiver has about the status of a remote source. Consider a sequence of source samples that are sent through a queue to a receiver. Let be the generation time of the newest sample that has been delivered to the receiver by time . The age of information, as a function of , is defined as , which is the time elapsed since the newest sample was generated. Hence, a small age indicates that there exists a recently generated sample at the receiver.
In practice, some information sources (e.g., vehicle location, stock price) vary quickly over time, while others (e.g., temperature, interest-rate) change slowly. Consider again the example of autonomous driving: The location information of motor vehicles collected 0.5 seconds ago could already be quite stale for making control decisions111A car will travel 15 meters during 0.5 seconds at the speed of 70 mph., but the engine temperature measured a few minutes ago is still valid for engine health monitoring. From this example, one can observe that data freshness should be evaluated based on (i) the time-varying pattern of the source and (ii) how valuable the fresh data is in the usage context. Both of these features can be characterized by non-linear functions of the age , where could be the utility value of data with age
, temporal autocorrelation function of the source, estimation error of signal value, or other application-specific performance metrics[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. A survey of non-linear age functions and their applications is provided in Section III-B. Recently, the age of information has received significant attention, because of the rapid deployment of real-time applications. A large portion of existing studies on age have been devoted to the linear age function , e.g., [3, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]. However, the design of data update policies for optimizing non-linear age metrics remains largely unexplored.
In this paper, we study a problem of sampling an information source, where the samples are forwarded to a remote receiver through a channel that is modeled as a FIFO queue. The optimal sampler design for optimizing non-linear age metrics subject to a sampling rate constraint is obtained. The contributions of this paper are summarized as follows:
We consider a class of data freshness metrics, where the utility for data freshness is represented by a non-increasing function of the age . Accordingly, the penalty for data staleness is denoted by a non-decreasing function of . The sampler design problem for optimizing these data freshness metrics is formulated as a constrained Markov decision process (MDP) with a possibly uncountable state space.
We prove that an optimal sampling solution to this MDP is a deterministic or randomized threshold policy, where the threshold is equal to the optimum objective value of the MDP plus the optimal Lagrangian dual variable associated to the sampling rate constraint; see Section V-E for the details. The threshold can be computed by bisection search, and the randomization probabilities are chosen to satisfy the sampling rate constraint. The curse of dimensionality is circumvented in this sampling solution by exploiting the structure of the MDP. This age optimality result holds for (i) general monotonic age metrics, (ii) general service time distributions of the queueing server, and (iii) both continuous-time and discrete-time sampling. Among the technical tools used to prove these results are an extension of Dinkelbach’s method for MDP, and a geometric multiplier technique for establishing strong duality. These technical tools were recently used in [45, 46], where a quite different sampling problem was solved.
When there is no sampling rate constraint, a logical sampling policy is the zero-wait sampling policy [3, 22, 14], which is throughput-optimal and delay-optimal, but not necessarily age-optimal. We develop sufficient and necessary conditions for characterizing the optimality of the zero-wait sampling policy for general monotonic age metrics. Our numerical results show that the optimal sampling policies can be much better than zero-wait sampling and the classic uniform sampling.
The rest of this paper is organized as follows. In Section II, we discuss some related work. In Section III, we describe the system model and the formulation of the optimal sampling problem; a short survey of non-linear age functions is also provided. In Section IV, we present the optimal sampling policy for different system settings, as well as a sufficient and necessary condition for the optimality of the zero-wait sampling policy. The proofs are provided In Section V. The numerical results and the conclusion are presented in Section VI and Section VII. This paper is an extended version of .
Ii Related Work
The age of information was used as a data freshness metric as early as 1990s in the studies of real-time databases [2, 47, 48, 49]. Queueing theoretic techniques were introduced to evaluate the age of information in . The average age, average peak age, and age distribution have been analyzed for various queueing systems in, e.g.,[3, 18, 50, 19, 20, 51, 52, 15, 53]. It was observed that a Last-Come, First-Served (LCFS) scheduling policy can achieve a smaller time-average age than a few other scheduling policies. The optimality of the LCFS policy, or more generally the Last-Generated, First-Served (LGFS) policy, was first proven in . This age optimality result holds for several queueing systems with multiple servers, multiple hops, and/or multiple sources [54, 55, 56, 57, 58].
When the transmission power of the source is subject to an energy-harvesting constraint, the age of information was minimized in, e.g., [21, 22, 14, 23, 24, 25, 26, 27, 28]. Source coding and channel coding schemes for reducing the age were developed in, e.g., [29, 30, 31, 32]. Age-optimal transmission scheduling of wireless networks have been investigated in, e.g., [33, 34, 35, 36, 59, 37, 60, 38, 39, 40]. Game theoretical perspective of the age was studied in [61, 62, 41, 42]. The aging effect of channel state information was analyzed in, e.g., [63, 64, 65]. An interesting connection between age of information and remote estimation were revealed in [45, 46], where a challenging sampling problem of Wiener processes was solved analytically. An extension of [45, 46] was conducted recently in . The impact of the age to control systems was studied in [66, 43]. Emulations and measurements of the age were conducted in [67, 68, 44]. An age-based transport protocol was developed in .
The most relevant studies to this paper are [14, 31, 40]. This paper is an extension of : The data freshness metrics considered in this paper are more general than those of ; the optimal sampling policies developed in this paper are simpler and more insightful than those in ; continuous-time sampling was addressed in , which is generated to discrete-time sampling in this paper. In [31, 40], optimal sampling policies were developed to minimize the time-average age for status updates over wireless channels, where the optimal sampling policies were shown to be randomized threshold policies. Structural properties of the randomized threshold policies were obtained in [31, 40] to simplify the value iteration or policy iteration algorithms therein. The linear age function considered in [31, 40] is a special case of the monotonic age functions considered in this paper, and the channel models in [31, 40] are different from ours. In our study, the optimal sampling policies are characterized analytically and can be computed by bisection search. In a special case of , a closed-form optimal sampling solution was obtained. However, it is unclear whether analytical or closed-form solutions can be found for the general cases considered in [31, 40].
Iii Model, Metrics, and Formulation
Iii-a System Model
We consider the data update system illustrated in Fig. 1, where samples of a source process are taken and sent to a receiver through a communication channel. The channel is modeled as a single-server FIFO queue with i.i.d. service times. The system starts to operate at time . The -th sample is generated at time and is delivered to the receiver at time with a service time , which satisfy , , , and for all . Each sample packet contains the sampling time and the sample value . Once a sample is delivered, the receiver sends an acknowledgement (ACK) back to the sampler with zero delay. Hence, the sampler has access to the idle/busy state of the server in real-time.
which is plotted in Fig. 2. Because , can be also written as
The initial state of the system is assumed to be , , and is a finite constant.
In this paper, we will consider both continuous-time and discrete-time status-update systems. In the continuous-time setting, can take any positive value. In the discrete-time setting, is a multiple of period ; as a result, are all discrete-time variables. For notational simplicity, we choose second such that all the discrete-time variables are integers. The results for other values of can be readily obtained by time scaling.
In practice, the continuous-time setting can be used to model status-update systems with a high clock rate, while the discrete-time setting is appropriate for characterizing sensors that have a very low energy budget and can only wake up periodically from a low-power sleep mode.
Iii-B Data Staleness and Freshness Metrics: A Survey
The dissatisfaction for data staleness (or the eagerness for data refreshing) is represented by a penalty function of the age , where the function is non-decreasing. This non-decreasing requirement on complies with the observations that stale data is usually less desired than fresh data [1, 4, 5, 6, 7, 8, 9]. This data staleness model is quite general, as it allows to be non-convex or dis-continuous. These data staleness metrics are clearly more general than those in [13, 14], where was restricted to be non-negative and non-decreasing.
Similarly, data freshness can be characterized by a non-increasing utility function of the age [5, 7]. One simple choice is . Note that because the age is a function of time , and are both time-varying, as illustrated in Fig. 3. In practice, one can choose and based on the information source and the application under consideration, as illustrated in the following examples.
Iii-B1 Auto-correlation Function of the Source
The auto-correlation function can be used to evaluate the freshness of the sample . For some stationary sources, is a non-increasing function of the age , which can be considered as an age utility function . For example, in stationary ergodic Gauss-Markov block fading channels, the impact of channel aging can be characterized by the auto-correlation function of fading channel coefficients. When the age is small, the auto-correlation function and the data rate both decay with respect to the age .
Iii-B2 Estimation Error of Real-time Source Value
Consider a status-update system, where samples of a Markov source are forwarded to a remote estimator. The estimator uses causally received samples to reconstruct an estimate of real-time source value. If the sampling times are independent of the observed source , the estimation error at time can be expressed as an age penalty function [20, 45, 46, 17]. If the sampling times are chosen based on causal knowledge about the source, the estimation error is no longer a function of the age [45, 46, 17].
Iii-B3 Information based Data Freshness Metric
denote the samples that have been delivered to the receiver by time . One can use the mutual information — the amount of information that the received samples carry about the current source value — to evaluate the freshness of . If is close to , the samples contains a lot of information about and is considered to be fresh; if is almost , provides little information about and is deemed to be obsolete.
One way to interpret is to consider how helpful the received samples are for inferring . By using the Shannon code lengths [70, Section 5.4], the expected minimum number of bits required to specify satisfies
where can be interpreted as the expected minimum number of binary tests that are needed to infer . On the other hand, with the knowledge of , the expected minimum number of bits that are required to specify satisfies
is a random vector consisting of a large number of symbols (e.g.,represents an image containing many pixels or the coefficients of MIMO-OFDM channels), the one bit of overhead in (4) and (5) is insignificant. Hence, is approximately the reduction in the description cost for inferring without and with the knowledge of .
is a stationary, time-homogeneous Markov chain, by data processing inequality[70, Theorem 2.8.1], it is easy to prove the following lemma:
If is a stationary, time-homogeneous Markov chain, is defined in (3), and the sampling times are independent of , then the mutual information
is a non-negative and non-increasing function of .
See Appendix A. ∎
Lemma 1 provides an intuitive interpretation of “information aging”: The amount of information that is preserved in for inferring the current source value decreases as the age grows. When the ’s are independent of , the sampling times of delivered samples do not carry any information about the current source value . One interesting future research direction is how to exploit the timing information in to improve data freshness.
Next, we provide the closed-form expression of for two Markov sources:
Gauss-Markov Source: Suppose that is a first-order discrete-time Gauss-Markov process, defined by
where and the ’s are zero-mean i.i.d.. Because is a Gauss-Markov process, one can show that 
Since and is an integer, is a positive and decreasing function of the age . Note that if , then , because the absolute entropy of a Gaussian random variable is infinite.
Binary Markov Source: Suppose that is a binary symmetric Markov process defined by
where denotes binary modulo-2 addition and the ’s are i.i.d. Bernoulli random variables with mean . One can show that
where and is the binary entropy function defined by with a domain [70, Eq. (2.5)]. Because is increasing on , is a non-negative and decreasing function of the age .
Similarly, one can also use the conditional entropy to represent the staleness of . If the ’s are independent of and is a time-homogeneous Markov chain, is a non-decreasing function of the age [10, 11, 12]. In this result, the Markov chain is required to be time-homogeneous, but not necessarily stationary. If the sampling times are determined based on causal knowledge of , is no longer a function of the age.
Iii-C Formulation of Optimal Sampling Problems
Let represent a sampling policy and denote the set of causal sampling policies that satisfy the following two conditions: (i) Each sampling time is chosen based on history and current information of the idle/busy state of the channel. (ii) The inter-sampling times form a regenerative process [72, Section 6.1]222We assume that is a regenerative process because we will optimize , but operationally a nicer objective function is . These two criteria are equivalent, if is a regenerative process, or more generally, if has only one ergodic class. If no condition is imposed, however, they are different.: There exists an increasing sequence of almost surely finite random integers such that the post- process has the same distribution as the post- process and is independent of the pre- process ; in addition, , , and
We assume that the sampling times are independent of the source process , and the service times of the queue do not change according to the sampling policy. We further assume that for all finite .
In this paper, we study the optimal sampling policy that minimizes (maximizes) the average age penalty (utility) subject to an average sampling rate constraint. In the continuous-time case, we will consider the following problem:
where is the optimal value of (11) and is the maximum allowed sampling rate. In the discrete-time case, we need to solve the following optimal sampling problem:
where is the optimal value of (13). We assume that and are finite. The problems for maximizing the average age utility can be readily obtained from (11) and (13) by choosing . In practice, the cost for data updates increases with the average sampling rate. Therefore, Problems (11) and (13) represent a tradeoff between data staleness (freshness) and update cost.
Problems (11) and (13) are constrained MDPs, one with a continuous (uncountable) state space and the other with a countable state space. Because of the curse of dimensionality , it is quite rare that one can explicitly solve such problems and derive analytical or closed-form solutions that are arbitrarily accurate.
Iv Main Results: Optimal Sampling Policies
In this section, we present a complete characterization of the solutions to (11) and (13). Specifically, the optimal sampling policies are either deterministic or randomized threshold policies, depending on the scenario under consideration. Efficiently computation algorithms of the thresholds and the randomization probabilities are provided. The proofs are relegated to Section V.
Iv-a Continuous-time Sampling without Rate Constraint
Theorem 1 (Continuous-time Sampling without Rate Constraint).
The optimal sampling policy in (15)-(16) has a nice structure. Specifically, the -th sample is generated at the earliest time satisfying two conditions: (i) The -th sample has already been delivered by time , i.e., , and (ii) The expected age penalty has grown to be no smaller than a pre-determined threshold . Notice that if , then is the delivery time of the -th sample. In addition, is equal to the optimum objective value of (11). Hence, (15)-(16) requires that the expected age penalty upon the delivery of the -th sample is no smaller than , i.e., the minimum possible time-average expected age penalty.
Next, we develop an efficient algorithm to find the root of (16). Because the ’s are i.i.d., the expectations in (16) are functions of and are irrelevant of . Given , these expectations can be evaluated by Monte Carlo simulations or importance sampling. Define
If the analytical expression of is available, then (18) can be used to simplify the numerical evaluation of the expected integral in (16). As shown in Section V-F, (16) has a unique solution. We use a simple bisection method to solve (16), which is illustrated in Algorithm 1.
Iv-A1 Optimality Condition of Zero-wait Sampling
This zero-wait sampling policy achieves the maximum throughput and the minimum queueing delay. In the special case of , Theorem 5 of  provided a sufficient and necessary condition for characterizing the optimality of the zero-wait sampling policy. We now generalize that result to non-linear age functions in the following corollary:
One can consider as the minimum possible value of . It immediately follows from Corollary 1 that
The condition2(b), if is strictly increasing, the zero-wait sampling policy (19) is not optimal for these commonly used distributions.
Iv-B Continuous-time Sampling with Rate Constraint
Theorem 2 (Continuous-time Sampling with Rate Constraint).
Otherwise, is an optimal solution to (11), where
and are given by
, , is determined by solving
and is given by333If almost surely, then becomes a deterministic threshold policy and can be any number within .
According to Theorem 2, the solution to (11) consists of two cases: In Case 1, the deterministic threshold policy in Theorem 1 is an optimal solution to (11), which needs to satisfy (21). In Case 2, the randomized threshold policy in (22)-(26) is an optimal solution to (11), which needs to satisfy
We note that the only difference between (23) and (24) is that “” is used in (23) while “” is employed in (24).444Clearly, an important issue is the optimality of such a randomized threshold policy, which is proven in Section V. If there exists a time-interval such that
We provide a low-complexity algorithm to compute the randomized threshold policy in (22)-(26): As shown in Appendix C, there is a unique satisfying (25). We use the bisection method in Algorithm 2 to solve (25) and obtain . After that, and can be computed by substituting into (22)-(24) and (26). Because of the similarity between (23) and (24), and are quite sensitive to the numerical error in . This issue can be resolved by replacing in (22) and (26) with and replacing in (22) and (26) with , where and are determined by
respectively, and is the tolerance in Algorithm 2. One can improve the accuracy of this solution by (i) reducing the tolerance and (ii) computing the expectations more accurately by increasing the number of Monte Carlo realizations or using advanced techniques such as importance sampling.
As depicted in Fig. 4(b)-(c), if is strictly increasing on , then almost surely and (22) reduces to a deterministic threshold policy. In this case, Theorem 2 can be greatly simplified, as stated in the following corollary:
If is strictly increasing or the distribution of is sufficiently smooth, is strictly increasing in . Hence, the extra condition in Corollary 3 is satisfied for a broad class of age penalty functions and service time distributions.
A restrictive case of problem (11) was studied in , where was assumed to be positive and non-decreasing. There is an error in Theorem 3 of , because the condition “ is strictly increasing in ” is missing. Further, the solution in Theorem 3 of  is more complicated than that in Corollary 3. A special case of Corollary 3 with was derived in Theorem 4 of .
Iv-C Discrete-time Sampling
Theorem 3 (Discrete-time Sampling without Rate Constraint).
Theorem 3 is quite similar to Theorem 1, with two minor differences: (i) The sampling time in (15) is a real number, which is restricted to an integer in (33). (ii) The integral in (16) becomes a summation in (34).
In the discrete-time case, the optimality of the zero-wait sampling policy is characterized as follows.
Theorem 4 (Discrete-time Sampling with Rate Constraint).
Theorem 4 is similar to Theorem 2, but there are two differences: (i) and are real numbers in (23)-(24), which are restricted to integers in (38)-(39). (ii) If is strictly increasing in , then holds almost surely in (23)-(24) and Theorem 2 can be greatly simplified. However, in the discrete-time case, even if is strictly increasing in , may still occur in (38)-(39). In fact, it is rather common that holds for the optimal , because of the following reason:
If almost surely, then (37) becomes a deterministic threshold policy that needs to ensure (27). However, because and are integers, such a deterministic threshold policy is difficult to satisfy (27) for all possible values of . On the other hand, if , the randomized threshold policy in (37)-(41) can satisfy (27). Hence, even though is strictly increasing in , Theorem 4 cannot be further simplified. This is a key difference between continuous-time and discrete-time sampling.
The computation algorithms of the optimal discrete-time sampling policies are similar to their counterparts in the continuous-time case, and hence are omitted here.
Iv-D An Example: Mutual Information Maximization
Next, we provide an example to illustrate the above theoretical results. Suppose that is a stationary, time-homogeneous Markov chain and the sampling times are independent of . The optimal sampling problem that maximizes the time-average expected mutual information between and is formulated as
If the service times are i.i.d. with , then is an optimal solution to (13), where