Sampling for Data Freshness Optimization: Non-linear Age Functions

12/18/2018 ∙ by Yin Sun, et al. ∙ 0

In this paper, we study how to take samples from an information source, where the samples are forwarded to a remote receiver through a queue. The optimal sampling problem for maximizing the freshness of received samples is formulated as a constrained Markov decision process (MDP) with a possibly uncountable state space. We present a complete characterization of the optimal solution to this MDP: The optimal sampling policy is a deterministic or randomized threshold policy, where the threshold and the randomization probabilities are determined by the optimal objective value of the MDP and the sampling rate constraint. The optimal sampling policy can be computed by bisection search, and the curse of dimensionality is circumvented. This solution is optimal under quite general conditions, including (i) general data freshness metrics represented by monotonic functions of the age of information, (ii) general service time distributions of the queueing server, and (iii) both continuous-time and discrete-time sampling problems. Numerical results suggest that the optimal sampling policies can be much better than zero-wait sampling and the classic uniform sampling.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 5

page 6

page 7

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Information usually has the greatest value when it is fresh [1, p. 56]. For example, real-time knowledge about the location, orientation, and speed of motor vehicles is imperative in autonomous driving, and the access to timely updates about the stock price and interest-rate movements is essential for developing trading strategies on the stock market. In [2, 3], the concept of Age of Information was introduced to measure the freshness of information that a receiver has about the status of a remote source. Consider a sequence of source samples that are sent through a queue to a receiver. Let be the generation time of the newest sample that has been delivered to the receiver by time . The age of information, as a function of , is defined as , which is the time elapsed since the newest sample was generated. Hence, a small age indicates that there exists a recently generated sample at the receiver.

In practice, some information sources (e.g., vehicle location, stock price) vary quickly over time, while others (e.g., temperature, interest-rate) change slowly. Consider again the example of autonomous driving: The location information of motor vehicles collected 0.5 seconds ago could already be quite stale for making control decisions111A car will travel 15 meters during 0.5 seconds at the speed of 70 mph., but the engine temperature measured a few minutes ago is still valid for engine health monitoring. From this example, one can observe that data freshness should be evaluated based on (i) the time-varying pattern of the source and (ii) how valuable the fresh data is in the usage context. Both of these features can be characterized by non-linear functions of the age , where could be the utility value of data with age

, temporal autocorrelation function of the source, estimation error of signal value, or other application-specific performance metrics

[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. A survey of non-linear age functions and their applications is provided in Section III-B. Recently, the age of information has received significant attention, because of the rapid deployment of real-time applications. A large portion of existing studies on age have been devoted to the linear age function , e.g., [3, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]. However, the design of data update policies for optimizing non-linear age metrics remains largely unexplored.

In this paper, we study a problem of sampling an information source, where the samples are forwarded to a remote receiver through a channel that is modeled as a FIFO queue. The optimal sampler design for optimizing non-linear age metrics subject to a sampling rate constraint is obtained. The contributions of this paper are summarized as follows:

  • We consider a class of data freshness metrics, where the utility for data freshness is represented by a non-increasing function of the age . Accordingly, the penalty for data staleness is denoted by a non-decreasing function of . The sampler design problem for optimizing these data freshness metrics is formulated as a constrained Markov decision process (MDP) with a possibly uncountable state space.

  • We prove that an optimal sampling solution to this MDP is a deterministic or randomized threshold policy, where the threshold is equal to the optimum objective value of the MDP plus the optimal Lagrangian dual variable associated to the sampling rate constraint; see Section V-E for the details. The threshold can be computed by bisection search, and the randomization probabilities are chosen to satisfy the sampling rate constraint. The curse of dimensionality is circumvented in this sampling solution by exploiting the structure of the MDP. This age optimality result holds for (i) general monotonic age metrics, (ii) general service time distributions of the queueing server, and (iii) both continuous-time and discrete-time sampling. Among the technical tools used to prove these results are an extension of Dinkelbach’s method for MDP, and a geometric multiplier technique for establishing strong duality. These technical tools were recently used in [45, 46], where a quite different sampling problem was solved.

  • When there is no sampling rate constraint, a logical sampling policy is the zero-wait sampling policy [3, 22, 14], which is throughput-optimal and delay-optimal, but not necessarily age-optimal. We develop sufficient and necessary conditions for characterizing the optimality of the zero-wait sampling policy for general monotonic age metrics. Our numerical results show that the optimal sampling policies can be much better than zero-wait sampling and the classic uniform sampling.

The rest of this paper is organized as follows. In Section II, we discuss some related work. In Section III, we describe the system model and the formulation of the optimal sampling problem; a short survey of non-linear age functions is also provided. In Section IV, we present the optimal sampling policy for different system settings, as well as a sufficient and necessary condition for the optimality of the zero-wait sampling policy. The proofs are provided In Section V. The numerical results and the conclusion are presented in Section VI and Section VII. This paper is an extended version of [16].

Ii Related Work

The age of information was used as a data freshness metric as early as 1990s in the studies of real-time databases [2, 47, 48, 49]. Queueing theoretic techniques were introduced to evaluate the age of information in [3]. The average age, average peak age, and age distribution have been analyzed for various queueing systems in, e.g.,[3, 18, 50, 19, 20, 51, 52, 15, 53]. It was observed that a Last-Come, First-Served (LCFS) scheduling policy can achieve a smaller time-average age than a few other scheduling policies. The optimality of the LCFS policy, or more generally the Last-Generated, First-Served (LGFS) policy, was first proven in [54]. This age optimality result holds for several queueing systems with multiple servers, multiple hops, and/or multiple sources [54, 55, 56, 57, 58].

When the transmission power of the source is subject to an energy-harvesting constraint, the age of information was minimized in, e.g., [21, 22, 14, 23, 24, 25, 26, 27, 28]. Source coding and channel coding schemes for reducing the age were developed in, e.g., [29, 30, 31, 32]. Age-optimal transmission scheduling of wireless networks have been investigated in, e.g., [33, 34, 35, 36, 59, 37, 60, 38, 39, 40]. Game theoretical perspective of the age was studied in [61, 62, 41, 42]. The aging effect of channel state information was analyzed in, e.g., [63, 64, 65]. An interesting connection between age of information and remote estimation were revealed in [45, 46], where a challenging sampling problem of Wiener processes was solved analytically. An extension of [45, 46] was conducted recently in [17]. The impact of the age to control systems was studied in [66, 43]. Emulations and measurements of the age were conducted in [67, 68, 44]. An age-based transport protocol was developed in [69].

The most relevant studies to this paper are [14, 31, 40]. This paper is an extension of [14]: The data freshness metrics considered in this paper are more general than those of [14]; the optimal sampling policies developed in this paper are simpler and more insightful than those in [14]; continuous-time sampling was addressed in [14], which is generated to discrete-time sampling in this paper. In [31, 40], optimal sampling policies were developed to minimize the time-average age for status updates over wireless channels, where the optimal sampling policies were shown to be randomized threshold policies. Structural properties of the randomized threshold policies were obtained in [31, 40] to simplify the value iteration or policy iteration algorithms therein. The linear age function considered in [31, 40] is a special case of the monotonic age functions considered in this paper, and the channel models in [31, 40] are different from ours. In our study, the optimal sampling policies are characterized analytically and can be computed by bisection search. In a special case of [31], a closed-form optimal sampling solution was obtained. However, it is unclear whether analytical or closed-form solutions can be found for the general cases considered in [31, 40].

Iii Model, Metrics, and Formulation

Fig. 1: System model.

Iii-a System Model

We consider the data update system illustrated in Fig. 1, where samples of a source process are taken and sent to a receiver through a communication channel. The channel is modeled as a single-server FIFO queue with i.i.d. service times. The system starts to operate at time . The -th sample is generated at time and is delivered to the receiver at time with a service time , which satisfy , , , and for all . Each sample packet contains the sampling time and the sample value . Once a sample is delivered, the receiver sends an acknowledgement (ACK) back to the sampler with zero delay. Hence, the sampler has access to the idle/busy state of the server in real-time.

Let be the generation time of the freshest sample that has been delivered to the receiver by time . Then, the age of information, or simply age, at time is defined by [2, 3]

(1)

which is plotted in Fig. 2. Because , can be also written as

(2)

The initial state of the system is assumed to be , , and is a finite constant.

In this paper, we will consider both continuous-time and discrete-time status-update systems. In the continuous-time setting, can take any positive value. In the discrete-time setting, is a multiple of period ; as a result, are all discrete-time variables. For notational simplicity, we choose second such that all the discrete-time variables are integers. The results for other values of can be readily obtained by time scaling.

In practice, the continuous-time setting can be used to model status-update systems with a high clock rate, while the discrete-time setting is appropriate for characterizing sensors that have a very low energy budget and can only wake up periodically from a low-power sleep mode.

Fig. 2: Evolution of the age over time.

Iii-B Data Staleness and Freshness Metrics: A Survey

The dissatisfaction for data staleness (or the eagerness for data refreshing) is represented by a penalty function of the age , where the function is non-decreasing. This non-decreasing requirement on complies with the observations that stale data is usually less desired than fresh data [1, 4, 5, 6, 7, 8, 9]. This data staleness model is quite general, as it allows to be non-convex or dis-continuous. These data staleness metrics are clearly more general than those in [13, 14], where was restricted to be non-negative and non-decreasing.

Similarly, data freshness can be characterized by a non-increasing utility function of the age [5, 7]. One simple choice is . Note that because the age is a function of time , and are both time-varying, as illustrated in Fig. 3. In practice, one can choose and based on the information source and the application under consideration, as illustrated in the following examples.

Iii-B1 Auto-correlation Function of the Source

The auto-correlation function can be used to evaluate the freshness of the sample [15]. For some stationary sources, is a non-increasing function of the age , which can be considered as an age utility function . For example, in stationary ergodic Gauss-Markov block fading channels, the impact of channel aging can be characterized by the auto-correlation function of fading channel coefficients. When the age is small, the auto-correlation function and the data rate both decay with respect to the age [63].

Iii-B2 Estimation Error of Real-time Source Value

Consider a status-update system, where samples of a Markov source are forwarded to a remote estimator. The estimator uses causally received samples to reconstruct an estimate of real-time source value. If the sampling times are independent of the observed source , the estimation error at time can be expressed as an age penalty function [20, 45, 46, 17]. If the sampling times are chosen based on causal knowledge about the source, the estimation error is no longer a function of the age [45, 46, 17].

(a) Non-decreasing age penalty function .

(b) Non-increasing age utility function .
Fig. 3: Two examples of non-linear age functions.

Iii-B3 Information based Data Freshness Metric

Let

(3)

denote the samples that have been delivered to the receiver by time . One can use the mutual information — the amount of information that the received samples carry about the current source value — to evaluate the freshness of . If is close to , the samples contains a lot of information about and is considered to be fresh; if is almost , provides little information about and is deemed to be obsolete.

One way to interpret is to consider how helpful the received samples are for inferring . By using the Shannon code lengths [70, Section 5.4], the expected minimum number of bits required to specify satisfies

(4)

where can be interpreted as the expected minimum number of binary tests that are needed to infer . On the other hand, with the knowledge of , the expected minimum number of bits that are required to specify satisfies

(5)

If

is a random vector consisting of a large number of symbols (e.g.,

represents an image containing many pixels or the coefficients of MIMO-OFDM channels), the one bit of overhead in (4) and (5) is insignificant. Hence, is approximately the reduction in the description cost for inferring without and with the knowledge of .

If

is a stationary, time-homogeneous Markov chain, by data processing inequality

[70, Theorem 2.8.1], it is easy to prove the following lemma:

Lemma 1.

If is a stationary, time-homogeneous Markov chain, is defined in (3), and the sampling times are independent of , then the mutual information

(6)

is a non-negative and non-increasing function of .

Proof.

See Appendix A. ∎

Lemma 1 provides an intuitive interpretation of “information aging”: The amount of information that is preserved in for inferring the current source value decreases as the age grows. When the ’s are independent of , the sampling times of delivered samples do not carry any information about the current source value . One interesting future research direction is how to exploit the timing information in to improve data freshness.

Next, we provide the closed-form expression of for two Markov sources:

Gauss-Markov Source: Suppose that is a first-order discrete-time Gauss-Markov process, defined by

(7)

where and the ’s are zero-mean i.i.d.

 Gaussian random variables with variance

. Because is a Gauss-Markov process, one can show that [71]

(8)

Since and is an integer, is a positive and decreasing function of the age . Note that if , then , because the absolute entropy of a Gaussian random variable is infinite.

Binary Markov Source: Suppose that is a binary symmetric Markov process defined by

(9)

where denotes binary modulo-2 addition and the ’s are i.i.d. Bernoulli random variables with mean . One can show that

(10)

where and is the binary entropy function defined by with a domain [70, Eq. (2.5)]. Because is increasing on , is a non-negative and decreasing function of the age .

Similarly, one can also use the conditional entropy to represent the staleness of . If the ’s are independent of and is a time-homogeneous Markov chain, is a non-decreasing function of the age [10, 11, 12]. In this result, the Markov chain is required to be time-homogeneous, but not necessarily stationary. If the sampling times are determined based on causal knowledge of , is no longer a function of the age.

More usage cases of and can be found in [4, 5, 6, 7, 8, 9]. Other data staleness and freshness metrics that cannot be expressed as functions of were discussed in [50, 54, 55, 56, 57, 58].

Iii-C Formulation of Optimal Sampling Problems

Let represent a sampling policy and denote the set of causal sampling policies that satisfy the following two conditions: (i) Each sampling time is chosen based on history and current information of the idle/busy state of the channel. (ii) The inter-sampling times form a regenerative process [72, Section 6.1]222We assume that is a regenerative process because we will optimize , but operationally a nicer objective function is . These two criteria are equivalent, if is a regenerative process, or more generally, if has only one ergodic class. If no condition is imposed, however, they are different.: There exists an increasing sequence of almost surely finite random integers such that the post- process has the same distribution as the post- process and is independent of the pre- process ; in addition, , , and

We assume that the sampling times are independent of the source process , and the service times of the queue do not change according to the sampling policy. We further assume that for all finite .

In this paper, we study the optimal sampling policy that minimizes (maximizes) the average age penalty (utility) subject to an average sampling rate constraint. In the continuous-time case, we will consider the following problem:

(11)
s.t. (12)

where is the optimal value of (11) and is the maximum allowed sampling rate. In the discrete-time case, we need to solve the following optimal sampling problem:

(13)
s.t. (14)

where is the optimal value of (13). We assume that and are finite. The problems for maximizing the average age utility can be readily obtained from (11) and (13) by choosing . In practice, the cost for data updates increases with the average sampling rate. Therefore, Problems (11) and (13) represent a tradeoff between data staleness (freshness) and update cost.

Problems (11) and (13) are constrained MDPs, one with a continuous (uncountable) state space and the other with a countable state space. Because of the curse of dimensionality [73], it is quite rare that one can explicitly solve such problems and derive analytical or closed-form solutions that are arbitrarily accurate.

Iv Main Results: Optimal Sampling Policies

In this section, we present a complete characterization of the solutions to (11) and (13). Specifically, the optimal sampling policies are either deterministic or randomized threshold policies, depending on the scenario under consideration. Efficiently computation algorithms of the thresholds and the randomization probabilities are provided. The proofs are relegated to Section V.

Iv-a Continuous-time Sampling without Rate Constraint

We first consider the continuous-time sampling problem (11). When there is no sampling rate constraint (i.e., ), an solution to (11) is provided in the following theorem:

Theorem 1 (Continuous-time Sampling without Rate Constraint).

If , is non-decreasing, and the service times are i.i.d. with , then is an optimal solution to (11), where

(15)

, , and is the root of

(16)

Further, is exactly the optimal value to (11), i.e., .

The optimal sampling policy in (15)-(16) has a nice structure. Specifically, the -th sample is generated at the earliest time satisfying two conditions: (i) The -th sample has already been delivered by time , i.e., , and (ii) The expected age penalty has grown to be no smaller than a pre-determined threshold . Notice that if , then is the delivery time of the -th sample. In addition, is equal to the optimum objective value of (11). Hence, (15)-(16) requires that the expected age penalty upon the delivery of the -th sample is no smaller than , i.e., the minimum possible time-average expected age penalty.

Next, we develop an efficient algorithm to find the root of (16). Because the ’s are i.i.d., the expectations in (16) are functions of and are irrelevant of . Given , these expectations can be evaluated by Monte Carlo simulations or importance sampling. Define

(17)

then

(18)

If the analytical expression of is available, then (18) can be used to simplify the numerical evaluation of the expected integral in (16). As shown in Section V-F, (16) has a unique solution. We use a simple bisection method to solve (16), which is illustrated in Algorithm 1.

  given , , tolerance .
  repeat
     .
     .
     if , ; else, .
  until .
  return .
Algorithm 1 Bisection method for solving (16)

Iv-A1 Optimality Condition of Zero-wait Sampling

When , one logical sampling policy is the zero-wait sampling policy [22, 14, 3], given by

(19)

This zero-wait sampling policy achieves the maximum throughput and the minimum queueing delay. In the special case of , Theorem 5 of [14] provided a sufficient and necessary condition for characterizing the optimality of the zero-wait sampling policy. We now generalize that result to non-linear age functions in the following corollary:

Corollary 1.

If , is non-decreasing, and the service times are i.i.d. with , then the zero-wait sampling policy in (19) is optimal for solving (11) if and only if

(20)

where .

One can consider as the minimum possible value of . It immediately follows from Corollary 1 that

Corollary 2.

If , is non-decreasing, and the service times are i.i.d. with , then the following assertions are true:

  • If is a constant, then (19) is optimal for solving (11).

  • If and is strictly increasing, then (19) is not optimal for solving (11).

The condition

is satisfied by many commonly used distributions, such as exponential distribution, geometric distribution, Erlang distribution, and hyperexponential distribution. According to Corollary

2(b), if is strictly increasing, the zero-wait sampling policy (19) is not optimal for these commonly used distributions.

Iv-B Continuous-time Sampling with Rate Constraint

When the sampling rate constraint (12) is imposed, an solution to (11) is presented in the following theorem:

Theorem 2 (Continuous-time Sampling with Rate Constraint).

If is non-decreasing, for all finite , and the service times are i.i.d. with , then (15)-(16) is an optimal solution to (11), if

(21)

Otherwise, is an optimal solution to (11), where

(22)

and are given by

(23)
(24)

, , is determined by solving

(25)

and is given by333If almost surely, then becomes a deterministic threshold policy and can be any number within .

(26)
Fig. 4: Three cases of function .

According to Theorem 2, the solution to (11) consists of two cases: In Case 1, the deterministic threshold policy in Theorem 1 is an optimal solution to (11), which needs to satisfy (21). In Case 2, the randomized threshold policy in (22)-(26) is an optimal solution to (11), which needs to satisfy

(27)

We note that the only difference between (23) and (24) is that “” is used in (23) while “” is employed in (24).444Clearly, an important issue is the optimality of such a randomized threshold policy, which is proven in Section V. If there exists a time-interval such that

(28)

as shown in Fig. 4(a), then . In this case, or may not satisfy (27), but their randomized mixture in (22) can satisfy (27). In particular, if and are given by (25) and (26), then (27) follows.

  given , , tolerance .
  repeat
     .
     .
     .
     if , ;
     else if , ;
     else return .
  until .
  return .
Algorithm 2 Bisection method for solving (25)

We provide a low-complexity algorithm to compute the randomized threshold policy in (22)-(26): As shown in Appendix C, there is a unique satisfying (25). We use the bisection method in Algorithm 2 to solve (25) and obtain . After that, and can be computed by substituting into (22)-(24) and (26). Because of the similarity between (23) and (24), and are quite sensitive to the numerical error in . This issue can be resolved by replacing in (22) and (26) with and replacing in (22) and (26) with , where and are determined by

(29)
(30)

respectively, and is the tolerance in Algorithm 2. One can improve the accuracy of this solution by (i) reducing the tolerance and (ii) computing the expectations more accurately by increasing the number of Monte Carlo realizations or using advanced techniques such as importance sampling.

As depicted in Fig. 4(b)-(c), if is strictly increasing on , then almost surely and (22) reduces to a deterministic threshold policy. In this case, Theorem 2 can be greatly simplified, as stated in the following corollary:

Corollary 3.

In Theorem 2, if is strictly increasing in , then (15) is an optimal solution to (11), where , , and is determined by (16), if

(31)

otherwise, is determined by solving

(32)

If is strictly increasing or the distribution of is sufficiently smooth, is strictly increasing in . Hence, the extra condition in Corollary 3 is satisfied for a broad class of age penalty functions and service time distributions.

A restrictive case of problem (11) was studied in [14], where was assumed to be positive and non-decreasing. There is an error in Theorem 3 of [14], because the condition “ is strictly increasing in ” is missing. Further, the solution in Theorem 3 of [14] is more complicated than that in Corollary 3. A special case of Corollary 3 with was derived in Theorem 4 of [14].

Iv-C Discrete-time Sampling

We now move on to the discrete-time sampling problem (13). When there is no sampling rate constraint (i.e., ), the solution to (13) is provided in the following theorem:

Theorem 3 (Discrete-time Sampling without Rate Constraint).

If , is non-decreasing, and the service times are i.i.d. with , then is an optimal solution to (13), where

(33)

, , and is the root of

(34)

Further, is exactly the optimal value to (13), i.e., .

Theorem 3 is quite similar to Theorem 1, with two minor differences: (i) The sampling time in (15) is a real number, which is restricted to an integer in (33). (ii) The integral in (16) becomes a summation in (34).

In the discrete-time case, the optimality of the zero-wait sampling policy is characterized as follows.

Corollary 4.

If , is non-decreasing, and the service times are i.i.d. with , then the zero-wait sampling policy (19) is optimal for solving (13) if and only if there exists such that

(35)

where .

When the sampling rate constraint (14) is imposed, the solution to (13) is provided in the following theorem.

Theorem 4 (Discrete-time Sampling with Rate Constraint).

If is non-decreasing, for all finite , and the service times are i.i.d. with , then (33)-(34) is an optimal solution to (13), if

(36)

Otherwise, is an optimal solution to (13), where

(37)

and are given by

(38)
(39)

, , is determined by solving

(40)

and is given by

(41)

Theorem 4 is similar to Theorem 2, but there are two differences: (i) and are real numbers in (23)-(24), which are restricted to integers in (38)-(39). (ii) If is strictly increasing in , then holds almost surely in (23)-(24) and Theorem 2 can be greatly simplified. However, in the discrete-time case, even if is strictly increasing in , may still occur in (38)-(39). In fact, it is rather common that holds for the optimal , because of the following reason:

If almost surely, then (37) becomes a deterministic threshold policy that needs to ensure (27). However, because and are integers, such a deterministic threshold policy is difficult to satisfy (27) for all possible values of . On the other hand, if , the randomized threshold policy in (37)-(41) can satisfy (27). Hence, even though is strictly increasing in , Theorem 4 cannot be further simplified. This is a key difference between continuous-time and discrete-time sampling.

The computation algorithms of the optimal discrete-time sampling policies are similar to their counterparts in the continuous-time case, and hence are omitted here.

Iv-D An Example: Mutual Information Maximization

Next, we provide an example to illustrate the above theoretical results. Suppose that is a stationary, time-homogeneous Markov chain and the sampling times are independent of . The optimal sampling problem that maximizes the time-average expected mutual information between and is formulated as

(42)

where is the optimal value of (42). We assume that is finite. Problem (42) is a special case of (13) satisfying and . The following result follows immediately from Theorem 3.

Corollary 5.

If the service times are i.i.d. with , then is an optimal solution to (13), where

(43)