I Introduction
The rapid proliferation of the Internet of Things (IoT) devices boosts the fast development of various networked monitoring and cyberphysical systems applications [6], [33], such as crowdsourcing in sensor networks [23], phaser updating in smart grid systems [41], and autonomous driving in smart transportation systems [48]. For these IoT applications, the freshness of status information of the physical process at the operation nodes is of fundamental importance for accurate monitoring and controlling.
Age of information (AoI), which is also often referred to as age, was proposed to quantify the status freshness of interested physical process [20], [21]. More specifically, AoI is generally defined as the time elapsed from the generation at the source node (e.g., a sensor) to the last successfully received status update at the destination (e.g., a controller). There have been extensive works that focus on minimizing the age under various queueing models [26, 40, 1, 45, 44, 37, 43, 32, 16, 18, 13, 27, 9, 17, 36]. It is worth noting that the AoI minimization depends on the status update frequency, and differs from the conventional design principles (e.g., providing low delay). Specifically, on the one hand, updating status at a low frequency results in a small message queueing delay since the queue is always empty, however, the destination node has a large age because of the infrequent status update. On the other hand, updating status at a high frequency results in a large queueing delay due to the Little’s law [25], and the destination node also has a large age because the status update suffers from a large queueing delay. Therefore, different from the queueing delay that increases with the status sampling frequency, AoI exhibits more complex patterns as a metric for status freshness and is more challenging to optimize [19].
For many intelligent realtime IoT applications, the status freshness depends not only on the status update frequency of AoI, but also on status data processing operations. For example, in smart video surveillance, the status update (e.g., sampling an image) would not take effect until the useful information embedded in the image is extracted by some data processing operations (e.g., AIbased image recognition) which are computational expensive and time consuming.
Since an IoT device typically has limited computation and storage capacities, edge computing can be leveraged to facilitate realtime data processing. In this case, the IoT device can offload the data processing operations to the nearby mobile edge computing (MEC) platforms [24], which utilize the edge servers deployed at the edge of radio access networks (e.g., base stations (BSs) or access points (APs)) to execute computing tasks. Specifically, the IoT device offloads the status update to the edge server through wireless channel for further data processing, and then the edge server sends the final results back to the destination node. Therefore, the processing offloading would also affect the status freshness.
To capture the status freshness considering data processing in the edge computingenabled realtime IoT applications, we propose a new metric, ageofprocessing (AoP), which is defined as the time elapsed since the generation of the freshest status update data until it is processed and finally takes effect at the destination node. Compared with conventional AoI, the AoP takes the additional data processing time in status update into account.
In this paper we aim to minimize the AoP through optimizing the data processing offloading decision and the status sampling frequency jointly. Specifically, the data processing offloading can reduce the data processing time by utilizing edge servers’ computation resource, but incurs additional transmission time which depends on the wireless channel state between the source node (e.g., the IoT device) and the edge server. When the wireless channel state is good, offloading the data processing operations to the edge server incurs short transmission time and can reduce the processing time. However, when the channel state is bad, the transmission time between the source node and the edge server is not negligible, and the IoT device can process the status update data by its local server or wait for a good channel state. Therefore, we need to carefully decide the optimal offloading strategy under different channel states to minimize the AoP.
Moreover, the status sampling frequency also has an essential impact on AoP. Specifically, when the previous status update is under processing, a new update needs to wait in queue, and hence becomes stale while waiting. Therefore, it can be better not to generate new sample while the edge server is busy. Authors in [7] proposed a status sampling policy called zerowait policy, which samples a new update after the previous update takes effect. However, authors in [46], [39] showed that the zerowait policy might be far from ageoptimal in some cases. Hence, how to optimize the status sampling frequency considering data processing is still an open question. Furthermore, the status sampling process consumes energy of IoT devices. It is necessary to introduce a constraint for the sampling frequency due to the limited energy budget of the IoT devices, which make it harder to obtain the optimal status sampling policy for minimizing the AoP.
By addressing the challenges above, we achieve the following key contributions:

We propose a new metric, ageofprocessing (AoP), to capture the status freshness considering data processing in realtime IoT applications. In order to minimize the average AoP, we formulate the joint status sampling and processing offloading problem as an infinitehorizon constrained Markov decision process (CMDP) with the maximum sampling frequency constraint of the IoT device.

We relax the challenging CMDP problem into an unconstrained MDP problem using the Lagrangian method which significantly simplifies the original CMDP problem. We then propose a Lagrangian transformation framework to derive the optimal status sampling and processing offloading policy under the optimal Lagrangian multiplier.

Building upon the proposed Lagrangian transformation framework, we develop stochastic approximation based policy iteration algorithms with perturbation based refinement to achieve the optimal policy of the CMDP problem.

We provide extensive simulations to illustrate the structural properties of the optimal policy, and show that the proposed improved algorithm outperforms the benchmarks, with an average AoP reduction up to 30%.
The rest of the paper is organized as follows. In Sec. II, we discuss the related works. In Sec. III, we present the system model and formulate the AoP minimization problem as a CMDP problem. In Sec. IV, we transform the CMDP problem to an unconstrained MDP problem by leveraging the Lagrangian method. In Sec V, we first propose a Lagrangian transformation framework for the original CMDP problem, and improve it with perturbation based refinement to achieve the optimal policy. We show our simulation results in Sec. VI, and conclude the paper in Sec.VII.
Ii Related work
Ageofinformation (AoI) was introduced in the early 2010s as a new metric to characterize the freshness of the information that a system has about a process observed remotely [20]. Since then, an abundant of researches focus on the queueing theory to analyze the ageofinformation in various system settings. In [19], the authors obtained the theoretical results of the average AoI, where the status update is served with the firstcomefirstserved (FCFS) principle, and more specifically, the queueing models include , and . After that, different queueing models, such as [37], [32], and [13], were also studied. A new metric, peak age, was introduce in [15], and the authors in [1] obtained the distribution of peak age in a queue. In [16], the authors studied the reliable transmission under the peakage violation guarantees.
Another branch of researches on AoI considers energyharvesting constraints since the IoT device (e.g., a sensor) is usually energy limited, and the sampling process consumes energy [38, 12, 5, 4]. In [38], the authors derived an optimal transmission policy in an energy harvesting status update system, which is formulated as an MDP problem. In [12]
, the authors proposed a reinforcement learning algorithm to learn the system parameters and the status update policy for an energy harvesting transmitter with a finitecapacity battery. The authors in
[5], [4] analyzed the scenario where an energy harvesting sensor with random or incremental battery recharge sends measurement updates to a destination, and showed that the optimal update policy follows a renewal structure. All the above works assume that the status update takes effect once it is received in the destination node, and the age is immediately set down to the time elapsed from status generation to its reception.For computationintensive application (e.g., autonomous driving), however, the status update (e.g., a video clip) needs further data processing to reveal the useful features. Hence, the data processing time also affects the age. However, there are very limited research efforts in this area. In [8], the authors considered the soft update in an information updating system. In both exponentially and linearly decaying age cases, the authors derived the optimal sampling schemes subject to a sampling frequency constraint. In [22]
, the authors studied the AoI for computationintensive messages with MEC, and derived the closedform average AoI for exponentially distributed computing time. In
[2], the authors jointly optimized the information freshness (age) and the completion time in a vehicular network. Nevertheless, the computation time is not taken into consideration in the age. In [34], the authors proposed a performance metric called age of task (AoT) to evaluate the temporal value of computation tasks. By jointly considering task scheduling, computation offloading and energy consumption, the authors proposed a lightweight task scheduling algorithm. However, it is an offline policy where the task arrival time is known in advanced.Different from existing research efforts, in this paper, we expand the concept of AoI to AoP by taking the data processing time into consideration. We further consider data processing offloading to MEC server, and minimize the total average AoP by optimizing the status sampling and processing offloading policy.
Iii Model and formulation of AoP minimization
Iiia System Model
Consider a realtime IoT status monitoring and control system for computationintensive applications. The IoT device (a.k.a. the sensor) monitors the current status of a physical process (e.g., a camera records images of traffic situation at a crossroad), which needs further data processing. As shown in Fig. 1, the IoT device can choose to process the raw data locally at its processor or offload them to a mobile edge server in proximity. The data processing operation reveals the hidden feature (e.g., the congestion level at the crossroad) in the raw data, which we refer to as knowledge that will be then transmitted to an operator for accurate control. After receiving the knowledge, the operator sends an acknowledge (ACK) to the IoT device to sample a new status update.
We define the time elapsed from the status generation at the IoT device to the latest knowledge received by the operator as the ageofprocessing (AoP), which is maintained by the operator to capture the status freshness. Compared to the traditional AoI, the AoP takes the data processing time into account, which is affected by the data processing offloading policy.
The IoT device follows the generateatwill sampling policy [46], under which the IoT device can start a new sample whenever it prefers, and does not generate a new status update when the previous update is under processing, to avoid unnecessary waiting time. Suppose the IoT device samples a new status update at time , and then decides where to send the raw data (e.g., to its local processor or the edge server) for further data processing.
For the status update , we denote its data processing task by a pair , where is the input data size of status packet and is the total required CPU cycles to compute this task.
IiiA1 Local processing
We assume that the sensor is equipped with a local processor (e.g., embedded CPU) for some necessary computations. If the sensor chooses to process the status update locally, then the operation time can be formulated as
(1) 
where is the CPU frequency of the local processor. After data processing, the local server transmits the processed status result to the operator. We assume that the data size of the result is quite small (e.g., the result of object classification is usually takes only several bits). Therefore, the time of transmitting the result to the operator is negligible.
IiiA2 Edge offloading
If the sensor chooses to offload the raw data to the edge server, it incurs extra time for transmitting the computation input data via wireless connection. According to [29], the offloading rate can be formulated as
(2) 
where is the channel bandwidth and is the transmission power of update . Furthermore, denotes the wireless channel gain between the sensor and the edge server, which can be generated using a distancedependent pathloss model given in [14]
(3) 
and is the total background noise and interference power of transmitting update data . Therefore, the transmission time is computed as
(4) 
and the data processing time at the edge server is
(5) 
where is the CPU frequency of the edge server. As mentioned before, we ignore the time of transmitting the processed statue result from the edge server to the operator. Following (4) and (5), we can compute the time overhead of the edge offloading approach as
(6) 
Throughout this paper, we assume that both the computation capacities of the local and edge servers are stable (e.g., and are both constants). This is reasonable since the sensor usually carries out a dedicated sensing task and the edge server usually allocates a resource block (i.e., a virtual machine) with fixed size to a certain computing task. Besides, we assume that all state update tasks have the same input data size and required computation ^{1}^{1}1We left the heterogeneous state update tasks with different and in our future work.. For example, the input image size for object recognition based surveillance task is the same with almost the same CPU cycles for processing each image. For the wireless channel, we assume that the transmission power is the same for all update . The total background noise and interference power influences the wireless channel state. It is unknown and change stochastically. The channel state has a critical impact on the data offloading policy. Intuitively, when the wireless channel state is good (e.g., is small), the IoT device tends to offload the status data to the edge server, using the abundant computing resource of edge server to reduce the processing time. When the wireless channel is bad (e.g., is large), the transmission time is relatively large, and hence the IoT device would like to process the update data locally to avoid the large transmission time.
We depict the evolution of the ageofprocessing in Fig. 2. Suppose a new status update is sampled at time . If the raw data is processed locally, then the processing time is ^{2}^{2}2Since all state update tasks have the same required computation , we simplify as for all update .. If the raw data is processed in edge server, the total processing time is . Therefore, the processed result of update is delivered at time . After the operator receives the update , the sensor node may insert a waiting time before sampling the new status update at time , where is the maximum waiting time under a sampling frequency constraint. The sensor can switch to a lowpower sleep mode during the waiting period .
At any time , the freshest status update at the operator is generated at time
(7) 
Then the ageofprocessing is defined as
(8) 
As shown in Fig. 2, the AoP follows a stochastic process which increases linearly with while waiting for the next sample or the data is under processing, and then downward jumps when the status update is delivered at the operator. Therefore, the curve of the AoP has a zigzag shape. More specifically, status update is sampled at time and is received by the operator at time . Therefore, the AoP at time is . After that, the AoP continues to increase linearly with time while the sensor is waiting or the update data is under processing. Finally, the AoP reaches right before the processed result of status update is delivered. Then, at time , the AoP drops to .
IiiB CMDP Formulation
In this subsection, we focus on specifying the optimal status sampling and computation offloading policy to minimize the average AoP of the system discussed above.
In most existing works, the longterm average age is a key performance metric to measure the longrun information freshness, which is defined as
(9) 
Intuitively, is the timeaverage shaded area under the envelop. To compute the average AoP, we decompose into a series of areas between the sampling time . As shown in Fig. 2, the light shaded area is a parallelogram, which equals to
(10) 
and the dark shaded area is a triangle having the area
(11) 
Therefore, the average AoP can be calculated as
(12)  
Note that, minimizing is a longterm stochastic problem. At each delivered time , the operator maintains the age , and then decides the inserted waiting time before sampling next status update . Besides, we assume that the IoT device also determines where to process the next update at time , which will affect the value of .
Markov decision process: As mentioned before, the wireless channel states between the sensor and the edge server change stochastically. Let be the channel state with a finite state space ,^{3}^{3}3Since the data size of all status update packets is the same, we can simplify the channel state as the transmission time of the update data (4). Since the transmission time in (4) is continuous, it results in an infinite state space in the MDP. For simplicity, we discretize the transmission time into channel states. which is influenced by . Unlike the assumption of the i.i.d. channel state process in [49], we consider a general case where the process of
is a stationary and ergodic Markov chain with the transition matrix
[47].^{4}^{4}4We assume that we know the statistics of the channelin advance, since we can estimate
through channel training. The element the transition matrixis the probability from channel state
to state .At time , we denote as the current system state, where is the system state space.^{5}^{5}5We also discretize the waiting time since a discrete waiting time is much easier to execute for IoT devices, and an infinite waiting time space results in an infinite MDP state space which is difficult to solve. Then the sensor chooses an action from the action space , where is the inserted waiting time and is the offloading decision for update . When , the sensor chooses to offload the status update to the edge server, and when , the sensor chooses to process the update locally. We then define the reward function for taking action at state as
(13)  
The system then evolves to the next state , which only depends on previous system state and the action . More specifically, the transition of channel state is
(14) 
where is the element of channel transition matrix , and the age evolves according to
(15) 
Stationary status sampling and processing offloading policy: Given the system state , the IoT device determines the sampling and offloading action according to the following policy.
Definition 1: A stationary status sampling and computation offloading policy is defined as a mapping from the system state space to the control action space , where , which is independent of the update sequence .
In this paper, we focus on the stationary policy due to its low complexity for practical implementation (e.g., without recording the long historical information for decision making). Under a given stationary policy , the average AoP can be calculated as:
(16) 
where the expectation operation is taken with respect to the measure induced by the policy , and we focus on the worst case derived by the operation.
Sampling frequency constraint: Due to the limited energy resource of the sensor, it is impossible to sample the status update in a very high frequency. Following the works [39] and [11], we introduce a sampling frequency constraint
(17) 
where is the minimum sampling duration and the maximum allowed average status sampling frequency due to a longterm average resource constraint. We should emphasize that in practice it is hard to monitor the runtime energy expenditure by the sensor itself, and hence we consider the maximum sampling frequency constraint instead of the energy budget constraint in the formulation.
AoP minimization: We seek to find the optimal stationary status sampling and computation offloading policy that minimizes the average AoP under a maximum sampling frequency constraint at the sensor, as follows:
(18)  
Problem (18) is a constrained Markov decision process (CMDP). It is computationally intractable to find the optimal policy for problem (18), since only at the end of the infinite trajectory can we obtain the final valuation of the policy , this is because the denominator of (16) is the sum of for all status update .
To tackle this difficulty, we relax the problem (18) as:
(19)  
where
(20) 
and
(21) 
Obviously, finding the optimal policy for problem (19) is not equal to the optimal policy for problem (18). If is smaller than for all policy , therefore, the solution of problem (19) is an upper bound policy for the original problem (18). However, there is no certain assertion to determine the direction of inequality between
Iv Unconstrained MDP Transformation
It is well known that solving a CMDP problem directly is quite challenging [3]. In this section we will transform the CMDP problem (19) to an unconstrained MDP problem by leveraging the Lagrangian method.
We first describe problem (19) in terms of CMDP. At each delivered time
which we also refer to as decision epoch
, the IoT device observes the current system state , where is the processing time of the previous status update , is the waiting time before sampling the update , and , are current processing time and transmission time, respectively. After observing the current state , the IoT device selects an action following a policy , where . We also refer the policy to a stateaction mapping function. After that, the IoT device will receive an immediate reward from the reward function(22) 
which is a timeaverage area of and then the system evolves to next state . We can see that all the elements in only depend on the previous state and action . Therefore, the random process is a controlled Markov process. The objective of problem (19) is to find an optimal stateaction mapping function to minimize the infinite horizon average reward
(23) 
while committing to a sampling constraint .
A major challenge in obtaining the optimal policy for problem (19) is the sampling frequency constraint. To overcome this difficulty, we first transform the problem (19) into an unconstrained Markov decision process (MDP) by introducing Lagrange multipliers [49]. We define the immediate Lagrange reward of update as
(24) 
where is the Lagrange multiplier. Then, the average Lagrange reward under policy is given by
(25) 
By introducing the Lagrange multiplier, we now have an unconstrained MDP problem with the objective of minimizing the average Lagrange cost
(26) 
Let be the optimal policy of problem (26) when the Lagrange multiplier is . Define , , and . For the above Lagrange transformation, we can show the following result.
Lemma 1: is monotone nonincreasing while and are monotone nondecreasing in .
Proof.
The monotone nonincreasing property of and nondecreasing property are a consequence of the following fundamental inequality
(27)  
for any positives and . The first inequality follows that the policy minimizes the problem , and the second inequality follows that the policy minimizes the problem . The third inequality can be obtained from
(28) 
Therefore, we have and . As for , we first assume that is not monotone nondecreasing. Then there exists , such that . But , whence,
(29) 
Consequently, we come to the contradiction . Finally, we have the result . ∎
Lemma 1 reveals important relationships between the Lagrange multiplier and the minimum sampling duration as well as the average AoP , which help us solve the MDP problem (26). First, the minimum sampling duration is nondecreasing in . Therefore, the optimal policy to problem (26) under Lagrange multiplier corresponds to a certain . When , the policy is not a feasible solution to the original problem (18). Then, we can increase the value of , until . Furthermore, the average AoP, is also nondecreasing in . Since our objective is to find an optimal policy to minimize subject to , it is equivalent to find the optimal Lagrange multiplier , such that
(30) 
In order to find the optimal Lagrange multiplier , we need to solve the following two subproblems:
Subproblem 1: how to find the optimal policy for the MDP problem (26) when given a Lagrange multiplier ;
Subproblem 2: how to update such that converges to .
In summary, the Lagrangian transformation method transforms the CMDP problem (19) to the unconstrained MDP problem (26) which is much easier to solve. Furthermore, by exploring the relationships between the Lagrangian multiplier and the sampling frequency as well as the AoP, we show that the MDP problem (26) can be decomposed into two subproblems. In the next section we will first solve the two subproblems for (26), and then we propose an algorithm to obtain the optimal policy for the original CMDP problem (19).
V Optimal policy for the CMDP problem
In this section, we first propose a policy iteration algorithm to derive the optimal policy for Subproblem 1. After that, we apply the RobbinsMonro algorithm to derive the optimal Lagrangian multiplier for Subproblem 2. Finally, we propose an algorithm to derive the optimal policy for the original CMDP problem (19).
Solving Subproblem 1. When given , problem (26) is a Markov decision process with an average reward criterion, which has been studied in many excellent works, e.g., [28] and [10]. We restrict the stationary policy to the stationary deterministic policy . A stationary deterministic policy maps each state to a single action. That is, given a state , the output of policy
is a single action, not a probability distribution over the action space. The stationary deterministic policy simplifies the state space and guarantees the existence of the optimal policy to the MDP problem (
26).Applying a stationary deterministic policy to a controlled Markov process yields a Markov process with stationary transition probability matrix , where the element is the state transition probability from to under policy [31]. Given policy
, we also have a reward vector
, where the element is the immediate reward at state with the chosen action . A gain vector is an average reward vector, whose element is the average reward when starting at the initial state(31) 
Moreover, when given a , the MDP problem (26) has following Bellman optimality equation:
(32) 
where is the probability from state to under the policy
, and the bias vector
is the expected total difference between the immediate reward and the average reward [28]. Therefore, the optimal policy can be obtained by:(33) 
We propose the policy iteration algorithm to solve (33), as shown in Algorithm 1. The key idea of Algorithm 1 is to iteratively perform policy evaluation and policy improvement to drive the update dynamics to converge to the optimal policy in (33).
(34)  
(35)  
(36) 
(37)  
The linear equations (33) and (34) can uniquely determine the gain vector . However, as for , the class of , where is an arbitrary constant and is an all one vector with the same dimension as , all satisfy the linear equations (33) and (34). Therefore, an auxiliary vector and an additional equation (35) are introduced to determine
. Note that, in each iteration, the policy evaluation needs to solve a linear program with
variables, and the policy improvement needs conduct at most comparisons. The convergence of Algorithm 1 to the optimal policy of problem (26) can be shown by following the similar proof procedures in [28] and hence is omitted here for brevity.Solving Subproblem 2. Since the minimum sampling duration is nondecreasing in the Lagrangian multiplier according to Lemma 1, we adopt the two timescale stochastic approximation based RobbinsMonro algorithm [30] to solve Subproblem 2, as shown in Algorithm 2. Specifically, at the small time scale we solve the optimal policy for the MDP with a given Lagrange multiplier (e.g., step 4 and 5), and at the large time scale we update the Lagrange multiplier according to
(38) 
(e.g., step 6 and 7). The sequence of Lagrange multipliers derived by Algorithm 2 converges to following the two timescale stochastic approximation analysis in [30].
There are several possible stop criterion in Algorithm 2, for example, the difference between or being small enough (e.g., smaller than ), or the number in iterations of Algorithm 2 exceeding a prespecified number (e.g., ). In practice, the optimal Lagrange multiplier derived by Algorithm 2 can be close to but not precisely the one defined at (30). Nevertheless, when the is close to the value defined in (30), we can further refine the optimal policy for (19) as follows.
Solving Problem (19). We integrate the perturbation based refinement framework to achieving the optimal policy for problem (19). We introduce two perturbed Lagrange multipliers and by imposing some perturbation to . Given derived by Algorithm 2, we set
(39) 
where is a small enough perturbation parameter (e.g., ). Lemma 1 shows that is monotone nondecreasing in , and hence
(40) 
Then we refine the optimal policy as a randomized mixture of two perturbed policies and as
(41) 
where the randomization factor can be given as
(42) 
In this way, we will satisfy the condition in (30) due to the fact that
(43) 
We summarize the policy refining procedure in Algorithm 3.
In Algorithm 3, when running the Algorithm 2 to obtain the optimal Lagrangian multiplier in step 1, it takes a long time to converge due to the low convergence rate of the stochastic approximation technique in (38). Since the step size is still large when is near after a few iterations, it would take a long time for to get small enough. Therefore, we improve Algorithm 2 by introducing a modified step size , where is small value (e.g., ), and update as:
(44) 
As shown in Fig. 3(a), when updating using , it takes a long time to converge to the optimal Lagrangian multiplier (e.g., more than 25000 iterations when the stop criterion here is ). As shown in Fig. 3(b), the new updating rule (44) tremendously reduces the number of iterations (e.g., 120 iterations). Furthermore, the small figures in Fig. 3(a) and 3(b) show the last ten iterations of (38) and (44). We can see that using (44) converges more close to .
Vi Performance evaluation
In this section, we evaluate the performances of our proposed algorithms via extensive simulations.
Via Simulation Setup
As mentioned in Sec. III, we use to characterize the status update for an IoT computationintensive application, where is the input data size and
indicates the required CPU cycles. We also assume that all status update packets are of identical pair. Specially, we consider the face recognition application in
[35], where the data size for the computation offloading KB and the total number of CPU cycles Megacycles. In terms of computing resources, we assume that the CPU capability of edge and local server to be GHz and GHz [42].As for edge offloading, we assume that the wireless channel bandwidth MHz, and the distance between sensor and edge server km. The transmission power of sensor is dBm and the mean background noise dBm [29]. We assume that the wireless channel state process is a Markov chain. Following the equalprobability SNR partitioning method [47], we can model the channel by threestate Markov chain, i.e., , with approximately the transition probability matrix
(45) 
Assume that if offloading is attempted, the transmission time defined in (2) and (4) are given by ms, ms, and ms. We summarize the main parameters of the simulation in Table I.
parameters  values 
input data size of each status update,  500 KB 
number of CPU cycles of each status update,  1000 Megacycles 
CPU cycle of edge server,  20 GHz 
CPU cycle of local server,  1 GHz 
wireless bandwidth between sensor and edge server,  20 MHz 
distance between sensor and edge server,  0.1 km 
transmission power of the sensor,  20 dBm 
background noise,  100 dBm 
action set of waiting time,  [0,200,…,800] ms 
minimum sampling duration,  1200 ms 
perturbation parameter, 
ViB Benchmarks
In order to verify the performance of our proposed algorithms, we compare with the following benchmarks:

Always edge computing with zero waiting (AEZW): the sensor chooses to offload each status update to the edge server for further processing without waiting. That is, when the edge server completes computation of one status update, the sensor would sample an new status update immediately. However, this policy may not satisfy the sampling frequency constraint.

Always edge computing with conservative waiting (AECW): the sensor chooses to offload each status update to the edge server with a conservative waiting. That is, when the edge server completes computation of one status update, based on current AoP in the operator, the sensor choose to wait before sampling next status update.

Always local computing with conservative waiting (ALCW): the sensor chooses to computer each status update at the local server with a conservative waiting. Since the local CPU cycle and total computation cycles are constants, the AoP is also a constants when the local server completes computation, then the sensor choose to wait .
ViC Policy structures of proposed algorithms
We first compare the average AoP performance of the original problem (18) and the approximated problem (19), and verify the optimal policy structure of the CMDP problem (19).
As shown in Fig. 4, we conduct the simulation of status updates while using the optimal policy defined in (41). The orange line depicts the average AoP of the original problem (18) while the blue line depicts the approximated problem (19). As we can see, when the status update number increases, the average AoP of both (18) and (19) would become stable, and the average AoP of (18) is slightly lager than that of (19). More precisely, the small figure in Fig. 4 depicts the average AoP ratio of (18) and (19). We can see that the ratio is very close to 1 (with the value of 1.06). This shows that, instead of obtaining the optimal policy of the original problem (18), which is intractable, we seek to obtain the optimal policy of the approximated problem (19), and the solution of (19) is also a nice approximation of the original problem (18).
We depict the optimal policy structure of the CMDP problem (19) in the Fig. 5. The coordinate represents the current system state , where axis represents the combination of the current AoP and the wireless channel state , while the axis represents the combination of the last AoP and waiting time . The axis represents the action at state
, where even numbers denote “offloading”, odd numbers denote “local computing”, and bigger numbers represent higher waiting time (e.g.,
represents local computing and the waiting time is 200 ms). As shown in Fig. 5(a), the waiting time is a threshold structure function of and . That is, the optimal policy chooses longer waiting time when the sum of the last AoP and waiting time is large (e.g., when axis is fixed and the value of axis increases, the value of axis also increases).Fig. 5(a) shows the optimal policy structure of the Lagrangian multiplier , which is obtained by Algorithm 2. Fig. 5(b) and 5(c) show the optimal policy and , which are obtained by Algorithm 3. As we can see, the policy and are exactly the same when introducing a small perturbation to (). Besides, the policy and only differ at one state which is pointed out by the red arrow in state (3, 7).
ViD Performances among different benchmarks
We next conduct the simulations to compare average AoP performances among different benchmarks. As shown in Fig. 6, the optimal policy achieves the minimum average AoP at around 1460 ms. The ALCW policy has a lower average AoP than AECW and AEZW, which is a constant of 1600 ms, and our proposed algorithms have an average AoP reduction at around 10%. The reason of this reduction is that the optimal policy would offload the status update to the edge server for further processing when the wireless channel state is good, and the powerful computing capacity of the edge server can shorten the processing time immensely, therefore, it results in a smaller average AoP. However, as shown in Fig. 6, the always offloading policies achieve a worse average AoP, at around 1840 ms and 2100 ms for AEZW and AECW, respectively, and our proposed algorithm achieves an AoP reduction at around 20% and 30%. The reason is that the average transmission time of offloading to the edge server is large in the original simulation setting. Although the processing time is small at edge server, the transmission time plays an critical role of AoP.
ViE The influence of wireless channel state
In this subsection, we discuss the influence of wireless channel state for average AoP. Although the sensor can choose to offload to edge server to reduce the processing time, it would introduce additional transmission time. In this paper, we assume that the channel state is an Markov chain with three states. We can simply refer these three state to “good”, “medium”, and “bad” channel state. We conduct the simulation with different transmission time of the medium channel state (e.g., ms)^{6}^{6}6the transmission time of good channel state is half of the medium state, and the transmission time of bad channel state is twice of the medium state.. As shown in Fig. 7(a), when the transmission time increases, the average AoP of our proposed algorithm and the always offloading policies (AEZW and AECW) also increases. Besides, our algorithm has a much smaller increase rate, because the optimal policy would choose to local computing when the wireless channel state is bad. When the transmission time less than 700 ms, the AEZW policy has a smaller average AoP than our proposed algorithm, however, as shown in Fig. 7(b), the average sampling time of AEZW is less than ms, which violates the sampling frequency constraint (17). Although the AECW and ALCW policies can always satisfy the constraint (17), they result in a worse average AoP.
ViF The influence of computation demand
In this subsection, we discuss the influence of computation demand for average AoP. We conduct the simulation of different computation demand of one status update (e.g., Gigacycles) while the transmission time of the medium channel state is 1000 ms. As shown in Fig. 8, the average AoP of the ALCW policy increases dramatically when the computation demand increases from 1.0 to 2.0 Gigacycles due to the limited computation capacity of the local server. It takes much time to process a status update for computationintensive application at the local server. In contrast, the average AoP of always offloading policies AEZW and AECW just has a slight increment since the edge server has a much larger computation capacity. We should note that, when the computation demand is 2.0 Gigacycles, the average AoP of our proposed algorithm equals to that of the AEZW policy. The reason is that when the computation demand is essentially large, the processing time would dominates the AoP, the proposed algorithm would choose to always offloading policy to reduce the processing time.
Vii Conclusion
In this paper, we aim to minimize the ageofprocessing (AoP) of computationintensive IoT application in a status monitoring and control system. Due to the limited resource of an IoT sensor, it can offload the status update to the edge server for processing. We focus on finding the optimal sampling and processing offloading policy to minimize the average AoP, which is formulated as a CMDP. We propose a Lagrangian transformation method to relax the CMDP problem into an unconstrained MDP problem, and derive the optimal policy when given the optimal Lagrangian multiplier of the MDP problem. Furthermore, by introducing a small perturbation value to the optimal Lagrangian multiplier of the MDP problem, we obtain the optimal policy of the original CMDP problem. The extensive simulation results verify the superior performance of our proposed algorithms. For the future direction, we are going to generalize our framework to the much more challenging scenarios with multiple IoT devices and edge servers.
References
 [1] (2019) Finding the exact distribution of (peak) age of information for queues of ph/ph/1/1 and m/ph/1/2 type. External Links: 1911.07274 Cited by: §I, §II.
 [2] (2018) Joint information freshness and completion time optimization for vehicular networks. CoRR abs/1811.12924. External Links: Link, 1811.12924 Cited by: §II.
 [3] (1999) Constrained markov decision processes. Vol. 7, CRC Press. Cited by: §IV.
 [4] (2018) Ageminimal online policies for energy harvesting sensors with incremental battery recharges. CoRR abs/1802.02129. External Links: Link, 1802.02129 Cited by: §II.
 [5] (201805) Ageminimal online policies for energy harvesting sensors with random battery recharges. 2018 IEEE International Conference on Communications (ICC). External Links: ISBN 9781538631805, Link, Document Cited by: §II.
 [6] (2010) The internet of things: a survey. Computer networks 54 (15), pp. 2787–2805. Cited by: §I.
 [7] (201906) Is the zerowait policy always optimum for information freshness (peak age) or throughput?. IEEE Communications Letters 23 (6), pp. 987–990. External Links: Document, ISSN 23737891 Cited by: §I.
 [8] (201810) Age of information with soft updates. In 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Vol. , pp. 378–385. External Links: Document, ISSN null Cited by: §II.
 [9] (2017) Minimizing the age of the information through queues. CoRR abs/1709.04956. External Links: Link, 1709.04956 Cited by: §I.
 [10] (1995) Dynamic programming and optimal control. Vol. 1, Athena scientific Belmont, MA. Cited by: §V.
 [11] (2017) Average age of information with hybrid ARQ under a resource constraint. CoRR abs/1710.04971. External Links: Link, 1710.04971 Cited by: §IIIB.
 [12] (2019) Reinforcement learning to minimize age of information with an energy harvesting sensor with harq and sensing cost. External Links: 1902.09467 Cited by: §II.
 [13] (201804) Statistical guarantee optimization for age of information for the d/g/1 queue. In IEEE INFOCOM 2018  IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vol. , pp. 130–135. External Links: Document, ISSN null Cited by: §I, §II.
 [14] (2013) Heterogeneous cellular networks: theory, simulation and deployment. Cambridge University Press. Cited by: §IIIA2.
 [15] (201406) Age of information with packet management. In 2014 IEEE International Symposium on Information Theory, Vol. , pp. 1583–1587. External Links: Document, ISSN 21578095 Cited by: §II.
 [16] (2018) Reliable transmission of short packets through queues and noisy channels under latency and peakage violation guarantees. CoRR abs/1806.09396. External Links: Link, 1806.09396 Cited by: §I, §II.
 [17] (201706) The stationary distribution of the age of information in fcfs singleserver queues. In 2017 IEEE International Symposium on Information Theory (ISIT), Vol. , pp. 571–575. External Links: Document, ISSN 21578117 Cited by: §I.
 [18] (2018) A general formula for the stationary distribution of the age of information and its application to singleserver queues. CoRR abs/1804.06139. External Links: Link, 1804.06139 Cited by: §I.
 [19] (201203) Realtime status: how often should one update?. In 2012 Proceedings IEEE INFOCOM, Vol. , pp. 2731–2735. External Links: Document, ISSN 0743166X Cited by: §I, §II.
 [20] (2011) Minimizing age of information in vehicular networks. In 2011 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, pp. 350–358. Cited by: §I, §II.
 [21] (2017) Age of information: a new concept, metric, and tool. Foundations and Trends® in Networking 12 (3), pp. 162–259. External Links: Link, Document, ISSN 1554057X Cited by: §I.
 [22] (2020) Analysis on computationintensive status update in mobile edge computing. External Links: 2002.06400 Cited by: §II.
 [23] (2019) Can we achieve fresh information with selfish users in mobile crowdlearning?. CoRR abs/1902.06149. External Links: Link, 1902.06149 Cited by: §I.
 [24] (2019) Resource priceaware offloading for edgecloud collaboration: a twotimescale online control approach. IEEE Transactions on Cloud Computing (), pp. 1–1. External Links: Document, ISSN 23720018 Cited by: §I.
 [25] (196106) A proof for the queuing formula: l = w. Oper. Res. 9 (3), pp. 383–387. External Links: ISSN 0030364X, Link, Document Cited by: §I.
 [26] (2019) On the age of information in multisource queueing models. External Links: 1911.07029 Cited by: §I.
 [27] (2018) Status updates in a multistream M/G/1/1 preemptive queue. CoRR abs/1801.04068. External Links: Link, 1801.04068 Cited by: §I.
 [28] (2014) Markov decision processes.: discrete stochastic dynamic programming. John Wiley & Sons. Cited by: §V, §V, §V.
 [29] (1996) Wireless communications: principles and practice. Vol. 2, prentice hall PTR New Jersey. Cited by: §IIIA2, §VIA.
 [30] (1951) A stochastic approximation method. The annals of mathematical statistics, pp. 400–407. Cited by: §V, §V.
 [31] (2014) Introduction to stochastic dynamic programming. Academic press. Cited by: §V.
 [32] (201806) Ageoptimal channel coding blocklength for an m/g/1 queue with harq. In 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Vol. , pp. 1–5. External Links: Document, ISSN 19483252 Cited by: §I, §II.
 [33] (201906) An age control transport protocol for delivering fresh updates in the internetofthings. In 2019 IEEE 20th International Symposium on ”A World of Wireless, Mobile and Multimedia Networks” (WoWMoM), Vol. , pp. 1–7. External Links: Document, ISSN null Cited by: §I.
 [34] (2019) Age based task scheduling and computation offloading in mobileedge computing systems. arXiv preprint arXiv:1905.11570. Cited by: §II.
 [35] (201207) Cloudvision: realtime face recognition using a mobilecloudletcloud acceleration architecture. In 2012 IEEE Symposium on Computers and Communications (ISCC), Vol. , pp. 000059–000066. External Links: Document, ISSN 15301346 Cited by: §VIA.
 [36] (2018) Age of information in G/G/1/1 systems. CoRR abs/1805.12586. External Links: Link, 1805.12586 Cited by: §I.
 [37] (2019) Age of information in G/G/1/1 systems: age expressions, bounds, special cases, and optimization. CoRR abs/1905.13743. External Links: Link, 1905.13743 Cited by: §I, §II.
 [38] (2019) Control of status updates for energy harvesting devices that monitor processes with alarms. CoRR abs/1907.03826. External Links: Link, 1907.03826 Cited by: §II.
 [39] (201604) Update or wait: how to keep your data fresh. In IEEE INFOCOM 2016  The 35th Annual IEEE International Conference on Computer Communications, Vol. , pp. 1–9. External Links: Document, ISSN null Cited by: §I, §IIIB.
 [40] (2019) Agedelay tradeoffs in queueing systems. arXiv preprint arXiv:1911.05601. Cited by: §I.
 [41] (201101) Widearea monitoring, protection, and control of future electric power networks. Proceedings of the IEEE 99 (1), pp. 80–93. External Links: Document, ISSN 15582256 Cited by: §I.
 [42] (2018) Joint task offloading and resource allocation for multiserver mobileedge computing networks. IEEE Transactions on Vehicular Technology 68 (1), pp. 856–868. Cited by: §VIA.
 [43] (2019) Age of information for discrete time queues. CoRR abs/1901.10463. External Links: Link, 1901.10463 Cited by: §I.
 [44] (2019) Realtime reconstruction of counting process through queues. CoRR abs/1901.08197. External Links: Link, 1901.08197 Cited by: §I.
 [45] (2019) Towards assigning priorities in queues using age of information. CoRR abs/1906.12278. External Links: Link, 1906.12278 Cited by: §I.
 [46] (201506)
Comments
There are no comments yet.