Age of Processing: Age-driven Status Sampling and Processing Offloading for Edge Computing-enabled Real-time IoT Applications

03/24/2020
by   Rui Li, et al.
0

The freshness of status information is of great importance for time-critical Internet of Things (IoT) applications. A metric measuring status freshness is the age-of-information (AoI), which captures the time elapsed from the status being generated at the source node (e.g., a sensor) to the latest status update.However, in intelligent IoT applications such as video surveillance, the status information is revealed after some computation intensive and time-consuming data processing operations, which would affect the status freshness. In this paper, we propose a novel metric, age-of-processing (AoP), to quantify such status freshness, which captures the time elapsed of the newest received processed status data since it is generated. Compared with AoI, AoP further takes the data processing time into account. Since an IoT device has limited computation and energy resource, the device can choose to offload the data processing to the nearby edge server under constrained status sampling frequency.We aim to minimize the average AoP in a long-term process by jointly optimizing the status sampling frequency and processing offloading policy. We formulate this online problem as an infinite-horizon constrained Markov decision process (CMDP) with average reward criterion. We then transform the CMDP problem into an unconstrained Markov decision process (MDP) by leveraging a Lagrangian method, and propose a Lagrangian transformation framework for the original CMDP problem. Furthermore, we integrate the framework with perturbation based refinement for achieving the optimal policy of the CMDP problem. Extensive numerical evaluations show that the proposed algorithm outperforms the benchmarks, with an average AoP reduction up to 30

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 11

12/16/2019

Minimizing Age of Information for Real-Time Monitoring in Resource-Constrained Industrial IoT Networks

This paper considers an Industrial Internet of Thing (IIoT) system with ...
07/11/2018

Joint Status Sampling and Updating for Minimizing Age of Information in the Internet of Things

The effective operation of time-critical Internet of things (IoT) applic...
08/04/2021

When to Preprocess? Keeping Information Fresh for Computing Enable Internet of Things

Age of information (AoI), a notion that measures the information freshne...
03/01/2020

Average Age of Changed Information in the Internet of Things

The freshness of status updates is imperative in mission-critical Intern...
06/13/2021

Optimal Status Update for Caching Enabled IoT Networks: A Dueling Deep R-Network Approach

In the Internet of Things (IoT) networks, caching is a promising techniq...
02/15/2020

Analysis on Computation-Intensive Status Update in Mobile Edge Computing

In status update scenarios, the freshness of information is measured in ...
04/13/2021

Optimizing the Long-Term Average Reward for Continuing MDPs: A Technical Report

Recently, we have struck the balance between the information freshness, ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The rapid proliferation of the Internet of Things (IoT) devices boosts the fast development of various networked monitoring and cyber-physical systems applications [6], [33], such as crowdsourcing in sensor networks [23], phaser updating in smart grid systems [41], and autonomous driving in smart transportation systems [48]. For these IoT applications, the freshness of status information of the physical process at the operation nodes is of fundamental importance for accurate monitoring and controlling.

Age of information (AoI), which is also often referred to as age, was proposed to quantify the status freshness of interested physical process [20], [21]. More specifically, AoI is generally defined as the time elapsed from the generation at the source node (e.g., a sensor) to the last successfully received status update at the destination (e.g., a controller). There have been extensive works that focus on minimizing the age under various queueing models [26, 40, 1, 45, 44, 37, 43, 32, 16, 18, 13, 27, 9, 17, 36]. It is worth noting that the AoI minimization depends on the status update frequency, and differs from the conventional design principles (e.g., providing low delay). Specifically, on the one hand, updating status at a low frequency results in a small message queueing delay since the queue is always empty, however, the destination node has a large age because of the infrequent status update. On the other hand, updating status at a high frequency results in a large queueing delay due to the Little’s law [25], and the destination node also has a large age because the status update suffers from a large queueing delay. Therefore, different from the queueing delay that increases with the status sampling frequency, AoI exhibits more complex patterns as a metric for status freshness and is more challenging to optimize [19].

For many intelligent real-time IoT applications, the status freshness depends not only on the status update frequency of AoI, but also on status data processing operations. For example, in smart video surveillance, the status update (e.g., sampling an image) would not take effect until the useful information embedded in the image is extracted by some data processing operations (e.g., AI-based image recognition) which are computational expensive and time consuming.

Since an IoT device typically has limited computation and storage capacities, edge computing can be leveraged to facilitate real-time data processing. In this case, the IoT device can offload the data processing operations to the nearby mobile edge computing (MEC) platforms [24], which utilize the edge servers deployed at the edge of radio access networks (e.g., base stations (BSs) or access points (APs)) to execute computing tasks. Specifically, the IoT device offloads the status update to the edge server through wireless channel for further data processing, and then the edge server sends the final results back to the destination node. Therefore, the processing offloading would also affect the status freshness.

To capture the status freshness considering data processing in the edge computing-enabled real-time IoT applications, we propose a new metric, age-of-processing (AoP), which is defined as the time elapsed since the generation of the freshest status update data until it is processed and finally takes effect at the destination node. Compared with conventional AoI, the AoP takes the additional data processing time in status update into account.

In this paper we aim to minimize the AoP through optimizing the data processing offloading decision and the status sampling frequency jointly. Specifically, the data processing offloading can reduce the data processing time by utilizing edge servers’ computation resource, but incurs additional transmission time which depends on the wireless channel state between the source node (e.g., the IoT device) and the edge server. When the wireless channel state is good, offloading the data processing operations to the edge server incurs short transmission time and can reduce the processing time. However, when the channel state is bad, the transmission time between the source node and the edge server is not negligible, and the IoT device can process the status update data by its local server or wait for a good channel state. Therefore, we need to carefully decide the optimal offloading strategy under different channel states to minimize the AoP.

Moreover, the status sampling frequency also has an essential impact on AoP. Specifically, when the previous status update is under processing, a new update needs to wait in queue, and hence becomes stale while waiting. Therefore, it can be better not to generate new sample while the edge server is busy. Authors in [7] proposed a status sampling policy called zero-wait policy, which samples a new update after the previous update takes effect. However, authors in [46], [39] showed that the zero-wait policy might be far from age-optimal in some cases. Hence, how to optimize the status sampling frequency considering data processing is still an open question. Furthermore, the status sampling process consumes energy of IoT devices. It is necessary to introduce a constraint for the sampling frequency due to the limited energy budget of the IoT devices, which make it harder to obtain the optimal status sampling policy for minimizing the AoP.

By addressing the challenges above, we achieve the following key contributions:

  1. We propose a new metric, age-of-processing (AoP), to capture the status freshness considering data processing in real-time IoT applications. In order to minimize the average AoP, we formulate the joint status sampling and processing offloading problem as an infinite-horizon constrained Markov decision process (CMDP) with the maximum sampling frequency constraint of the IoT device.

  2. We relax the challenging CMDP problem into an unconstrained MDP problem using the Lagrangian method which significantly simplifies the original CMDP problem. We then propose a Lagrangian transformation framework to derive the optimal status sampling and processing offloading policy under the optimal Lagrangian multiplier.

  3. Building upon the proposed Lagrangian transformation framework, we develop stochastic approximation based policy iteration algorithms with perturbation based refinement to achieve the optimal policy of the CMDP problem.

  4. We provide extensive simulations to illustrate the structural properties of the optimal policy, and show that the proposed improved algorithm outperforms the benchmarks, with an average AoP reduction up to 30%.

The rest of the paper is organized as follows. In Sec. II, we discuss the related works. In Sec. III, we present the system model and formulate the AoP minimization problem as a CMDP problem. In Sec. IV, we transform the CMDP problem to an unconstrained MDP problem by leveraging the Lagrangian method. In Sec V, we first propose a Lagrangian transformation framework for the original CMDP problem, and improve it with perturbation based refinement to achieve the optimal policy. We show our simulation results in Sec. VI, and conclude the paper in Sec.VII.

Ii Related work

Age-of-information (AoI) was introduced in the early 2010s as a new metric to characterize the freshness of the information that a system has about a process observed remotely [20]. Since then, an abundant of researches focus on the queueing theory to analyze the age-of-information in various system settings. In [19], the authors obtained the theoretical results of the average AoI, where the status update is served with the first-come-first-served (FCFS) principle, and more specifically, the queueing models include , and . After that, different queueing models, such as [37], [32], and [13], were also studied. A new metric, peak age, was introduce in [15], and the authors in [1] obtained the distribution of peak age in a queue. In [16], the authors studied the reliable transmission under the peak-age violation guarantees.

Another branch of researches on AoI considers energy-harvesting constraints since the IoT device (e.g., a sensor) is usually energy limited, and the sampling process consumes energy [38, 12, 5, 4]. In [38], the authors derived an optimal transmission policy in an energy harvesting status update system, which is formulated as an MDP problem. In [12]

, the authors proposed a reinforcement learning algorithm to learn the system parameters and the status update policy for an energy harvesting transmitter with a finite-capacity battery. The authors in

[5], [4] analyzed the scenario where an energy harvesting sensor with random or incremental battery recharge sends measurement updates to a destination, and showed that the optimal update policy follows a renewal structure. All the above works assume that the status update takes effect once it is received in the destination node, and the age is immediately set down to the time elapsed from status generation to its reception.

For computation-intensive application (e.g., autonomous driving), however, the status update (e.g., a video clip) needs further data processing to reveal the useful features. Hence, the data processing time also affects the age. However, there are very limited research efforts in this area. In [8], the authors considered the soft update in an information updating system. In both exponentially and linearly decaying age cases, the authors derived the optimal sampling schemes subject to a sampling frequency constraint. In [22]

, the authors studied the AoI for computation-intensive messages with MEC, and derived the closed-form average AoI for exponentially distributed computing time. In

[2], the authors jointly optimized the information freshness (age) and the completion time in a vehicular network. Nevertheless, the computation time is not taken into consideration in the age. In [34], the authors proposed a performance metric called age of task (AoT) to evaluate the temporal value of computation tasks. By jointly considering task scheduling, computation offloading and energy consumption, the authors proposed a light-weight task scheduling algorithm. However, it is an offline policy where the task arrival time is known in advanced.

Different from existing research efforts, in this paper, we expand the concept of AoI to AoP by taking the data processing time into consideration. We further consider data processing offloading to MEC server, and minimize the total average AoP by optimizing the status sampling and processing offloading policy.

Iii Model and formulation of AoP minimization

Iii-a System Model

Consider a real-time IoT status monitoring and control system for computation-intensive applications. The IoT device (a.k.a. the sensor) monitors the current status of a physical process (e.g., a camera records images of traffic situation at a crossroad), which needs further data processing. As shown in Fig. 1, the IoT device can choose to process the raw data locally at its processor or offload them to a mobile edge server in proximity. The data processing operation reveals the hidden feature (e.g., the congestion level at the crossroad) in the raw data, which we refer to as knowledge that will be then transmitted to an operator for accurate control. After receiving the knowledge, the operator sends an acknowledge (ACK) to the IoT device to sample a new status update.

We define the time elapsed from the status generation at the IoT device to the latest knowledge received by the operator as the age-of-processing (AoP), which is maintained by the operator to capture the status freshness. Compared to the traditional AoI, the AoP takes the data processing time into account, which is affected by the data processing offloading policy.

The IoT device follows the generate-at-will sampling policy [46], under which the IoT device can start a new sample whenever it prefers, and does not generate a new status update when the previous update is under processing, to avoid unnecessary waiting time. Suppose the IoT device samples a new status update at time , and then decides where to send the raw data (e.g., to its local processor or the edge server) for further data processing.

For the status update , we denote its data processing task by a pair , where is the input data size of status packet and is the total required CPU cycles to compute this task.

Iii-A1 Local processing

We assume that the sensor is equipped with a local processor (e.g., embedded CPU) for some necessary computations. If the sensor chooses to process the status update locally, then the operation time can be formulated as

(1)

where is the CPU frequency of the local processor. After data processing, the local server transmits the processed status result to the operator. We assume that the data size of the result is quite small (e.g., the result of object classification is usually takes only several bits). Therefore, the time of transmitting the result to the operator is negligible.

Iii-A2 Edge offloading

Fig. 1: Status sampling and processing procedure.

If the sensor chooses to offload the raw data to the edge server, it incurs extra time for transmitting the computation input data via wireless connection. According to [29], the offloading rate can be formulated as

(2)

where is the channel bandwidth and is the transmission power of update . Furthermore, denotes the wireless channel gain between the sensor and the edge server, which can be generated using a distance-dependent path-loss model given in [14]

(3)

and is the total background noise and interference power of transmitting update data . Therefore, the transmission time is computed as

(4)

and the data processing time at the edge server is

(5)

where is the CPU frequency of the edge server. As mentioned before, we ignore the time of transmitting the processed statue result from the edge server to the operator. Following (4) and (5), we can compute the time overhead of the edge offloading approach as

(6)

Throughout this paper, we assume that both the computation capacities of the local and edge servers are stable (e.g., and are both constants). This is reasonable since the sensor usually carries out a dedicated sensing task and the edge server usually allocates a resource block (i.e., a virtual machine) with fixed size to a certain computing task. Besides, we assume that all state update tasks have the same input data size and required computation 111We left the heterogeneous state update tasks with different and in our future work.. For example, the input image size for object recognition based surveillance task is the same with almost the same CPU cycles for processing each image. For the wireless channel, we assume that the transmission power is the same for all update . The total background noise and interference power influences the wireless channel state. It is unknown and change stochastically. The channel state has a critical impact on the data offloading policy. Intuitively, when the wireless channel state is good (e.g., is small), the IoT device tends to offload the status data to the edge server, using the abundant computing resource of edge server to reduce the processing time. When the wireless channel is bad (e.g., is large), the transmission time is relatively large, and hence the IoT device would like to process the update data locally to avoid the large transmission time.

Fig. 2: Evolution of the age-of-processing (AoP).

We depict the evolution of the age-of-processing in Fig. 2. Suppose a new status update is sampled at time . If the raw data is processed locally, then the processing time is 222Since all state update tasks have the same required computation , we simplify as for all update .. If the raw data is processed in edge server, the total processing time is . Therefore, the processed result of update is delivered at time . After the operator receives the update , the sensor node may insert a waiting time before sampling the new status update at time , where is the maximum waiting time under a sampling frequency constraint. The sensor can switch to a low-power sleep mode during the waiting period .

At any time , the freshest status update at the operator is generated at time

(7)

Then the age-of-processing is defined as

(8)

As shown in Fig. 2, the AoP follows a stochastic process which increases linearly with while waiting for the next sample or the data is under processing, and then downward jumps when the status update is delivered at the operator. Therefore, the curve of the AoP has a zig-zag shape. More specifically, status update is sampled at time and is received by the operator at time . Therefore, the AoP at time is . After that, the AoP continues to increase linearly with time while the sensor is waiting or the update data is under processing. Finally, the AoP reaches right before the processed result of status update is delivered. Then, at time , the AoP drops to .

Iii-B CMDP Formulation

In this subsection, we focus on specifying the optimal status sampling and computation offloading policy to minimize the average AoP of the system discussed above.

In most existing works, the long-term average age is a key performance metric to measure the long-run information freshness, which is defined as

(9)

Intuitively, is the time-average shaded area under the envelop. To compute the average AoP, we decompose into a series of areas between the sampling time . As shown in Fig. 2, the light shaded area is a parallelogram, which equals to

(10)

and the dark shaded area is a triangle having the area

(11)

Therefore, the average AoP can be calculated as

(12)

Note that, minimizing is a long-term stochastic problem. At each delivered time , the operator maintains the age , and then decides the inserted waiting time before sampling next status update . Besides, we assume that the IoT device also determines where to process the next update at time , which will affect the value of .

Markov decision process: As mentioned before, the wireless channel states between the sensor and the edge server change stochastically. Let be the channel state with a finite state space ,333Since the data size of all status update packets is the same, we can simplify the channel state as the transmission time of the update data (4). Since the transmission time in (4) is continuous, it results in an infinite state space in the MDP. For simplicity, we discretize the transmission time into channel states. which is influenced by . Unlike the assumption of the i.i.d. channel state process in [49], we consider a general case where the process of

is a stationary and ergodic Markov chain with the transition matrix

[47].444We assume that we know the statistics of the channel

in advance, since we can estimate

through channel training.
The element the transition matrix

is the probability from channel state

to state .

At time , we denote as the current system state, where is the system state space.555We also discretize the waiting time since a discrete waiting time is much easier to execute for IoT devices, and an infinite waiting time space results in an infinite MDP state space which is difficult to solve. Then the sensor chooses an action from the action space , where is the inserted waiting time and is the offloading decision for update . When , the sensor chooses to offload the status update to the edge server, and when , the sensor chooses to process the update locally. We then define the reward function for taking action at state as

(13)

The system then evolves to the next state , which only depends on previous system state and the action . More specifically, the transition of channel state is

(14)

where is the element of channel transition matrix , and the age evolves according to

(15)

Stationary status sampling and processing offloading policy: Given the system state , the IoT device determines the sampling and offloading action according to the following policy.

Definition 1: A stationary status sampling and computation offloading policy is defined as a mapping from the system state space to the control action space , where , which is independent of the update sequence .

In this paper, we focus on the stationary policy due to its low complexity for practical implementation (e.g., without recording the long historical information for decision making). Under a given stationary policy , the average AoP can be calculated as:

(16)

where the expectation operation is taken with respect to the measure induced by the policy , and we focus on the worst case derived by the operation.

Sampling frequency constraint: Due to the limited energy resource of the sensor, it is impossible to sample the status update in a very high frequency. Following the works [39] and [11], we introduce a sampling frequency constraint

(17)

where is the minimum sampling duration and the maximum allowed average status sampling frequency due to a long-term average resource constraint. We should emphasize that in practice it is hard to monitor the runtime energy expenditure by the sensor itself, and hence we consider the maximum sampling frequency constraint instead of the energy budget constraint in the formulation.

AoP minimization: We seek to find the optimal stationary status sampling and computation offloading policy that minimizes the average AoP under a maximum sampling frequency constraint at the sensor, as follows:

(18)

Problem (18) is a constrained Markov decision process (CMDP). It is computationally intractable to find the optimal policy for problem (18), since only at the end of the infinite trajectory can we obtain the final valuation of the policy , this is because the denominator of (16) is the sum of for all status update .

To tackle this difficulty, we relax the problem (18) as:

(19)

where

(20)

and

(21)

Obviously, finding the optimal policy for problem (19) is not equal to the optimal policy for problem (18). If is smaller than for all policy , therefore, the solution of problem (19) is an upper bound policy for the original problem (18). However, there is no certain assertion to determine the direction of inequality between

The inequality direction is influenced by the values of and all . However, the extensive simulation results in Sec. VI show that the ratio between and is very close to 1, which shows that the relaxed problem (19) is a good approximation to the original problem (18).

Iv Unconstrained MDP Transformation

It is well known that solving a CMDP problem directly is quite challenging [3]. In this section we will transform the CMDP problem (19) to an unconstrained MDP problem by leveraging the Lagrangian method.

We first describe problem (19) in terms of CMDP. At each delivered time

which we also refer to as decision epoch

, the IoT device observes the current system state , where is the processing time of the previous status update , is the waiting time before sampling the update , and , are current processing time and transmission time, respectively. After observing the current state , the IoT device selects an action following a policy , where . We also refer the policy to a state-action mapping function. After that, the IoT device will receive an immediate reward from the reward function

(22)

which is a time-average area of and then the system evolves to next state . We can see that all the elements in only depend on the previous state and action . Therefore, the random process is a controlled Markov process. The objective of problem (19) is to find an optimal state-action mapping function to minimize the infinite horizon average reward

(23)

while committing to a sampling constraint .

A major challenge in obtaining the optimal policy for problem (19) is the sampling frequency constraint. To overcome this difficulty, we first transform the problem (19) into an unconstrained Markov decision process (MDP) by introducing Lagrange multipliers [49]. We define the immediate Lagrange reward of update as

(24)

where is the Lagrange multiplier. Then, the average Lagrange reward under policy is given by

(25)

By introducing the Lagrange multiplier, we now have an unconstrained MDP problem with the objective of minimizing the average Lagrange cost

(26)

Let be the optimal policy of problem (26) when the Lagrange multiplier is . Define , , and . For the above Lagrange transformation, we can show the following result.

Lemma 1: is monotone non-increasing while and are monotone non-decreasing in .

Proof.

The monotone non-increasing property of and non-decreasing property are a consequence of the following fundamental inequality

(27)

for any positives and . The first inequality follows that the policy minimizes the problem , and the second inequality follows that the policy minimizes the problem . The third inequality can be obtained from

(28)

Therefore, we have and . As for , we first assume that is not monotone non-decreasing. Then there exists , such that . But , whence,

(29)

Consequently, we come to the contradiction . Finally, we have the result . ∎

Lemma 1 reveals important relationships between the Lagrange multiplier and the minimum sampling duration as well as the average AoP , which help us solve the MDP problem (26). First, the minimum sampling duration is non-decreasing in . Therefore, the optimal policy to problem (26) under Lagrange multiplier corresponds to a certain . When , the policy is not a feasible solution to the original problem (18). Then, we can increase the value of , until . Furthermore, the average AoP, is also non-decreasing in . Since our objective is to find an optimal policy to minimize subject to , it is equivalent to find the optimal Lagrange multiplier , such that

(30)

In order to find the optimal Lagrange multiplier , we need to solve the following two subproblems:

Subproblem 1: how to find the optimal policy for the MDP problem (26) when given a Lagrange multiplier ;

Subproblem 2: how to update such that converges to .

In summary, the Lagrangian transformation method transforms the CMDP problem (19) to the unconstrained MDP problem (26) which is much easier to solve. Furthermore, by exploring the relationships between the Lagrangian multiplier and the sampling frequency as well as the AoP, we show that the MDP problem (26) can be decomposed into two subproblems. In the next section we will first solve the two subproblems for (26), and then we propose an algorithm to obtain the optimal policy for the original CMDP problem (19).

V Optimal policy for the CMDP problem

In this section, we first propose a policy iteration algorithm to derive the optimal policy for Subproblem 1. After that, we apply the Robbins-Monro algorithm to derive the optimal Lagrangian multiplier for Subproblem 2. Finally, we propose an algorithm to derive the optimal policy for the original CMDP problem (19).

Solving Subproblem 1. When given , problem (26) is a Markov decision process with an average reward criterion, which has been studied in many excellent works, e.g., [28] and [10]. We restrict the stationary policy to the stationary deterministic policy . A stationary deterministic policy maps each state to a single action. That is, given a state , the output of policy

is a single action, not a probability distribution over the action space. The stationary deterministic policy simplifies the state space and guarantees the existence of the optimal policy to the MDP problem (

26).

Applying a stationary deterministic policy to a controlled Markov process yields a Markov process with stationary transition probability matrix , where the element is the state transition probability from to under policy [31]. Given policy

, we also have a reward vector

, where the element is the immediate reward at state with the chosen action . A gain vector is an average reward vector, whose element is the average reward when starting at the initial state

(31)

Moreover, when given a , the MDP problem (26) has following Bellman optimality equation:

(32)

where is the probability from state to under the policy

, and the bias vector

is the expected total difference between the immediate reward and the average reward [28]. Therefore, the optimal policy can be obtained by:

(33)

We propose the policy iteration algorithm to solve (33), as shown in Algorithm 1. The key idea of Algorithm 1 is to iteratively perform policy evaluation and policy improvement to drive the update dynamics to converge to the optimal policy in (33).

0:  Lagrangian multiplier ;
0:  The optimal policy of (26) when given a ;
1:  Set and select an arbitrary stationary deterministic policy .
2:  (Policy evaluation) Obtain the average reward vector , the bias vector , and an auxiliary vector by solving a set of linear equations for as follows:
(34)
(35)
(36)
where is a diagonal matrix with all one value and the same dimension as . and are known when given .
3:  (Policy improvement)For each state , choose to satisfy
(37)
setting .
4:  If , stop and set . Otherwise, increment by 1 and return to step 2.
Algorithm 1 The policy iteration algorithm

The linear equations (33) and (34) can uniquely determine the gain vector . However, as for , the class of , where is an arbitrary constant and is an all one vector with the same dimension as , all satisfy the linear equations (33) and (34). Therefore, an auxiliary vector and an additional equation (35) are introduced to determine

. Note that, in each iteration, the policy evaluation needs to solve a linear program with

variables, and the policy improvement needs conduct at most comparisons. The convergence of Algorithm 1 to the optimal policy of problem (26) can be shown by following the similar proof procedures in [28] and hence is omitted here for brevity.

Solving Subproblem 2. Since the minimum sampling duration is non-decreasing in the Lagrangian multiplier according to Lemma 1, we adopt the two time-scale stochastic approximation based Robbins-Monro algorithm [30] to solve Subproblem 2, as shown in Algorithm 2. Specifically, at the small time scale we solve the optimal policy for the MDP with a given Lagrange multiplier (e.g., step 4 and 5), and at the large time scale we update the Lagrange multiplier according to

(38)

(e.g., step 6 and 7). The sequence of Lagrange multipliers derived by Algorithm 2 converges to following the two time-scale stochastic approximation analysis in [30].

0:  Stop criterion ;
0:  The policy of (19);
1:  Initialization:
2:  Initialized with a sufficiently small number (e.g., ) and .
3:  End initialization
4:  Repeat transform the CMDP problem (19) to the MDP problem (26) when given a .
5:    Obtain the optimal policy for problem (26) using Algorithm 1.
6:    Update the Lagrange multiplier according to (38).
7:    Increase by 1.
8:  Until some stop criterion are satisfied.
Algorithm 2 Lagrangian transformation algorithm for the CMDP problem (19).

There are several possible stop criterion in Algorithm 2, for example, the difference between or being small enough (e.g., smaller than ), or the number in iterations of Algorithm 2 exceeding a prespecified number (e.g., ). In practice, the optimal Lagrange multiplier derived by Algorithm 2 can be close to but not precisely the one defined at (30). Nevertheless, when the is close to the value defined in (30), we can further refine the optimal policy for (19) as follows.

Solving Problem (19). We integrate the perturbation based refinement framework to achieving the optimal policy for problem (19). We introduce two perturbed Lagrange multipliers and by imposing some perturbation to . Given derived by Algorithm 2, we set

(39)

where is a small enough perturbation parameter (e.g., ). Lemma 1 shows that is monotone non-decreasing in , and hence

(40)

Then we refine the optimal policy as a randomized mixture of two perturbed policies and as

(41)

where the randomization factor can be given as

(42)

In this way, we will satisfy the condition in (30) due to the fact that

(43)

We summarize the policy refining procedure in Algorithm 3.

0:  Stop criterion and the perturbation value ;
0:  The optimal policy of (19);
1:  Obtain the Lagrangian multiplier using Algorithm 2.
2:  Obtain and according to (39).
3:  Obtain the policy and using Algorithm 1.
4:  Obtain the optimal policy for the CMDP problem (19) according to (41).
Algorithm 3 Optimal policy refining for CMDP problem (19).

In Algorithm 3, when running the Algorithm 2 to obtain the optimal Lagrangian multiplier in step 1, it takes a long time to converge due to the low convergence rate of the stochastic approximation technique in (38). Since the step size is still large when is near after a few iterations, it would take a long time for to get small enough. Therefore, we improve Algorithm 2 by introducing a modified step size , where is small value (e.g., ), and update as:

(44)
(a) update using (38)
(b) update using (44)
Fig. 3: value of under different update step sizes.

As shown in Fig. 3(a), when updating using , it takes a long time to converge to the optimal Lagrangian multiplier (e.g., more than 25000 iterations when the stop criterion here is ). As shown in Fig. 3(b), the new updating rule (44) tremendously reduces the number of iterations (e.g., 120 iterations). Furthermore, the small figures in Fig. 3(a) and 3(b) show the last ten iterations of (38) and (44). We can see that using (44) converges more close to .

Vi Performance evaluation

In this section, we evaluate the performances of our proposed algorithms via extensive simulations.

Vi-a Simulation Setup

As mentioned in Sec. III, we use to characterize the status update for an IoT computation-intensive application, where is the input data size and

indicates the required CPU cycles. We also assume that all status update packets are of identical pair. Specially, we consider the face recognition application in

[35], where the data size for the computation offloading KB and the total number of CPU cycles Megacycles. In terms of computing resources, we assume that the CPU capability of edge and local server to be GHz and GHz [42].

As for edge offloading, we assume that the wireless channel bandwidth MHz, and the distance between sensor and edge server km. The transmission power of sensor is dBm and the mean background noise dBm [29]. We assume that the wireless channel state process is a Markov chain. Following the equal-probability SNR partitioning method [47], we can model the channel by three-state Markov chain, i.e., , with approximately the transition probability matrix

(45)

Assume that if offloading is attempted, the transmission time defined in (2) and (4) are given by ms, ms, and ms. We summarize the main parameters of the simulation in Table I.

parameters values
input data size of each status update, 500 KB
number of CPU cycles of each status update, 1000 Megacycles
CPU cycle of edge server, 20 GHz
CPU cycle of local server, 1 GHz
wireless bandwidth between sensor and edge server, 20 MHz
distance between sensor and edge server, 0.1 km
transmission power of the sensor, 20 dBm
background noise, -100 dBm
action set of waiting time, [0,200,…,800] ms
minimum sampling duration, 1200 ms
perturbation parameter,
TABLE I: SIMULATION SETUP AND SYSTEM PARAMETERS

Vi-B Benchmarks

In order to verify the performance of our proposed algorithms, we compare with the following benchmarks:

  1. Always edge computing with zero waiting (AEZW): the sensor chooses to offload each status update to the edge server for further processing without waiting. That is, when the edge server completes computation of one status update, the sensor would sample an new status update immediately. However, this policy may not satisfy the sampling frequency constraint.

  2. Always edge computing with conservative waiting (AECW): the sensor chooses to offload each status update to the edge server with a conservative waiting. That is, when the edge server completes computation of one status update, based on current AoP in the operator, the sensor choose to wait before sampling next status update.

  3. Always local computing with conservative waiting (ALCW): the sensor chooses to computer each status update at the local server with a conservative waiting. Since the local CPU cycle and total computation cycles are constants, the AoP is also a constants when the local server completes computation, then the sensor choose to wait .

Fig. 4: the average AoP of the original calculation (16) and the approximate calculation (20).

Vi-C Policy structures of proposed algorithms

We first compare the average AoP performance of the original problem (18) and the approximated problem (19), and verify the optimal policy structure of the CMDP problem (19).

As shown in Fig. 4, we conduct the simulation of status updates while using the optimal policy defined in (41). The orange line depicts the average AoP of the original problem (18) while the blue line depicts the approximated problem (19). As we can see, when the status update number increases, the average AoP of both (18) and (19) would become stable, and the average AoP of (18) is slightly lager than that of (19). More precisely, the small figure in Fig. 4 depicts the average AoP ratio of (18) and (19). We can see that the ratio is very close to 1 (with the value of 1.06). This shows that, instead of obtaining the optimal policy of the original problem (18), which is intractable, we seek to obtain the optimal policy of the approximated problem (19), and the solution of (19) is also a nice approximation of the original problem (18).

We depict the optimal policy structure of the CMDP problem (19) in the Fig. 5. The coordinate represents the current system state , where axis represents the combination of the current AoP and the wireless channel state , while the axis represents the combination of the last AoP and waiting time . The axis represents the action at state

, where even numbers denote “offloading”, odd numbers denote “local computing”, and bigger numbers represent higher waiting time (e.g.,

represents local computing and the waiting time is 200 ms). As shown in Fig. 5(a), the waiting time is a threshold structure function of and . That is, the optimal policy chooses longer waiting time when the sum of the last AoP and waiting time is large (e.g., when axis is fixed and the value of axis increases, the value of axis also increases).

Fig. 5(a) shows the optimal policy structure of the Lagrangian multiplier , which is obtained by Algorithm 2. Fig. 5(b) and 5(c) show the optimal policy and , which are obtained by Algorithm 3. As we can see, the policy and are exactly the same when introducing a small perturbation to (). Besides, the policy and only differ at one state which is pointed out by the red arrow in state (3, 7).

(a) optimal policy when given .
(b) optimal policy when given .
(c) optimal policy when given .
Fig. 5: different optimal policies when given different Lagrangian multipliers.

Vi-D Performances among different benchmarks

Fig. 6: Average AoP performance among different policies.
(a) average AoP of different transmission time
(b) average sampling time of different transmission time
Fig. 7: performance of four policy under different transmission time.

We next conduct the simulations to compare average AoP performances among different benchmarks. As shown in Fig. 6, the optimal policy achieves the minimum average AoP at around 1460 ms. The ALCW policy has a lower average AoP than AECW and AEZW, which is a constant of 1600 ms, and our proposed algorithms have an average AoP reduction at around 10%. The reason of this reduction is that the optimal policy would offload the status update to the edge server for further processing when the wireless channel state is good, and the powerful computing capacity of the edge server can shorten the processing time immensely, therefore, it results in a smaller average AoP. However, as shown in Fig. 6, the always offloading policies achieve a worse average AoP, at around 1840 ms and 2100 ms for AEZW and AECW, respectively, and our proposed algorithm achieves an AoP reduction at around 20% and 30%. The reason is that the average transmission time of offloading to the edge server is large in the original simulation setting. Although the processing time is small at edge server, the transmission time plays an critical role of AoP.

Vi-E The influence of wireless channel state

In this subsection, we discuss the influence of wireless channel state for average AoP. Although the sensor can choose to offload to edge server to reduce the processing time, it would introduce additional transmission time. In this paper, we assume that the channel state is an Markov chain with three states. We can simply refer these three state to “good”, “medium”, and “bad” channel state. We conduct the simulation with different transmission time of the medium channel state (e.g., ms)666the transmission time of good channel state is half of the medium state, and the transmission time of bad channel state is twice of the medium state.. As shown in Fig. 7(a), when the transmission time increases, the average AoP of our proposed algorithm and the always offloading policies (AEZW and AECW) also increases. Besides, our algorithm has a much smaller increase rate, because the optimal policy would choose to local computing when the wireless channel state is bad. When the transmission time less than 700 ms, the AEZW policy has a smaller average AoP than our proposed algorithm, however, as shown in Fig. 7(b), the average sampling time of AEZW is less than ms, which violates the sampling frequency constraint (17). Although the AECW and ALCW policies can always satisfy the constraint (17), they result in a worse average AoP.

Vi-F The influence of computation demand

Fig. 8: Average AoP performance under different computation demand.

In this subsection, we discuss the influence of computation demand for average AoP. We conduct the simulation of different computation demand of one status update (e.g., Gigacycles) while the transmission time of the medium channel state is 1000 ms. As shown in Fig. 8, the average AoP of the ALCW policy increases dramatically when the computation demand increases from 1.0 to 2.0 Gigacycles due to the limited computation capacity of the local server. It takes much time to process a status update for computation-intensive application at the local server. In contrast, the average AoP of always offloading policies AEZW and AECW just has a slight increment since the edge server has a much larger computation capacity. We should note that, when the computation demand is 2.0 Gigacycles, the average AoP of our proposed algorithm equals to that of the AEZW policy. The reason is that when the computation demand is essentially large, the processing time would dominates the AoP, the proposed algorithm would choose to always offloading policy to reduce the processing time.

Vii Conclusion

In this paper, we aim to minimize the age-of-processing (AoP) of computation-intensive IoT application in a status monitoring and control system. Due to the limited resource of an IoT sensor, it can offload the status update to the edge server for processing. We focus on finding the optimal sampling and processing offloading policy to minimize the average AoP, which is formulated as a CMDP. We propose a Lagrangian transformation method to relax the CMDP problem into an unconstrained MDP problem, and derive the optimal policy when given the optimal Lagrangian multiplier of the MDP problem. Furthermore, by introducing a small perturbation value to the optimal Lagrangian multiplier of the MDP problem, we obtain the optimal policy of the original CMDP problem. The extensive simulation results verify the superior performance of our proposed algorithms. For the future direction, we are going to generalize our framework to the much more challenging scenarios with multiple IoT devices and edge servers.

References

  • [1] N. Akar, O. Dogan, and E. U. Atay (2019) Finding the exact distribution of (peak) age of information for queues of ph/ph/1/1 and m/ph/1/2 type. External Links: 1911.07274 Cited by: §I, §II.
  • [2] A. O. Al-Abbasi and V. Aggarwal (2018) Joint information freshness and completion time optimization for vehicular networks. CoRR abs/1811.12924. External Links: Link, 1811.12924 Cited by: §II.
  • [3] E. Altman (1999) Constrained markov decision processes. Vol. 7, CRC Press. Cited by: §IV.
  • [4] A. Arafa, J. Yang, S. Ulukus, and H. V. Poor (2018) Age-minimal online policies for energy harvesting sensors with incremental battery recharges. CoRR abs/1802.02129. External Links: Link, 1802.02129 Cited by: §II.
  • [5] A. Arafa, J. Yang, and S. Ulukus (2018-05) Age-minimal online policies for energy harvesting sensors with random battery recharges. 2018 IEEE International Conference on Communications (ICC). External Links: ISBN 9781538631805, Link, Document Cited by: §II.
  • [6] L. Atzori, A. Iera, and G. Morabito (2010) The internet of things: a survey. Computer networks 54 (15), pp. 2787–2805. Cited by: §I.
  • [7] B. Barakat, S. Keates, I. Wassell, and K. Arshad (2019-06) Is the zero-wait policy always optimum for information freshness (peak age) or throughput?. IEEE Communications Letters 23 (6), pp. 987–990. External Links: Document, ISSN 2373-7891 Cited by: §I.
  • [8] M. Bastopcu and S. Ulukus (2018-10) Age of information with soft updates. In 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Vol. , pp. 378–385. External Links: Document, ISSN null Cited by: §II.
  • [9] A. M. Bedewy, Y. Sun, and N. B. Shroff (2017) Minimizing the age of the information through queues. CoRR abs/1709.04956. External Links: Link, 1709.04956 Cited by: §I.
  • [10] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas (1995) Dynamic programming and optimal control. Vol. 1, Athena scientific Belmont, MA. Cited by: §V.
  • [11] E. T. Ceran, D. Gündüz, and A. György (2017) Average age of information with hybrid ARQ under a resource constraint. CoRR abs/1710.04971. External Links: Link, 1710.04971 Cited by: §III-B.
  • [12] E. T. Ceran, D. Gündüz, and A. György (2019) Reinforcement learning to minimize age of information with an energy harvesting sensor with harq and sensing cost. External Links: 1902.09467 Cited by: §II.
  • [13] J. P. Champati, H. Al-Zubaidy, and J. Gross (2018-04) Statistical guarantee optimization for age of information for the d/g/1 queue. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vol. , pp. 130–135. External Links: Document, ISSN null Cited by: §I, §II.
  • [14] X. Chu, D. Lopez-Perez, Y. Yang, and F. Gunnarsson (2013) Heterogeneous cellular networks: theory, simulation and deployment. Cambridge University Press. Cited by: §III-A2.
  • [15] M. Costa, M. Codreanu, and A. Ephremides (2014-06) Age of information with packet management. In 2014 IEEE International Symposium on Information Theory, Vol. , pp. 1583–1587. External Links: Document, ISSN 2157-8095 Cited by: §II.
  • [16] R. Devassy, G. Durisi, G. C. Ferrante, O. Simeone, and E. Uysal-Biyikoglu (2018) Reliable transmission of short packets through queues and noisy channels under latency and peak-age violation guarantees. CoRR abs/1806.09396. External Links: Link, 1806.09396 Cited by: §I, §II.
  • [17] Y. Inoue, H. Masuyama, T. Takine, and T. Tanaka (2017-06) The stationary distribution of the age of information in fcfs single-server queues. In 2017 IEEE International Symposium on Information Theory (ISIT), Vol. , pp. 571–575. External Links: Document, ISSN 2157-8117 Cited by: §I.
  • [18] Y. Inoue, H. Masuyama, T. Takine, and T. Tanaka (2018) A general formula for the stationary distribution of the age of information and its application to single-server queues. CoRR abs/1804.06139. External Links: Link, 1804.06139 Cited by: §I.
  • [19] S. Kaul, R. Yates, and M. Gruteser (2012-03) Real-time status: how often should one update?. In 2012 Proceedings IEEE INFOCOM, Vol. , pp. 2731–2735. External Links: Document, ISSN 0743-166X Cited by: §I, §II.
  • [20] S. Kaul, M. Gruteser, V. Rai, and J. Kenney (2011) Minimizing age of information in vehicular networks. In 2011 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, pp. 350–358. Cited by: §I, §II.
  • [21] A. Kosta, N. Pappas, and V. Angelakis (2017) Age of information: a new concept, metric, and tool. Foundations and Trends® in Networking 12 (3), pp. 162–259. External Links: Link, Document, ISSN 1554-057X Cited by: §I.
  • [22] Q. Kuang, J. Gong, X. Chen, and X. Ma (2020) Analysis on computation-intensive status update in mobile edge computing. External Links: 2002.06400 Cited by: §II.
  • [23] B. Li and J. Liu (2019) Can we achieve fresh information with selfish users in mobile crowd-learning?. CoRR abs/1902.06149. External Links: Link, 1902.06149 Cited by: §I.
  • [24] R. Li, Z. Zhou, X. Chen, and Q. Ling (2019) Resource price-aware offloading for edge-cloud collaboration: a two-timescale online control approach. IEEE Transactions on Cloud Computing (), pp. 1–1. External Links: Document, ISSN 2372-0018 Cited by: §I.
  • [25] J. D. C. Little (1961-06) A proof for the queuing formula: l = w. Oper. Res. 9 (3), pp. 383–387. External Links: ISSN 0030-364X, Link, Document Cited by: §I.
  • [26] M. Moltafet, M. Leinonen, and M. Codreanu (2019) On the age of information in multi-source queueing models. External Links: 1911.07029 Cited by: §I.
  • [27] E. Najm and E. Telatar (2018) Status updates in a multi-stream M/G/1/1 preemptive queue. CoRR abs/1801.04068. External Links: Link, 1801.04068 Cited by: §I.
  • [28] M. L. Puterman (2014) Markov decision processes.: discrete stochastic dynamic programming. John Wiley & Sons. Cited by: §V, §V, §V.
  • [29] T. S. Rappaport et al. (1996) Wireless communications: principles and practice. Vol. 2, prentice hall PTR New Jersey. Cited by: §III-A2, §VI-A.
  • [30] H. Robbins and S. Monro (1951) A stochastic approximation method. The annals of mathematical statistics, pp. 400–407. Cited by: §V, §V.
  • [31] S. M. Ross (2014) Introduction to stochastic dynamic programming. Academic press. Cited by: §V.
  • [32] H. Sac, T. Bacinoglu, E. Uysal-Biyikoglu, and G. Durisi (2018-06) Age-optimal channel coding blocklength for an m/g/1 queue with harq. In 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Vol. , pp. 1–5. External Links: Document, ISSN 1948-3252 Cited by: §I, §II.
  • [33] T. Shreedhar, S. K. Kaul, and R. D. Yates (2019-06) An age control transport protocol for delivering fresh updates in the internet-of-things. In 2019 IEEE 20th International Symposium on ”A World of Wireless, Mobile and Multimedia Networks” (WoWMoM), Vol. , pp. 1–7. External Links: Document, ISSN null Cited by: §I.
  • [34] X. Song, X. Qin, Y. Tao, B. Liu, and P. Zhang (2019) Age based task scheduling and computation offloading in mobile-edge computing systems. arXiv preprint arXiv:1905.11570. Cited by: §II.
  • [35] T. Soyata, R. Muraleedharan, C. Funai, M. Kwon, and W. Heinzelman (2012-07) Cloud-vision: real-time face recognition using a mobile-cloudlet-cloud acceleration architecture. In 2012 IEEE Symposium on Computers and Communications (ISCC), Vol. , pp. 000059–000066. External Links: Document, ISSN 1530-1346 Cited by: §VI-A.
  • [36] A. Soysal and S. Ulukus (2018) Age of information in G/G/1/1 systems. CoRR abs/1805.12586. External Links: Link, 1805.12586 Cited by: §I.
  • [37] A. Soysal and S. Ulukus (2019) Age of information in G/G/1/1 systems: age expressions, bounds, special cases, and optimization. CoRR abs/1905.13743. External Links: Link, 1905.13743 Cited by: §I, §II.
  • [38] G. Stamatakis, N. Pappas, and A. Traganitis (2019) Control of status updates for energy harvesting devices that monitor processes with alarms. CoRR abs/1907.03826. External Links: Link, 1907.03826 Cited by: §II.
  • [39] Y. Sun, E. Uysal-Biyikoglu, R. Yates, C. E. Koksal, and N. B. Shroff (2016-04) Update or wait: how to keep your data fresh. In IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, Vol. , pp. 1–9. External Links: Document, ISSN null Cited by: §I, §III-B.
  • [40] R. Talak and E. Modiano (2019) Age-delay tradeoffs in queueing systems. arXiv preprint arXiv:1911.05601. Cited by: §I.
  • [41] V. Terzija, G. Valverde, D. Cai, P. Regulski, V. Madani, J. Fitch, S. Skok, M. M. Begovic, and A. Phadke (2011-01) Wide-area monitoring, protection, and control of future electric power networks. Proceedings of the IEEE 99 (1), pp. 80–93. External Links: Document, ISSN 1558-2256 Cited by: §I.
  • [42] T. X. Tran and D. Pompili (2018) Joint task offloading and resource allocation for multi-server mobile-edge computing networks. IEEE Transactions on Vehicular Technology 68 (1), pp. 856–868. Cited by: §VI-A.
  • [43] V. Tripathi, R. Talak, and E. Modiano (2019) Age of information for discrete time queues. CoRR abs/1901.10463. External Links: Link, 1901.10463 Cited by: §I.
  • [44] M. Wang, W. Chen, and A. Ephremides (2019) Real-time reconstruction of counting process through queues. CoRR abs/1901.08197. External Links: Link, 1901.08197 Cited by: §I.
  • [45] J. Xu and N. Gautam (2019) Towards assigning priorities in queues using age of information. CoRR abs/1906.12278. External Links: Link, 1906.12278 Cited by: §I.
  • [46] R. D. Yates (2015-06)