I Introduction
Recent advances in Internet of things (IoT) devices have increased the variety of services they provide, such as health monitoring, financial analysis, weather analysis, locationbased services (LBSs) and smart metering. Moreover, the integration of some IoT devices with social networks has encouraged the users to share their personal data to obtain useful information from these social platforms. While the users can receive hotel, restaurant and product recommendations from Facebook, Twitter or YouTube when they share their location information, they can also benefit from the personalized dietary tips as a result of sharing their Fitbit activity. However, finegrained timeseries data collected by IoT devices contain sensitive confidential information about the user. Account balance, biomedical measurements, location trace, weather forecast and smart meter readings are typical examples of timeseries data which carry sensitive personal information. For instance, a malicious third party can derive an individual’s frequently visited destinations, financial situation or social relationships using the shared location information[LPsurvey]. Using nonintrusive load monitoring techniques on smart meter data, an eavesdropper can deduce the user’s presence at home, disabilities and even political views due to the TV channel the user is watching [GiulioSPmag]. Besides all, the most sensitive private information, such as patient history, chronic diseases and psychological state, can be revealed by health monitoring systems [ECG, Health]. Therefore, timeseries data privacy has been an important concern, and there is an increasing pressure from consumers to keep their data traces private against malicious attackers or untrusted service providers (SPs), while preserving the utility obtained from these IoT services. Our goal in this paper is to study the fundamental privacyutility tradeoff (PUT) when sharing sensitive timeseries data.
Ia Related Work
Timeseries data privacy and its applications to various domains have been extensively studied [Timeseries, Health_kanon1, Health_kanon2, Health_DifP, Crypto, InfoTheo_annon, InfoTheo_anon_obfus, Shokri_kanony, Shokri_single, InfoTheo_single, SM_DifP1, SM_DifP2, GiulioRED_ESD, Ravi, Trace1, Trace2, Trace3, WIFS, Giulio, ICASSP, BizBook]. A large body of research has focused on protecting the privacy of a single data point, e.g., the current sensitive measurement [Shokri_kanony, Shokri_single, InfoTheo_single, SM_DifP1, SM_DifP2, GiulioRED_ESD]. However, the temporal relations in timeseries data requires going beyond single data point privacy. Individual measurements taken at each time instance, such as electrocardiogram (ECG), body temperature, location, account balance and smart meter readings, are highly correlated and the strategies focusing on the current data privacy might reveal sensitive information about the past or future measurements.
Differential privacy (DP), kanonymity and information theoretic metrics are commonly used as privacy measures [Timeseries, Health_kanon1, Health_kanon2, Health_DifP, Crypto, Shokri_kanony, Shokri_single, InfoTheo_single, InfoTheo_annon, InfoTheo_anon_obfus, SM_DifP1, SM_DifP2, GiulioRED_ESD, Ravi, Trace1, Trace2, Trace3, WIFS, ICASSP, Giulio, BizBook]. By definition, DP prevents the SP from inferring the current sensitive data of the user, even if the SP has the knowledge of all the remaining private data points. Kanonymity ensures that a sensitive data is indistinguishable from at least other data points. However, DP and kanonymity are meant to ensure the privacy of a single data point in time. In [ShokriQuantify], it is stated that these are not appropriate measures for location trace privacy since temporal correlations are not taken into account.
Several papers on DP and kanonymity consider temporal correlations. In [Health_DifP], physiological measurements are obfuscated before reporting to a utility provider for PUT. Instead of the entire timeseries history, a selected temporal section of the sensor data is considered, and solved by using dynamic program and greedy algorithm. The work in [InfoTheo_annon] focuses on keeping the user identity private in a location privacy setting by performing random permutation on a set of multiple users. However, the users might still be reidentified when attackers have access to auxiliary information. In [InfoTheo_anon_obfus], authors improve this approach by considering both user identity and location privacy and merging anonymization with obfuscation. However, the risk of reidentification of the user by the adversary still exists and privacy gain by obfuscation depends highly on the number of users. In [SM_DifP2], DP in a smart meter with a rechargeable battery is achieved by adding noise to the meter readings before reporting to a utility provider. In order to guarantee DP, the perturbation must be independent of the battery state of charge. However, for a finite capacity battery, the energy management system cannot provide the amount of noise required for preserving privacy.
On the other hand, informationtheoretic privacy considers the statistics of the entire timeseries in terms of temporal correlations, and study privacy mechanisms that allow arbitrary stochastic transformations of data samples, rather than being limited to addition of noise of a specific form. In [Ravi]
, the authors introduce location distortion mechanisms to keep the user’s trajectory private, measuring the privacy by the mutual information between the true and released traces, under a constraint on the average distortion between the two. The true trajectory is assumed to form a Markov chain. Due to the computational complexity of historydependent mutual information optimization, authors propose bounds which take only the current and one step past locations into account. However, due to temporal correlations in the trajectory, the optimal distortion introduced at each time instance depends on the entire distortion and location history. Hence, the proposed bounds do not guarantee optimality.
In [AshishInfoTheoP], a smart metering system is considered assuming Markovian energy demands. Privacy is achieved by filtering the energy demand with the help of a rechargeable battery. Information theoretic privacy problem is formulated as a Markov decision process (MDP), and the minimum leakage is obtained numerically through dynamic programming (DP), while a singleletter expression is obtained for an independent and identically distributed (i.i.d) demand. This approach is extended to the scenario with a renewable energy source in [Giulio]. In [ParvMDP], privacycost tradeoff is examined with an RB. Due to Markovian demand and price processes, the problem is formulated as a partially observable MDP with beliefdependent rewards (POMDP), and solved by DP for infinitehorizon. In [ICASSP], the PUT is characterized numerically by DP for a special energy generation process.
In [Erdogdu_TimeSeriesISIT]
, PUT of timeseries data is considered in both online and offline setting. In the scenario, a user continuously releases data samples which are correlated with its private information, and in return obtains utility from a SP. The proposed schemes are cast as convex optimization problems and solved under hidden Markov model assumption. The simulation results are provided for binary timeseries data for a finite time horizon. However, the dimensions of the optimization problems in both schemes grow exponentially with time and the number of sample states. Therefore, in a setting when finegrained sensor data is considered for a long time horizon, computational complexity of the proposed schemes is very high.
IB Contributions
In this work, we consider the scenario in which the user measures timeseries data (e.g., location, heartbeat, temperature or energy consumption) generated by a firstorder Markov process through an IoT device, and periodically reports a distorted version of her true data to an untrusted SP to gain utility. We assume that the true data becomes available to the user in an online manner. We use the mutual information between the true and distorted data sequences as a measure of privacy loss, and measure the utility of the reported data by a specific distortion metric between the true and distorted samples. For the PUT, we introduce an online private data release policy (PDRP) that minimizes the mutual information while keeping the distortion below a certain threshold. We consider both instantaneous and average distortion constraints. We consider data release policies which take the entire released data history into account, and show its information theoretic optimality. To tackle the complexity, we exploit the Markovity of the user’s true data sequence, and recast the problem as a Markov decision process (MDP). After identifying the structure of the optimal policy, we use advantage actorcritic (A2C) deep reinforcement learning (RL) framework as a tool to evaluate our continuous state and action space MDP numerically. The performances of the proposed PDRPs are examined by considering a specific scenario, where the timeseries data is represented by the location traces generated by the user moving in a gridworld. For the average distortion constrained case, the proposed PDRP is compared with a myopic location data release mechanism [Ravi]. To the best of our knowledge, this is the first time deep RL tools are used to optimize information theoretic timeseries data privacy.
This paper extends the theoretical approach in our previous work on PUT for location sharing in [WIFS]. In this paper, our contributions are summarized as follows:

We propose a simplified PDRP by exploiting the Markov property of the user’s true data sequences. Then, we prove the information theoretic optimality of the simplified strategy.

As a novel approach, we recast the information theoretic timeseries data PUT problem as a Markov decision process and evaluate the MDP using advantage actorcritic deep RL.

We apply the optimal data release strategies into location trace privacy problem, and evaluate their performances for instantaneous and average distortion constraints numerically.
The remainder of the paper is organized as follows. We present the problem statement in Section II where we also introduce privacy and utility metrics. In Section III, we introduce simplified data release mechanisms for the timeseries data PUT problem. In Section IV, we reformulate the problem as an MDP and propose a numerical evaluation approach utilizing advantage actorcritic deep RL. In Section V, we apply the proposed solution to the location trace privacy problem, and compare the performance of the proposed location release strategy with a myopic policy numerically. Finally, we conclude our work in Section VI.
Ii Problem Statement
Notation  Definition 

Timeseries data set  
Timeseries data length  
Random variables representing the user’s true  
and distorted data at time  
Probability distribution of the true data at  
Markov transition of user data  
Markov transition matrix of transition probabilities  
Conditional probability distribution, (policy)  
Probability space of history dependent policies  
Probability space of simplified policies under firstorder  
and th order Markov assumptions 
We consider a timeseries , taking values from a finite discrete set . The user shares with an SP to gain utility through some online service. We assume that the user’s true data sequence follows a firstorder timehomogeneous Markov chain with transition probabilities , and initial probability distribution . While the firstorder Markov structure assumed for the true data may seem restrictive, we will show that our solution techniques generalize to higherorder Markov chains, albeit with increased complexity in the numerical solutions. In the literature, Markov structure is a common assumption for timeseries data, and it is proved to be a reasonable assumption for location trajectories [MarkovProofLocation], smart meter measurements [MarkovProofSM] and financial data [MarkovProofFinance] due to the history dependent behavior of these timeseries.
Instead of sharing its true data at time , the user shares a distorted version of her current data, denoted by . The released data at time , , does not depend on future data samples; i.e., for any , form a Markov chain, where we have denoted the sequence by , and the sequence by . The notations which have been used throughout the paper are listed in the Table I.
For a better understanding of the user’s private timeseries data generation process, a simple Markov chain with state space and state transition probabilities for are presented in Fig.1. The sensitive data takes the values according to the state transition probabilities. The user becomes aware of in an online manner and releases a distorted version , following her privacypreserving strategy.
Iia Privacy and Utility Measures
Mutual information can be written as the reduction in the uncertainty of a random variable (r.v.) due to the knowledge of another r.v., i.e., , where is the conditional entropy. In information theoretic timeseries data privacy framework, we assume the strongest model for the malicious third party. That is, both the user and the SP are assumed to have complete statistical knowledge of the user’s data as well as her data release mechanism; that is, the transition probabilities of the Markov chain generating the true data sequence and the potentially stochastic mechanism that generates depending on the history. Then, we quantify the privacy by the information leaked to the untrusted SP measured by the mutual information between the true and released data sequences. Accordingly, the information leakage of the user’s data release strategy for a time period is given by
(1) 
where the first equality follows from the chain rule of mutual information, while the second from the Markov chain
.Even though a malicious third party can obtain the statistics of the user’s data release strategy over an infinite time horizon, i.e., , he cannot infer the realizations of the private information due to the privacy measure based on uncertainty. Since information theoretic metrics are independent of the attack’s behavior and computational capabilities, they are preferable as privacy measures.
In the timeseries data privacy problem, we want to minimize the information leakage to the SP. However, as we apply more distortion to the true data sequence for privacy, the more utility is lost due to increased deviation from the original sequence. That is, releasing distorted data reduces the utility received from the SP, and the distortion applied by the user should be limited to a certain level. Therefore, our main purpose is to characterize the tradeoff between the privacy and utility. The distortion between the true data sample and the released version is measured by a distortion measure specified based on the underlying application (e.g., Manhattan distance or Euclidean distance), where .
Our main goal is to minimize the information leakage rate to the SP while satisfying the distortion constraint for utility. Throughout the paper, we consider two different constraints on the distortion introduced by PDRP, namely an instantaneous distortion constraint and an average distortion constraint. The infinitehorizon optimization problem can be written as:
(2) 
under the instantaneous distortion constraint , and as
(3) 
under the average distortion constraint , where and represent the realizations of and , is a conditional probability distribution which represents the user’s randomized data release policy at time . The randomness stems from both the Markov process generating the true data sequence, and the random release mechanism . The mutual information induced by policy is calculated using the joint probability distribution
(4) 
In the next section, we characterize the structure of the optimal data release policy, and using this structure we recast the problem as an MDP, and finally evaluate the optimal tradeoff numerically using advantage actorcritic deep RL.
Iii PUT for TimeSeries Data Sharing
In this section, we analyze the optimal PUT achievable by a privacyaware timeseries data release mechanism under the notion of mutual information minimization with both instantaneous and average distortion constraints. Moreover, we propose simplified PDRPs that still preserve optimality.
By the definition of mutual information, the objectives (2) and (3) depend on the entire history of and . Therefore, the user must follow a historydependent PDRP , where the feasible set consists of policies that satisfy . As a result of strong history dependence, computational complexity of the minimization problem increases exponentially with the length of the data sequence. To tackle this problem, we introduce a class of simplified policies, and prove that they do not cause any loss of optimality in the PUT.
Iiia Simplified PDRPs
In this section we introduce a set of policies of the form
, which samples the distorted data only by considering the true data in the last two time instances and the entire released data history. Hence, the joint distribution (
4) induced by , where can be written as(5) 
Next, we show that considering PDRPs in set is without loss of optimality.
Theorem 1.
In both minimization problems (2) and (3), there is no loss of optimality in restricting the PDRPs to the set of policies . Furthermore, information leakage induced by any can be written as:
(6)  
(7) 
and the average distortion induced by any can be written as:
(8)  
(9) 
where the first equation comes from the linearity of expectation.
Remark 1.
Although the proof of Theorem 1 assumes that the true data sequence is a firstorder Markov chain, it is possible to generalize it to higherorder Markov chains, i.e., for order . Let denote the set of policies
(10) 
Then the following theorem holds.
Theorem 2.
If the true data sequence is a Markov chain of order , then there is no loss of optimally in using a PDRP from the set . Moreover, information leakage induced by can be written as:
(11) 
and the average distortion induced by any can be written as:
(12) 
Then the simplified PDRP followed by the user is illustrated by the Markov chain in Fig. 2, where denotes the released data history, i.e., . That is, the user samples the distorted data, , at time following by considering the current and previous true data, , and the released data history, .
IiiB Online PDRP with an Instantaneous Distortion Constraint
As we have stated earlier, we are assuming that the utility gained by the user by sharing its private data diminishes as the distortion between the true data sequence and the released version increases, under the specified distortion measure. Therefore, the utility requirements of the user imposes distortion constraints on the PDPR. Here, we assume that the user would like to guarantee a minimum utility level at each time instant, which, in turn, imposes an instantaneous constraint on the distortion between the true data sample and the released version at each time instance, i.e., , .
Accordingly, given , the set of feasible simplified PDRPs satisfying an instantaneous distortion constraint is , and the set of the released data samples induced by is given by
(13) 
Furthermore, we require to satisfy
(14) 
The objective of the PUT for online PDRP with an instantaneous distortion constraints (PDRPIDC) can be rewritten as
(15) 
IiiC Online PDRP with an Average Distortion Constraint
Alternatively, the user may want to limit only the average distortion applied to the truedata sequence. That is, the utility loss averaged over the time horizon is denoted by . The feasible set of simplified PDRPs with an average distortion constraint is , and the feasible set of the released induced by is given by
(16) 
where the constraint follows from the linearity of expectation, i.e., , and the expectation is taken over the joint probabilities of and . Similarly to (13), is required to satisfy
(17) 
Hence, the objective of the problem for online PDRP with an average distortion constraint (PDRPADC) can be written as:
(18) 
Minimization of the mutual information subject to a distortion constraint can be converted into an unconstrained minimization problem using Lagrange multipliers. Since the distortion constraint induced by the simplified PDRP is memoryless, we can integrate it into the additive mutual information objective easily. Hence, the unconstrained minimization problem for timeseries data release PUT can be rewritten as
(19) 
where is the Lagrangian multiplier, and determines the operating point on the tradeoff curve, i.e., it represents where the gradients of the mutual information and the distortion constraint point in the same direction. When , the user releases data samples which only minimize the information leakage. On the other hand, as , the released data minimizes only distortion constraint rather than information leakage, which results in full information leakage.
In the following section, we present the MDP formulation of the problem for both PDRPs and the evaluation method utilized by advantage actorcritic RL.
Iv MDP Formulation
Markovity of the user’s true data sequence and the additive objective functions in both (15) and (19) allow us to represent the problem as an MDP with state . However, the information leakage at time depends on , resulting in a growing state space in time. Therefore, for a given policy and any realization of , we define a belief state as a probability distribution over the state space:
(20) 
This represents the SP’s belief on the true data sample at the beginning of time instance , i.e., after receiving the distorteddata . The actions are defined as probability distributions with which the user samples the released value at time and determined by the randomized PDRPs. The user’s action induced by a policy can be denoted by . At each time , the SP updates its belief on the true data sample , after observing its distorted version by
(21) 
We define the perstep information leakage of the user due to taking action at time as,
(22) 
The expectation of step sum of (22) over the joint probability is equal to the mutual information expression in the original problem (6). Therefore, given the belief and action probabilities, average information leakage at time can be formulated as,
(23) 
We can recast the PDRPIDC problem in (15) as a continuous state and action space MDP. The actions satisfying the instantaneous distortion constraint are denoted by and induced by the simplified PDRP . The solution of the MDP for PDRPIDC problem relies on minimizing the objective
(24) 
where is the average information leakage obtained by taking the actions , at each time step .
We remark that the representation of average distortion in terms of belief and action probabilities is straightforward due to its additive form. Similarly to (23), average distortion for PDRPADC at time can be written as,
(25) 
where there is no restriction on how the actions are chosen, i.e., . Hence, we can recast the PDRPADC problem in (19) as a continuous state and action space MDP with a perstep cost function given by
(26) 
Finding optimal policies for continuous state and action space MDPs is a PSPACEhard problem [PSPACEhard]. In practice, they can be solved by various finitestate MDP evaluation methods, e.g., value iteration, policy iteration and gradientbased methods. These are based on the discretization of the continuous belief states to obtain a finite state MDP [Tamas]
. While finer discretization of the belief reduces the loss from the optimal solution, it causes an increase in the state space; hence, in the complexity of the problem. To overcome the complexity limitation, we will employ a deep learning based method as a tool to numerically solve our continuous state and action space MDP problem.
Iva Advantage ActorCritic (A2C) Deep RL
In this section, we simply use and to represent the MDP cost and action pair of both PDRPIDC and PDRPADC, respectively. Integration of the solution into the instantaneous and average distortion constrained cases is straightforward.
In RL, an agent discovers the best action to take in a particular state by receiving instantaneous rewards/costs from the environment [SuttonBarto]. On the other hand, in our problem, we have the knowledge of the state transition probabilities and the cost for every stateaction pair without the need for interacting with the environment. We use A2Cdeep RL as a computational tool to numerically evaluate the optimal PDRP for our continuous state and action space MDP.
To integrate RL framework into our problem, we create an artificial environment which inputs the user’s current action, , samples an observation , and calculates the next state, , using Bayesian belief update (21). Instantaneous cost revealed by the environment is calculated by (26). The user receives the experience tuple from the environment, and refines her policy accordingly. Fig. 3 illustrates the interaction between the artificial environment and the user, which is represented by the RL agent. The corresponding Bellman equation induced by policy can be written as
(27) 
where is the statevalue function, is the updated belief state according to (21), represents action probability distributions, and is the costtogo function, i.e., the expected future cost induced by policy [Bertsekas].
RL methods can be divided into three groups: valuebased, policybased, and actorcritic [OnACalgs]
. Actorcritic methods combine the advantages of valuebased (criticonly) and policybased (actoronly) methods, such as low variance and continuous action producing capability. The actor represents the policy structure, while the critic estimates the value function
[SuttonBarto]. In our setting, we parameterize the value function by the parameter vector
as , and the stochastic policy by as . The difference between the right and the left hand side of (27) is called temporal difference (TD) error, which represents the error between the critic’s estimate and the target differing by onestep in time [SurveyACRL]. The TD error for the experience tuple is estimated as(28) 
where is called the TD target, and is a discount factor that we choose very close to to approximate the Bellman equation in (27) for our infinitehorizon average cost MDP. To implement RL in the infinitehorizon problem, we take sample averages over independent and finite data sequences, which are generated by experience tuples at each time via MonteCarlo rollouts.
Instead of using value functions in actor and critic updates, we use advantage function to reduce the variance in policy gradient methods. The advantage can be approximated by TD error. Hence, the critic is updated by gradient descent as:
(29) 
where is the critic loss and is the learning rate of the critic at time . The actor is updated similarly as,
(30) 
where is the actor loss and is the actor’s learning rate. This method is called advantage actorcritic RL.
In our A2Cdeep RL implementation, we represent the actor and critic mechanisms by fully connected feedforward deep neural networks (DNNs) with two hidden layers as illustrated in Fig.
(a)a. The critic DNN takes the current belief state of size as input, where is the true data sequence vector, and outputs the value of the belief state for the current action probabilities . The actor DNN also takes the current belief state as input, and outputs the parameters used for determining the action probabilities of the corresponding belief. Hence, the input/output sizes of the critic and actor DNNs are and , respectively. Here, the actor DNN output parameters are used to generate a Dirichlet distribution, which represents the action probabilities. The overall A2Cdeep RL algorithm for online PDRP is described in Algorithm 1. In the next section, we apply the proposed deep RL solution to a location trace privacy problem.V Application to Location Trace Privacy
In this section, we consider an application of the theoretical framework we have introduced to the location trace privacy problem. We focus on location trace as an example of timeseries data. In this scenario, the user shares a distorted version of her trajectory with the SP due to privacy concerns. An example for the user trajectory of length is illustrated in Fig. 9. While the user’s location at time is depicted with a grey circle, the true and released user trajectories over the next time steps are represented by black and grey arrows, respectively.
Va Numerical Results
In this section, we evaluate the PUT of the proposed PDRPADC and PDRPIDC methods numerically. We also compare the PDRPADC results with the myopic Markovian location release mechanism proposed in [Ravi]. For the simulation results presented in the following sections, we train two fully connected feedforward DNNs, representing the actor and critic networks, respectively, by utilizing ADAM optimizer [ADAM]. Both networks contain two hidden layers of sizes
with leakyReLU activation
[LeakyRELU]. We obtain the corresponding PUT by averaging the total information leakage for the specified distortion constraint over a time horizon of .VA1 PDRPIDC Results
We first consider a simple gridworld, where as in Fig. 9. The cells are numbered such that the first and the last rows of the gridworld are represented by and , respectively. The user’s trajectory forms a firstorder Markov chain with a transition probability matrix of size , whose index , , represents the transition probability from the state to . The user can start its movement at any square with equal probability, i.e., . Our goal is to obtain the PUT under instantaneous distortion constraints with Manhattan distance on the distortion measure between the true position and the reported one.
In Fig. 10, PUT curves are obtained for transition probability matrices , and , each corresponding to a different temporal correlation level. In all the cases, the user can move from any square to any other square in the grid at each step, i.e., , . While all the transition probabilities are equal to for , the probability of the user moving to a nearby square is greater than taking a larger step to a more distant one for and . Moreover, represents a more uniform trajectory, where the agent moves to equidistant cells with equal probability, while with the agent is more likely to follow a certain path, i.e., the random trajectory generated by has lower entropy. The transition probabilities for are given by:
(31) 
where is the Manhattan distance between positions and ; is a scalar which determines the probability of the user moving from one square to any of the equidistant squares in the next step. Fig. 11 is obtained by setting and , .
For , we set
(32) 
where, for , we have
where is the modulo operator which finds the remainder after division of by , and for , and . As a result, temporal correlations in the location history increase in the order , , .
We train our DNNs for a time horizon of in each episode, and over Monte Carlo rollouts. Fig. 10 shows that, information leakage increase in the order , , . As the temporal correlations between the locations on a trace increases, the proposed PDRPIDC leaks less information since it takes the entire released location history into account.
VA2 PDRPADC Results
Next, we consider the same scenario as before, but evaluate the PUT under an average distortion constraint. We evaluate the performance of the proposed PDRPADC and compare the results with the myopic Markovian location release mechanism proposed in [Ravi]. In [Ravi], an upper bound on the PUT is given by a myopic policy as follows:
(33) 
Exploiting the fact that (33) is similar to the ratedistortion function, BlahutArimoto algorithm is used in [Ravi] to minimize the conditional mutual information at each time step. Finitehorizon solution of the objective function (33) is obtained by applying alternating minimization sequentially. In our simulations, we obtained the average information leakage and distortion for this approach by normalizing for .
In Fig. 11, PUT curves of the proposed PDRPADC and the myopic location release mechanism are obtained for the same environment defined in Section VA1. The same transition matrices are used, i.e., , and represent increasing temporal correlations in the user’s trajectory. The Lagrangian multiplier denotes the user’s choice for the operating point on the PUT curve. Distortion is again measured by the Manhattan distance. Similarly to Section VA1, we train our DNNs for in each episode, and over Monte Carlo rollouts. Fig. 11 shows that, for the proposed PDRPADC obtained through deep RL leaks much less information than the myopic location release mechanism for the same distortion level, indicating the benefits of considering all the history when taking actions at each time instant. The gain is less for , since there is less temporal correlations in the location history compared to ; and hence, there is less to gain from considering all the history when taking actions. Finally, for
the proposed scheme and the myopic policy perform the same, since the user movement with uniform distribution does not have temporal memory; and therefore, taking the history into account does not help.
We next consider a small size toy example for PDRPADC to visualize the action selection and location release strategy for a better understanding. We consider a gridworld, where the user’s trajectory forms a firstorder Markov chain with the transition probability matrix , given in Table II. We assume that the user can start its movement at any square with equal probability, i.e., . The Lagrange multiplier is chosen as , and the distortion constraint is .
[width=5em]  

1  0.11  0.64  0.05  0.11  0.05  0.04 
2  0.1  0.1  0.6  0.05  0.1  0.05 
3  0.05  0.11  0.11  0.04  0.05  0.64 
4  0.11  0.05  0.04  0.11  0.64  0.05 
5  0.05  0.1  0.05  0.1  0.1  0.6 
6  0.04  0.05  0.11  0.05  0.11  0.64 
After training the actor and critic DNNs, we obtain the best action probabilities that minimize the objective function in (26). Given the user trajectory pattern in Table II, and , the action distribution matrix induced by PDRPADC is obtained as in Table III.
[width=7em]  

(1,1)  0.19  0.06  0.22  0.18  0.23  0.12 
(1,2)  0.21  0.19  0.28  0.09  0.06  0.17 
(1,3)  0.19  0.13  0.18  0.19  0.28  0.03 
(1,4)  0.3  0.24  0.17  0.07  0.07  0.15 
(1,5)  0.03  0.05  0.51  0.01  0.25  0.15 
(1,6)  0.22  0.14  0.13  0.16  0.21  0.14 
⋮  ⋮  ⋮  ⋮  ⋮  ⋮  ⋮ 
(6,1)  0.03  0.07  0.21  0.21  0.32  0.16 
(6,2)  0.18  0.13  0.35  0.1  0.16  0.08 
(6,3)  0.21  0.08  0.18  0.12  0.13  0.28 
(6,4)  0.18  0.05  0.19  0.36  0.14  0.08 
(6,5)  0.31  0.14  0.3  0.07  0.16  0.02 
(6,6)  0.09  0.29  0.21  0.16  0.01  0.24 
In Fig. 12, we indicate some samples of the true location trace which are generated by given in Table II, and their distorted versions which are released according to the best action distributions given in Table III. Fig. 12 shows that is not released according to a deterministic pattern, which can reveal the true location realizations to a malicious third party in long term. Although the released and true location samples are the same at some time instants, such as and , the third party would not know at which time instant there is going to be full leakage. Hence, the privacy measure based on uncertainty and stochastic behavior of the location release policy provide a certain level of privacy even in long term data release.
Vi Conclusions
We have studied the PUT of timeseries data using mutual information as a privacy measure. Having identified some properties of the optimal policy, we proposed information theoretically optimal online PDRPs under instantaneous and average distortion constraints, which represent utility constraints, and solved the PUT problem as an MDP. Due to continuous state and action spaces, it is challenging to characterize or even numerically compute the optimal policy. We overcome this difficulty by employing advantage actorcritic deep RL as a computational tool. Then, we applied the theoretical approach which we introduced for timeseries data privacy into the location trace privacy problem. Utilizing DNNs, we numerically evaluated the PUT curve of the proposed PDRPs under both instantaneous and average distortion constraints. We compared the results with the myopic location release policy introduced recently in [Ravi], and observed the effect of considering temporal correlations on information leakagedistortion performance. According to the simulation results, we have seen that the proposed data release policies provide significant privacy advantage, especially when the user trajectory has higher temporal correlations.
Appendix A Proof of Theorem 1
The proof of Theorem 1 relies on the following lemmas and will be presented later.
Lemma 1.
For any ,
Comments
There are no comments yet.