Privacy-Aware Time-Series Data Sharing with Deep Reinforcement Learning

by   Ecenaz Erdemir, et al.
Imperial College London

Internet of things (IoT) devices are becoming increasingly popular thanks to many new services and applications they offer. However, in addition to their many benefits, they raise privacy concerns since they share fine-grained time-series user data with untrusted third parties. In this work, we study the privacy-utility trade-off (PUT) in time-series data sharing. Existing approaches to PUT mainly focus on a single data point; however, temporal correlations in time-series data introduce new challenges. Methods that preserve the privacy for the current time may leak significant amount of information at the trace level as the adversary can exploit temporal correlations in a trace. We consider sharing the distorted version of a user's true data sequence with an untrusted third party. We measure the privacy leakage by the mutual information between the user's true data sequence and shared version. We consider both instantaneous and average distortion between the two sequences, under a given distortion measure, as the utility loss metric. To tackle the history-dependent mutual information minimization, we reformulate the problem as a Markov decision process (MDP), and solve it using asynchronous actor-critic deep reinforcement learning (RL). We apply our optimal data release policies to location trace privacy scenario, and evaluate the performance of the proposed policy numerically.



There are no comments yet.


page 1


Privacy-Aware Location Sharing with Deep Reinforcement Learning

Location based mobile applications have become widely popular. Despite t...

Active Privacy-utility Trade-off Against a Hypothesis Testing Adversary

We consider a user releasing her data containing some personal informati...

Data Privacy and Utility Trade-Off Based on Mutual Information Neural Estimator

In the era of big data and the Internet of Things (IoT), data owners nee...

Deep Directed Information-Based Learning for Privacy-Preserving Smart Meter Data Release

The explosion of data collection has raised serious privacy concerns in ...

Privacy-preserving Data Analysis through Representation Learning and Transformation

The abundance of data from the sensors embedded in mobile and Internet o...

Composition Properties of Inferential Privacy for Time-Series Data

With the proliferation of mobile devices and the internet of things, dev...

Asymptotic Privacy Loss due to Time Series Matching of Dependent Users

The Internet of Things (IoT) promises to improve user utility by tuning ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Recent advances in Internet of things (IoT) devices have increased the variety of services they provide, such as health monitoring, financial analysis, weather analysis, location-based services (LBSs) and smart metering. Moreover, the integration of some IoT devices with social networks has encouraged the users to share their personal data to obtain useful information from these social platforms. While the users can receive hotel, restaurant and product recommendations from Facebook, Twitter or YouTube when they share their location information, they can also benefit from the personalized dietary tips as a result of sharing their Fitbit activity. However, fine-grained time-series data collected by IoT devices contain sensitive confidential information about the user. Account balance, biomedical measurements, location trace, weather forecast and smart meter readings are typical examples of time-series data which carry sensitive personal information. For instance, a malicious third party can derive an individual’s frequently visited destinations, financial situation or social relationships using the shared location information[LPsurvey]. Using non-intrusive load monitoring techniques on smart meter data, an eavesdropper can deduce the user’s presence at home, disabilities and even political views due to the TV channel the user is watching [GiulioSPmag]. Besides all, the most sensitive private information, such as patient history, chronic diseases and psychological state, can be revealed by health monitoring systems [ECG, Health]. Therefore, time-series data privacy has been an important concern, and there is an increasing pressure from consumers to keep their data traces private against malicious attackers or untrusted service providers (SPs), while preserving the utility obtained from these IoT services. Our goal in this paper is to study the fundamental privacy-utility trade-off (PUT) when sharing sensitive time-series data.

I-a Related Work

Time-series data privacy and its applications to various domains have been extensively studied [Timeseries, Health_kanon1, Health_kanon2, Health_DifP, Crypto, InfoTheo_annon, InfoTheo_anon_obfus, Shokri_kanony, Shokri_single, InfoTheo_single, SM_DifP1, SM_DifP2, GiulioRED_ESD, Ravi, Trace1, Trace2, Trace3, WIFS, Giulio, ICASSP, BizBook]. A large body of research has focused on protecting the privacy of a single data point, e.g., the current sensitive measurement [Shokri_kanony, Shokri_single, InfoTheo_single, SM_DifP1, SM_DifP2, GiulioRED_ESD]. However, the temporal relations in time-series data requires going beyond single data point privacy. Individual measurements taken at each time instance, such as electrocardiogram (ECG), body temperature, location, account balance and smart meter readings, are highly correlated and the strategies focusing on the current data privacy might reveal sensitive information about the past or future measurements.

Differential privacy (DP), k-anonymity and information theoretic metrics are commonly used as privacy measures [Timeseries, Health_kanon1, Health_kanon2, Health_DifP, Crypto, Shokri_kanony, Shokri_single, InfoTheo_single, InfoTheo_annon, InfoTheo_anon_obfus, SM_DifP1, SM_DifP2, GiulioRED_ESD, Ravi, Trace1, Trace2, Trace3, WIFS, ICASSP, Giulio, BizBook]. By definition, DP prevents the SP from inferring the current sensitive data of the user, even if the SP has the knowledge of all the remaining private data points. K-anonymity ensures that a sensitive data is indistinguishable from at least other data points. However, DP and k-anonymity are meant to ensure the privacy of a single data point in time. In [ShokriQuantify], it is stated that these are not appropriate measures for location trace privacy since temporal correlations are not taken into account.

Several papers on DP and k-anonymity consider temporal correlations. In [Health_DifP], physiological measurements are obfuscated before reporting to a utility provider for PUT. Instead of the entire time-series history, a selected temporal section of the sensor data is considered, and solved by using dynamic program and greedy algorithm. The work in [InfoTheo_annon] focuses on keeping the user identity private in a location privacy setting by performing random permutation on a set of multiple users. However, the users might still be re-identified when attackers have access to auxiliary information. In [InfoTheo_anon_obfus], authors improve this approach by considering both user identity and location privacy and merging anonymization with obfuscation. However, the risk of re-identification of the user by the adversary still exists and privacy gain by obfuscation depends highly on the number of users. In [SM_DifP2], DP in a smart meter with a rechargeable battery is achieved by adding noise to the meter readings before reporting to a utility provider. In order to guarantee DP, the perturbation must be independent of the battery state of charge. However, for a finite capacity battery, the energy management system cannot provide the amount of noise required for preserving privacy.

On the other hand, information-theoretic privacy considers the statistics of the entire time-series in terms of temporal correlations, and study privacy mechanisms that allow arbitrary stochastic transformations of data samples, rather than being limited to addition of noise of a specific form. In [Ravi]

, the authors introduce location distortion mechanisms to keep the user’s trajectory private, measuring the privacy by the mutual information between the true and released traces, under a constraint on the average distortion between the two. The true trajectory is assumed to form a Markov chain. Due to the computational complexity of history-dependent mutual information optimization, authors propose bounds which take only the current and one step past locations into account. However, due to temporal correlations in the trajectory, the optimal distortion introduced at each time instance depends on the entire distortion and location history. Hence, the proposed bounds do not guarantee optimality.

In [AshishInfoTheoP], a smart metering system is considered assuming Markovian energy demands. Privacy is achieved by filtering the energy demand with the help of a rechargeable battery. Information theoretic privacy problem is formulated as a Markov decision process (MDP), and the minimum leakage is obtained numerically through dynamic programming (DP), while a single-letter expression is obtained for an independent and identically distributed (i.i.d) demand. This approach is extended to the scenario with a renewable energy source in [Giulio]. In [ParvMDP], privacy-cost trade-off is examined with an RB. Due to Markovian demand and price processes, the problem is formulated as a partially observable MDP with belief-dependent rewards (-POMDP), and solved by DP for infinite-horizon. In [ICASSP], the PUT is characterized numerically by DP for a special energy generation process.

In [Erdogdu_TimeSeriesISIT]

, PUT of time-series data is considered in both online and offline setting. In the scenario, a user continuously releases data samples which are correlated with its private information, and in return obtains utility from a SP. The proposed schemes are cast as convex optimization problems and solved under hidden Markov model assumption. The simulation results are provided for binary time-series data for a finite time horizon. However, the dimensions of the optimization problems in both schemes grow exponentially with time and the number of sample states. Therefore, in a setting when fine-grained sensor data is considered for a long time horizon, computational complexity of the proposed schemes is very high.

I-B Contributions

In this work, we consider the scenario in which the user measures time-series data (e.g., location, heartbeat, temperature or energy consumption) generated by a first-order Markov process through an IoT device, and periodically reports a distorted version of her true data to an untrusted SP to gain utility. We assume that the true data becomes available to the user in an online manner. We use the mutual information between the true and distorted data sequences as a measure of privacy loss, and measure the utility of the reported data by a specific distortion metric between the true and distorted samples. For the PUT, we introduce an online private data release policy (PDRP) that minimizes the mutual information while keeping the distortion below a certain threshold. We consider both instantaneous and average distortion constraints. We consider data release policies which take the entire released data history into account, and show its information theoretic optimality. To tackle the complexity, we exploit the Markovity of the user’s true data sequence, and recast the problem as a Markov decision process (MDP). After identifying the structure of the optimal policy, we use advantage actor-critic (A2C) deep reinforcement learning (RL) framework as a tool to evaluate our continuous state and action space MDP numerically. The performances of the proposed PDRPs are examined by considering a specific scenario, where the time-series data is represented by the location traces generated by the user moving in a grid-world. For the average distortion constrained case, the proposed PDRP is compared with a myopic location data release mechanism [Ravi]. To the best of our knowledge, this is the first time deep RL tools are used to optimize information theoretic time-series data privacy.

This paper extends the theoretical approach in our previous work on PUT for location sharing in [WIFS]. In this paper, our contributions are summarized as follows:

  • We propose a simplified PDRP by exploiting the Markov property of the user’s true data sequences. Then, we prove the information theoretic optimality of the simplified strategy.

  • As a novel approach, we recast the information theoretic time-series data PUT problem as a Markov decision process and evaluate the MDP using advantage actor-critic deep RL.

  • We apply the optimal data release strategies into location trace privacy problem, and evaluate their performances for instantaneous and average distortion constraints numerically.

The remainder of the paper is organized as follows. We present the problem statement in Section II where we also introduce privacy and utility metrics. In Section III, we introduce simplified data release mechanisms for the time-series data PUT problem. In Section IV, we reformulate the problem as an MDP and propose a numerical evaluation approach utilizing advantage actor-critic deep RL. In Section V, we apply the proposed solution to the location trace privacy problem, and compare the performance of the proposed location release strategy with a myopic policy numerically. Finally, we conclude our work in Section VI.

Ii Problem Statement

Notation Definition
Time-series data set
Time-series data length
Random variables representing the user’s true
and distorted data at time
Probability distribution of the true data at
Markov transition of user data
Markov transition matrix of transition probabilities
Conditional probability distribution, (policy)
Probability space of history dependent policies
Probability space of simplified policies under first-order
and -th order Markov assumptions
TABLE I: Notation summary

We consider a time-series , taking values from a finite discrete set . The user shares with an SP to gain utility through some online service. We assume that the user’s true data sequence follows a first-order time-homogeneous Markov chain with transition probabilities , and initial probability distribution . While the first-order Markov structure assumed for the true data may seem restrictive, we will show that our solution techniques generalize to higher-order Markov chains, albeit with increased complexity in the numerical solutions. In the literature, Markov structure is a common assumption for time-series data, and it is proved to be a reasonable assumption for location trajectories [MarkovProofLocation], smart meter measurements [MarkovProofSM] and financial data [MarkovProofFinance] due to the history dependent behavior of these time-series.

Instead of sharing its true data at time , the user shares a distorted version of her current data, denoted by . The released data at time , , does not depend on future data samples; i.e., for any , form a Markov chain, where we have denoted the sequence by , and the sequence by . The notations which have been used throughout the paper are listed in the Table I.

Fig. 1: Markov chain example for the true data generation.

For a better understanding of the user’s private time-series data generation process, a simple Markov chain with state space and state transition probabilities for are presented in Fig.1. The sensitive data takes the values according to the state transition probabilities. The user becomes aware of in an online manner and releases a distorted version , following her privacy-preserving strategy.

Ii-a Privacy and Utility Measures

Mutual information can be written as the reduction in the uncertainty of a random variable (r.v.) due to the knowledge of another r.v., i.e., , where is the conditional entropy. In information theoretic time-series data privacy framework, we assume the strongest model for the malicious third party. That is, both the user and the SP are assumed to have complete statistical knowledge of the user’s data as well as her data release mechanism; that is, the transition probabilities of the Markov chain generating the true data sequence and the potentially stochastic mechanism that generates depending on the history. Then, we quantify the privacy by the information leaked to the untrusted SP measured by the mutual information between the true and released data sequences. Accordingly, the information leakage of the user’s data release strategy for a time period is given by


where the first equality follows from the chain rule of mutual information, while the second from the Markov chain


Even though a malicious third party can obtain the statistics of the user’s data release strategy over an infinite time horizon, i.e., , he cannot infer the realizations of the private information due to the privacy measure based on uncertainty. Since information theoretic metrics are independent of the attack’s behavior and computational capabilities, they are preferable as privacy measures.

In the time-series data privacy problem, we want to minimize the information leakage to the SP. However, as we apply more distortion to the true data sequence for privacy, the more utility is lost due to increased deviation from the original sequence. That is, releasing distorted data reduces the utility received from the SP, and the distortion applied by the user should be limited to a certain level. Therefore, our main purpose is to characterize the trade-off between the privacy and utility. The distortion between the true data sample and the released version is measured by a distortion measure specified based on the underlying application (e.g., Manhattan distance or Euclidean distance), where .

Our main goal is to minimize the information leakage rate to the SP while satisfying the distortion constraint for utility. Throughout the paper, we consider two different constraints on the distortion introduced by PDRP, namely an instantaneous distortion constraint and an average distortion constraint. The infinite-horizon optimization problem can be written as:


under the instantaneous distortion constraint , and as


under the average distortion constraint , where and represent the realizations of and , is a conditional probability distribution which represents the user’s randomized data release policy at time . The randomness stems from both the Markov process generating the true data sequence, and the random release mechanism . The mutual information induced by policy is calculated using the joint probability distribution


In the next section, we characterize the structure of the optimal data release policy, and using this structure we recast the problem as an MDP, and finally evaluate the optimal trade-off numerically using advantage actor-critic deep RL.

Iii PUT for Time-Series Data Sharing

In this section, we analyze the optimal PUT achievable by a privacy-aware time-series data release mechanism under the notion of mutual information minimization with both instantaneous and average distortion constraints. Moreover, we propose simplified PDRPs that still preserve optimality.

By the definition of mutual information, the objectives (2) and (3) depend on the entire history of and . Therefore, the user must follow a history-dependent PDRP , where the feasible set consists of policies that satisfy . As a result of strong history dependence, computational complexity of the minimization problem increases exponentially with the length of the data sequence. To tackle this problem, we introduce a class of simplified policies, and prove that they do not cause any loss of optimality in the PUT.

Iii-a Simplified PDRPs

In this section we introduce a set of policies of the form

, which samples the distorted data only by considering the true data in the last two time instances and the entire released data history. Hence, the joint distribution (

4) induced by , where can be written as


Next, we show that considering PDRPs in set is without loss of optimality.

Theorem 1.

In both minimization problems (2) and (3), there is no loss of optimality in restricting the PDRPs to the set of policies . Furthermore, information leakage induced by any can be written as:


and the average distortion induced by any can be written as:


where the first equation comes from the linearity of expectation.

See Appendix A for the proof of Theorem 1.

Remark 1.

Although the proof of Theorem 1 assumes that the true data sequence is a first-order Markov chain, it is possible to generalize it to higher-order Markov chains, i.e., for order . Let denote the set of policies


Then the following theorem holds.

Theorem 2.

If the true data sequence is a Markov chain of order , then there is no loss of optimally in using a PDRP from the set . Moreover, information leakage induced by can be written as:


and the average distortion induced by any can be written as:

Fig. 2: Markov chain induced by the simplified PDRP.

Then the simplified PDRP followed by the user is illustrated by the Markov chain in Fig. 2, where denotes the released data history, i.e., . That is, the user samples the distorted data, , at time following by considering the current and previous true data, , and the released data history, .

Iii-B Online PDRP with an Instantaneous Distortion Constraint

As we have stated earlier, we are assuming that the utility gained by the user by sharing its private data diminishes as the distortion between the true data sequence and the released version increases, under the specified distortion measure. Therefore, the utility requirements of the user imposes distortion constraints on the PDPR. Here, we assume that the user would like to guarantee a minimum utility level at each time instant, which, in turn, imposes an instantaneous constraint on the distortion between the true data sample and the released version at each time instance, i.e., , .

Accordingly, given , the set of feasible simplified PDRPs satisfying an instantaneous distortion constraint is , and the set of the released data samples induced by is given by


Furthermore, we require to satisfy


The objective of the PUT for online PDRP with an instantaneous distortion constraints (PDRP-IDC) can be rewritten as


Iii-C Online PDRP with an Average Distortion Constraint

Alternatively, the user may want to limit only the average distortion applied to the true-data sequence. That is, the utility loss averaged over the time horizon is denoted by . The feasible set of simplified PDRPs with an average distortion constraint is , and the feasible set of the released induced by is given by


where the constraint follows from the linearity of expectation, i.e., , and the expectation is taken over the joint probabilities of and . Similarly to (13), is required to satisfy


Hence, the objective of the problem for online PDRP with an average distortion constraint (PDRP-ADC) can be written as:


Minimization of the mutual information subject to a distortion constraint can be converted into an unconstrained minimization problem using Lagrange multipliers. Since the distortion constraint induced by the simplified PDRP is memoryless, we can integrate it into the additive mutual information objective easily. Hence, the unconstrained minimization problem for time-series data release PUT can be rewritten as


where is the Lagrangian multiplier, and determines the operating point on the trade-off curve, i.e., it represents where the gradients of the mutual information and the distortion constraint point in the same direction. When , the user releases data samples which only minimize the information leakage. On the other hand, as , the released data minimizes only distortion constraint rather than information leakage, which results in full information leakage.

In the following section, we present the MDP formulation of the problem for both PDRPs and the evaluation method utilized by advantage actor-critic RL.

Iv MDP Formulation

Markovity of the user’s true data sequence and the additive objective functions in both (15) and (19) allow us to represent the problem as an MDP with state . However, the information leakage at time depends on , resulting in a growing state space in time. Therefore, for a given policy and any realization of , we define a belief state as a probability distribution over the state space:


This represents the SP’s belief on the true data sample at the beginning of time instance , i.e., after receiving the distorted-data . The actions are defined as probability distributions with which the user samples the released value at time and determined by the randomized PDRPs. The user’s action induced by a policy can be denoted by . At each time , the SP updates its belief on the true data sample , after observing its distorted version by


We define the per-step information leakage of the user due to taking action at time as,


The expectation of -step sum of (22) over the joint probability is equal to the mutual information expression in the original problem (6). Therefore, given the belief and action probabilities, average information leakage at time can be formulated as,


We can recast the PDRP-IDC problem in (15) as a continuous state and action space MDP. The actions satisfying the instantaneous distortion constraint are denoted by and induced by the simplified PDRP . The solution of the MDP for PDRP-IDC problem relies on minimizing the objective


where is the average information leakage obtained by taking the actions , at each time step .

We remark that the representation of average distortion in terms of belief and action probabilities is straightforward due to its additive form. Similarly to (23), average distortion for PDRP-ADC at time can be written as,


where there is no restriction on how the actions are chosen, i.e., . Hence, we can recast the PDRP-ADC problem in (19) as a continuous state and action space MDP with a per-step cost function given by


Finding optimal policies for continuous state and action space MDPs is a PSPACE-hard problem [PSPACEhard]. In practice, they can be solved by various finite-state MDP evaluation methods, e.g., value iteration, policy iteration and gradient-based methods. These are based on the discretization of the continuous belief states to obtain a finite state MDP [Tamas]

. While finer discretization of the belief reduces the loss from the optimal solution, it causes an increase in the state space; hence, in the complexity of the problem. To overcome the complexity limitation, we will employ a deep learning based method as a tool to numerically solve our continuous state and action space MDP problem.

Iv-a Advantage Actor-Critic (A2C) Deep RL

Fig. 3: RL for a known model.

In this section, we simply use and to represent the MDP cost and action pair of both PDRP-IDC and PDRP-ADC, respectively. Integration of the solution into the instantaneous and average distortion constrained cases is straightforward.

In RL, an agent discovers the best action to take in a particular state by receiving instantaneous rewards/costs from the environment [SuttonBarto]. On the other hand, in our problem, we have the knowledge of the state transition probabilities and the cost for every state-action pair without the need for interacting with the environment. We use A2C-deep RL as a computational tool to numerically evaluate the optimal PDRP for our continuous state and action space MDP.

To integrate RL framework into our problem, we create an artificial environment which inputs the user’s current action, , samples an observation , and calculates the next state, , using Bayesian belief update (21). Instantaneous cost revealed by the environment is calculated by (26). The user receives the experience tuple from the environment, and refines her policy accordingly. Fig. 3 illustrates the interaction between the artificial environment and the user, which is represented by the RL agent. The corresponding Bellman equation induced by policy can be written as


where is the state-value function, is the updated belief state according to (21), represents action probability distributions, and is the cost-to-go function, i.e., the expected future cost induced by policy [Bertsekas].

Fig. 8: Critic (a) and actor (b) DNN structures.
Critic (a) and actor (b) DNN structures.
Fig. 5: *
Fig. 7: *

RL methods can be divided into three groups: value-based, policy-based, and actor-critic [OnACalgs]

. Actor-critic methods combine the advantages of value-based (critic-only) and policy-based (actor-only) methods, such as low variance and continuous action producing capability. The actor represents the policy structure, while the critic estimates the value function


. In our setting, we parameterize the value function by the parameter vector

as , and the stochastic policy by as . The difference between the right and the left hand side of (27) is called temporal difference (TD) error, which represents the error between the critic’s estimate and the target differing by one-step in time [SurveyACRL]. The TD error for the experience tuple is estimated as


where is called the TD target, and is a discount factor that we choose very close to to approximate the Bellman equation in (27) for our infinite-horizon average cost MDP. To implement RL in the infinite-horizon problem, we take sample averages over independent and finite data sequences, which are generated by experience tuples at each time via Monte-Carlo roll-outs.

Instead of using value functions in actor and critic updates, we use advantage function to reduce the variance in policy gradient methods. The advantage can be approximated by TD error. Hence, the critic is updated by gradient descent as:


where is the critic loss and is the learning rate of the critic at time . The actor is updated similarly as,


where is the actor loss and is the actor’s learning rate. This method is called advantage actor-critic RL.

Initialize DNNs with random weights and
Initialize environment
for episode= do
       Initialize belief state ;
       for  do
             Sample action probability vector  according to the current policy;
             Perform action and calculate cost in ;
             Sample an observation and calculate next belief state in ;
             Set TD target ;
             Minimize the loss ;
             Update the critic ;
             Minimize the loss ;
             Update actor ;
             Update belief state
       end for
end for
Algorithm 1 A2C-deep RL algorithm for online PDRP

In our A2C-deep RL implementation, we represent the actor and critic mechanisms by fully connected feed-forward deep neural networks (DNNs) with two hidden layers as illustrated in Fig.

(a)a. The critic DNN takes the current belief state of size as input, where is the true data sequence vector, and outputs the value of the belief state for the current action probabilities . The actor DNN also takes the current belief state as input, and outputs the parameters used for determining the action probabilities of the corresponding belief. Hence, the input/output sizes of the critic and actor DNNs are and , respectively. Here, the actor DNN output parameters are used to generate a Dirichlet distribution, which represents the action probabilities. The overall A2C-deep RL algorithm for online PDRP is described in Algorithm 1. In the next section, we apply the proposed deep RL solution to a location trace privacy problem.

V Application to Location Trace Privacy

In this section, we consider an application of the theoretical framework we have introduced to the location trace privacy problem. We focus on location trace as an example of time-series data. In this scenario, the user shares a distorted version of her trajectory with the SP due to privacy concerns. An example for the user trajectory of length is illustrated in Fig. 9. While the user’s location at time is depicted with a grey circle, the true and released user trajectories over the next time steps are represented by black and grey arrows, respectively.

V-a Numerical Results

In this section, we evaluate the PUT of the proposed PDRP-ADC and PDRP-IDC methods numerically. We also compare the PDRP-ADC results with the myopic Markovian location release mechanism proposed in [Ravi]. For the simulation results presented in the following sections, we train two fully connected feed-forward DNNs, representing the actor and critic networks, respectively, by utilizing ADAM optimizer [ADAM]. Both networks contain two hidden layers of sizes

with leaky-ReLU activation

[LeakyRELU]. We obtain the corresponding PUT by averaging the total information leakage for the specified distortion constraint over a time horizon of .

V-A1 PDRP-IDC Results

We first consider a simple grid-world, where as in Fig. 9. The cells are numbered such that the first and the last rows of the grid-world are represented by and , respectively. The user’s trajectory forms a first-order Markov chain with a transition probability matrix of size , whose index , , represents the transition probability from the state to . The user can start its movement at any square with equal probability, i.e., . Our goal is to obtain the PUT under instantaneous distortion constraints with Manhattan distance on the distortion measure between the true position and the reported one.

Fig. 9: True and released user trajectory example for .
Fig. 10: Average information leakage as a function of the allowed instantaneous distortion under Manhattan distance as the distortion measure.

In Fig. 10, PUT curves are obtained for transition probability matrices , and , each corresponding to a different temporal correlation level. In all the cases, the user can move from any square to any other square in the grid at each step, i.e., , . While all the transition probabilities are equal to for , the probability of the user moving to a nearby square is greater than taking a larger step to a more distant one for and . Moreover, represents a more uniform trajectory, where the agent moves to equidistant cells with equal probability, while with the agent is more likely to follow a certain path, i.e., the random trajectory generated by has lower entropy. The transition probabilities for are given by:


where is the Manhattan distance between positions and ; is a scalar which determines the probability of the user moving from one square to any of the equidistant squares in the next step. Fig. 11 is obtained by setting and , .

For , we set


where, for , we have

where is the modulo operator which finds the remainder after division of by , and for , and . As a result, temporal correlations in the location history increase in the order , , .

We train our DNNs for a time horizon of in each episode, and over Monte Carlo roll-outs. Fig. 10 shows that, information leakage increase in the order , , . As the temporal correlations between the locations on a trace increases, the proposed PDRP-IDC leaks less information since it takes the entire released location history into account.

V-A2 PDRP-ADC Results

Fig. 11: Average information leakage as a function of the allowed average distortion under Manhattan distance as the distortion measure.
Fig. 12: True and released location traces generated using and in Tables II and III for .

Next, we consider the same scenario as before, but evaluate the PUT under an average distortion constraint. We evaluate the performance of the proposed PDRP-ADC and compare the results with the myopic Markovian location release mechanism proposed in [Ravi]. In [Ravi], an upper bound on the PUT is given by a myopic policy as follows:


Exploiting the fact that (33) is similar to the rate-distortion function, Blahut-Arimoto algorithm is used in [Ravi] to minimize the conditional mutual information at each time step. Finite-horizon solution of the objective function (33) is obtained by applying alternating minimization sequentially. In our simulations, we obtained the average information leakage and distortion for this approach by normalizing for .

In Fig. 11, PUT curves of the proposed PDRP-ADC and the myopic location release mechanism are obtained for the same environment defined in Section V-A1. The same transition matrices are used, i.e., , and represent increasing temporal correlations in the user’s trajectory. The Lagrangian multiplier denotes the user’s choice for the operating point on the PUT curve. Distortion is again measured by the Manhattan distance. Similarly to Section V-A1, we train our DNNs for in each episode, and over Monte Carlo roll-outs. Fig. 11 shows that, for the proposed PDRP-ADC obtained through deep RL leaks much less information than the myopic location release mechanism for the same distortion level, indicating the benefits of considering all the history when taking actions at each time instant. The gain is less for , since there is less temporal correlations in the location history compared to ; and hence, there is less to gain from considering all the history when taking actions. Finally, for

the proposed scheme and the myopic policy perform the same, since the user movement with uniform distribution does not have temporal memory; and therefore, taking the history into account does not help.

We next consider a small size toy example for PDRP-ADC to visualize the action selection and location release strategy for a better understanding. We consider a grid-world, where the user’s trajectory forms a first-order Markov chain with the transition probability matrix , given in Table II. We assume that the user can start its movement at any square with equal probability, i.e., . The Lagrange multiplier is chosen as , and the distortion constraint is .

1 0.11 0.64 0.05 0.11 0.05 0.04
2 0.1 0.1 0.6 0.05 0.1 0.05
3 0.05 0.11 0.11 0.04 0.05 0.64
4 0.11 0.05 0.04 0.11 0.64 0.05
5 0.05 0.1 0.05 0.1 0.1 0.6
6 0.04 0.05 0.11 0.05 0.11 0.64
TABLE II: The transition probability matrix of the toy example for PDRP-ADC, when .

After training the actor and critic DNNs, we obtain the best action probabilities that minimize the objective function in (26). Given the user trajectory pattern in Table II, and , the action distribution matrix induced by PDRP-ADC is obtained as in Table III.

(1,1) 0.19 0.06 0.22 0.18 0.23 0.12
(1,2) 0.21 0.19 0.28 0.09 0.06 0.17
(1,3) 0.19 0.13 0.18 0.19 0.28 0.03
(1,4) 0.3 0.24 0.17 0.07 0.07 0.15
(1,5) 0.03 0.05 0.51 0.01 0.25 0.15
(1,6) 0.22 0.14 0.13 0.16 0.21 0.14
(6,1) 0.03 0.07 0.21 0.21 0.32 0.16
(6,2) 0.18 0.13 0.35 0.1 0.16 0.08
(6,3) 0.21 0.08 0.18 0.12 0.13 0.28
(6,4) 0.18 0.05 0.19 0.36 0.14 0.08
(6,5) 0.31 0.14 0.3 0.07 0.16 0.02
(6,6) 0.09 0.29 0.21 0.16 0.01 0.24
TABLE III: Best action probabilities for in Table II, and .

In Fig. 12, we indicate some samples of the true location trace which are generated by given in Table II, and their distorted versions which are released according to the best action distributions given in Table III. Fig. 12 shows that is not released according to a deterministic pattern, which can reveal the true location realizations to a malicious third party in long term. Although the released and true location samples are the same at some time instants, such as and , the third party would not know at which time instant there is going to be full leakage. Hence, the privacy measure based on uncertainty and stochastic behavior of the location release policy provide a certain level of privacy even in long term data release.

Vi Conclusions

We have studied the PUT of time-series data using mutual information as a privacy measure. Having identified some properties of the optimal policy, we proposed information theoretically optimal online PDRPs under instantaneous and average distortion constraints, which represent utility constraints, and solved the PUT problem as an MDP. Due to continuous state and action spaces, it is challenging to characterize or even numerically compute the optimal policy. We overcome this difficulty by employing advantage actor-critic deep RL as a computational tool. Then, we applied the theoretical approach which we introduced for time-series data privacy into the location trace privacy problem. Utilizing DNNs, we numerically evaluated the PUT curve of the proposed PDRPs under both instantaneous and average distortion constraints. We compared the results with the myopic location release policy introduced recently in [Ravi], and observed the effect of considering temporal correlations on information leakage-distortion performance. According to the simulation results, we have seen that the proposed data release policies provide significant privacy advantage, especially when the user trajectory has higher temporal correlations.

Appendix A Proof of Theorem 1

The proof of Theorem 1 relies on the following lemmas and will be presented later.

Lemma 1.

For any ,