Privacy-preserving release of mobility data: a clean-slate approach

by   Szilvia Lestyan, et al.
CrySyS Lab

The quantity of mobility data is overwhelming nowadays providing tremendous potential for various value-added services. While the benefits of these mobility datasets are apparent, they also provide significant threat to location privacy. Although a multitude of anonymization schemes have been proposed to release location data, they all suffer from the inherent sparseness and high-dimensionality of location trajectories which render most techniques inapplicable in practice. In this paper, we revisit the problem of releasing location trajectories with strong privacy guarantees. We propose a general approach to synthesize location trajectories meanwhile providing differential privacy. We model the generator distribution of the dataset by first constructing a model to generate the source and destination location of trajectories along with time information, and then compute all transition probabilities between close locations given the destination of the synthetic trajectory. Finally, an optimization algorithm is used to find the most probable trajectory between the given source and destination at a given time using the computed transition probabilities. We exploit several inherent properties of location data to boost the performance of our model, and demonstrate its usability on a public location dataset. We also develop a novel composite of generative neural network to synthesize location trajectories which might be of independent interest.



There are no comments yet.


page 1

page 2

page 3

page 4


Adaptive Differential Privacy Mechanism for Aggregated Mobility Dataset

Location data is collected from users continuously to acquire user mobil...

Privacy metrics for trajectory data based on k-anonymity, l-diversity and t-closeness

Mobility patterns of vehicles and people provide powerful data sources f...

Generative Models for Simulating Mobility Trajectories

Mobility datasets are fundamental for evaluating algorithms pertaining t...

Customizable and Rigorous Location Privacy through Policy Graph

Location privacy has been extensively studied in the literature. However...

Infostop: Scalable stop-location detection in multi-user mobility data

Data-driven research in mobility has prospered in recent years, providin...

Customer Segmentation of Wireless Trajectory Data

Wireless trajectory data consists of a number of (time, point) entries w...

Modeling Taxi Drivers' Behaviour for the Next Destination Prediction

Taxi destination prediction is a very important task for optimizing the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Analyzing human mobility patterns has been in the focus of researchers in the last decades. Besides the fundamental academic curiosity to better understand human behavior(gonzalez2008understanding, ), mining mobility data enables us to design livable cities, buildings (biczok2014navigating, ) and intelligent transportation systems(intelligent_trans, ), perform spatial resource optimization, and implement location-based services (kupper2005location, ), whether for targeted advertisements (li2012building, ), fire emergency response (lazreg2015smartrescue, ) or controlling a full-blown COVID-19 epidemic (ferretti2020quantifying, ). However, as the current discussion regarding contact tracing shows (cho2020contact, ), collecting and mining location data inherently comes with its own strong privacy and other ethical concerns (beresford2003location, ).

On one hand, naive de-identification methods have emerged naturally, attempting to find a feasible operating point on the privacy-utility trade-off curve for mobility data. On the other hand, these quasi-standard techniques do not do their job (zang2011anonymization, ), and this significantly hinders location data sharing and use by researchers, developers and humanitarian workers alike; as evidenced by qualified personnel not being granted access to mobile cell tower logs during the height of the Ebola crisis in 2014111, citing privacy as one of the main concerns. Perhaps rightfully so, as a study showed that pseudonymization and quasi-standard de-identification are not sufficient to prevent users from being re-identified in location data: four spatio-temporal data points were demonstrated to be enough for uniquely re-identifying 95% of the users in a dataset of 1.5 million users (de2013unique4points, ). Generally speaking, plenty of different anonymization techniques have been proposed, but there exists a similarly high number of re-identification algorithms (for a very thorough survey see (fiore2019trajsurvey, )).

A prominent line of anonymization research uses the notion of differential privacy (dwork2008differential, )

which gives a privacy guarantee based on rigorous mathematical proofs. Some proposals based on synthetic data generation via machine learning, i.e., modeling the dataset through the underlying distributions of generating variables, do apply differential privacy specifically for location data 

(chen2012diffgrams, ; he2015dpt, ), but with significant shortcomings. First, they generate synthetic traces as random walks that do not incorporate destinations, equivalent to: ”let’s go to a place where people usually go from this place”. Second, time-of-day is also left out from the models; clearly, this reduces the descriptive power of the generative model as human mobility does show strong time-of-day patterns (gonzalez2008understanding, ). In fact, time-of-day even influences trip destinations.

We acknowledge that incorporating the above variables meaningfully into an anonymization algorithm (whether with or without differential privacy) is a hard task owing to

the curse of dimensionality

. When trying to model a dataset through the underlying distribution of generating variables, a general approach is to include several dependent variables is the chain rule, i.e., the joint probability distribution is broken down to conditional probabilities. With the large number of conditionals in case of high-dimensional data (such as location), the number of histograms is also large. Moreover these histograms are usually very sparse, and the bins’ frequency distribution has a heavy tail: these promote uniqueness and make de-anonymization possible

(narayanan2008robustnetflix, ) as well as useful anonymization notoriously difficult. (Note that if we assume partially independent variables, the model loses its descriptive power: a type of dimensionality vs. utility trade-off.). Yet, in specific cases, where there is a hidden pattern in the data that aids clustering, the construction of a well-fitting simple generative model could be possible. For such a model it is reasonable to add proper noise a la differential privacy, and still get meaningful utility from the noisy dataset.

In order to tackle high dimensional data modelling, deep learning has shown very promising results. Generative models estimate the underlying distribution of the data and generate realistic samples based on their estimated distribution. However, the generalization capability of these models does not necessarily prevent the model to learn any individual-specific information

(NasrSH19, ; fredrikson2015modelinv, )

. Most privacy-preserving algorithms for neural networks are based on modifying the gradient information generated during backpropagation. Modification involves clipping the gradients (to bound the influence of any single record on the released model parameters) and adding calibrated random noise

(abadi2016deep, ; chen2020gswgan, )

. Some works propose to use generative adversarial networks (e.g.,

(park2018data, )) or mixture models (acs2018diffmixture, ) to directly generate privacy-preserving synthetic data. Nonetheless, none of these generative models are specific to private location data generation.

In this paper, we propose what we believe to be the first private synthetic data generation algorithm and corresponding neural network model specifically tailored for mobility data.

Contributions: Our contributions are as follows:

  • We propose a private synthetic data generation algorithm that is divided into three phases. The first phase learns the private distribution of the (starting point, destination, time) 3-tuple. The second phase generates transition probability distributions conditioned on the same 3-tuple. These probabilities are then used to build transition graphs, where, in the final step, the path with the highest probability between the starting point and destination is computed. We combine the three steps into our final generative model.

  • Our model is conditioned on both destination and time-of-day, and an accuracy enhancing scalable locality technique. This trick considers only plausible places for a trajectory, i.e., it disregards areas that are not plausible to reach in time with a realistic speed.

  • We evaluate our findings on a real-life open source taxi dataset, and demonstrate that the generated private synthetic data has higher or the same utility compared to previous works.

The rest of the paper is structured as follows. Section 2 defines the preliminaries. Section 7 gives both a high-level and a detailed description of our generative approach. Section 8 describes the experimental evaluation including the dataset, preprocessing, model instantiation, metrics and results. Finally, Section 9 concludes the paper.

2. Preliminaries

2.1. Location Data

In general, location data is geographical information about a specific device’s whereabouts associated to a time identifier. Formally, let be the universe of locations, where is the size of the universe. We assume that the whole universe is represented as a grid and each location corresponds to a cell in the grid. Each record in a location database is a sequence of timestamped location visits drawn from the universe. Specifically, a sequence of length is an ordered list of items , where . A location may occur multiple times in , but not consecutively. For example, is contracted to , where is a function of and . A location database is composed of a multiset of sequences , where denotes the number of traces in .

2.2. Differential privacy

Differential privacy (dwork2006calibrating, ) (DP) ensures that the outcome of any computation is insensitive to the change of a single record. It follows that any information that can be learned from the database with a record can also be learned from the one without that particular record. In our case, DP guarantees that our generative model is not affected by any single original trajectory beyond the privacy budget measured by and

, which can be computed as follows. The privacy loss between two neighbouring databases can be formulated as a random variable:

Definition 3 (Privacy loss (dwork2014algorithmic, )).

Let be a privacy mechanism which assigns a value to a dataset . The privacy loss of with datasets and and auxiliary input at output is a random variable:

where the probability is taken on the randomness of .

Definition 4 ().

-Differential privacy A privacy mechanism gives -differential privacy if for any database and differing on at most one record, and for any possible output ,

where the probability is taken over the randomness of .

The original definition does not include the term , this version was introduced in (dwork2006eddiff, ), and allows that -differential privacy is not satisfied with probability .

Intuitively, the privacy loss, as a random variable, describes the value of for a specific output , and -DP requires that for any neighboring datasets and . That is, DP guarantees that every output of algorithm is almost equally likely (up to ) on datasets differing in a single record except with probability at most , preferably smaller than .

A fundamental concept for achieving differential privacy is the global sensitivity of a function (dwork2006calibrating, )

that maps an underlying database to (vectors of) reals:

Definition 5 (Global -sensitivity (dwork2014algorithmic, )).

For any function , the -sensitivity of is

for all differing in at most one record, where denotes the -norm.

Differential privacy also maintains composition, i.e., if each of the mechanisms is -DP, then their -fold adaptive composition is -DP. However, a tighter upper bound can be derived from advanced composition theorems such as in (bun2016concentrated, ).

The moments accountant

(abadi2016deep, ) generalizes the regular approach of keeping track of

using an advanced composition theorem by taking into account the exact noise distribution. In order to present the bounds given by the moments accountant, we first introduce the log of the moment generating function:

Definition 6 (Moment Generating function).

For a given mechanism , the log of the moment generating function evaluated at is:


Theorem 6.1 (Moments Accountant).

Let be defined as above. Let be the -fold adaptive composition of . Then:

  1. Composability:

  2. Tail bound: For any , the mechanism is -differentially private for .

6.1. Artificial Neural Networks

Artificial neural networks define parametrized functions from inputs to outputs as compositions of many layers of basic computational blocks (artificial neurons) that may apply linear or nonlinear activation functions. By changing the parameters of these neurons, we can fit any finite set of input/output examples on the network. Results showed that such constructed networks can approximate arbitrary close real-valued continuous functions on compact subsets of

(csaji2001approximation, ). In this section, we introduce the different types of deep neural networks that we have applied in this work.

6.1.1. Feed Forward Neural Network (FFNN)

In FFNN, neurons (or perceptrons) are arranged into layers, where the first layer takes in inputs and the last layer produces outputs. The middle layers have no connection outside the network, and hence are called hidden layers. Each neuron in one layer is connected to every neuron in the next layer. Hence, information is constantly ”fed forward” from one layer to the next. In FFNN, there is no connection among neurons in the same layer.

6.1.2. Variational Autoencoders

A variational autoencoder 

(kingma2013auto, ) (doersch2016tutorialVAE, ) consists of two neural networks (an encoder and a decoder

), and a loss function. The encoder compresses data into a latent space (

) while the decoder reconstructs the data given the hidden representation. The primary benefit of a VAE is that it is capable of learning

smooth latent state representations of the input data. In other words, a VAE learns a distribution over the input space.

Let be a random vector of observed variables, which are either discrete or continuous. Let be a random vector of latent continuous variables. The probability distribution between and assumes the form , where indicates that is parametrized by . Also, let be a recognition model whose goal is to approximate the true and intractable posterior distribution . A lower-bound can be defined on the log-likelihood of as follows: . The first term makes similar to ensuring that the VAE learns a decoder during training which, at generation time, will be able to invert samples from the prior distribution such that they look just like the training data. The second term can be regarded as a form of reconstruction cost, and is approximated by sampling from .

In VAEs, the gradient signal is propagated back through the sampling process and through

using a reparametrization trick. The variational autoencoder is trained using stochastic gradient descent to optimize the loss with respect to the parameters of the encoder and decoder,

and .

7. Model

7.1. Overview

Our goal is to generate private synthetic location traces. In particular, having a location dataset with a multiset of trajectories, our goal is to build a generative model which approximates the true generator distribution of , where every trajectory in is a sample from this distribution. The model is built using the privacy-sensitive data , and hence the training process of this model must guarantee differential privacy for any user/trajectory in . Due to the large complexity of this model, we decompose it into three main parts which are as follows:

  • Trajectory Initialization: A generative model, called Trajectory Initializer (TI)

    , learns the underlying joint distribution of the starting and ending locations and time variable of all trajectories, i.e., their very first (source) and very last (destination) location visits along with the single timestamp of the whole trace.

  • Transition Probability Generation: A classification model, called Transition Probability Generator (TPG), learns the transition probability distribution between any two consecutive locations, i.e., it outputs the probability distribution for the next hop in a trace, conditioned on the current location, the destination and time. Both of these models are trained with differential privacy guarantees on a potentially sensitive training dataset.

  • Trace Generation: Sampling a source and destination along with the time from the output distribution of TI, and using the transition probabilities between any locations generated by TPG, the trace generator (TG) deterministically reconstructs a trajectory between source and destination at time . As this process only uses the output of TI and TPG, which are already differentially private, and some public information about locations, the whole generation process becomes differentially private.

Input: Private Dataset
1 Model construction: Train the Trace Initialization Model on Train the Transition Probability Generation Model on Trajectory Reconstruction: for  do
2        Sample Build a routing graph , where and , where Find the path between and with the minimal total weight in
Output: Synthetic dataset
Algorithm 1 Differentially Private Synthetic Trace Generator

7.2. Assumptions

Time is divided into equally sized slots which are sufficiently large to include whole trajectories. Each trajectory is assigned to a single time slot , and all location visits of a trajectory take place within this slot . All trajectories are also assumed to be independent of each other, that is, trajectories are generated by drawing them independently from the same distribution.

7.3. Model Description

In the remainder of this section, we describe our general approach to release differentially private synthetic location trajectories that is also summarized in Alg. 1. A more specific implementation is detailed in Section 8.3.

7.3.1. Trajectory Initialization (TI)

For every possible starting and ending locations and and time , the probability distribution is approximated by . The model parameters are learnt from a sensitive location dataset , and hence the training process must be differentially private.

The output of are 3-dimensional vectors, with the starting and ending location and time of a trace (recall each trace has a single timestamp in our model). The set of model parameters, denoted by , depends on the exact choice of the model which is a Variational Autoencoder in our case (see our choice in Section 8.3.1). Notice that learning this distribution privately is challenging due to its high dimensionality; the domain of the joint probability distribution is , where is the number of all possible time slots. We add noise to the learning algorithm (see Section 8.3.1 for more details) so that the released model parameters are differentially private.

7.3.2. Transition Probability Generation (TPG)

For every possible time slot and destination , the true transition distribution is approximated with for any neighboring locations and , that is, the probability that an individual at location moves to location towards destination at time .

In particular, the output of is a neighboring location of the current location . The neighborhood is constrained to those locations which are plausibly available from in the next time slot, that is, they are not geographically too far from it. Although one could consider all possible locations in as potential next-hop locations, this would be unrealistic and also unnecessarily degrade the performance of our model. The -sized neighborhood of a location consists of all locations in that are within a distance from .

, as opposed to , is typically a supervised classification model; it deterministically maps a current location , a destination , and time to a probability distribution on the set of possible next-hop locations. This distribution is used by the Trace Generator (TG) (see below) to find the optimal path between a source and a destination location at time . Importantly, and unlike in most Markov-based sequential data generators, our trace generator is not restricted to choose the mode of this distribution (i.e., the most probable next-hop location) but may consider the output which yields the globally most probable trajectory.

During training, is fed with a 3-dimensional vector as input, which is composed of the current location, the destination location, and time . The output label is the next-hop location towards the destination at time . More specifically, every trajectory at time in the (private) training data is decomposed into training samples as for every , where is the output label. Note that, for every , and are within distance . We add noise to the learning algorithm (see Section 8.3.2 for more details) so that the released model parameters are differentially private.

Besides the current time and destination, the prediction of the next location depends only on the current location and not on the earlier location visits. That is, when the next location is predicted, we do not take into account how the current location is reached. This is not a far-fetched simplification; several studies have shown that 1 or at most 2-order Markov chains provide a sufficiently accurate estimation of the next location visit

(gambs2012nextmarkov, ).

7.3.3. Trace Generation (TG)

When a trajectory is generated, we first sample a pair of source and destination locations along with the time slot from the output distribution of . Then, a weighted directed routing graph is built, where the edge weights are the transition probabilities between two location points generated by (i.e., , is composed of all location pairs that are within distance , and for any ). Finally, a trajectory is constructed from graph by applying Dijkstra’s shortest path algorithm to search for the most probable path between and . If the training of and are differentially private, the shortest path search in graph is also differentially private, since is constructed from the privacy-preserving outputs of and . Note that is specific to a given destination and time slot , hence different graphs are constructed for trajectories differing in their destination or time. Also note that each vertex in has at most neighbors which makes the search algorithm more scalable compared to the case when is complete (i.e., an individual can move to any location from any location at any time).

In order to find the most probable path between and , the weight of the directed edge between any vertices and is set to , that is the negative logarithm of the transition probability from to conditioned on destination and time which is approximated by model . As is always non-negative, Dijsktra’s shortest path algorithm finds the path with the minimum total weight between two vertices, which is equivalent to the most probable path between and at time in our case. Indeed, let denote the set of all paths between and . Then, the most probable path between and is

due to the monotonicity property of the logarithm.

Note that the path finding algorithm is deterministic meaning that our approach always generates the same trajectory for identical starting and ending locations at the same time . Although this is hardly the case in practice, decreasing the length of time slots can result in more realistic trajectories at the cost of model complexity. For example, if location trajectories of vehicles are generated in a city (see Section 8.1), then a time slot with a size of one hour can be sufficiently large if traffic conditions do not change significantly within an hour (i.e., a driver probably takes the same route between the same source and destination locations at a given hour of the day).


Feeding the destination and time as an input to our transition probability generator enhances model accuracy by a large margin (in certain cases with more than 20%). The rationale behind this is that the probability of the next-hop location is heavily influenced by the direction of movement, i.e. the specific destination where the individual is heading for. Similarly, time also impacts the direction of movement towards a specific destination, especially in vehicular transport when the route of a vehicle is largely influenced by the traffic that is ultimately time dependent. This is in sharp contrast to earlier works (chen2012diffgrams, ) which solely used the last visited locations to predict the next location of a trajectory.

Applying Dijkstra’s shortest path algorithm ensures that the synthetic traces remain realistic. Even navigation applications use similar algorithms to select the best routes which are then faithfully followed by most drivers in vehicular transport. Previous works (chen2012diffgrams, ; he2015dpt, ) generated trajectories as random walks, i.e., at each location they always choose the most likely next-hop location and stop when a terminal symbol is chosen. These models do not guarantee that the generated trajectory is realistic, nor that it stops in plausible time. Indeed, as we show in Section 8.5, our model preserves the global statistics of trajectories (such as the distribution of trip length) much more accurately.

8. Experimental evaluation

In this section, we empirically evaluate our model on the publicly available San Francisco taxi dataset containing the trajectories of different taxi trips (epfl-mobility-20090224, ). We show that the synthetic trajectories generated by our model is close to the original trajectories according to four different utility metrics.

8.1. Data

The original SF taxi dataset contains a set of GPS trajectories with timestamps that were recorded by approximately 500 taxis. They were collected over 30 days in the San Francisco Bay Area of the USA in 2009. Our dataset is a random subsample containing all the trips of 200 taxis selected randomly for our evaluation. The trajectories cover the region of San Francisco within the bounding box of (37.6017N, 122.5158W) and (37.8112N, 122.3527W) – approximately . The original sampling rate of these trajectories is roughly 1 minute. Our goal is to preserve the privacy of passengers and not taxi drivers, and hence a trace (or a sequence) is composed of the recorded location visits of a single taxi trip.

8.2. Data Preprocessing

We consider two grids with two different cell sizes. The smaller grid consists of cells with size of , and the larger one with cells of size . Each GPS location point is assigned to its covering cell. Therefore, every trace (taxi trip) is composed of the sequence of location visits, each containing a pair of cells and the time of the visit.

All traces are dropped with velocity larger than km/h (calculated between two GPS points) or being out of the bounding box. Also, we removed all traces in the weekends and US holidays in order to focus on weekdays only. Since the sampling rate was not constant in the database, we applied two transformations to make it more regular; (1) cell visits are aggregated in time by

seconds by keeping the cell that was the most frequent in the trace during the given time frame, and (2) when there were gaps shorter than 5 minutes without any location visits, these missing visits are approximated by linear interpolation. If the trace has “self-loop” transitions (i.e., consecutive location visits with identical cells), these visits are merged keeping the timestamp of the first visit of the cell. Finally, if the resulting trace had only a single visit, it is dropped.

After cleansing and smoothing our data, the timestamps are further aggregated by assigning only the hour of the day to all visits of a single trace, when the larger part of the taxi trace was present. For example, if a trace started at 17:58 and ended at 18:10 with 12 visits altogether, we assigned the hour of the day to every cell visit of the trace (including those which happened before 18:00). As our aim is only to demonstrate the feasibility of our approach, this simplification is introduced in order to lower the size of the input and output space of our models which makes training faster and the models less complex. Finally, we created 2-grams from the traces, i.e. we grouped every two consecutive data points together to create a single training sample for our TPG model, where the first and second part are served as input and output for the model during training. The first part of every gram is augmented with the destination cell of the trace where the gram comes from, and the second gram contains only the cell identifier of the next location (without timestamp).

It is important to note that we used the whole length of the passenger traces contrary to (chen2012diffgrams, )

, where they were truncated to a pre-fixed size. See more descriptive statistics in Table

1. We derived two datasets with two different grid sizes from the original SF taxi dataset described in Section 8.1. The more fine-grained SF-TAXI-250 dataset was computed using a cell size of 250 meters, whereas SF-TAXI-500 is obtained by using a cell size of 500 meters.

SF-TAXI-250 121,622 2851 69 9.07 5.15
SF-TAXI-500 120,561 848 55 7.34 4.53
Table 1. The datasets used in our experiments: SF-TAXI-250 (with a cell size of 250 meters) and SF-TAXI-500 (with a cell size of 500 meters). Time is always .

8.3. Model Instantiation

As a generative model, the neural network has significantly less parameters than the amount of data we train them on, so the models are forced to discover and efficiently learn the essence of the data in order to generate it.

8.3.1. Trajectory Initialization with Variational Autoencoders

In order to sample a starting and ending point and time for a synthetic trajectory, we built a differentially private variational autoencoder (VAE) that is capable of approximating the joint probability distribution of the three random variables. A VAE has two neural networks; an encoder and a decoder. VAEs perform better when the input data is one-hot encoded, thus the input layer of the encoder has a dimension of

, where the size of depends on the coarseness of the grid, and is the number of hours in a day. We evaluated the model with two differently sized grids (with cell sizes of and meters), see details in Table 8.5. Our encoder has two hidden dense layers with

neurons each, the first has RELU, the second has a linear activation function. The outputs of the encoder are the parameters of the learned normal distribution, one vector for the expected value

and one for the variance

; values drawn from this distribution comprise the latent vectors, and we set its size to

(as well as the mean and standard deviation of the normal distribution each). The input of the decoder is a randomly chosen sample from the normal distribution constructed by the encoder. The decoder has to transform this latent variable to an actual example. The decoder has only

hidden layer with neurons and RELU activation. Finally, there are three parallel output layers with softmax activations, each of them corresponds to an output variable (location, destination, time). Our VAE is shown in Figure 1. VAEs have their own specific loss function and we applied the original one from (doersch2016tutorialVAE, ).

Figure 1. Trajectory Initializator as a Variational Autoencoder

8.3.2. Transition Probability Generation with Feed Forward Neural Networks

Our classifier model is a feed forward network endowed with word embedding. Its input is a 3-dimensional vector

(current location, destination, time), and its output is the probability distribution on the next hop (in Figure 2). The input vector’s two location coordinates are fed into an embedding layer where they are embedded separately into the same -dimensional vector space222The embedding layer is inside the network, thus trained together with the rest of the layers.. Then, they are concatenated along with the time coordinate, resulting in a -dimensional vector. This serves as the input of the next dense layer that has neurons and RELU activation. The output layer uses softmax activation and has classes. As explained in Section 7.3.2, the output consists of neighbouring cells only: we chose to include all the cells that were at most in a -cell distance from the input position, thus resulting in cells at most, where the middle cell is the input position333We chose -cell distance, because this can cover slower and faster moving vehicles at the same time.. We trained the network with sparse categorical crossentropy loss and the SGD optimizer.


Embedding layers are mostly used in natural language processing: they create distributional vectors that are based on the so-called distributional hypothesis, i.e.,

”words” (here locations) appearing within similar context possess similar meaning. In case of location data they are close to each other in geographical space and in a trajectory, i.e., the more they tend to follow each other in a trajectory, the closer they are in the embedded space.

The choice of a feed forward network (FFN) instead of a recurrent neural network (such as LSTM), which has an implicit capability to handle time and sequential data, deserves more explanation. Training with differentially private SGD, the training time of LSTMs rose approximately to 6-8 times of the non-noisy model. However, training simple feed forward networks is considerably faster, almost as fast as the non-private model. Furthermore, our FFN has less parameters than the simplest but still well-performing LSTM; our solution has an accuracy only 1-2

less than the LSTM layer on the considered datasets.

Figure 2. Transition Probability Generator

8.3.3. Privacy Parameters

For both the TI and the TPG models, we used the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm by Abadi et al. (abadi2016deep, ). This method is independent of the chosen loss function and model, and it adds noise to the clipped gradients. In particular, the gradients of all model parameters in every model update are clipped to have a bounded -norm with value , and then Gaussian noise with variance is added to the clipped gradients before updating the parameters. The output of DPSGD are the parameters and of TI and TPG, respectively. The sampling probability in DPSGD is calculated as follows. Our aim is to provide user-level (or in our case trajectory-level) differential privacy. However, recall that trajectories have different lengths, and we divided them into 1-grams: thus there are variable number of training examples belonging to one trajectory for TPG. In the case of our TI model, we only have one sample per trajectory, thus the sampling probability of a single trajectory here is at most , where is the total number of trajectories and is the size of batch . However, for TPG, a single trajectory can have multiple samples, and therefore, we sample a batch differently. We first sample a trajectory from the dataset uniformly at random, and then a 1-gram out of this trajectory also uniformly at random. We repeat this experiment until a batch of grams (training samples for TPG) is collected. This sampling mechanism ensures that any trajectory is equally probable to participate in an update (batch), and hence the sampling probability becomes .

8.4. Evaluation Metrics

We consider three different utility metrics. Each of them is evaluated both on the synthetic and the original dataset, and the difference is measured according to different distance metrics.

Trip size distribution:

The Jensen-Shannon divergence (JSD) is computed between the distribution of the trip lengths in the synthetic and the original datasets in each hour of the day (the trip length is the number of cells of a trip). Note that, unlike the Kullback-Leibler (KL) divergence, JSD is symmetric and has a finite value. In our case, JSD is bounded between 0 (identical distributions) and 1 (least similar distributions).

Frequent patterns:

The top- most frequent patterns (i.e., subsequences of locations) are computed both in the original and synthetic dataset , which are denoted by and , respectively. The true positive ratio is reported for .

Spatio-temporal distribution of source and destination pairs:

The joint distribution of the source and destination locations is computed from the original and synthetic datasets individually, and their EMD is reported for every hour of the day. In particular, we count the relative frequency of every possible pair of source and destination locations in both datasets, and compute the EMD between these two distributions. The Earth Mover’s Distance (emd, ) is reported between these distributions, which measures their difference in terms of geographical distance (meters) and is a metric for probability distributions (the distance between a pair of location points is the sum of their individual distances). Specifically, EMD measures the “amount of energy” (or cost) needed to transform one distribution to another where the ground distance is the geographical distance between the centers of cells. As the domain of this joint distribution has a size of , the computation of EMD can be very costly. Therefore, we approximate the joint distribution by sampling at most 2000 trips from both datasets uniformly at random, and consider only the source-destination pairs in the computation that actually occur in these samples. This metric measures the performance of the Variational Autoencoder in our proposal.

8.5. Results

In this section, we experimentally evaluate the performance of our solution in terms of the above three utility metrics on the datasets described in Table 1

. We also evaluate the N-gram model from

(chen2012diffgrams, )

, referred to as NGRAM in the sequel, on the same datasets and compare the results with our work using the three metrics. As NGRAM does not release time information, we dropped all timestamps both in SF-TAXI-250 and SF-TAXI-500 and synthesized the resulting datasets with NGRAM (by contrast, our approach were always executed on the original SF-TAXI-250 and SF-TAXI-500 with time information). Experiments were conducted using Tensorflow 2.0 and Python 3.6.9. We have conducted our experiments in four different settings: we combined

and with two grid resolutions, SF-TAXI-250 and SF-TAXI-500 datasets 1.

In the TI model, we set the -norm clipping threshold to , the batch size to , and ran the training over epochs (recall that is the variance of the Gaussian noise used to perturb the gradients in order to provide differential privacy). With , and and the learning rate is set to ; with , is and the learning rate is set to .

In the TPG model, we set the -norm clipping threshold to , the batch size to , and train the model over epochs. With , and , the learning rate was set to ; with , is and the learning rate wis set to .

is computed with the moments accountant over epochs, that is, the total epochs trained over the whole model including TI and TPG.

In Figure 3, we report the JSD value between the original and the private synthetic trip size distributions with the two different values of ( and ). Recall that the NGRAM model does not include time information, thus we only report one value for each setting. One can see that our model’s results are clearly much lower (i.e. closer to the original distribution), than that of the NGRAM. The NGRAM model generates traces without destination, and terminates whenever it samples a special terminal character, that is, it does not select the globally most probable trajectory. In contrast to this, our model is more realistic.

Figure 3 also shows how the coarseness of the grid influences the impact of the added noise. In Figure 2(a), the JSD for the two values of have the same trend, but keep a steady distance from each other. In contrast to this, different values have the same JSD on the SF-TAXI-500 dataset in Figure 2(b).

In Figure 4, the Earth Mover Distance (EMD) is reported between the spatial distribution of the source and destination pairs depending on the time. In Figure 3(a), our model has a peak between 11 a.m. and 2 p.m., where it reports a higher EMD than the NGRAM model, but during the rest of the day it stays steadily below that. Coarsening the grid does not heavily affect our model’s performance, in general the variability is higher, but the peak is lower. However, it does affect the NGRAM model’s performance, its EMD values drop by a 1000 meters on average.

In Table 2 and 3, we report the results of the Frequent Patterns metric, that is, the true positive ratio of the Top-K location subsequences between the original and the private synthetic databases. One can see that for both datasets the NGRAM model clearly outperforms our model which is due to the fact that NGRAM focuses on the accurate release of the most frequent subsequences and reconstructs traces from these.

As a final result, we report the training performance metrics after the very last epoch for the TI and the TPG models separately in Table 4 and 5. TPG’s accuracy is heavily dependent on the cell size, but importantly, the error of the prediction in meters stays low even for the higher resolution, it is only cells on average.

(a) SF-TAXI-250. NGRAM: 0.93 ()
and 0.94 ()
(b) SF-TAXI-500. NGRAM: 0.85 ()
and 0.89 ()
Figure 3. JSD between trip size distributions depending on the hour of a day.
(a) SF-TAXI-250. NGRAM: 3000 meters ()
and 3913 meters ()
(b) SF-TAXI-500. NGRAM: 2390 meters ()
and 2377 meters ()
Figure 4. EMD (in meters) between spatial distribution of source and destination pairs depending on the hour of a day.
K value (Top-K)
Our approach 50% 60% 71% 71%
NGRAM 80% 88% 100% 100%
K value (Top-K)
Our approach 20% 60% 80% 80%
NGRAM 90% 90% 100% 100%
Table 2. Frequent Patterns on SF-TAXI-250
K value (Top-K) value
Our approach 40% 25% 30% 22%
NGRAM 90% 95% 76% 100%
K value (Top-K) value
Our approach 40% 30% 20% 15%
NGRAM 90% 95% 78% 79%
Table 3. Frequent Patterns on SF-TAXI-500
value Loss
(a) SF-TAX-250
value Loss
(b) SF-TAXI-500
Table 4. Training performance of the Trajectory Initializer after 15 epochs.
value Loss Accuracy Error in meters
(a) SF-TAX-250
value Loss Accuracy Error in meters
(b) SF-TAXI-500
Table 5. Training performance of the Transition Probability Generator after 15 epochs.

9. Conclusions

Releasing location data is challenging owing to the fact that it is high-dimensional and sparse. We proposed a novel approach to release location data with strong privacy guarantees. In contrast to most prior works, our model is capable to release time information along with location visits without suffering significant utility loss. Our general framework consists of generating the source and destination pairs of every trace separately, computing the transition probabilities between neighboring locations, and then generating synthetic trajectories between the source and destination using a graph optimization algorithm. As opposed to previous works, the transition probability depends on the time and the destination towards where the individual is moving, and is computed only between geographically close locations which make our solution accurate and scalable. We evaluated our proposal on a public location dataset and designed neural networks to model the distribution of trajectories. These networks are simple and hence fast to train even with differential privacy guarantees. Our approach is evaluated on real-life location data of taxi passenger trajectories, and results show that the provided utility is meaningful. Therefore, our technique can be a compelling new approach to the privacy-preserving release of location trajectories with time information. Importantly, we produce synthetic datasets that hopefully preserve many different statistics of the original dataset. Obviously, releasing only a few targeted statistics with or without differential privacy, instead of the complete synthetized dataset, is a different approach which should always result in greater accuracy but only with respect to the released statistics.

There are several further research directions to explore. First, the proposed framework is general and finding the best generative models to a given type of data is difficult which requires domain expertise. Second, there can be several techniques to recover trajectories from the noisy transition probabilities which may provide superior performance to a simpler shortest path search. Finally, our general approach may be applicable to other types of sequential data than location trajectories such as different time series.


  • (1) M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi, “Understanding individual human mobility patterns,” nature, vol. 453, no. 7196, pp. 779–782, 2008.
  • (2) G. Biczok, S. D. Martínez, T. Jelle, and J. Krogstie, “Navigating mazemap: indoor human mobility, spatio-logical ties and future potential,” in 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS), pp. 266–271, IEEE, 2014.
  • (3) G. Dimitrakopoulos and P. Demestichas, “Intelligent transportation systems,” IEEE Vehicular Technology Magazine, vol. 5, no. 1, pp. 77–84, 2010.
  • (4) A. Küpper, Location-based services: fundamentals and operation. John Wiley & Sons, 2005.
  • (5) K. Li and T. C. Du, “Building a targeted mobile advertising system for location-based services,” Decision Support Systems, vol. 54, no. 1, pp. 1–8, 2012.
  • (6) M. B. Lazreg, J. Radianti, O.-C. Granmo, M. Palen, T. Comes, and A. Hughes, “Smartrescue: Architecture for fire crisis assessment and prediction.,” in ISCRAM, 2015.
  • (7) L. Ferretti, C. Wymant, M. Kendall, L. Zhao, A. Nurtay, L. Abeler-Dörner, M. Parker, D. Bonsall, and C. Fraser, “Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing,” Science, vol. 368, no. 6491, 2020.
  • (8) H. Cho, D. Ippolito, and Y. W. Yu, “Contact tracing mobile apps for covid-19: Privacy considerations and related trade-offs,” arXiv preprint arXiv:2003.11511, 2020.
  • (9) A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive computing, vol. 2, no. 1, pp. 46–55, 2003.
  • (10) H. Zang and J. Bolot, “Anonymization of location data does not work: A large-scale measurement study,” in Proceedings of the 17th annual international conference on Mobile computing and networking, pp. 145–156, 2011.
  • (11) Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel, “Unique in the crowd: The privacy bounds of human mobility,” Scientific reports, vol. 3, p. 1376, 2013.
  • (12) M. Fiore, P. Katsikouli, E. Zavou, M. Cunche, F. Fessant, D. L. Hello, U. M. Aivodji, B. Olivier, T. Quertier, and R. Stanica, “Privacy in trajectory micro-data publishing: a survey,” arXiv preprint arXiv:1903.12211, 2019.
  • (13) C. Dwork, “Differential privacy: A survey of results,” in International conference on theory and applications of models of computation, pp. 1–19, Springer, 2008.
  • (14) R. Chen, G. Acs, and C. Castelluccia, “Differentially private sequential data publication via variable-length n-grams,” in Proceedings of the 2012 ACM conference on Computer and communications security, pp. 638–649, 2012.
  • (15) X. He, G. Cormode, A. Machanavajjhala, C. M. Procopiuc, and D. Srivastava, “Dpt: differentially private trajectory synthesis using hierarchical reference systems,” Proceedings of the VLDB Endowment, vol. 8, no. 11, pp. 1154–1165, 2015.
  • (16) A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” in 2008 IEEE Symposium on Security and Privacy (sp 2008), pp. 111–125, IEEE, 2008.
  • (17) M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning,” in 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019, pp. 739–753, 2019.
  • (18) M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333, 2015.
  • (19) M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318, 2016.
  • (20) D. Chen, T. Orekondy, and M. Fritz, “Gs-wgan: A gradient-sanitized approach for learning differentially private generators,” arXiv preprint arXiv:2006.08265, 2020.
  • (21) N. Park, M. Mohammadi, K. Gorde, S. Jajodia, H. Park, and Y. Kim, “Data synthesis based on generative adversarial networks,” arXiv preprint arXiv:1806.03384, 2018.
  • (22) G. Acs, L. Melis, C. Castelluccia, and E. De Cristofaro, “Differentially private mixture of generative neural networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 6, pp. 1109–1121, 2018.
  • (23) C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of cryptography conference, pp. 265–284, Springer, 2006.
  • (24) C. Dwork, A. Roth, et al., “The algorithmic foundations of differential privacy.,” Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014.
  • (25) C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our data, ourselves: Privacy via distributed noise generation,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503, Springer, 2006.
  • (26) M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Theory of Cryptography Conference, pp. 635–658, Springer, 2016.
  • (27) B. C. Csáji et al., “Approximation with artificial neural networks,” Faculty of Sciences, Etvs Lornd University, Hungary, vol. 24, no. 48, p. 7, 2001.
  • (28) D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  • (29) C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016.
  • (30) S. Gambs, M.-O. Killijian, and M. N. del Prado Cortez, “Next place prediction using mobility markov chains,” in Proceedings of the first workshop on measurement, privacy, and mobility, pp. 1–6, 2012.
  • (31) M. Piorkowski, N. Sarafijanovic-Djukic, and M. Grossglauser, “CRAWDAD dataset epfl/mobility (v. 2009-02-24).” Downloaded from, Feb. 2009.
  • (32)

    Y. Rubner, C. Tomasi, and L. J. Guibas, “The earth mover’s distance as a metric for image retrieval,”

    International journal of computer vision

    , vol. 40, no. 2, pp. 99–121, 2000.