Geographic Differential Privacy for Mobile Crowd Coverage Maximization

For real-world mobile applications such as location-based advertising and spatial crowdsourcing, a key to success is targeting mobile users that can maximally cover certain locations in a future period. To find an optimal group of users, existing methods often require information about users' mobility history, which may cause privacy breaches. In this paper, we propose a method to maximize mobile crowd's future location coverage under a guaranteed location privacy protection scheme. In our approach, users only need to upload one of their frequently visited locations, and more importantly, the uploaded location is obfuscated using a geographic differential privacy policy. We propose both analytic and practical solutions to this problem. Experiments on real user mobility datasets show that our method significantly outperforms the state-of-the-art geographic differential privacy methods by achieving a higher coverage under the same level of privacy protection.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/29/2021

On the differential privacy of dynamic location obfuscation with personalized error bounds

Geo-indistinguishability and expected inference error are two complement...
02/01/2021

Regionalized location obfuscation mechanism with personalized privacy levels

Global Positioning Systems are now a standard module in mobile devices, ...
09/02/2019

Differentially Private Publication of Location Entropy

Location entropy (LE) is a popular metric for measuring the popularity o...
01/15/2019

On (The Lack Of) Location Privacy in Crowdsourcing Applications

Crowdsourcing enables application developers to benefit from large and d...
01/15/2019

Why Johnny Can't Develop Mobile Crowdsourcing Applications with Location Privacy

Mobile crowdsourcing (MCS) relies on users' devices as sensors to perfor...
05/22/2018

Social-Network-Assisted Worker Recruitment in Mobile Crowd Sensing

Worker recruitment is a crucial research problem in Mobile Crowd Sensing...
09/01/2020

Top-k Socio-Spatial Co-engaged Location Selection for Social Users

With the advent of location-based social networks, users can tag their d...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Crowd coverage maximization is a classical problem in mobile computing: how to select

users from a candidate pool to maximize the probability of covering a set of target locations in a coming time period (e.g., one day or one week). This problem and its variants have a wide spectrum of applications in location-based advertising 

[Dhar and Varshney2011], spatial crowdsoucing [Chen and Shahabi2016, Zhang et al.2014], urban computing [Zheng et al.2014], etc. For example, it can help shop owners to offer electronic coupons to the set of mobile app users who may physically visit the region around the shop soon; it can also help crowdsourcing organizers to recruit the participants to cover the task area with the highest probability [Xiong et al.2016].

One of the key steps in crowd coverage maximization is mobility profiling, i.e., predicting the probability of a user appearing at a certain location. A common practice is first dividing an area into fine-grained grids or sub-areas, and then counting the frequency of a user appearing in each grid based on trajectory history [Guo et al.2017]

. One can use more sophisticated models like Poisson process to estimate users’ occurrence distribution 

[Xiong et al.2016]. Existing mobility profiling methods often require access to users’ historical mobility traces, which may seriously compromise user privacy. For example, users’ exposed location data may reveal sensitive information about their identities and social relationships [Cho, Myers, and Leskovec2011, Rossi et al.2015]. Despite the importance of location privacy, as far as we know, there is little research effort combining location privacy, mobility profiling, and crowd coverage maximization up to date.

To fill this gap, this paper aims to explore how to protect the crowds’ location privacy, while still optimizing their expected coverage of a set of locations. To achieve this goal, we propose a mobile crowd coverage maximization framework with a rigorous privacy protection scheme — geographic differential privacy [Andrés et al.2013]. A geographic differential privacy policy obfuscates a user’s actual location to another with carefully designed probabilities, such that adversaries, regardless of their prior knowledge, can learn little about the user’s true location after observing the obfuscated locations. However, with differential privacy protection, crowd coverage maximization can only be performed based on the obfuscated (inaccurate) locations, which leads to inevitable loss of the quality of the selected users. Therefore, we propose a method to generate the optimal location obfuscation policy which satisfies geographic differential privacy while minimizing such loss.

In summary, this paper has the following contributions:

(1) To the best of our knowledge, this is the first work studying the mobile crowd coverage maximization problem with location privacy protection.

(2) In our approach, users only need to upload one of their frequently visited locations, and more importantly, the uploaded location is obfuscated using the rigorous privacy policy — geographic differential privacy. We further formulate an optimization problem to obtain the optimal obfuscation policy that can maximize the expected future crowd coverage over a set of locations under a guaranteed level of differential privacy protection. As the optimization problem is non-convex, we first mathematically analyze the scenario when only one location needs to cover and then derive an optimal solution. Then, we extend this setting to the multi-location coverage scenario and propose a practical algorithm to obtain the optimal obfuscation policy.

(3) Experiments on real human mobility datasets verify that, by selecting the same number of users under the same level of privacy protection, our method achieves a higher coverage than state-of-the-art differential privacy methods.

Preliminaries

Geographic differential privacy [Andrés et al.2013] introduces the idea of database differential privacy [Dwork2008] into the location obfuscation context. Its key idea is: given an observed obfuscated location , any two locations and have similar probabilities of being mapped to . It is thus hard for an adversary to differentiate whether the user is at or by observing .

Definition 1 [Andrés et al.2013]. Suppose the target area includes a set of locations , then an obfuscation policy satisfies geographic -differential privacy, iff.

(1)

where is the probability of obfuscating to , is the distance between and , is the privacy budget — the smaller , the better privacy protection.

Note that the set of locations are usually constructed by dividing the target area into subregions, e.g., equal-size grids [Bordenabe, Chatzikokolakis, and Palamidessi2014] or cell-tower regions [Xiong et al.2016].

If satisfies geographic differential privacy, it can be proven that for adversaries with any prior knowledge about users’ location distributions, their posterior knowledge after observing the obfuscated location can only be increased by a small constant factor [Andrés et al.2013]. Note that this protection is guaranteed even if the adversaries know . Due to this rigorous protection effect, geographic differential privacy has seen many applications in location based services, spatial crowdsourcing, etc. [Bordenabe, Chatzikokolakis, and Palamidessi2014, Wang et al.2016, Wang et al.2017].

Mobility profiling aims to estimate the probability of a user covering a certain location during a time period in the future. Specifically, a user ’s mobility profile is denoted as , and means the estimated probability of visiting in a concerned future period (e.g., next week). Commonly used mobility profiling methods include frequency-based [Guo et al.2017] and Poisson-based [Xiong et al.2016] algorithms. We use the Poisson process to model user mobility given its better prediction performance in our experiments. More details can be found in the appendix.

Framework Overview

We present an overview of our privacy framework in Figure 1. The key idea of our framework is that users should expose their location information as little as possible, while we can still select a proper set of users for optimizing their coverage on certain target locations in the future.

Figure 1: Framework overview.

The two main players in our framework are a server platform and its mobile client users. As we want users to expose their actual location information as little as possible, user mobility profiling runs locally on individuals’ smart devices. That means, the clients’ mobility profiles are only known to themselves. As shown in the literature, only uploading frequent locations with high profiling probabilities (e.g., ) to the server can already help achieve a good future crowd coverage [Guo et al.2017]. To limit the potential location leakage, our framework only requires users to upload one of their frequent locations. Moreover, this frequent location is obfuscated by the geographic differential privacy policy before being sent to the platform. The policy is generated by the server based on which target locations need to be covered. Finally, according to the uploaded obfuscated frequent locations (suppose users), the platform aims to select a set of users to maximize the expected coverage of intended locations in the coming period.

Figure 2: A running example of our framework.

A running example is shown in Figure 2, where the table represents a user’s mobility profile in a 2D spatial area splitted into uniform grids. Suppose that a location-based advertising platform needs to decide whether to send a Starbucks coupon to a user. The platform expects that a user receiving the coupon is a frequent visitor to the regions where Starbucks stores are located, so that the user will probably go to the stores. To achieve this goal, first, a user client computes its owner’s mobility profile locally. Second, from the set of locations whose probabilities are larger than a threshold (e.g., 80%), the user client randomly selects one location to be uploaded to the server.111If there is no location with the probability larger than the threshold, then the user does not upload any location. Third, according to the privacy policy received from the server, the user client randomly obfuscates to and then sends to the server. Finally, the server will decide whether to send the coupon to the user or not according to the uploaded . In this case, a user’s location privacy is preserved as the uploaded frequent location is rigorously obfuscated with differential privacy.

Location obfuscation would inevitably introduce certain loss of quality in selecting users for coverage optimization, as users’ uploaded frequent locations contain deliberate noises. Hence, how the server generates the privacy policy is the key challenge of our framework, which aims to minimize the loss of quality caused by privacy protection.

Optimal Privacy Policy

In this section, we illustrate our solution that guarantees geographic -differential privacy while minimizing the loss of quality in mobile crowd coverage optimization.

Single Location Coverage Problem (SLCP)

As the first step, we analyze the scenario where only one location needs to be covered. In location-based advertising, this reflects the scenario that the advertising only involves one specific site (e.g., a newly opened restaurant). In spatial crowdsourcing, this means that the task is only associated with one location (e.g., taking the photo of Statue of Liberty). Suppose the target location to cover as and a user submits her/his obfuscated frequent location as , then the probability of her/his frequent location being actually is:

(2)

where is the overall distribution of all the users’ frequent locations. Here we suppose that we can foreknow , and later we will elaborate how to estimate it. Note that the denominator can be seen as the overall probability of a user reporting her/his frequent location as .

Suppose we select a user reporting to cover the target location in the coming time period, apparently we would like to maximize Eq. 2 so that the future probability of the user covering is maximized. With this idea, we have the following optimization process to get the optimal privacy policy . Particularly, given to cover, we aim to

(3)
(4)
(5)
(6)

Eq. 4 is the constraint of geographic differential privacy; Eq. 5 and 6 are probability restrictions. By solving the above optimization problem, we can get the optimal privacy policy , as well as the user selection strategy, i.e., selecting the users reporting for future coverage maximization.

However, even given , Eq. 3 cannot be converted to a convex optimization problem with existing solutions [Boyd and Vandenberghe2004]. To overcome this difficulty, we then analyze the relationship between the constraints and the objective function, and then deduce an optimal solution analytically.

An Analytic Solution to SLCP

Our analytic deduction includes three steps. First, we verify that the selection of will not affect the optimal objective value of Eq. 3. Second, we prove that Eq. 3 cannot exceed a certain upper bound. Finally, we show that this upper bound can be achieved by constructing a feasible solution of .

Lemma 1. For any two locations , the optimal objective values of Eq. 3 are the same if we set or .

Proof. For or , we can always find a pair of , , where , and for other ; and lead to the same objective value. A detailed proof is in the appendix. ∎

Remark. Lemma 1 demonstrates that we can use any location as the obfuscated location for user selection without impacting the achievable optimal coverage utility.

Lemma 2. The optimal value of Eq. 3 cannot exceed

(7)

and this value can only be achieved if

(8)

Proof. With geographic differential privacy constraints,

(9)
(10)

Remark. Lemma 2 points out an upper bound of the optimal objective value and the condition (Eq. 8) that must satisfy for getting the upper bound value. However, whether we can find a feasible satisfying Eq. 8, as well as Eq. 4-6 is still unknown. Next, we prove that this exists.

Lemma 3. If satisfies Eq. 8, then

(11)

Proof. Considering that is a distance metric, then

(12)

Remark. Lemma 3 proves that when Eq. 8 stands, Eq. 4 of must also hold for any , .

Theorem 1. Given any , we can get a feasible ,

(13)
(14)

which can achieve the upper bound Eq. 7. Here, can be any positive constant value smaller than or equal to a threshold , where

(15)

The proof is in the appendix.

Note that while Theorem 1 gets an optimal solution, in reality, there may not be enough users who report for selection (if is too small and the total user number is limited). Later we will propose a practical solution overcoming this shortcoming, when addressing the multi-location scenario.

Multi-Location Coverage Problem (MLCP)

A more complicated setting for mobile crowd coverage problem includes a set of locations that need to be covered. Real-life examples include delivering coupons of chain stores to users who will probably visit any of them in the next time period. Denote the set of locations to cover as

(16)

then the probability of a user’s actual frequent location belonging to is:

(17)

Then, we can maximize Eq. 17 with the constraints Eq. 4-6 to get the optimal privacy policy , and the obfuscated location for future crowd coverage maximization.

(18)
(19)

Similar to the single location coverage problem, we can prove the following lemmas.

Lemma 4. For any , the optimal objective values of Eq. 18 are the same if we set or .

Lemma 4 is a straightforward extension of Lemma 1 to the multiple location coverage scenario.

Lemma 5. The optimal value of Eq. 18 cannot exceed

(20)

and this value can be achieved only if

(21)

The detailed proof is in the appendix.

Although Lemma 5 seems to be an extension of Lemma 2 for the multi-location scenario, they have a significant difference that the optimal value Eq. 20 may not always be feasible, i.e., Eq. 21 may not stand. Take a toy example of containing two locations, it means that, for any

(22)
(23)

Then, for any two locations , let or , then

(24)
(25)

Hence, if Eq. 20 is feasible, Eq. 25 must hold. Figure 3 shows two examples, in one of which Eq. 25 stands (Figure 3a) and the other does not (Figure 3b, considering the Euclidean distance). This shows that whether Eq. 20 can be achieved depends on the distribution of the target locations.

Figure 3: Toy examples with two locations to cover.

A Practical Solution to MLCP

While we cannot always obtain the upper bound value of Eq. 20 for the multi-location coverage problem, here we propose a practical solution which can work in real scenarios.

Revisiting the objective function of the multi-location coverage problem, Eq. 18, we can see that the main difficulty in solving the optimization problem is that the denominator includes in it. To address this issue, we propose to add one more constraint to the optimization process by setting the denominator to a constant value,

(26)

where is a constant between 0 and 1; we will later elaborate how to set . With Eq. 26, the objective function is,

(27)

Lemma 4 has shown that we can set to any without affecting the optimal objective value. Since Eq. 4-6

are all linear constraints, we can then use state-of-the-art linear programming tools (e.g., Mosek and Gurobi) to solve the optimization problem to get the optimal privacy policy

.

Setting with Binomial Distribution.

We then discuss how to set in real-life scenarios. First, we prove that if we want to get the objective value as high as possible, we should set as small as possible.

Theorem 2. Given , suppose , are the two optimal objective values of Eq. 27 when we set to , , respectively, and , then .

Proof. We denote the optimal when setting to , as , , respectively. Then, we construct a new solution of when as follows:

where . All the constraints of Eq. 4-6 still stand for . As the optimal objective value is when ,

Theorem 2 is very important for our practical solution, because it tells us that to get the optimal solution, we only need to solve the linear program once by setting to the smallest value that we can accept, rather than enumerating all the possible . On the other hand, can be seen as the overall probability that a user will report her/his frequent location as . Since we need to select users from such users, we cannot set to a too small value, which will lead to very few people reporting their locations as . Therefore, we propose a method to set , with a guarantee that the platform can find users with a probability of (e.g., 95%) as follows.

Input : : overall user spatial distribution.
: differential privacy budget.
: whole set of locations.
: set of target locations to cover.
: total number of users.
: number of users to select.
: probability threshold for user selection.
Output : : optimal differential privacy policy.
: the obfuscated location to select users.
1 (or any other ) ;
2 the minimum value that can ensure

for the Binomial distribution

;
3 Solve the linear program to get optimal :
 return , ;
Algorithm 1 Optimal policy for multi-location coverage.

Suppose totally users report their frequent locations, then we can estimate the number of users who will report their obfuscated frequent locations as with the Binomial probability . Then, the probability that we can find at least users is that,

(28)

And thus we would like to set to the smallest value that ensures .

We describe the pseudo-code of our practical solution for the private multi-location coverage problem in Algorithm 1. Note that since covering one location is a special case of covering multiple locations, Algorithm 1 can also solve the single location coverage problem, without the need to assume that we will always have enough users reporting .

Input : : number of user groups to split.
other inputs like Algorithm 1, except that is unknown.
Output : : selected users.
1 uniform distribution (or other proper initial distribution);
2 users are split into groups, each with users;
3 for  do
4          run Algorithm 1 with ;
5          foreach  do
                   /* downloads to the mobile client */
6                   a randomly selected frequent location;
7                   obfuscating by ;
                   /* uploads to the server */
8                   ;
9                  
10          end foreach
11          the mean value of over ;
12         
13 end for
14;
15 for  do
16          foreach  do
17                   if ’s obfuscated location is  then
18                            ;
19                            if  then
20                                     return ;
21                                    
22                            end if
23                           
24                   end if
25                  
26          end foreach
27         
28 end for
29 return ;
Algorithm 2 User selection with dynamic estimating .

Estimating Overall Location Distribution .

Previously, we assume that we have known the overall frequent location distribution . This may be possible when we have other sources to infer , e.g., mobile call logs [Blondel et al.2012]. However, if we do not have such data, other methods are required to estimate along with user selection. We thus propose a Bayes rule based method to do user selection and estimation simultaneously, as shown in Algorithm 2.

Our basic idea is using users’ uploaded obfuscated locations to refine . Note that our mechanism requires that each user uploads the obfuscated location only once to ensure differential privacy protection [Andrés et al.2013]. Hence, to preserve differential privacy, we split all the users into groups, get users’ obfuscated locations group by group, and iteratively refine with the obfuscated locations from previous user groups. The key update formula of is the Bayes rule in line 2. In such a way, the estimated gradually reaches the actual after iterative refinements. As generally becomes more and more accurate, the final user selection is biased to the users in the groups who upload locations later (line 2-2). The number of groups balances the trade-off between algorithm running efficiency and solution quality — larger updates more frequently, but costs more time as it involves iterations of running Algorithm 1.

Note that in real implementation, users who do not have any frequent locations can still upload ‘NULL’ to the server. Then, we can estimate the percentage of users who can report locations from previous user groups. This can help us to set an appropriate used in the optimization so as to finally find users with a probability of .

Experiments

In this section, we conduct empirical studies on three real user mobility datasets. We use Algorithm 2 for both single and multi-location coverage scenarios given its practicality (no need to foreknow ).

Baselines

  • Laplace. The state-of-the-art method to achieve geographic differential privacy is based on the Laplace distribution [Andrés et al.2013].

  • NO. We use the No-Obfuscation (NO) policy, i.e., the users upload one of their real frequent locations to the server, to show an upper bound of the coverage.

  • Random. We use the random user selection to serve as the lower bound of the coverage that can be achieved.

Datasets

  • FS dataset [Yang et al.2016] contains 1083 Foursquare users’ check-ins in New York, USA across near one year. We set the time period to a weekly granularity, that is, the selected users are expected to visit the target locations in the next week. The studied area (Figure 9(a)) is split into 1km*1km grids. Among the 45 weeks of user mobility data, we use the last five weeks as the test time period, and first 40 weeks for mobility profiling.

  • CMCC dataset contains 1315 users’ GPS trajectories in Hangzhou, China, for one month from one mobile operator. The time period is set to a daily granularity. The studied area (Figure 9(b)) is split into 1km*1km grids. We use the first 18 weekdays for mobility profiling and the remaining four weekdays for testing.

  • D4D dataset [Blondel et al.2012] includes 5378 users’ two-week mobile phone call logs with cell tower locations in Abidjan, Côte d’Ivoire. The time period is set to a daily granularity. The studied area (Figure 9(c)) is split into cell-tower-based regions [Xiong et al.2016, Wang et al.2017]. We use the first nine weekdays for mobility profiling and the last one weekday for testing.

(a) New York
(b) Hangzhou
(c) Abidjan
Figure 4: Experiment areas. Points in NY and Hangzhou are user locations, and points in Abidjan are cell towers.

Table 1 summarizes the experimental parameters. Note that the default differential privacy budget is set to as suggested by the original paper [Andrés et al.2013].

Notation Values Description
ln(2), ln(4), ln(6), ln(8) differential privacy level
0.5, 0.6, 0.7, 0.8 threshold for frequent locations
1083 (FS), 1315 (CMCC) total number of users
5378 (D4D)
5% number of selected users
95% probability for user selection
6 number of user groups
Table 1: Experimental parameters.

Results on FS

Single Location Coverage.

We first evaluate the scenario where only one location (grid) needs to be covered. Our evaluation metric is the probability that a selected user will actually appear at the target location in the next week.

Figure 5 shows the results on two target locations with different population sizes when and . In both target locations, our proposed method can achieve a larger coverage probability (up to 5% improvement) than the Laplace mechanism. Compared to the no-obfuscation method, the coverage probability of our method drops from 32.9% to 21.7% for the densely populated target location. For the less densely populated one, the drop is bigger (from 30.5% to 14.5%). A possible explanation is that when the target location is densely populated, even if our mechanism mis-selects a user whose frequent location is not the target one, s/he still may go to the target location by chance.

Figure 5(a) illustrates how the coverage probability changes when we vary the privacy budget for the densely populated target location. As a trade-off between privacy and coverage, when increases (i.e., lower level of privacy), we can get a higher coverage probability. More specifically, the improvement of our method over Laplace is more significant for a lower , i.e., higher privacy protection guarantee.

Figure 5(b) shows the change of coverage probability when the threshold of frequent locations varies. The coverage probabilities of all the methods rise with the increase of . While a higher benefits coverage probability, the number of users who can upload their (obfuscated) frequent locations (i.e., candidates for selection) is smaller, because only users with at least one location profiling probability larger than will upload frequent locations. Based on experiment results, setting to around 0.7-0.8 is appropriate for our method, as the coverage probability is relatively satisfactory while a large portion of users can be involved.

Figure 5: Experiment of single location coverage on two different populated locations on FS ().
(a) Varying ()
(b) Varying ()
Figure 6: Single location coverage results on FS.
Figure 7: Multi-location coverage results on FS.
Figure 8: Results of estimating on FS.

Multi-Location Coverage.

We evaluate the scenario where multiple target locations exist. We randomly select 2, 4, 6 and 8 locations as the targets. Figure 8 shows the actual coverage probability that we can get, i.e., the probabilities of selected users covering any one of the target locations in the coming week. The results show that our proposed method consistently outperforms Laplace under the same level of privacy protection. Moreover, with an increasing number of the target locations, we find that the performance gap between our method and no-obfuscation becomes smaller. This indicates that, when there are more locations to cover, using our mechanism is more profitable, as the performance loss incurred by the geographic differential privacy protection becomes smaller.

Estimation of .

We evaluate whether our proposed Bayes rule based method can estimate correctly. We use KL divergence [Kullback and Leibler1951] to quantify the similarity between the estimated and the actual . The smaller KL divergence is, the more similar they are. Figure 8 shows the change of KL divergence with (the number of user groups), and is initialized to a uniform distribution. In Figure 8, represents the KL divergence between the uniform and the actual distribution. When is small, we have fewer iterations to update , leading to a larger KL divergence. In our experiment, is a good setting, as KL divergence achieves a relatively low value, while the algorithm can complete execution within a reasonable time.

Runtime Efficiency.

We use Gurobi 7.5 [Gurobi2014] as the linear programming solver engine to run Algorithm 1 for getting the optimal policy . It takes about 450 seconds on a commodity laptop with i5-5200U (2.2 GHz), 8G memory. We split all the users to six groups, meaning that Algorithm 1 is executed six times, which sums up to about 45 minutes. As the optimal privacy policy generation can be an offline process, such runtime efficiency is totally acceptable for real applications. Note that this running time is not affected by the number of users, so our method can serve mobile applications with a large number of users.

Results on CMCC and D4D

To test the robustness of our proposed method, we also conduct experiments on CMCC and D4D datasets. The results are shown in Figure 8(a) and 8(b), where we randomly select 1, 2, 4, and 8 locations to cover. The results verify that our proposed method can always outperform the Laplace mechanism in attaining a higher coverage probability of the selected users. Moreover, the results show that when the number of target locations to cover increases to 8, our privacy mechanism almost achieves the same coverage probability as no-obfuscation, especially for the D4D dataset. This further emphasizes the practicability of our mechanism, as user privacy is gained with a nearly negligible quality loss. Note that the achieved coverage probability on D4D is smaller than FS or CMCC in general, because the phone call locations on D4D are intrinsically more difficult to predict. Please refer to the appendix for detailed mobility prediction results.

(a) CMCC (Hangzhou)
(b) D4D (Abidjan)
Figure 9: Experiment results on CMCC and D4D with different number of locations to cover ().

Related Work

Selecting a set of users who can cover a set of locations in the near future is a very important problem for real applications like spatial crowdsourcing [Chen and Shahabi2016, Zhang et al.2014] and location-based advertising [Dhar and Varshney2011]. In most of previous research works, users’ moving histories are known and hence their mobility patterns can be effectively modeled for predicting their future locations [Xiong et al.2016, Guo et al.2017, Yang et al.2015].

As user privacy is becoming more and more important nowadays, some pioneering works have started to model users’ mobility or activity patterns based on privacy-preserving data. Geo-indistinguishability mechanisms are proposed for location-based query systems where users can submit their differentially obfuscated locations [Andrés et al.2013, Bordenabe, Chatzikokolakis, and Palamidessi2014]. PrivCheck is designed to enable personalized location-based advertising or recommendation with obfuscated user check-ins, so that users’ sensitive information (e.g., age and gender) cannot be inferred by adversaries [Yang et al.2016]. In spatial crowdsourcing data acquisition, recent works also incorporate privacy mechanisms to protect participants’ precise locations [Wang et al.2016, Wang et al.2017, To, Ghinita, and Shahabi2014, Vergara-Laurens, Mendez, and Labrador2014, Pournajaf et al.2014]. While these studies have various applications, they usually focus on obfuscating users’ current locations. As far as we know, little previous work has studied the privacy-preserving future crowd coverage maximization problem based on users’ obfuscated historical mobility profiles which we specifically focus on in this paper.

Conclusion

In this paper, we study the crowd coverage maximization problem under the privacy protection on user locations. The key idea is to select users who will probably visit certain locations in near future with their differentially obfuscated locations. To maximize the quality (coverage probability) of selected users under such a privacy protection scheme, an optimization problem is formulated to obtain the optimal privacy policy. We mathematically analyze the problem, and then propose a practical algorithm to obtain the optimal privacy policy. Experiments on various real user mobility datasets have verified the effectiveness of our privacy mechanism. As future work, we plan to study the problem when a user can upload multiple obfuscated frequent locations.

Acknowledgment

This research is partially supported by NSFC Grant no. 71601106, State Language Commission of China Key Program Grant no. ZDI135-18, Hong Kong ITF Grant no. ITS/391/15FX, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement 683253/GraphInt).

References

  • [Andrés et al.2013] Andrés, M. E.; Bordenabe, N. E.; Chatzikokolakis, K.; and Palamidessi, C. 2013. Geo-indistinguishability: Differential privacy for location-based systems. In Proc. CCS, 901–914.
  • [Blondel et al.2012] Blondel, V. D.; Esch, M.; Chan, C.; Clérot, F.; Deville, P.; Huens, E.; Morlot, F.; Smoreda, Z.; and Ziemlicki, C. 2012. Data for development: the d4d challenge on mobile phone data. arXiv preprint arXiv:1210.0137.
  • [Bordenabe, Chatzikokolakis, and Palamidessi2014] Bordenabe, N. E.; Chatzikokolakis, K.; and Palamidessi, C. 2014. Optimal geo-indistinguishable mechanisms for location privacy. In Proc. CCS, 251–262.
  • [Boyd and Vandenberghe2004] Boyd, S., and Vandenberghe, L. 2004. Convex optimization. Cambridge university press.
  • [Chen and Shahabi2016] Chen, L., and Shahabi, C. 2016. Spatial crowdsourcing: Challenges and opportunities. IEEE Data Eng. Bull. 39(4):14–25.
  • [Cho, Myers, and Leskovec2011] Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship and mobility: user movement in location-based social networks. In Proc. KDD, 1082–1090.
  • [Dhar and Varshney2011] Dhar, S., and Varshney, U. 2011. Challenges and business models for mobile location-based services and advertising. Communications of the ACM 54(5):121–128.
  • [Dwork2008] Dwork, C. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation, 1–19.
  • [Fawcett2006] Fawcett, T. 2006. An introduction to roc analysis. Pattern recognition letters 27(8):861–874.
  • [Guo et al.2017] Guo, B.; Liu, Y.; Wu, W.; Yu, Z.; and Han, Q. 2017. Activecrowd: A framework for optimized multitask allocation in mobile crowdsensing systems. IEEE Transactions on Human-Machine Systems 47(3):392–403.
  • [Gurobi2014] Gurobi. 2014. Inc.,“gurobi optimizer reference manual,” 2014. URL: http://www. gurobi. com.
  • [Kullback and Leibler1951] Kullback, S., and Leibler, R. A. 1951. On information and sufficiency. The annals of mathematical statistics 22(1):79–86.
  • [Pournajaf et al.2014] Pournajaf, L.; Xiong, L.; Sunderam, V.; and Goryczka, S. 2014. Spatial task assignment for crowd sensing with cloaked locations. In Proc. MDM, volume 1, 73–82.
  • [Rossi et al.2015] Rossi, L.; Williams, M. J.; Stich, C.; and Musolesi, M. 2015. Privacy and the city: User identification and location semantics in location-based social networks. In Proc. ICWSM, 387–396.
  • [To, Ghinita, and Shahabi2014] To, H.; Ghinita, G.; and Shahabi, C. 2014. A framework for protecting worker location privacy in spatial crowdsourcing. Proc. of the VLDB Endowment 7(10):919–930.
  • [Vergara-Laurens, Mendez, and Labrador2014] Vergara-Laurens, I. J.; Mendez, D.; and Labrador, M. A. 2014. Privacy, quality of information, and energy consumption in participatory sensing systems. In Proc. PerCom, 199–207.
  • [Wang et al.2016] Wang, L.; Zhang, D.; Yang, D.; Lim, B. Y.; and Ma, X. 2016. Differential location privacy for sparse mobile crowdsensing. In Proc. ICDM, 1257–1262.
  • [Wang et al.2017] Wang, L.; Yang, D.; Han, X.; Wang, T.; Zhang, D.; and Ma, X. 2017. Location privacy-preserving task allocation for mobile crowdsensing with differential geo-obfuscation. In Proc. WWW, 627–636.
  • [Xiong et al.2016] Xiong, H.; Zhang, D.; Chen, G.; Wang, L.; Gauthier, V.; and Barnes, L. E. 2016. icrowd: Near-optimal task allocation for piggyback crowdsensing. IEEE Transactions on Mobile Computing 15(8):2010–2022.
  • [Yang et al.2015] Yang, D.; Zhang, D.; Zheng, V. W.; and Yu, Z. 2015. Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. IEEE Transactions on Systems, Man, and Cybernetics: Systems 45(1):129–142.
  • [Yang et al.2016] Yang, D.; Zhang, D.; Qu, B.; and Cudré-Mauroux, P. 2016. Privcheck: privacy-preserving check-in data publishing for personalized location based services. In Proc. UbiComp, 545–556.
  • [Zhang et al.2014] Zhang, D.; Wang, L.; Xiong, H.; and Guo, B. 2014. 4w1h in mobile crowd sensing. IEEE Communications Magazine 52(8):42–48.
  • [Zheng et al.2014] Zheng, Y.; Capra, L.; Wolfson, O.; and Yang, H. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology 5(3):38.

Appendix

Detailed Proof of Lemma 1

Suppose we have two different , i.e., , and get two different optimal objective values

We now construct a new solution of when as follows:

We can verify that all the constraints of the optimization still stand, and then is a feasible solution when , and then

This violates that is the optimal solution when . ∎

Detailed Proof of Theorem 1

With the following way to construct ,

(29)
(30)

Then, for any ,

So the probability sum constraint stands. We then prove that differential privacy constraint also stands. Note that Lemma 3 has proved that the differential privacy constraint holds if . Therefore, we only need to show that the differential privacy constraint also stands for . Next, we show how to select to ensure that this is true for any ,

(31)
(32)

It is worth noting that both the numerator and denominator in the right side of Eq. 32 are larger than zero when . Hence, we can set to any positive value smaller than or equal to

(33)

and then for any , geographic differential privacy still holds. Then, based on Lemma 2, we can know that the satisfying Eq. 29 and 30 can lead to the upper bound of the objective value. ∎

Detailed Proof of Lemma 5

According to the geographic differential privacy constraints, we have

(34)
(35)
(36)
(37)

Then,

(38)
(39)

For the ease of presentation, we denote ,

(40)
(41)
(42)
(43)

Mobility Profiling

We consider two popular mobility profiling methods used in literature, and choose the better one in our experiments.

(1) Frequency [Guo et al.2017]. This method counts daily (or weekly) frequency that a user visits a location in her/his historical mobility records. For example, suppose we have a user’s 7-day mobility history and s/he visits a location in 5 days, then the daily visiting probability is 5/7.

(2) Poisson [Xiong et al.2016]. Given a user ’s average daily (or weekly) visiting times to location in the past, denoted as , then the Poisson process estimates that visits at least once in one day (week) is:

(44)
(a) New York (FS)
(b) Hangzhou (CMCC)
(c) Abidjan (D4D)
Figure 10: ROC curves of mobility profiling.

Figure 10 plots the receiver operating characteristics (ROC) curves [Fawcett2006] and shows the area under the ROC curve (AUC) values for both profiling methods on the FS, CMCC, and D4D datasets, respectively. The larger AUC value implies better performance in predicting a user’s future mobility patterns. From the results, we see that Poisson-based mobility profiling method beats Frequency-based method, and thus we use the Poisson-based method in all the experiments.

In addition, we observe that the mobility prediction on the D4D dataset is more difficult than on the other two datasets, as it gets a lower AUC value. As expected, our experiments in the paper (Figure 9) show that the selected users on the D4D dataset achieve a lower coverage probability than the other two datasets with the same user selection mechanism.