Introduction
Crowd coverage maximization is a classical problem in mobile computing: how to select
users from a candidate pool to maximize the probability of covering a set of target locations in a coming time period (e.g., one day or one week). This problem and its variants have a wide spectrum of applications in locationbased advertising
[Dhar and Varshney2011], spatial crowdsoucing [Chen and Shahabi2016, Zhang et al.2014], urban computing [Zheng et al.2014], etc. For example, it can help shop owners to offer electronic coupons to the set of mobile app users who may physically visit the region around the shop soon; it can also help crowdsourcing organizers to recruit the participants to cover the task area with the highest probability [Xiong et al.2016].One of the key steps in crowd coverage maximization is mobility profiling, i.e., predicting the probability of a user appearing at a certain location. A common practice is first dividing an area into finegrained grids or subareas, and then counting the frequency of a user appearing in each grid based on trajectory history [Guo et al.2017]
. One can use more sophisticated models like Poisson process to estimate users’ occurrence distribution
[Xiong et al.2016]. Existing mobility profiling methods often require access to users’ historical mobility traces, which may seriously compromise user privacy. For example, users’ exposed location data may reveal sensitive information about their identities and social relationships [Cho, Myers, and Leskovec2011, Rossi et al.2015]. Despite the importance of location privacy, as far as we know, there is little research effort combining location privacy, mobility profiling, and crowd coverage maximization up to date.To fill this gap, this paper aims to explore how to protect the crowds’ location privacy, while still optimizing their expected coverage of a set of locations. To achieve this goal, we propose a mobile crowd coverage maximization framework with a rigorous privacy protection scheme — geographic differential privacy [Andrés et al.2013]. A geographic differential privacy policy obfuscates a user’s actual location to another with carefully designed probabilities, such that adversaries, regardless of their prior knowledge, can learn little about the user’s true location after observing the obfuscated locations. However, with differential privacy protection, crowd coverage maximization can only be performed based on the obfuscated (inaccurate) locations, which leads to inevitable loss of the quality of the selected users. Therefore, we propose a method to generate the optimal location obfuscation policy which satisfies geographic differential privacy while minimizing such loss.
In summary, this paper has the following contributions:
(1) To the best of our knowledge, this is the first work studying the mobile crowd coverage maximization problem with location privacy protection.
(2) In our approach, users only need to upload one of their frequently visited locations, and more importantly, the uploaded location is obfuscated using the rigorous privacy policy — geographic differential privacy. We further formulate an optimization problem to obtain the optimal obfuscation policy that can maximize the expected future crowd coverage over a set of locations under a guaranteed level of differential privacy protection. As the optimization problem is nonconvex, we first mathematically analyze the scenario when only one location needs to cover and then derive an optimal solution. Then, we extend this setting to the multilocation coverage scenario and propose a practical algorithm to obtain the optimal obfuscation policy.
(3) Experiments on real human mobility datasets verify that, by selecting the same number of users under the same level of privacy protection, our method achieves a higher coverage than stateoftheart differential privacy methods.
Preliminaries
Geographic differential privacy [Andrés et al.2013] introduces the idea of database differential privacy [Dwork2008] into the location obfuscation context. Its key idea is: given an observed obfuscated location , any two locations and have similar probabilities of being mapped to . It is thus hard for an adversary to differentiate whether the user is at or by observing .
Definition 1 [Andrés et al.2013]. Suppose the target area includes a set of locations , then an obfuscation policy satisfies geographic differential privacy, iff.
(1) 
where is the probability of obfuscating to , is the distance between and , is the privacy budget — the smaller , the better privacy protection.
Note that the set of locations are usually constructed by dividing the target area into subregions, e.g., equalsize grids [Bordenabe, Chatzikokolakis, and Palamidessi2014] or celltower regions [Xiong et al.2016].
If satisfies geographic differential privacy, it can be proven that for adversaries with any prior knowledge about users’ location distributions, their posterior knowledge after observing the obfuscated location can only be increased by a small constant factor [Andrés et al.2013]. Note that this protection is guaranteed even if the adversaries know . Due to this rigorous protection effect, geographic differential privacy has seen many applications in location based services, spatial crowdsourcing, etc. [Bordenabe, Chatzikokolakis, and Palamidessi2014, Wang et al.2016, Wang et al.2017].
Mobility profiling aims to estimate the probability of a user covering a certain location during a time period in the future. Specifically, a user ’s mobility profile is denoted as , and means the estimated probability of visiting in a concerned future period (e.g., next week). Commonly used mobility profiling methods include frequencybased [Guo et al.2017] and Poissonbased [Xiong et al.2016] algorithms. We use the Poisson process to model user mobility given its better prediction performance in our experiments. More details can be found in the appendix.
Framework Overview
We present an overview of our privacy framework in Figure 1. The key idea of our framework is that users should expose their location information as little as possible, while we can still select a proper set of users for optimizing their coverage on certain target locations in the future.
The two main players in our framework are a server platform and its mobile client users. As we want users to expose their actual location information as little as possible, user mobility profiling runs locally on individuals’ smart devices. That means, the clients’ mobility profiles are only known to themselves. As shown in the literature, only uploading frequent locations with high profiling probabilities (e.g., ) to the server can already help achieve a good future crowd coverage [Guo et al.2017]. To limit the potential location leakage, our framework only requires users to upload one of their frequent locations. Moreover, this frequent location is obfuscated by the geographic differential privacy policy before being sent to the platform. The policy is generated by the server based on which target locations need to be covered. Finally, according to the uploaded obfuscated frequent locations (suppose users), the platform aims to select a set of users to maximize the expected coverage of intended locations in the coming period.
A running example is shown in Figure 2, where the table represents a user’s mobility profile in a 2D spatial area splitted into uniform grids. Suppose that a locationbased advertising platform needs to decide whether to send a Starbucks coupon to a user. The platform expects that a user receiving the coupon is a frequent visitor to the regions where Starbucks stores are located, so that the user will probably go to the stores. To achieve this goal, first, a user client computes its owner’s mobility profile locally. Second, from the set of locations whose probabilities are larger than a threshold (e.g., 80%), the user client randomly selects one location to be uploaded to the server.^{1}^{1}1If there is no location with the probability larger than the threshold, then the user does not upload any location. Third, according to the privacy policy received from the server, the user client randomly obfuscates to and then sends to the server. Finally, the server will decide whether to send the coupon to the user or not according to the uploaded . In this case, a user’s location privacy is preserved as the uploaded frequent location is rigorously obfuscated with differential privacy.
Location obfuscation would inevitably introduce certain loss of quality in selecting users for coverage optimization, as users’ uploaded frequent locations contain deliberate noises. Hence, how the server generates the privacy policy is the key challenge of our framework, which aims to minimize the loss of quality caused by privacy protection.
Optimal Privacy Policy
In this section, we illustrate our solution that guarantees geographic differential privacy while minimizing the loss of quality in mobile crowd coverage optimization.
Single Location Coverage Problem (SLCP)
As the first step, we analyze the scenario where only one location needs to be covered. In locationbased advertising, this reflects the scenario that the advertising only involves one specific site (e.g., a newly opened restaurant). In spatial crowdsourcing, this means that the task is only associated with one location (e.g., taking the photo of Statue of Liberty). Suppose the target location to cover as and a user submits her/his obfuscated frequent location as , then the probability of her/his frequent location being actually is:
(2) 
where is the overall distribution of all the users’ frequent locations. Here we suppose that we can foreknow , and later we will elaborate how to estimate it. Note that the denominator can be seen as the overall probability of a user reporting her/his frequent location as .
Suppose we select a user reporting to cover the target location in the coming time period, apparently we would like to maximize Eq. 2 so that the future probability of the user covering is maximized. With this idea, we have the following optimization process to get the optimal privacy policy . Particularly, given to cover, we aim to
(3)  
(4)  
(5)  
(6) 
Eq. 4 is the constraint of geographic differential privacy; Eq. 5 and 6 are probability restrictions. By solving the above optimization problem, we can get the optimal privacy policy , as well as the user selection strategy, i.e., selecting the users reporting for future coverage maximization.
However, even given , Eq. 3 cannot be converted to a convex optimization problem with existing solutions [Boyd and Vandenberghe2004]. To overcome this difficulty, we then analyze the relationship between the constraints and the objective function, and then deduce an optimal solution analytically.
An Analytic Solution to SLCP
Our analytic deduction includes three steps. First, we verify that the selection of will not affect the optimal objective value of Eq. 3. Second, we prove that Eq. 3 cannot exceed a certain upper bound. Finally, we show that this upper bound can be achieved by constructing a feasible solution of .
Lemma 1. For any two locations , the optimal objective values of Eq. 3 are the same if we set or .
Proof. For or , we can always find a pair of , , where , and for other ; and lead to the same objective value. A detailed proof is in the appendix. ∎
Remark. Lemma 1 demonstrates that we can use any location as the obfuscated location for user selection without impacting the achievable optimal coverage utility.
Proof. With geographic differential privacy constraints,
(9)  
(10) 
∎
Remark. Lemma 2 points out an upper bound of the optimal objective value and the condition (Eq. 8) that must satisfy for getting the upper bound value. However, whether we can find a feasible satisfying Eq. 8, as well as Eq. 46 is still unknown. Next, we prove that this exists.
Lemma 3. If satisfies Eq. 8, then
(11) 
Proof. Considering that is a distance metric, then
(12) 
∎
Theorem 1. Given any , we can get a feasible ,
(13)  
(14) 
which can achieve the upper bound Eq. 7. Here, can be any positive constant value smaller than or equal to a threshold , where
(15) 
The proof is in the appendix.
Note that while Theorem 1 gets an optimal solution, in reality, there may not be enough users who report for selection (if is too small and the total user number is limited). Later we will propose a practical solution overcoming this shortcoming, when addressing the multilocation scenario.
MultiLocation Coverage Problem (MLCP)
A more complicated setting for mobile crowd coverage problem includes a set of locations that need to be covered. Reallife examples include delivering coupons of chain stores to users who will probably visit any of them in the next time period. Denote the set of locations to cover as
(16) 
then the probability of a user’s actual frequent location belonging to is:
(17) 
Then, we can maximize Eq. 17 with the constraints Eq. 46 to get the optimal privacy policy , and the obfuscated location for future crowd coverage maximization.
(18)  
(19) 
Similar to the single location coverage problem, we can prove the following lemmas.
Lemma 4. For any , the optimal objective values of Eq. 18 are the same if we set or .
Lemma 4 is a straightforward extension of Lemma 1 to the multiple location coverage scenario.
The detailed proof is in the appendix.
Although Lemma 5 seems to be an extension of Lemma 2 for the multilocation scenario, they have a significant difference that the optimal value Eq. 20 may not always be feasible, i.e., Eq. 21 may not stand. Take a toy example of containing two locations, it means that, for any
(22)  
(23) 
Then, for any two locations , let or , then
(24)  
(25) 
Hence, if Eq. 20 is feasible, Eq. 25 must hold. Figure 3 shows two examples, in one of which Eq. 25 stands (Figure 3a) and the other does not (Figure 3b, considering the Euclidean distance). This shows that whether Eq. 20 can be achieved depends on the distribution of the target locations.
A Practical Solution to MLCP
While we cannot always obtain the upper bound value of Eq. 20 for the multilocation coverage problem, here we propose a practical solution which can work in real scenarios.
Revisiting the objective function of the multilocation coverage problem, Eq. 18, we can see that the main difficulty in solving the optimization problem is that the denominator includes in it. To address this issue, we propose to add one more constraint to the optimization process by setting the denominator to a constant value,
(26) 
where is a constant between 0 and 1; we will later elaborate how to set . With Eq. 26, the objective function is,
(27) 
Lemma 4 has shown that we can set to any without affecting the optimal objective value. Since Eq. 46
are all linear constraints, we can then use stateoftheart linear programming tools (e.g., Mosek and Gurobi) to solve the optimization problem to get the optimal privacy policy
.Setting with Binomial Distribution.
We then discuss how to set in reallife scenarios. First, we prove that if we want to get the objective value as high as possible, we should set as small as possible.
Theorem 2. Given , suppose , are the two optimal objective values of Eq. 27 when we set to , , respectively, and , then .
Proof. We denote the optimal when setting to , as , , respectively. Then, we construct a new solution of when as follows:
where . All the constraints of Eq. 46 still stand for . As the optimal objective value is when ,
∎
Theorem 2 is very important for our practical solution, because it tells us that to get the optimal solution, we only need to solve the linear program once by setting to the smallest value that we can accept, rather than enumerating all the possible . On the other hand, can be seen as the overall probability that a user will report her/his frequent location as . Since we need to select users from such users, we cannot set to a too small value, which will lead to very few people reporting their locations as . Therefore, we propose a method to set , with a guarantee that the platform can find users with a probability of (e.g., 95%) as follows.
Suppose totally users report their frequent locations, then we can estimate the number of users who will report their obfuscated frequent locations as with the Binomial probability . Then, the probability that we can find at least users is that,
(28) 
And thus we would like to set to the smallest value that ensures .
We describe the pseudocode of our practical solution for the private multilocation coverage problem in Algorithm 1. Note that since covering one location is a special case of covering multiple locations, Algorithm 1 can also solve the single location coverage problem, without the need to assume that we will always have enough users reporting .
Estimating Overall Location Distribution .
Previously, we assume that we have known the overall frequent location distribution . This may be possible when we have other sources to infer , e.g., mobile call logs [Blondel et al.2012]. However, if we do not have such data, other methods are required to estimate along with user selection. We thus propose a Bayes rule based method to do user selection and estimation simultaneously, as shown in Algorithm 2.
Our basic idea is using users’ uploaded obfuscated locations to refine . Note that our mechanism requires that each user uploads the obfuscated location only once to ensure differential privacy protection [Andrés et al.2013]. Hence, to preserve differential privacy, we split all the users into groups, get users’ obfuscated locations group by group, and iteratively refine with the obfuscated locations from previous user groups. The key update formula of is the Bayes rule in line 2. In such a way, the estimated gradually reaches the actual after iterative refinements. As generally becomes more and more accurate, the final user selection is biased to the users in the groups who upload locations later (line 22). The number of groups balances the tradeoff between algorithm running efficiency and solution quality — larger updates more frequently, but costs more time as it involves iterations of running Algorithm 1.
Note that in real implementation, users who do not have any frequent locations can still upload ‘NULL’ to the server. Then, we can estimate the percentage of users who can report locations from previous user groups. This can help us to set an appropriate used in the optimization so as to finally find users with a probability of .
Experiments
In this section, we conduct empirical studies on three real user mobility datasets. We use Algorithm 2 for both single and multilocation coverage scenarios given its practicality (no need to foreknow ).
Baselines

Laplace. The stateoftheart method to achieve geographic differential privacy is based on the Laplace distribution [Andrés et al.2013].

NO. We use the NoObfuscation (NO) policy, i.e., the users upload one of their real frequent locations to the server, to show an upper bound of the coverage.

Random. We use the random user selection to serve as the lower bound of the coverage that can be achieved.
Datasets

FS dataset [Yang et al.2016] contains 1083 Foursquare users’ checkins in New York, USA across near one year. We set the time period to a weekly granularity, that is, the selected users are expected to visit the target locations in the next week. The studied area (Figure 9(a)) is split into 1km*1km grids. Among the 45 weeks of user mobility data, we use the last five weeks as the test time period, and first 40 weeks for mobility profiling.

CMCC dataset contains 1315 users’ GPS trajectories in Hangzhou, China, for one month from one mobile operator. The time period is set to a daily granularity. The studied area (Figure 9(b)) is split into 1km*1km grids. We use the first 18 weekdays for mobility profiling and the remaining four weekdays for testing.

D4D dataset [Blondel et al.2012] includes 5378 users’ twoweek mobile phone call logs with cell tower locations in Abidjan, Côte d’Ivoire. The time period is set to a daily granularity. The studied area (Figure 9(c)) is split into celltowerbased regions [Xiong et al.2016, Wang et al.2017]. We use the first nine weekdays for mobility profiling and the last one weekday for testing.
Table 1 summarizes the experimental parameters. Note that the default differential privacy budget is set to as suggested by the original paper [Andrés et al.2013].
Notation  Values  Description 

ln(2), ln(4), ln(6), ln(8)  differential privacy level  
0.5, 0.6, 0.7, 0.8  threshold for frequent locations  
1083 (FS), 1315 (CMCC)  total number of users  
5378 (D4D)  
5%  number of selected users  
95%  probability for user selection  
6  number of user groups 
Results on FS
Single Location Coverage.
We first evaluate the scenario where only one location (grid) needs to be covered. Our evaluation metric is the probability that a selected user will actually appear at the target location in the next week.
Figure 5 shows the results on two target locations with different population sizes when and . In both target locations, our proposed method can achieve a larger coverage probability (up to 5% improvement) than the Laplace mechanism. Compared to the noobfuscation method, the coverage probability of our method drops from 32.9% to 21.7% for the densely populated target location. For the less densely populated one, the drop is bigger (from 30.5% to 14.5%). A possible explanation is that when the target location is densely populated, even if our mechanism misselects a user whose frequent location is not the target one, s/he still may go to the target location by chance.
Figure 5(a) illustrates how the coverage probability changes when we vary the privacy budget for the densely populated target location. As a tradeoff between privacy and coverage, when increases (i.e., lower level of privacy), we can get a higher coverage probability. More specifically, the improvement of our method over Laplace is more significant for a lower , i.e., higher privacy protection guarantee.
Figure 5(b) shows the change of coverage probability when the threshold of frequent locations varies. The coverage probabilities of all the methods rise with the increase of . While a higher benefits coverage probability, the number of users who can upload their (obfuscated) frequent locations (i.e., candidates for selection) is smaller, because only users with at least one location profiling probability larger than will upload frequent locations. Based on experiment results, setting to around 0.70.8 is appropriate for our method, as the coverage probability is relatively satisfactory while a large portion of users can be involved.
MultiLocation Coverage.
We evaluate the scenario where multiple target locations exist. We randomly select 2, 4, 6 and 8 locations as the targets. Figure 8 shows the actual coverage probability that we can get, i.e., the probabilities of selected users covering any one of the target locations in the coming week. The results show that our proposed method consistently outperforms Laplace under the same level of privacy protection. Moreover, with an increasing number of the target locations, we find that the performance gap between our method and noobfuscation becomes smaller. This indicates that, when there are more locations to cover, using our mechanism is more profitable, as the performance loss incurred by the geographic differential privacy protection becomes smaller.
Estimation of .
We evaluate whether our proposed Bayes rule based method can estimate correctly. We use KL divergence [Kullback and Leibler1951] to quantify the similarity between the estimated and the actual . The smaller KL divergence is, the more similar they are. Figure 8 shows the change of KL divergence with (the number of user groups), and is initialized to a uniform distribution. In Figure 8, represents the KL divergence between the uniform and the actual distribution. When is small, we have fewer iterations to update , leading to a larger KL divergence. In our experiment, is a good setting, as KL divergence achieves a relatively low value, while the algorithm can complete execution within a reasonable time.
Runtime Efficiency.
We use Gurobi 7.5 [Gurobi2014] as the linear programming solver engine to run Algorithm 1 for getting the optimal policy . It takes about 450 seconds on a commodity laptop with i55200U (2.2 GHz), 8G memory. We split all the users to six groups, meaning that Algorithm 1 is executed six times, which sums up to about 45 minutes. As the optimal privacy policy generation can be an offline process, such runtime efficiency is totally acceptable for real applications. Note that this running time is not affected by the number of users, so our method can serve mobile applications with a large number of users.
Results on CMCC and D4D
To test the robustness of our proposed method, we also conduct experiments on CMCC and D4D datasets. The results are shown in Figure 8(a) and 8(b), where we randomly select 1, 2, 4, and 8 locations to cover. The results verify that our proposed method can always outperform the Laplace mechanism in attaining a higher coverage probability of the selected users. Moreover, the results show that when the number of target locations to cover increases to 8, our privacy mechanism almost achieves the same coverage probability as noobfuscation, especially for the D4D dataset. This further emphasizes the practicability of our mechanism, as user privacy is gained with a nearly negligible quality loss. Note that the achieved coverage probability on D4D is smaller than FS or CMCC in general, because the phone call locations on D4D are intrinsically more difficult to predict. Please refer to the appendix for detailed mobility prediction results.
Related Work
Selecting a set of users who can cover a set of locations in the near future is a very important problem for real applications like spatial crowdsourcing [Chen and Shahabi2016, Zhang et al.2014] and locationbased advertising [Dhar and Varshney2011]. In most of previous research works, users’ moving histories are known and hence their mobility patterns can be effectively modeled for predicting their future locations [Xiong et al.2016, Guo et al.2017, Yang et al.2015].
As user privacy is becoming more and more important nowadays, some pioneering works have started to model users’ mobility or activity patterns based on privacypreserving data. Geoindistinguishability mechanisms are proposed for locationbased query systems where users can submit their differentially obfuscated locations [Andrés et al.2013, Bordenabe, Chatzikokolakis, and Palamidessi2014]. PrivCheck is designed to enable personalized locationbased advertising or recommendation with obfuscated user checkins, so that users’ sensitive information (e.g., age and gender) cannot be inferred by adversaries [Yang et al.2016]. In spatial crowdsourcing data acquisition, recent works also incorporate privacy mechanisms to protect participants’ precise locations [Wang et al.2016, Wang et al.2017, To, Ghinita, and Shahabi2014, VergaraLaurens, Mendez, and Labrador2014, Pournajaf et al.2014]. While these studies have various applications, they usually focus on obfuscating users’ current locations. As far as we know, little previous work has studied the privacypreserving future crowd coverage maximization problem based on users’ obfuscated historical mobility profiles which we specifically focus on in this paper.
Conclusion
In this paper, we study the crowd coverage maximization problem under the privacy protection on user locations. The key idea is to select users who will probably visit certain locations in near future with their differentially obfuscated locations. To maximize the quality (coverage probability) of selected users under such a privacy protection scheme, an optimization problem is formulated to obtain the optimal privacy policy. We mathematically analyze the problem, and then propose a practical algorithm to obtain the optimal privacy policy. Experiments on various real user mobility datasets have verified the effectiveness of our privacy mechanism. As future work, we plan to study the problem when a user can upload multiple obfuscated frequent locations.
Acknowledgment
This research is partially supported by NSFC Grant no. 71601106, State Language Commission of China Key Program Grant no. ZDI13518, Hong Kong ITF Grant no. ITS/391/15FX, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement 683253/GraphInt).
References
 [Andrés et al.2013] Andrés, M. E.; Bordenabe, N. E.; Chatzikokolakis, K.; and Palamidessi, C. 2013. Geoindistinguishability: Differential privacy for locationbased systems. In Proc. CCS, 901–914.
 [Blondel et al.2012] Blondel, V. D.; Esch, M.; Chan, C.; Clérot, F.; Deville, P.; Huens, E.; Morlot, F.; Smoreda, Z.; and Ziemlicki, C. 2012. Data for development: the d4d challenge on mobile phone data. arXiv preprint arXiv:1210.0137.
 [Bordenabe, Chatzikokolakis, and Palamidessi2014] Bordenabe, N. E.; Chatzikokolakis, K.; and Palamidessi, C. 2014. Optimal geoindistinguishable mechanisms for location privacy. In Proc. CCS, 251–262.
 [Boyd and Vandenberghe2004] Boyd, S., and Vandenberghe, L. 2004. Convex optimization. Cambridge university press.
 [Chen and Shahabi2016] Chen, L., and Shahabi, C. 2016. Spatial crowdsourcing: Challenges and opportunities. IEEE Data Eng. Bull. 39(4):14–25.
 [Cho, Myers, and Leskovec2011] Cho, E.; Myers, S. A.; and Leskovec, J. 2011. Friendship and mobility: user movement in locationbased social networks. In Proc. KDD, 1082–1090.
 [Dhar and Varshney2011] Dhar, S., and Varshney, U. 2011. Challenges and business models for mobile locationbased services and advertising. Communications of the ACM 54(5):121–128.
 [Dwork2008] Dwork, C. 2008. Differential privacy: A survey of results. In International Conference on Theory and Applications of Models of Computation, 1–19.
 [Fawcett2006] Fawcett, T. 2006. An introduction to roc analysis. Pattern recognition letters 27(8):861–874.
 [Guo et al.2017] Guo, B.; Liu, Y.; Wu, W.; Yu, Z.; and Han, Q. 2017. Activecrowd: A framework for optimized multitask allocation in mobile crowdsensing systems. IEEE Transactions on HumanMachine Systems 47(3):392–403.
 [Gurobi2014] Gurobi. 2014. Inc.,“gurobi optimizer reference manual,” 2014. URL: http://www. gurobi. com.
 [Kullback and Leibler1951] Kullback, S., and Leibler, R. A. 1951. On information and sufficiency. The annals of mathematical statistics 22(1):79–86.
 [Pournajaf et al.2014] Pournajaf, L.; Xiong, L.; Sunderam, V.; and Goryczka, S. 2014. Spatial task assignment for crowd sensing with cloaked locations. In Proc. MDM, volume 1, 73–82.
 [Rossi et al.2015] Rossi, L.; Williams, M. J.; Stich, C.; and Musolesi, M. 2015. Privacy and the city: User identification and location semantics in locationbased social networks. In Proc. ICWSM, 387–396.
 [To, Ghinita, and Shahabi2014] To, H.; Ghinita, G.; and Shahabi, C. 2014. A framework for protecting worker location privacy in spatial crowdsourcing. Proc. of the VLDB Endowment 7(10):919–930.
 [VergaraLaurens, Mendez, and Labrador2014] VergaraLaurens, I. J.; Mendez, D.; and Labrador, M. A. 2014. Privacy, quality of information, and energy consumption in participatory sensing systems. In Proc. PerCom, 199–207.
 [Wang et al.2016] Wang, L.; Zhang, D.; Yang, D.; Lim, B. Y.; and Ma, X. 2016. Differential location privacy for sparse mobile crowdsensing. In Proc. ICDM, 1257–1262.
 [Wang et al.2017] Wang, L.; Yang, D.; Han, X.; Wang, T.; Zhang, D.; and Ma, X. 2017. Location privacypreserving task allocation for mobile crowdsensing with differential geoobfuscation. In Proc. WWW, 627–636.
 [Xiong et al.2016] Xiong, H.; Zhang, D.; Chen, G.; Wang, L.; Gauthier, V.; and Barnes, L. E. 2016. icrowd: Nearoptimal task allocation for piggyback crowdsensing. IEEE Transactions on Mobile Computing 15(8):2010–2022.
 [Yang et al.2015] Yang, D.; Zhang, D.; Zheng, V. W.; and Yu, Z. 2015. Modeling user activity preference by leveraging user spatial temporal characteristics in lbsns. IEEE Transactions on Systems, Man, and Cybernetics: Systems 45(1):129–142.
 [Yang et al.2016] Yang, D.; Zhang, D.; Qu, B.; and CudréMauroux, P. 2016. Privcheck: privacypreserving checkin data publishing for personalized location based services. In Proc. UbiComp, 545–556.
 [Zhang et al.2014] Zhang, D.; Wang, L.; Xiong, H.; and Guo, B. 2014. 4w1h in mobile crowd sensing. IEEE Communications Magazine 52(8):42–48.
 [Zheng et al.2014] Zheng, Y.; Capra, L.; Wolfson, O.; and Yang, H. 2014. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology 5(3):38.
Appendix
Detailed Proof of Lemma 1
Suppose we have two different , i.e., , and get two different optimal objective values
We now construct a new solution of when as follows:
We can verify that all the constraints of the optimization still stand, and then is a feasible solution when , and then
This violates that is the optimal solution when . ∎
Detailed Proof of Theorem 1
With the following way to construct ,
(29)  
(30) 
Then, for any ,
So the probability sum constraint stands. We then prove that differential privacy constraint also stands. Note that Lemma 3 has proved that the differential privacy constraint holds if . Therefore, we only need to show that the differential privacy constraint also stands for . Next, we show how to select to ensure that this is true for any ,
(31)  
(32) 
It is worth noting that both the numerator and denominator in the right side of Eq. 32 are larger than zero when . Hence, we can set to any positive value smaller than or equal to
(33) 
and then for any , geographic differential privacy still holds. Then, based on Lemma 2, we can know that the satisfying Eq. 29 and 30 can lead to the upper bound of the objective value. ∎
Detailed Proof of Lemma 5
According to the geographic differential privacy constraints, we have
(34)  
(35)  
(36)  
(37) 
Then,
(38)  
(39) 
For the ease of presentation, we denote ,
(40)  
(41)  
(42)  
(43) 
∎
Mobility Profiling
We consider two popular mobility profiling methods used in literature, and choose the better one in our experiments.
(1) Frequency [Guo et al.2017]. This method counts daily (or weekly) frequency that a user visits a location in her/his historical mobility records. For example, suppose we have a user’s 7day mobility history and s/he visits a location in 5 days, then the daily visiting probability is 5/7.
(2) Poisson [Xiong et al.2016]. Given a user ’s average daily (or weekly) visiting times to location in the past, denoted as , then the Poisson process estimates that visits at least once in one day (week) is:
(44) 
Figure 10 plots the receiver operating characteristics (ROC) curves [Fawcett2006] and shows the area under the ROC curve (AUC) values for both profiling methods on the FS, CMCC, and D4D datasets, respectively. The larger AUC value implies better performance in predicting a user’s future mobility patterns. From the results, we see that Poissonbased mobility profiling method beats Frequencybased method, and thus we use the Poissonbased method in all the experiments.
In addition, we observe that the mobility prediction on the D4D dataset is more difficult than on the other two datasets, as it gets a lower AUC value. As expected, our experiments in the paper (Figure 9) show that the selected users on the D4D dataset achieve a lower coverage probability than the other two datasets with the same user selection mechanism.
Comments
There are no comments yet.