Mobile location-based services (LBSs)  exploit location information of mobile users to provide context-aware and personalized services. Mobile crowdsourcing, enabled by sensor-rich and widely-used smartphones, has further spurred a wealth of mobile LBSs such as transportation services , lifestyle enhancement systems , and localization applications . Such mobile LBSs highly rely on location-embedded information from diverse sources such as built-in sensors on mobile devices, human inputs, and social media . This presents significant privacy threats  to mobile users. When they query or contribute data to mobile LBS systems, the privacy-sensitive information of users might be disclosed to undesired parties by untrustworthy or poorly-designed back-end data management systems.
Specifically, the privacy-sensitive information in this paper includes “activity hotspots” and “activity transitions”. The former refers to places that a user stays for a long time (e.g., home and office) or frequently visits (e.g., a cafe), and the latter refers to the sequence of those activity hotspots visited by a user. Fig. 1 illustrates a location-based WiFi advisory system 
, where a user may query for nearby WiFi access points with good signal quality, or contributes her quality of experience (QoE) of WiFi usage to the system. In either case, the user may unintentionally disclose her home and office locations as well as the travel path between them. Such a problem becomes more thorny due to the recent fast advancement of machine learning and data mining techniques, where activity hotspots and transitions could be revealed through statistics and probabilistic models, and the “next place” of a user might be inferred . Therefore, privacy preservation for mobile location-based services and crowdsourcing applications is critical for such system to be widely and practically adopted.
Given that privacy is largely a subjective and qualitative notion, this paper proposes to quantify this notion for a set of location-embedded data submitted by a mobile user, using a metric called privacy exposure. This metric incorporates activity coverage and activity uniformity, where the former refers to the range and the latter refers to the distribution, of a user’s activity hotspots. For an illustrative example, see Fig. 2, where the user’s activity coverage in Fig. 2(a) is much smaller than in Fig. 2(b), while her activity uniformity in Fig. 2(c) is also much lower than in Fig. 2(b). Both cases (a) and (c) are prone to disclosing activity hotspots of a user.
Secondly, we address the problem of how to minimize privacy exposure by proposing a privacy-preserving sensing algorithm to judiciously submit location-embedded data for each mobile user. Our algorithm (1) exploits ambient network signatures which are composed of the WiFi BSSIDs or cellular base station IDs in surroundings, in order to avoid exposing activity hotspots, and (2) cloaks the activity transitions of a user using the -anonymity technique. We also devise a learning algorithm to learn the hotspots so as to differentiate between private and non-private locations.
We use extensive simulations to investigate how different activity patterns affect levels of privacy exposure and how our -anonymity-based technique can reduce privacy exposure. We also implement our privacy-preserving sensing algorithm in an Android-based mobile system called WiFi-Scout  to evaluate the effectiveness of our proposed metric and algorithm.
Ii Related Work
Preserving location privacy is a critical issue for mobile LBSs. The work 
studies how location check-in data in social networks discloses identity of mobile LBS users. Linear regression models are exploited in to transform data at the client side before data is submitted so that user privacy can be preserved. In , a hybrid approach is proposed to preserve location privacy, where the encryption, data anonymization, and data obfuscation techniques are adaptively chosen based on the areas of interest. The work  provides users with a privacy guarantee at the lowest utility loss.
Some -anonymity approaches  were proposed to achieve privacy preservation in location-based services. The work  proposes a -anonymity framework with a trusted anonymity server which performs location perturbation on location-embedded queries received from mobile devices before forwarding these queries to the Mobile LBS provider. The work  considers an entropy-based privacy metric in the temporal domain so that the query frequency of the real queried location is similar to that of other dummy locations. The work  deals with -anonymous location information in both spatial and temporal domains when making queries to LBSs. It proposes an algorithm that chooses a sufficiently large area and delays the query to make -anonymous queries if necessary. An entropy-based metric was also used in  for participatory sensing applications. It preserves privacy by distributing users’ mobility trajectories among multiple databases, and balancing the frequencies of locations in each individual database. However, there still lacks a metric for quantifying privacy exposure particularly in the spatial domain.
Compared with existing research efforts, the key contributions of this paper are summarized as follows. First, our work is the first to formally quantify privacy exposure in the spatial domain for a given set of activity points of an arbitrary user. The metric is generally defined such that it can be applied to various mobile LBSs. Second, we formally define the privacy exposure problem, and propose an algorithm as a solution. Finally, we evaluate our proposed metric and algorithm via both simulations and real-world experiments, which confirm their validity and effectiveness.
Iii Problem Definition
Iii-a Countermeasures to Privacy Threats
Multi-modal built-in sensors on smartphones have urged many mobile LBSs which highly rely on location-embedded information of users. To explain the privacy threats and how we can protect user privacy in mobile LBSs, we consider a WiFi advisory system, WiFi-Scout , as an example. WiFi-Scout relies on crowdsourced WiFi-quality data to help smartphone users to find nearby WiFi access points (APs) with good quality. It provides two operating modes, query mode and crowdsensing mode, for mobile users. The query mode allows a user to search for WiFi APs with high quality ranking in the proximity of the user’s current location, so that the user can plan the next move or choose a good meeting point with friends. The crowdsensing mode allows a user who already connects to a WiFi AP to rate the AP, according to the actual experience of using the AP, through the user’s smartphone. Both modes require users’ location information: the query mode needs to know the location of the user in order to search for nearby APs, and the crowdsensing mode also needs to obtain the user’s location together with the associated WiFi AP quality information.
However, if the back-end system is not trustworthy or properly designed, the sensitive information of users may be disclosed to adversaries. Specifically, we consider two types of privacy threats in this work: activity hotspot disclosure and activity transition disclosure. A user’s activity hotspots are the locations or places she stays for a long time or frequently visits, such as home, office, and her favorite cafe. A user’s activity transition is the moving sequence that connects all the user’s activity hotspots.
Conventionally, the countermeasures to privacy threats can be classified into three categories: (1) anonymization, (2) obfuscation, and (3) encryption. Anonymization is to “disidentify” a user by, for example, reporting additional locations (typically of other users) together with the user’s location. Obfuscation is to report a “modified” version instead of the real location of a user. Encryption exploits cryptographic methods to conduct security key encryption and decryption between front-end clients and back-end servers. The first and second countermeasures are performed at front-end clients whereas the third relies on collaboration between front-end clients and back-end servers. In this paper, we combine anonymization and obfuscation approaches so that front-end sensing clients have full control over the reported data in order to protect user privacy.
Iii-B The Privacy Exposure Problem
This work sets out to answer the following two questions: (1) how to quantify privacy exposure for a given set of data submitted by a certain user and (2) how to effectively reduce privacy exposure. Note that the “data” here is location-tagged. Assume that each user has a unique user account in the system (this could be for the rewarding purpose in order to encourage user participation and improve trustworthiness ). Let denote the set of reports in the WiFi advisory system. Let denote the -th report, where is the ID of the WiFi access point that the smartphone is associated with, is the location of the smartphone, is the positioning accuracy (e.g., obtained from Google localization API), is the signal strength received by the smartphone, is the link quality (i.e., link speed) observed by the smartphone, and is the set of ambient visible WiFi access points observed by the smartphone. Note that each report is submitted to the system when the smartphone is under either the query mode or the crowdsensing mode.
Given a set of reports submitted by a particular user, we define privacy exposure based on the following two concepts: (1) activity coverage (or “coverage” for short) and (2) activity uniformity (or “uniformity” for short). The former refers to the location range of the user’s all activities, while the latter refers to how uniform the user moves within the above range.
Given a set of reports submitted by a particular user, the activity coverage is defined as
where is the diameter of the smallest circle covering all the reports and is a normalizing constant no smaller than , such that .
Finding the value of for a given set of reports is a classic smallest enclosing disc problem , to which there are many approximation algorithms as solutions. However, the computational complexity of these approximation algorithms does not suit smartphones. Therefore, for a practical implementation, we approximate this diameter as , where is the distance between the center of gravity of all the reports’ locations and the location . Fig. 3 gives an example of 1000 activity points for 3 different users. Fig. 3(a) shows that the activity pattern of user 1 has lower coverage compared to the user in Fig. 3(b).
Given a set of reports submitted by a particular user, the activity uniformity is defined as
The uniformity is the equilibrium level of all the distances between each pair of reports and is defined using Jain’s fairness index ; its value .
Now we can define define privacy exposure based on the coverage and uniformity defined above.
Given a set of reports submitted by a particular user, the privacy exposure of the user is defined as
Therefore, a lower coverage or a lower uniformity results in a higher privacy exposure, while a lower exposure is preferred. Also, . Fig. 3(b) shows that the activity pattern of user 2 has higher uniformity compared with the user in Fig. 3(c).
Thus, we have answered the first question, i.e., how to quantify the privacy exposure for a given set of reports submitted by a particular user. To answer the second question, i.e., how to reduce privacy exposure when the user needs to make location-based queries or submit location-tagged reports, we define the privacy exposure problem, taking a divide-and-conquer approach.
Given a crowdsourced dataset from all the users and the set of reports submitted by a particular user, the privacy exposure problem is to determine, for a new report sensed by this user: (1) whether it is necessary to submit the new report and (2) how to cloak this report to minimize privacy exposure if the submission is necessary.
Iv A Privacy-preserving Sensing Algorithm
In this section, we design a dual-mode sensing algorithm to solve the above-defined privacy exposure problem. We differentiate user locations (activity hotspots) between private places and non-private places. The private places refer to a user’s “long-term stay” places, such as home and office. The non-private places refer to “short-term stay” points of interest (PoIs) that are frequently visited by the user, such as her favorite cafes. Both places should not be directly submitted without control, which may otherwise lead to lower coverage and lower uniformity, and hence higher privacy exposure.
Our dual-mode sensing algorithm is illustrated in Fig. 4. Each smartphone operates in one of the two modes, place-aware mode and -anonymity mode, when holding a location-tagged report subject to possible submission. If the user is in a private place, the smartphone will operate in the place-aware mode which will skip all the location-tagged data. Otherwise, the smartphone operates in the -anonymity mode which will choose additional reports to submit together with the original report.
Iv-a Place Learning and Detection in Place-aware Mode
We exploit the network fingerprint-based place learning and detection  to determine if a smartphone should submit its sensed location-embedded data. Each smartphone maintains the WiFi fingerprints of the user’s private place profile . Initially, . When the user’s smartphone senses location-embedded data in an unknown place at time , it starts to learn the WiFi fingerprints of this unknown place, denoted by , where includes all the visible WiFi BSSIDs at time . The will be updated by at time incrementally if , where is the similarity between the sensed location and the current WiFi fingerprints of the private place , and is a configurable threshold. Otherwise, if , it gives up as a private place since the user is detected to have left . If till where and is a configurable threshold, is updated as , and the above learning process continues. Furthermore, for each location sensed by the smartphone at time , the algorithm will also conduct the place detection by calculating the similarity for each . If there exists a with , then the user is in the private place and is discarded.
The aforementioned scheme of place learning and detection is for submitting crowdsensed data as indicated in the right-hand portion of Fig. 4. When submitting a query, on the other hand, it will not enter the place-aware mode and the query will be submitted anyway. However, the -anonymity mode will be activated to protect privacy as explained next.
Iv-B Data Anonymization in -anonymity Mode
In the -anonymity mode, we exploit the -anonymity technique  to choose additional reports to submit together with the real report in order to reduce privacy exposure. Our algorithm consists of two phases: anonymization phase and obfuscation phase. The anonymization phase determines the extra reports, and the obfuscation phase modifies the original report to avoid giving away the user’s exact location.
Anonymization phase: Let denote the -anonymity set which will be submitted by the smartphone. When the smartphone senses a new report denoted by , the -anonymity set is initialized as . Then our algorithm will update iteratively until , as follows.
Each smartphone will maintain, or retrieve from server, two subsets of reports denoted by and , respectively, where is the set of reports already submitted by the current user’s smartphone and is the set of reports submitted by other users in the system. Note that since is a crowdsourced dataset contributed by all of users in the mobile LBS, each user’s smartphone can only differentiate from but cannot know the linkage between other users and their reports.
If , the smartphone will select a report
where is defined in (1).
Otherwise, select a report whose location appears in for the minimum number of times.
Repeat (b)–(d) until .
Obfuscation phase: Given the real report , find (e.g., via the Google Place API ) which is the nearest point of interest to , and update replace .
V-a Simulation Setup
We conduct extensive simulations to examine our proposed metric of privacy exposure defined in Section III and to evaluate our algorithm outlined in Section IV. First, we simulate different activity patterns to see how activity patterns affect the values of coverage, uniformity, and privacy exposure. Then, we compare our -anonymity mode against a random -anonymity (“random” for short) and a naive (“naive” for short) sensing algorithm to investigate the performance. The random -anonymity sensing algorithm chooses the redundant reports randomly, and the naive sensing algorithm submits reports directly without any cloaking mechanism.
In our simulations, the experimental field is a circle with diameter
meters. The activity patterns of users are generated by the uniform and beta distributions in Fig.5, where denotes the beta distribution with shape parameters of and . We consider the polar coordinate system to generate activity points within the experimental field, where each activity point is represented as a pair of . Here, is the distance from the activity point to the center of the experimental field,
is a random number generated by the probability distributions in Fig.5, and is the polar angle of the activity point. Thus, a smaller value of and a larger value of will generate activity points close to the center of the experimental field and span a narrow angle in the experimental field. In this way, we can generate non-uniform and dense cases of activity points to study the proposed metrics. All the simulation results are averaged over 100 runs.
V-B Simulation Results
In the first set of simulations, we vary the number of activity points of a single user from 0 to 1200 to study the coverage, uniformity, and privacy exposure. Fig. 6
(a) presents the simulation results. As it can be seen, the coverage of the activity points generated by the uniform distribution anddistribution are higher than others because their radii of the smallest enclosing circles are close to the radius of the experimental field. On the other hand, results in lower coverage because its activity points are located at an extremely small area around the center of the experimental field. For uniformity, as shown in Fig. 6(b), and have lower uniformity because both distributions generate activity points within a ring-belt area along the boundary of the experimental field. Finally, for privacy exposure as shown in Fig. 6(c), we see that the uniform distribution has the lowest privacy exposure while the distribution has the highest. This implies that coverage weighs more than uniformity in privacy exposure, which conforms to our intuition. Thus, this set of simulations demonstrate that our proposed privacy exposure is a sensible metric to evaluate user privacy in the spatial domain as users who have a larger activity coverage, and more uniform activity points will have a better (lower) privacy exposure.
An additional observation is that the privacy exposure does not improve when the number of activity points is larger than . This is because the activity points generated in the beginning are “representative” points that largely determine the coverage, uniformity, and privacy exposure.
In the second set of simulations, we compare our -anonymity mode against the random and the naive sensing algorithms. We uniformly generate with reports in the experimental field. In the simulation, we consider a single user who will sequentially submit real reports using our -anonymity mode with , where the activity points of the user are generated by and distributions, respectively. We choose these two distributions because will generate activity points within a small area closer to the center of the experimental field while will generate activity points within a ring-belt area along the boundary of the experimental field. Fig. 7 gives the result, where for each submitted new report, we trace the changes in coverage, uniformity, and privacy exposure. Figs. 7(a)-(b) and Figs. 7(d)-(e) show that our algorithm leads to higher coverage and higher uniformity than the random and the naive sensing algorithms. This is because of the judicious choice of the additional data in our algorithm to achieve the lowest privacy exposure. In Fig. 7(c) and Fig. 7(f), our algorithm improves privacy exposure by and , respectively, as compared to the naive sensing algorithm at the end of the simulation. On the other hand, the privacy exposure of the random sensing algorithm is closer to the result of ours when the simulations terminate. This is because when the number of submitted reports increases, the random sensing algorithm will gradually improve the coverage and approach the performance of our algorithm. However, it is important to note that our algorithm has optimized both coverage and uniformity even in the very beginning when there are only few user activity points. Thus, our algorithm allows users to submit location-tagged reports without significantly exposing privacy for any user activity patterns.
We have implemented our dual-mode sensing algorithm in WiFi-Scout , an Android-based mobile application that provides location-based service using crowdsourced WiFi-quality reports. For the place-aware mode, we set the learning duration hour and the threshold of place detection . For the -anonymity mode, we set and kilometers which is the longest distance from the east-most Singapore to the west-most Singapore. When a smartphone connects to a WiFi access point, it will start to submit WiFi-quality reports every 40 minutes in the format mentioned in Section III-B. The WiFi-Scout crowdsourced dataset is downloaded from the WiFi-Scout back-end server when the mobile application is first launched. We conduct two experiments to study how our algorithm affects privacy exposure and how human mobility patterns affect privacy exposure. The duration of the experiment is one week.
Vi-B Experimental Results
First, we compare our algorithm against the naive sensing algorithm which submits reports straight without any cloaking when the smartphone connects to WiFi. In this experiment, a single user carries two smartphones to collect data, where smartphone 1 is running the naive sensing algorithm and smartphone 2 is running our algorithm. Figs. 8(a)-(b) show the results, where the green markers stand for the real reports and the red marker stand for the extra redundant reports selected from the WiFi-Scout crowdsourced dataset using our -anonymity mode. As it can be seen, the reports submitted by smartphone 1 result in an extreme high privacy exposure and the user’s activity hotspots are exposed straight since there is no place learning and detection to avoid exposing long-stay points and the submitted reports are not cloaked. Note that smartphone 2 submits only 2 real reports since it learns the user’s private points and further avoids submitting reports when the user stays in these private points. Compared with the naive sensing algorithm, our algorithm successfully hides the activity hotspots and improves the privacy exposure by .
In the second experiment, we consider three users with different mobility patterns to see how human mobility affects privacy exposure. We select the three users based on their mobility diversity in the spatial domain and discuss if our algorithm can cloak the privacy-sensitive information of users who have seldom mobility. User 1 and user 2 have less mobility, and their activity hotspots are located at west and east Singapore, respectively, and the mobility of user 3 is high as she travels around the whole Singapore. As shown in Fig. 8(b), Fig. 8(c), and Fig. 8(d), the reports submitted by user , user , and user result in privacy exposure of , , and , respectively. This clearly shows that our proposed algorithm effectively reduces privacy exposure through cloaking users’ activity hotspots and activity transitions, and this can be achieved even at low user-mobility levels.
This paper addresses two privacy threats, namely activity hotspot disclosure and activity transition disclosure, for mobile LBSs. We have proposed a metric to quantify privacy exposure which incorporates activity coverage and activity uniformity to evaluate the privacy exposing levels given a set of user location-tagged data. We have also defined a privacy exposure problem and proposed a privacy-preserving algorithm to minimize the exposure of activity hotspots and activity transitions of users. We have implemented our algorithm both in a simulation program and on an Android-based WiFi advisory system, and carried out experiments in the real world. Both simulation and experimental results demonstrate that (1) the proposed metric of privacy exposure can properly characterize different user activity patterns and (2) the proposed algorithm can effectively reduce privacy exposure by cloaking users’ activity hotspots and activity transitions.
This work was supported in part by A*STAR Singapore under SERC grant 1224104046. This work has been partially funded by the European Union’s Horizon 2020 research and innovation programme within the project “Worldwide Interoperability for SEmantics IoT” under grant agreement Number 723156.
-  B. Rao and L. Minakakis, “Evolution of mobile location-based services,” Communications of the ACM, vol. 46, no. 12, pp. 61–65, 2003.
-  J. K.-S. Lau, C.-K. Tham, and T. Luo, “Participatory cyber physical system in public transport application,” in Proc. CCSA, IEEE/ACM UCC, 2011.
-  Y. Corporation, “Yelp,” http://www.yelp.com.sg/.
-  F.-J. Wu and T. Luo, “Infrastructureless signal source localization using crowdsourced data for smart-city applications,” in IEEE Int’l Conf. Comm. (ICC), 2015.
-  J. Chen, M. Brust, A. Kiremire, and V. Phoha, “Modeling privacy settings of an online social network from a game-theoretical perspective,” in IEEE CollaborateCom, 2013.
-  J. Chen, A. R. Kiremire, M. R. Brust, and V. V. Phoha, “A game theoretic approach for modeling privacy settings of an online social network,” EAI Endorsed Transactions on Collaborative Computing, 2014.
-  J. Chen, A. R. Kiremire, M. R. Brust, and V. V. Phoha, “Modeling online social network users’ profile attribute disclosure behavior from a game theoretic perspective,” Computer Communications, 2014.
-  L. Barkuus and A. Dey, “Location-based services for mobile telephony: a study of users’ privacy concerns,” in Int. Conf. on Human-Computer Interaction, 2003.
-  F.-J. Wu and T. Luo, “WiFiScout: A crowdsensing WiFi advisory system with gamification-based incentive,” in IEEE Int’l Conf. Mobile Ad Hoc and Sensor Systems (MASS), 2014.
M. Baratchi, N. Meratnia, P. J. M. Havinga, A. K. Skidmore, and B. A. K. G. Toxopeus, “A hierarchical hidden semi-markov model for modeling mobility data,” inInt’l Conf. Ubiquitous computing, 2014.
-  L. Rossi and M. Musolesi, “It’s the way you check-in: Identifying users in location-based social networks,” in ACM International Conference on Online Social Networks (ACM COSN’14), 2014, pp. 215–226.
-  S. Scellato, M. Musolesi, C. Mascolo, V. Latora, and A. T. Campbell, “NextPlace: A spatio-temporal prediction framework for pervasive systems,” in Int. Conf. on Pervasive Computing, 2011.
S. Gambs, M.-O. Killijian, M. N. del, and P. Cortez, “Next place prediction using mobility markov chains,” inProc. of the First Workshop on Measurement, Privacy, and Mobility, 2013.
-  Institute for Infocomm Research, A*STAR, Singapore, “WiFi-Scout,” http://wifi-scout.sns-i2r.org/.
-  L. Rossi, M. J. Williams, C. Stich, and M. Musolesi, “Privacy and the city: User identification and location semantics in location-based social networks,” in Int. Conf. on Web and Social Media, 2015.
-  H. Ahmadi, N. Pham, R. Ganti, T. Abdelzaher, S. Nath, and J. Han, “Privacy-aware regression modeling of participatory sensing data,” in ACM Int’l Conf. Embedded Networked Sensor Systems, 2010.
-  I. J. Vergara-Laurens, D. Mendezy, and M. A. Labrador, “Privacy, quality of information, and energy consumption in participatory sensing systems,” in IEEE Pervasive Computing and Communications, 2014.
-  B. Agir, T. G. Papaioannou, R. Narendula, K. Aberer, and J.-P. Hubaux, “User-side adaptive protection of location privacy in participatory sensing,” GeoInformatica, 2014.
-  B. Gedik and L. Liu, “Protecting location privacy with personalized k-Anonymity: Architecture and algorithms,” IEEE Trans. Mobile Computing, vol. 7, no. 1, pp. 1–18, 2008.
-  B. Niu, Q. Li, X. Zhu, G. Cao, and H. Li, “Achieving k-anonymity in privacy-aware location-based services,” in IEEE INFOCOM, 2014.
-  M. Gruteser and D. Grunwald, “Anonymous usage of location-based services through spatial and temporal cloaking,” in ACM Int’l Conf. on Mobile Systems, Applications, and Services, 2003, pp. 31–42.
-  I. Boutsis and V. Kalogeraki, “Privacy preservation for participatory sensing data,” in IEEE Pervasive Computing and Communications, 2013.
-  Foursquare Labs, Inc., “Foursquare,” https://foursquare.com/.
-  T. Luo, S. S. Kanhere, and H.-P. Tan, “SEW-ing a simple endorsement web to incentivize trustworthy participatory sensing,” in IEEE SECON, 2014.
-  M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars, Computational Geometry: Algorithms and Applications. Springer, 2008.
-  R. K. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. New York, NY, USA: Wiley, 1991.
-  F.-J. Wu and H. B. Lim, “UrbanMobilitySense: A User-Centric Participatory Sensing System for Transportation Activity Surveys,” IEEE Sensors Journal, vol. 14, no. 12, pp. 4165–4174, 2014.
-  Google, Inc., “Google Places API,” https://developers.google.com/places/.