Many online services operate by collecting and sharing users’ information. To protect consumers, the U.S. Federal Trade Commission (FTC) devised fair information practice principles (FIPPs) based on the “notice and choice” framework . These principles, in concert with state regulations, require companies to notify consumers about their information collection and sharing practices through privacy policies. These privacy policies, which often include details about the type of information collected, the entities that receive or store the information, and the conditions governing data acquisition and handling, serve two main purposes: 1) informing consumers about data collection practices, which they can consider when deciding whether or not to use a service, and 2) offering regulators, such as the FTC, a way to audit online services for misleading privacy practices.
As we write this paper, the European General Data Protection Regulation (GDPR)  is coming into effect, forcing companies to adapt their behavior and rewrite their privacy policies or face strict penalties. The changes are largely based on GDPR Articles 13, 14, and 15, which outline the details companies need to provide to consumers when collecting, processing and sharing their information. The regulation puts an emphasis on providing this information to the “subject in a concise, transparent, intelligible and easily accessible form, using clear and plain language” . As a result, consumers are receiving an avalanche of updated privacy policies as companies strive for GDPR compliance . However, just because the GDPR has pushed companies to update their privacy policies does not necessarily mean that these updated policies address the issues of previous versions.
In summary, this work makes the following contributions:
2 CI Primer
The theory of CI is based on two central premises: 1) privacy is defined as the appropriateness of information flows, which 2) is defined by contextual norms governing particular settings (contexts) in which information is transmitted .
CI offers a template for describing information flows using 5-parameter tuples, which include specific actors (senders, recipients, and subjects) involved in the information flow, the type (attribute) of the information, and the condition (transmission principle) under which the information flow occurs. This combination of five parameters defines contexts which determine privacy norms. For example, while someone might consider sharing Fitbit111https://www.fitbit.com/home data with their doctor, they might view the sharing of this same data with advertising or insurance companies as a privacy violation. The entire context, including recipient and information type, affects how we think about privacy.
The CI framework was previously used as a lens for examining android permissions , online platform practices [14, 37], and examining GDPR regulations  themselves. In more recent efforts, CI was employed to capture individuals’ privacy expectations, which can be then checked for inconsistencies or used to inform policymakers and manufacturers [5, 29].
3 Related work
Privacy policies are notoriously hard to read. As a result, average users find them difficult to comprehend and correctly interpret. This leads to gaps between users’ expectations and the stated policy .
Recent work has shown evidence that privacy policies often elide or obscure crucial contextual information that could help users formulate their privacy expectations. In 2016, Martin and Nissenbaum  showed that when confronted with a privacy-related scenario that was missing some contextual information, respondents mentally supplemented the information, essentially generating a different version of the scenario. Martin and Nissenbaum also conducted a survey of 569 respondents presented with 40 scenarios with random combination of contextual factors. The results showed that the “context of information exchange – how information is used and transmitted, the sender and receiver of the information – all impact the privacy expectations of individuals” .
The importance of including contextual factors was also reported by Rao et al., in a 2016 study that compared users’ privacy expectations with existing companies’ practices . 240 participants were asked to state their expectations for the data collection, sharing, and deletion practices of 16 websites across finance, health, and dictionary categories. The results showed that users’ privacy expectations depend on the type of website and the type of information being exchanged. For example, respondents expected medical data to be shared with a medical website, but not a financial website. These findings provide further evidence to support the importance of contextual factors in how individuals perceive privacy practices, motivating a contextual analysis of privacy policies to identify gaps which might result in mismatched privacy expectations.
Another body of work has explored using crowdsourcing to annotate privacy policies, thereby splitting the cognitive load of understanding an individual policy over multiple workers. In 2016, Wilson et. al.,  explored the feasibility of asking crowdworkers to answer questions on data collection practices. In the experiment, 218 crowdworkers were assigned the task of reading through 12 privacy policies and answering 9 questions about data collection, sharing, and deletion practices stated in the policies. To support their answers, respondents needed to annotate the relevant text in the privacy policies. The results showed that the answers of the crowdworkers agreed with those of skilled annotators over 80% of the time. The results indicate that crowdsourcing can be used to identify paragraphs describing specific practices in privacy policies. Our results support this conclusion, but extend it to even more sophisticated annotations of individual components of contextual information flows described in privacy policies.
4 Annotation Methodology
We use the CI framework to annotate policy statements that describe contextual information exchanges.
For the remainder of the paper, we denote a privacy statement with a single set of CI parameters as an “information flow.” For example, we consider the following statement an information flow, or simply as a “flow:”
We [Facebook] also collect contact information that you provide if you upload, sync or import this information (such as an address book) from a device.
We use the following guidelines to identify CI parameters within individual flows for annotation:
Sender. Any entity (person, company, website, device, etc.) that transfers or shares the information. This may be a pronoun or a specific entity, such as “Company A,” “strategic partners,” or “publisher.”
Recipient. Any entity (person, company, website, device, etc.) that ultimately receives the information. This may be a pronoun or a specific entity, such as “third party,” “developer,” “other users,” or “Company B and its affiliates.”
Transmission principle. Any clause describing the “terms and conditions under which […] transfers ought (or ought not) to occur” . This includes descriptions of how information may be used or collected. Examples include “if the user gives consent,” “when an update occurs,” or “to perform specified functions.”
Attribute. Any description of information type, instance, and/or example, such as “date of birth,” “credit card number,” “photos,” or, more generally, “personal information.”
Subject. Any subjects of the information exchanged in a flow. Subjects may be explicitly stated or implicitly described using pronouns and possessives.
5 Facebook Case Study
Furthermore, research shows that consumers’ tend to “[project] the important factors to their privacy expectations onto the privacy notice” . In other words, consumers implicitly fill in the blanks left by difficult-to-interpret policies, which inadvertently widens the gap between their expectations and actual company behaviors.
|Sender||people you share and communicate with||specific friends or accounts, friends and followers, other people using Facebook and Instagram, people|
|devices, phones, computers, devices where you install or access our Services||connected TVs, web-connected devices you use that integrate with our Products|
|Recipient||family of companies that are part of Facebook||Facebook companies, Facebook company products|
|people you share and communicate||audience they choose, specific friends or accounts, those you connect and share with around the world, people in your networks, friends and followers, people and businesses outside the audience that you shared with, anyone who can see the other person’s content, anyone on or off our products|
|partners conducting academic research, partners conducting surveys||research partners, research partners who we collaborate with, academics|
|third-party companies who help us provide and improve our services or who use advertising or related products||websites that integrate with our products, other services that integrate with our products, companies that aggregate|
|N/A||systems, devices and operating systems providing native versions of Facebook and Instagram (i.e. where we have not developed our own first-party apps), anyone on or off our product, content creator, seller, page admins, regulators, network|
|Attribute||information about how you use our services, how you use and interact with our services||information about any of your Instagram followers, the ads you see and how you use their services, other web-connected devices you use that integrate with our products, when you last used our products, whether a window is foregrounded or backgrounded, when you’re using and have last used our products, identifiers from apps or accounts that you use, actions that you have taken on our products|
|content about you||the features you use, life events, racial or ethnic origin, activities, where you live, what games you play, information about your interests actions and connections, who you are “interested in", your health, events you attend, interests, preferences, your religious views, general demographic, the places you like to go and the businesses and people you’re near, whether you are currently active on Instagram messenger or Facebook, check-ins, websites you visit, other information about your Facebook friends from you, political views, trade union membership, philosophical beliefs|
|information about the reach and effectiveness of their advertising||reports about the kinds of people seeing their ads, which Facebook ads led you to make a purchase or take an action with an advertiser, ads you see, family device ids|
|Device information||information about operations and behaviours performed on the device, other identifiers unique to Facebook company products associated with the same device or account, available storage space|
|N/A||information about nearby wi-fi access points beacons and cell towers|
|Transmission Principle||N/A||to detect when someone needs help, to recognise you in photos videos and camera experiences, help you stream a video from your phone to your tv, combat harmful conduct, can help distinguish humans from bots, to aid relief efforts, whether or not you have a Facebook account or are logged in to Facebook, reshared or downloaded through APIs, to have lawful rights to collect, use and share your data before providing any data to us and many others.|
5.1.1 Comparison of CI parameters
We compared the number information flows prescribed by both previous and updated Facebook privacy policies (Figure 2) and the CI parameters they contain. We matched CI parameters across policies using fuzzy string matching  with the following thresholds for each CI parameter: sender (70%), attribute (65%), recipient (70%), and transmission principle (55%). While the fuzzy string matching worked well, some corner cases required manual validation. We describe some notable differences between information flows in the previous and updated policies on a parameter-by-parameter basis as follows:
Recipient. Similarly to the sender parameter, the updated version introduces new recipients, such as “people and businesses outside the audience that you shared with,” “content creators,” “page admin,” “Instagram business profiles,” and “companies that aggregate.” As expected, the most common “recipients” in both versions are “Facebook,” and “third party service, vendors, partners” (Table 2).
|CI Param||Version||Instances (frequency)|
|Recipients||Previous||we [Facebook] (22), Third party service, vendors, partners (20)|
|Updated||we [Facebook] (32), Third party service, vendors, partners (24)|
|Senders||Previous||we [Facebook] (14), you (6)|
|Updated||we [Facebook] (17), you (11)|
|Attributes||Previous||information (8), information about you (2), information we have (2), non-personally identifiable information only (2)|
|Updated||information (15), content (5), information about you (4), information that we have (4), public information (4), communications (2), shipping and contact details (2).|
Attribute. When describing the types of information being transferred or collected, the updated policy contains more attributes (179) than the the previous policy (86). However, we note that some attributes from the previous policy were omitted in the update. The updated policy does not mention “user id” (opting for “username” instead), or “age range” (instead providing the example “…ad was seen by a woman between the ages of 25 and 34”). Generally, the updated policy describes new types of information and/or elaborates on information that was previously generic or abstract (Table 1). For example, the updated draft provides significantly more details about the type of content that is being collected about the user, including “racial or ethnic origins,” “health,” “events attended,” “interests,” “religious views,” “general demographics,” “political views,” “trade union membership,” and “philosophical beliefs.” Furthermore, the updated policy describes attributes not discussed in the previous policy, such as “connected TVs,” “information about nearby Wi-Fi access points,” “beacons,” and “cell towers.”
Transmission Principle. When specifying conditions under which information transfer may be performed, the updated policy includes all conditions and information flow constraints in the previous policy. In addition, the updated policy also contains new transmission principles, such as “whether or not you have a Facebook account or are logged in to Facebook,” “to recognise you in photos, videos and camera experiences,” “reshared or downloaded through APIs,” “to have lawful rights to collect, use and share your data before providing any data to us,” and many others (Table 1).
Subject. The subject of most flows in both policies is the consumer. We therefore do not include the subject parameter in our analysis.
5.1.2 Incomplete Information Flows
Missing Recipient. Table 3 lists the flows from both policies with missing recipient parameter. The previous policy only has one flow without an explicit recipient while the updated policy has two. Not stating information recipients forces users to infer what entities will have access to their information from other sources, often leading to incorrect notions of company behavior [32, 21]. Identifying the recipient can sometimes be difficult, as in the flow “We are able to suggest that your friend tags you in a picture by comparing your friend’s pictures to information we’ve put together from your profile pictures and the other photos in which you’ve been tagged.”
|When you comment on another person’s post or like their content on Facebook, that person decides the audience who can see your comment or like||Previous|
|You can choose to provide information in your Facebook profile fields or life events about your religious views, political views, who you are “interested in” or your health. This and other information (such as racial or ethnic origin, philosophical beliefs or trade union membership) could be subject to special protections under the laws of your country||Updated|
|For example, people can share a photo of you in a story or mention, tag you at a location in a post or share information about you in their posts or messages||Updated|
Missing Sender. The sender parameter is not specified in 14 () flows in the previous policy nor in 33 () flows in the updated policy. Many of the statements with missing senders describe “use-of-data,” i.e., they inform the consumer how the collected information will be used but not from where it is collected. Missing senders can easily lead to misinterpretations and false privacy expectations. For example, the source of the information in the following statement is unclear: “We collect information about the people, Pages, accounts, hashtags and groups you are connected to and how you interact with them across our Products, such as people you communicate with the most or groups you are part of.”
Missing Transmission Principle. We identified 6 information flows in the previous policy where the transmission principle was missing. For example, the statement “We share information we have about you within the family of companies that are part of Facebook” does not specify under what conditions/constraints the information is being shared. Likewise, the statement “We also collect information about how you use our Services, such as the types of content you view or engage with or the frequency and duration of your activities. Things others do and information they provide” does not contain any transmission principles. These statements force consumers to guess when and for what reason information is collected.
The updated policy contains even more (15) flows with missing transmission principles. Without a transmission principle, flows like “We also receive information about your online and offline actions and purchases from third-party data providers who have the rights to provide us with your information” become ambiguous because it is not clear when or why this information is being collected.
5.1.3 CI Parameter Bloating
Consider the following flow from the updated policy:
Advertisers, app developers and publishers can send us information through Facebook Business Tools that they use, including our social plug-ins (such as the Like button), Facebook Login, our APIs and SDKs or the Facebook pixel. These partners provide information about your activities off Facebook including information about your device, websites you visit, purchases you make, the ads you see and how you use their services whether or not you have a Facebook account or are logged in to Facebook.
At first glance, the above privacy statement seems transparent and informative. It explicitly specifies the type of information that is being exchanged, between what actors and under what conditions. However, this is an example of CI parameter bloating. The prescribed information flow is overloaded with CI parameters. Note the many senders (advertisers, app developers and publishers) attributes (information about your device, websites you visit, purchases you make, the ads you see and how you use their services), and transmission principles (when you use Like, Facebook login, APIs, SDKs and through Facebook Pixel). How does the consumer reason about this information flow? Do all listed senders transfer all of these information types to Facebook or does each particular sender transmit a specific information type? Do flows with each sender/information pair occur under each listed TP or only specific ones? Even technically-savvy users will have difficulty reasoning about the many possible information flows with all combinations of each parameter type.
We would like to emphasize that specifying multiple instances of the same parameter does not automatically lead to parameter bloating. Specifically, parameter bloating does not include instances where a single parameter is enumerated to clarify a given category, as in the following statement, which elaborates on several attributes:
We collect information about how use our Products, such as types of content you view or engage with, the features you use, the actions you take, the people or accounts you interact with and the time, frequency and duration of your activities.
Figure 4 shows the number of CI parameters per flow in both current and updated policies. In the previous policy, there are 10 information flows that mention more than one recipient, with one information flow standing out, listing 10 potential recipients. Three flows mention more than one sender, and 16 flows mention multiple attributes, ranging from 2 to 18 attributes per flow. Multiple transmission principles appear in 16 flows, ranging from 2 to 5 TPs per flow.
The updated policy contains even more bloated flows. Multiple senders appears in 8 information flows (2 senders in 6 flows, 3 in 1 flow, and 4 in 1 flow). Multiple attributes occur in 36 flows ranging from 2 attributes in 18 flows to 40 attributes in a single flow. Nineteen of the flows include more than one recipient (2 recipients in 14 flows, 3 in 4 flows, and 7 in 1 flow). Finally, the number of flows with multiple transmission principles increased to 30, ranging from 2 TPs in 14 flows to 8 TPs in a single flow.
Given that an average consumer today spends little to no time reading privacy policies, it is unreasonable to assume that the even the most privacy-concern citizen will dissect all possible combinations of this many multi-parameter flows.
5.1.4 Vague and Ambiguous Flows
|Conditionality||it is not clear what is the condition associated with information transfer||“as needed”, “as necessary”, “as appropriate”, “depending”, “sometimes”, “as applicable”, “otherwise reasonably determined”, “from time to time”|
|Generalization||action or information types are too abstract or vague||“typically", “normally", “often" , “general", “usually", “generally", “commonly ", “among other things", “widely", “primarily", “largely", “mostly"|
Hard to estimate the possibility of occurrence
|“likely", “may", “can", “could" “would", “might", “could", “possibly"|
|Numeric Quantifier||Vague numeric quantifier||“certain", “most", "majority", "many", "some" "few"|
Figure 5 shows the percentage of flows in Facebook’s previous and updated policies that use vague terminology. In both policies, “modality” vagueness dominates, occurring in close to 45% of all flows. The updated policy does not represent a reduction in vague terminology from the previous version. Rather, the percentage of flows with vague terms remains the same. This supports our initial claim the updated data policy does not contribute to clarity. The widespread occurrence of flows with vague wording further supports the problem that privacy policies are too often “obtuse and noncommittal [and] make it difficult for people to know what information a site collects and how it will be used” .
6 Crowdsourcing CI Annotations
The ability to effectively crowdsource CI annotation would allow researchers to efficiently pursue two primary goals: 1) collect a large dataset of annotations in order to train a machine learning model to perform CI annotation automatically, and 2) perform a large-scale analysis of information flows across the privacy policies of many companies. This would provide a broad sense of information flow disclosure practices across the technology sector via many of the same analysis methods used in the Facebook case study.
6.1 Annotation Task Design
We developed the annotation task as a Qualtrics  survey deployed on AMT. The task was designed to optimize annotation accuracy while minimizing cost.
Consent and Instructions. The first page of the annotation task is a consent form. Participants who do not consent are prevented from proceeding. The annotation task collects no personal information about crowdworkers and was approved by our university’s Institutional Review Board.
The task next presents annotation instructions (Appendix Figure 9), including a description of each information flow parameter that should be annotated (sender, attribute, recipient, and transmission principle) and an example annotated flow. The information flow parameter descriptions match those used by expert annotaters as described in Section 4.
The task concludes with a field for optional open-ended comments if participants have anything they wish to communicate to the researchers.
6.2 Task Deployment
We first tested the annotation task on UserBob , a usability-testing service where users narrate their experience while performing tasks. We collected seven UserBob responses. All UserBob workers completed the task in less than 15 minutes. We used the UserBob responses to adjust task instructions to ameliorate worker confusion. Performing such “cognitive interviews” is common practice in survey design and development .
We deployed the annotation task as a HIT on AMT using TurkPrime , an online tool for researchers to easily manage AMT tasks. We limited the HIT to AMT workers in the United States with an HIT approval rating of 90–100% and at least 100 HITs approved. 141 total workers accepted the HIT. Of these workers, 99 passed the screener questions. All 48 excerpts were annotated by between 7 and 12 workers (mean 10.2). AMT workers who did not pass the screening questions were automatically reimbursed $0.25. AMT workers who passed the screening test and completed the entire annotation task were reimbursed $1.50. Collecting all responses took approximately 4 hours.
6.3 Majority Vote Annotations
6.4 Evaluation Metrics
We had one of the authors perform expert ground truth annotations of all excerpts prior to seeing the crowdsourced results. We use the following evaluation metrics to compare the crowdsourced majority vote annotations to the expert annotations.
Parameter-based scoring. We manually counted all instances of each CI parameter labeled in both the crowdsourced majority vote and expert annotations (true positives), in the expert annotation only (false negatives), and in the crowdsourced annotation only (false positives). We further categorized the false positives and false negatives to better understand crowdworker mistakes and how to improve the annotation task in future studies (Section 6.6).
Word-based scoring. We also applied a automated word-based scoring method that did not require manually comparing variable-length parameters and could be used to easily evaluate future large-scale CI annotation efforts.
True positives are then words labeled by both the participant and the expert. False positives are words labeled by the participant only. False negatives are words labeled by the expert only. This allows us to calculate word-based precision, recall, and F1
scores for each CI parameter and excerpt. Some CI parameters do not occur in every excerpt. If the expert did not label a particular parameter in an excerpt, participants’ recalls were defined as 1 for the corresponding annotation. If a participant did not label a particular element in an excerpt, the participant’s precision was defined as 1 for the corresponding annotation. These are standard definitions of precision and recall for edge cases.
6.5 Annotation Accuracy
Figure 7 shows the counts of correctly and incorrectly annotated CI parameters across all excerpts from parameter-based scoring. The incorrect annotations are divided into categories to better understand the source of crowdworker errors. The crowdsourced majority vote annotations correctly labeled 43% of the senders, 89% of the attributes, 68% of the recipients, and 60% of the transmission principles across all excerpts. False negatives were by far the most common error, with the crowdsourced annotations missing 30% of the senders, 9% of the attributes, 21% of the recipients, and 34% of the transmission principles across all excerpts. Finally, false positive errors comprised 26% of the senders, 2% of the attributes, 11% of the recipients, and 6% of the transmission principles across all excerpts.555Percentages were rounded to the nearest whole value and may not add to 100%
Figure 8 shows the distributions of word-based precision and recall scores for the majority vote annotations across all excerpts and for each CI parameter. The average precision across all excerpts is 0.95 for attributes, 0.80 for senders, 0.89 for recipients, and 0.94 for a transmission principles. The corresponding average recall across all excerpts is 0.87 for attributes, 0.82 for senders, 0.83 for recipients, and 0.59 for transmission principles.
6.6 Evaluating Crowdworker Errors
Analyzing the crowdsourced annotations raises the question “What causes particular excerpts or CI parameters to be more difficult for crowdworkers to annotate than others?”
One intuitive explanation is that excerpts that are longer, more difficult to read, or contain more CI parameters are more difficult for crowdworkers to annotate. To test this hypothesis, we calculated Spearman correlations of the majority vote annotation word-based F1 scores versus text length, Flesch-Kincaid Reading Ease , FOG Index , and number of CI parameters (Appendix Table 5). However, all of the resulting correlation coefficients had absolute values less than , indicating no strong correlations with F1 score. This suggests that crowdworker difficulties with certain excerpts or parameters are due to more nuanced factors than length or readability.
We further investigate these factors by manually comparing the crowdsourced majority vote annotations to the expert annotations. We noticed that crowdworkers had more difficulty annotating senders and recipients than attributes and transmission principles. Attributes and transmission principles are generally nouns or verbs, occur in lists, and require less semantic parsing to identify. In contrast, senders and attributes are often pronouns that occur singly and require more complex sentence parsing to distinguish between them.
More detailed analysis indicated that the 160 parameter-based annotation errors fall into four main categories. Each category has corresponding implications for crowdsourcing CI annotations.
6.6.1 Expert Errors
We identified 11 cases where the majority vote crowdsourced annotation was correct while the “ground-truth” expert annotation was incorrect. Most of these cases were due to the expert missing a one-word sender or recipient, e.g. “we.” We did not adjust recall or precision scores to reflect the incorrect expert annotations, as these judgments were made after, and could have been influenced by, viewing the crowdsourced annotations. However, the presence of these incorrect expert annotations demonstrates the non-triviality of the annotation task.
6.6.2 Skipped Parameters
The most common error occurred when the crowdworkers simply neglected to annotate some or all instances of a given parameter. These errors were the primary contributor to lowering recall scores without affecting precision. We identified 117 skipped parameter errors. There are three possible reasons why crowdworkers might have neglected to annotate all instances of each parameter: 1) the workers may have considered an excerpt and honestly thought that it didn’t contain the parameter, 2) the workers may have intentionally skipped entire parameters, or 3) the workers may have found one or two instances of each parameter and then moved on to the next excerpt without double-checking to ensure that none were missed. This could be due to cognitive fatigue or the fact that crowdworkers are incentivized to finish the annotations as quickly as possible to optimize their hourly compensation rate.
As an example of reason 1, consider the sentence “We collect information when you sync non-content like your email address book, mobile device contacts, or calendar with your account.” Both the expert and the crowdworkers correctly labeled “email address book,” “mobile device contacts,” and “calendar” as attributes. However, the expert also labeled “information” as an attribute, while the majority vote annotation did not. This was marked as a false negative “skipped parameter” error, but “information” does not provide any specific details about the attribute, so it is understandable that the crowdworkers omitted this label. This specific skipped parameter error (“information” not labeled as attribute) occurred in 6 of the annotated excerpts.
Skipped errors could potentially reduced in future crowdsourcing tasks by using previous crowdworker annotations to provide “hints” for successive workers. For example, the number of parameters annotated by previous workers could be shown (likely as a range) to indicate approximately how many parameters the current worker should find. This would help address reason 3 for skipped errors above, providing a nudge for workers finding fewer parameters to continue searching for additional annotations. However, such hints would have to be carefully applied to prevent individual crowdworker errors from negatively influencing the collective annotation effort.
6.6.3 Ambiguous Parameters
6.6.4 Overlapping Parameters
Overlapping parameter errors occurred when a CI parameter was mislabeled compared to the expert annotation, but the text in question is part of two or more CI parameters simultaneously. We identified 16 overlapping parameter errors. Consider the excerpt “When you use our services or view content provided by Google, we automatically collect and store certain information in server logs.” The first clause (before the comma) could be interpreted as a single transmission principle, but the “you” could also be a sender. Variations on this issue were the primary cause of false positive errors for the “sender” parameter, i.e. the expert annotated an entire clause as a transmission principle but the majority vote annotation instead labeled a single word in the clause as a sender.
The presence of overlapping parameter errors is due to a tradeoff in our implementation of the CI annotation task. We chose to allow only one CI parameter annotation per word in each excerpt to simplify the task for workers. This tradeoff could be avoided in future work by asking each crowdworker to annotate only a single CI parameter type, simplifying the task from multi-class classification to binary classification. However, this would require more crowdworkers per policy and could lead to higher rates of false positives if crowdworkers aren’t forced to discriminate between different parameters.
6.6.5 True Errors
True errors occurred when the crowdworkers unambiguously misannotated a CI parameter. Fortunately, true errors accounted for only 13 out of 160 total errors in the majority vote annotation. This implies that when a label makes it into the majority vote annotation (with sufficient workers contributing to the vote), it is very likely to be correct. The low frequency of true errors indicates that, with improvements to reduce the number of skipped parameter errors, crowdsourcing can be a high-accuracy method of obtaining CI annotations of privacy policies.
We present a CI annotation methodology to help researchers and regulators assess and evaluate privacy policies. This work is a stepping stone in a larger effort to improve readability and increase transparency in disclosure of information handling practices. While philosophical in origin, the theory of CI offers a practical framework to reason about privacy implications in a given context and therefore serves as a powerful tool for reasoning about privacy preserving efforts in technical fields.
The notion of an appropriate information flow in the CI framework lends itself well to user data privacy policies; privacy statements are essentially prescribed by the policy information flows. Annotating privacy policies with CI parameter labels offers a way to apply a full-fledged formal theory of privacy to their analysis. Relevant stakeholders—consumers, legal scholars, and regulators—can perform qualitative, quantitative and normative analysis to find incomplete, vague and ambiguous privacy statements. This also enables leveraging other applications of the CI framework. For example, it is possible to compare which flows prescribed by the policy align or do not align with consumers privacy expectations .
As privacy policies evolve, CI annotations assist comparative analyses of new updates to identify which information flows were amended, added or removed. These analyses will ideally help companies write more coherent and complete privacy policies by identifying privacy statements containing missing, vague and/or bloated CI parameters.
Furthermore, we can use our CI annotation crowdsourcing methodology to produce a large corpus of privacy policies annotations and discover trends and patterns in the types of flows that are being prescribed by policies within and across industries. This corpus could also be used as a training set to build tools for automatically identifying CI flows and parameters in privacy policies.
8 Limitations and Future Work
We have identified the following opportunities for further research to improve and streamline the CI annotation process:
To further scale our approach, we present a method for crowdsourcing CI annotation of privacy policies. We test this method on 48 excerpts from 17 policies with 141 Amazon Mechanical Turk workers. Resulting high-precision crowdsourced annotations indicate that CI annotation is an intuitive method for interpreting privacy policies and that crowdsourcing could be used to obtain a large corpus of annotated privacy policies for future analysis.
-  Art. 12 GDPR Transparent information, communication and modalities for the exercise of the rights of the data subject. https://gdpr-info.eu/art-12-gdpr/.
-  EU GDPR Information Portal. https://www.eugdpr.org.
-  Multi-document Annotation Environment. https://keighrim.github.io/mae-annotation/.
-  Last-minute frenzy of GDPR emails unleashes ’torrent’ of spam – and memes. https://www.eugdpr.org, 2108.
-  N. Apthorpe, Y. Shvartzshnaider, A. Mathur, D. Reisman, and N. Feamster. Discovering smart home internet of things privacy norms using contextual integrity. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT/UbiComp), 2018.
-  J. Bhatia, T. D. Breaux, J. R. Reidenberg, and T. B. Norton. A theory of vagueness and privacy risk perception. In Requirements Engineering Conference (RE), 2016 IEEE 24th International, pages 26–35. IEEE, 2016.
-  S. Bird, E. Klein, and E. Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009.
-  A. Cohen. FuzzyWuzzy: Fuzzy string matching in python. https://github.com/seatgeek/fuzzywuzzy, 2011.
-  F. T. Commission et al. Privacy online: A report to congress. Washington, DC, June, pages 10–11, 1998.
-  E. Dwoskin and T. Romm. Facebook makes its privacy controls simpler as company faces data reckoning. https://www.washingtonpost.com/news/the-switch/wp/2018/03/28/facebooks-makes-its-privacy-controls-simpler-as-company-faces-data-reckoning/, 2018.
-  M. C. Evans, J. Bhatia, S. Wadkar, and T. D. Breaux. An evaluation of constituency-based hyponymy extraction from privacy policies. In Requirements Engineering Conference (RE), 2017 IEEE 25th International, pages 312–321. IEEE, 2017.
-  S. Frier. Facebook Updates Policies After Privacy Outcry, Limits Data Use. https://www.bloomberg.com/news/articles/2018-04-04/facebook-updates-policies-after-privacy-outcry-limits-data-use, 2018.
-  A. Guinchard. Contextual integrity and eu data protection law: Towards a more informed and transparent analysis. SSRN, 2017.
-  G. Hull, H. R. Lipford, and C. Latulipe. Contextual gaps: privacy issues on facebook. Ethics and information technology, 13(4):289–302, 2011.
-  M. Johnson, S. Egelman, and S. M. Bellovin. Facebook and privacy: it’s complicated. In Proceedings of the eighth symposium on usable privacy and security, page 9. ACM, 2012.
-  J. P. Kincaid, R. P. Fishburne Jr, R. L. Rogers, and B. S. Chissom. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch, 1975.
-  H. R. Lipford, A. Besmer, and J. Watson. Understanding privacy settings in facebook with an audience view. UPSEC, 8:1–8, 2008.
-  L. Litman, J. Robinson, and T. Abberbock. Turkprime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior research methods, 49(2):433–442, 2017.
-  Y. Liu, K. P. Gummadi, B. Krishnamurthy, and A. Mislove. Analyzing facebook privacy settings: user expectations vs. reality. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pages 61–70. ACM, 2011.
-  K. Martin. Privacy notices as tabula rasa: An empirical investigation into how complying with a privacy notice is related to meeting privacy expectations online. Journal of Public Policy & Marketing, 34(2):210–227, 2015.
-  K. Martin and H. Nissenbaum. Measuring privacy: an empirical test using context to expose confounding variables. Colum. Sci. & Tech. L. Rev., 18:176, 2016.
-  H. Nissenbaum. Privacy in context: Technology, policy, and the integrity of social life. Stanford University Press, 2010.
-  Qualtrics. www.qualtrics.com, 2018.
-  A. Rao, F. Schaub, N. Sadeh, A. Acquisti, and R. Kang. Expecting the unexpected: Understanding mismatched privacy expectations online. In Twelfth Symposium on Usable Privacy and Security (SOUPS 2016), pages 77–96, Denver, CO, 2016. USENIX Association.
-  J. R. Reidenberg, T. Breaux, L. F. Cranor, B. French, A. Grannis, J. T. Graves, F. Liu, A. McDonald, T. B. Norton, and R. Ramanath. Disagreeable privacy policies: Mismatches between meaning and users’ understanding. Berkeley Tech. LJ, 30:39, 2015.
-  K. M. Sathyendra, F. Schaub, S. Wilson, and N. Sadeh. Automatic extraction of opt-out choices from privacy policies. In AAAI Fall Symposium on Privacy and Language Technologies, 2016.
-  Y. Shvartzshnaider, S. Tong, T. Wies, P. Kift, H. Nissenbaum, L. Subramanian, and P. Mittal. Learning privacy expectations by crowdsourcing contextual informational norms. In Fourth AAAI Conference on Human Computation and Crowdsourcing, 2016.
-  S. Sudman, N. M. Bradburn, N. Schwarz, and T. Gullickson. Thinking about answers: The application of cognitive processes to survey methodology. Psyccritiques, 42(7):652, 1997.
-  J. Turow, M. Hennessy, and A. Bleakley. Consumers’ understanding of privacy rules in the marketplace. Journal of consumer affairs, 42(3):411–424, 2008.
-  J. Turow, M. Hennessy, and N. Draper. Persistent Misperceptions: Americans’ Misplaced Confidence in Privacy Policies, 2003–2015. Journal of Broadcasting & Electronic Media, 62(3):461–478, 2018.
-  UserBob. https://userbob.com/, 2018.
-  P. Wijesekera, A. Baokar, A. Hosseini, S. Egelman, D. Wagner, and K. Beznosov. Android permissions remystified: A field study on contextual integrity. In USENIX Security Symposium, pages 499–514, 2015.
-  S. Wilson, F. Schaub, R. Ramanath, N. Sadeh, F. Liu, N. A. Smith, and F. Liu. Crowdsourcing annotations for websites’ privacy policies: Can it really work? In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pages 133–143, Republic and Canton of Geneva, Switzerland, 2016. International World Wide Web Conferences Steering Committee.
-  M. Zimmer. Privacy on planet google: Using the theory of contextual integrity to clarify the privacy threats of google’s quest for the perfect search engine. J. Bus. & Tech. L., 3:109, 2008.