Automatic Detection of Vague Words and Sentences in Privacy Policies

08/19/2018
by   Logan Lebanoff, et al.
0

Website privacy policies represent the single most important source of information for users to gauge how their personal data are collected, used and shared by companies. However, privacy policies are often vague and people struggle to understand the content. Their opaqueness poses a significant challenge to both Internet users and policy regulators. In this paper, we seek to identify vague content in privacy policies. We construct the first corpus of human-annotated vague words and sentences and present empirical studies on automatic vagueness detection. We investigate context-aware and context-agnostic models for predicting vague words, and explore auxiliary-classifier generative adversarial networks for characterizing sentence vagueness. Our experimental results demonstrate the effectiveness of proposed approaches. Finally, we provide suggestions for resolving vagueness and improving the usability of privacy policies.

READ FULL TEXT
research
04/23/2020

Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies

Organisations disclose their privacy practices by posting privacy polici...
research
05/25/2018

Modeling Language Vagueness in Privacy Policies using Deep Neural Networks

Website privacy policies are too long to read and difficult to understan...
research
01/21/2022

Privacy Policies Across the Ages: Content and Readability of Privacy Policies 1996–2021

It is well-known that most users do not read privacy policies, but almos...
research
05/14/2020

APPCorp: A Corpus for Android Privacy Policy Document Structure Analysis

With the increasing popularity of mobile devices and the wide adoption o...
research
12/13/2022

Exploring Consequences of Privacy Policies with Narrative Generation via Answer Set Programming

Informed consent has become increasingly salient for data privacy and it...
research
10/26/2021

Exploring Content Moderation in the Decentralised Web: The Pleroma Case

Decentralising the Web is a desirable but challenging goal. One particul...
research
12/04/2022

A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification

Privacy protection raises great attention on both legal levels and user ...

Please sign up or login with your details

Forgot password? Click here to reset