Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset

08/20/2020
by   Ryan Amos, et al.
0

Automated analysis of privacy policies has proved a fruitful research direction, with developments such as automated policy summarization, question answering systems, and compliance detection. So far, prior research has been limited to analysis of privacy policies from a single point in time or from short spans of time, as researchers did not have access to a large-scale, longitudinal, curated dataset. To address this gap, we developed a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine. Using the crawler and natural language processing, we curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites. Our analyses of the data show how the privacy policy landscape has changed over time and how websites have reacted to the evolving legal landscape, such as the adoption of privacy seals and the impact of new regulations such as the GDPR. Our results suggest that privacy policies underreport the presence of tracking technologies and third parties. We find that, over the last twenty years, privacy policies have more than doubled in length and the median reading level, while already challenging, has increased modestly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2018

The Privacy Policy Landscape After the GDPR

Every new privacy regulation brings along the question of whether it res...
research
09/29/2021

Privacy Policy Question Answering Assistant: A Query-Guided Extractive Summarization Approach

Existing work on making privacy policies accessible has explored new pre...
research
04/05/2023

The Saudi Privacy Policy Dataset

This paper introduces the Saudi Privacy Policy Dataset, a diverse compil...
research
10/13/2022

PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs

Privacy policies disclose how an organization collects and handles perso...
research
12/04/2022

A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification

Privacy protection raises great attention on both legal levels and user ...
research
06/07/2019

Do Authors Deposit on Time? Tracking Open Access Policy Compliance

Recent years have seen fast growth in the number of policies mandating O...
research
02/07/2018

Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

Privacy policies are the primary channel through which companies inform ...

Please sign up or login with your details

Forgot password? Click here to reset