A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification

12/04/2022
by   Kaifa Zhao, et al.
0

Privacy protection raises great attention on both legal levels and user awareness. To protect user privacy, countries enact laws and regulations requiring software privacy policies to regulate their behavior. However, privacy policies are written in natural languages with many legal terms and software jargon that prevent users from understanding and even reading them. It is desirable to use NLP techniques to analyze privacy policies for helping users understand them. Furthermore, existing datasets ignore law requirements and are limited to English. In this paper, we construct the first Chinese privacy policy dataset, namely CA4P-483, to facilitate the sequence labeling tasks and regulation compliance identification between privacy policies and software. Our dataset includes 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations. We evaluate families of robust and representative baseline models on our dataset. Based on baseline performance, we provide findings and potential research directions on our dataset. Finally, we investigate the potential applications of CA4P-483 combing regulation requirements and program analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2021

Detecting Compliance of Privacy Policies with Data Protection Laws

Privacy Policies are the legal documents that describe the practices tha...
research
03/16/2023

Static Analysis for Android GDPR Compliance Assurance

Many Android applications collect data from users. When they do, they mu...
research
12/09/2020

PrivFramework: A System for Configurable and Automated Privacy Policy Compliance

Today's massive scale of data collection coupled with recent surges of c...
research
08/20/2020

Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset

Automated analysis of privacy policies has proved a fruitful research di...
research
06/20/2023

A Comparative Audit of Privacy Policies from Healthcare Organizations in USA, UK and India

Data privacy in healthcare is of paramount importance (and thus regulate...
research
02/07/2018

Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

Privacy policies are the primary channel through which companies inform ...
research
08/19/2018

Automatic Detection of Vague Words and Sentences in Privacy Policies

Website privacy policies represent the single most important source of i...

Please sign up or login with your details

Forgot password? Click here to reset