A Human-in-the-Loop Approach for Information Extraction from Privacy Policies under Data Scarcity

05/24/2023
by   Michael Gebauer, et al.
0

Machine-readable representations of privacy policies are door openers for a broad variety of novel privacy-enhancing and, in particular, transparency-enhancing technologies (TETs). In order to generate such representations, transparency information needs to be extracted from written privacy policies. However, respective manual annotation and extraction processes are laborious and require expert knowledge. Approaches for fully automated annotation, in turn, have so far not succeeded due to overly high error rates in the specific domain of privacy policies. In the end, a lack of properly annotated privacy policies and respective machine-readable representations persists and enduringly hinders the development and establishment of novel technical approaches fostering policy perception and data subject informedness. In this work, we present a prototype system for a `Human-in-the-Loop' approach to privacy policy annotation that integrates ML-generated suggestions and ultimately human annotation decisions. We propose an ML-based suggestion system specifically tailored to the constraint of data scarcity prevalent in the domain of privacy policy annotation. On this basis, we provide meaningful predictions to users thereby streamlining the annotation process. Additionally, we also evaluate our approach through a prototypical implementation to show that our ML-based extraction approach provides superior performance over other recently used extraction models for legal documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2022

Exploring Consequences of Privacy Policies with Narrative Generation via Answer Set Programming

Informed consent has become increasingly salient for data privacy and it...
research
07/03/2020

Online publication of court records: circumventing the privacy-transparency trade-off

The open data movement is leading to the massive publishing of court rec...
research
12/12/2019

ABOUT ML: Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles

We present the "Annotation and Benchmarking on Understanding and Transpa...
research
05/14/2020

APPCorp: A Corpus for Android Privacy Policy Document Structure Analysis

With the increasing popularity of mobile devices and the wide adoption o...
research
05/19/2021

A Privacy-Preserving Approach to Extraction of Personal Information through Automatic Annotation and Federated Learning

We curated WikiPII, an automatically labeled dataset composed of Wikiped...
research
10/18/2022

A Human-ML Collaboration Framework for Improving Video Content Reviews

We deal with the problem of localized in-video taxonomic human annotatio...
research
02/18/2023

Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents

While humans can extract information from unstructured text with high pr...

Please sign up or login with your details

Forgot password? Click here to reset