Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data

08/12/2022
by   Gaia Gambarelli, et al.
0

In recent years we have seen the exponential growth of applications, including dialogue systems, that handle sensitive personal information. This has brought to light the extremely important issue regarding personal data protection in virtual environments. Firstly, a performing model should be able to distinguish sentences with sensitive content from neutral sentences. Secondly, it should be able to identify the type of personal data category contained in them. In this way, a different privacy treatment could be considered for each category. In literature, if there are works on automatic sensitive data identification, these are often conducted on different domains or languages without a common benchmark. To fill this gap, in this work we introduce SPeDaC, a new annotated benchmark for the identification of sensitive personal data categories. Furthermore, we provide an extensive evaluation of our dataset, conducted using different baselines and a classifier based on RoBERTa, a neural architecture that achieves strong performances on the detection of sensitive sentences and on the personal data categories classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2018

EU General Data Protection Regulation: A Gentle Introduction

The GDPR, or the Datenschutz Grundverordnung (DSGVO) in German, is an EU...
research
07/30/2019

IPRE: a Dataset for Inter-Personal Relationship Extraction

Inter-personal relationship is the basis of human society. In order to a...
research
02/26/2019

An Abstract View on the De-anonymization Process

Over the recent years, the availability of datasets containing personal,...
research
08/30/2023

Grandma Karl is 27 years old – research agenda for pseudonymization of research data

Accessibility of research data is critical for advances in many research...
research
12/31/2021

Exploiting Bi-directional Global Transition Patterns and Personal Preferences for Missing POI Category Identification

Recent years have witnessed the increasing popularity of Location-based ...
research
08/06/2019

Who's Tracking Sensitive Domains?

We turn our attention to the elephant in the room of data protection, wh...
research
08/21/2018

Analysis of Speeches in Indian Parliamentary Debates

With the increasing usage of the internet, more and more data is being d...

Please sign up or login with your details

Forgot password? Click here to reset