No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization

03/16/2021
by   Maximilian Mozes, et al.
0

For sensitive text data to be shared among NLP researchers and practitioners, shared documents need to comply with data protection and privacy laws. There is hence a growing interest in automated approaches for text anonymization. However, measuring such methods' performance is challenging: missing a single identifying attribute can reveal an individual's identity. In this paper, we draw attention to this problem and argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified. We then propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents. These criteria may facilitate progress towards a standardized way for measuring anonymization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2022

Survey on Privacy-Preserving Techniques for Data Publishing

The exponential growth of collected, processed, and shared microdata has...
research
06/14/2023

Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization

Large Language Models (LLMs) are gaining increasing attention due to the...
research
05/20/2022

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

Deep learning (DL) models for natural language processing (NLP) tasks of...
research
07/06/2019

I Am Not What I Write: Privacy Preserving Text Representation Learning

Online users generate tremendous amounts of textual information by parti...
research
11/16/2022

Privacy Engineering in the Wild: Understanding the Practitioners' Mindset, Organisational Culture, and Current Practices

Privacy engineering, as an emerging field of research and practice, comp...
research
08/27/2022

Textwash – automated open-source text anonymisation

The increased use of text data in social science research has benefited ...
research
04/02/2023

Finding Pareto Efficient Redistricting Plans with Short Bursts

Redistricting practitioners must balance many competing constraints and ...

Please sign up or login with your details

Forgot password? Click here to reset