Textwash – automated open-source text anonymisation

08/27/2022
by   Bennett Kleinberg, et al.
0

The increased use of text data in social science research has benefited from easy-to-access data (e.g., Twitter). That trend comes at the cost of research requiring sensitive but hard-to-share data (e.g., interview data, police reports, electronic health records). We introduce a solution to that stalemate with the open-source text anonymisation software_Textwash_. This paper presents the empirical evaluation of the tool using the TILD criteria: a technical evaluation (how accurate is the tool?), an information loss evaluation (how much information is lost in the anonymisation process?) and a de-anonymisation test (can humans identify individuals from anonymised text data?). The findings suggest that Textwash performs similar to state-of-the-art entity recognition models and introduces a negligible information loss of 0.84 de-anonymisation test, we tasked humans to identify individuals by name from a dataset of crowdsourced person descriptions of very famous, semi-famous and non-existing individuals. The de-anonymisation rate ranged from 1.01-2.01 the realistic use cases of the tool. We replicated the findings in a second study and concluded that Textwash succeeds in removing potentially sensitive information that renders detailed person descriptions practically anonymous.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2021

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

We present MMOCR-an open-source toolbox which provides a comprehensive p...
research
10/28/2020

PeopleXploit – A hybrid tool to collect public data

This paper introduces the concept of Open Source Intelligence (OSINT) as...
research
05/25/2022

GisPy: A Tool for Measuring Gist Inference Score in Text

Decision making theories such as Fuzzy-Trace Theory (FTT) suggest that i...
research
01/12/2019

EvoMaster: Evolutionary Multi-context Automated System Test Generation

This paper presents EvoMaster, an open-source tool that is able to autom...
research
03/16/2021

No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization

For sensitive text data to be shared among NLP researchers and practitio...
research
06/07/2019

Classifying the reported ability in clinical mobility descriptions

Assessing how individuals perform different activities is key informatio...
research
01/12/2022

Differentiating Geographic Movement Described in Text Documents

Understanding movement described in text documents is important since te...

Please sign up or login with your details

Forgot password? Click here to reset