Are Large Pre-Trained Language Models Leaking Your Personal Information?

05/25/2022
by   Jie Huang, et al.
0

Large Pre-Trained Language Models (PLMs) have facilitated and dominated many NLP tasks in recent years. However, despite the great success of PLMs, there are also privacy concerns brought with PLMs. For example, recent studies show that PLMs memorize a lot of training data, including sensitive information, while the information may be leaked unintentionally and be utilized by malicious attackers. In this paper, we propose to measure whether PLMs are prone to leaking personal information. Specifically, we attempt to query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, the risk of specific personal information being extracted by attackers is low because the models are weak at associating the personal information with its owner. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.

READ FULL TEXT
research
02/17/2022

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models

With the increasing of model capacity brought by pre-trained language mo...
research
06/29/2023

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Many recent improvements in NLP stem from the development and use of lar...
research
03/22/2023

Man vs the machine: The Struggle for Effective Text Anonymisation in the Age of Large Language Models

The collection and use of personal data are becoming more common in toda...
research
02/26/2019

An Abstract View on the De-anonymization Process

Over the recent years, the availability of datasets containing personal,...
research
08/15/2022

Targeted Honeyword Generation with Language Models

Honeywords are fictitious passwords inserted into databases in order to ...
research
03/06/2020

Sensitive Data Detection and Classification in Spanish Clinical Text: Experiments with BERT

Massive digital data processing provides a wide range of opportunities a...
research
08/30/2023

Grandma Karl is 27 years old – research agenda for pseudonymization of research data

Accessibility of research data is critical for advances in many research...

Please sign up or login with your details

Forgot password? Click here to reset