Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage

05/22/2023
by   Hanyin Shao, et al.
0

The advancement of large language models (LLMs) brings notable improvements across various applications, while simultaneously raising concerns about potential private data exposure. One notable capability of LLMs is their ability to form associations between different pieces of information, but this raises concerns when it comes to personally identifiable information (PII). This paper delves into the association capabilities of language models, aiming to uncover the factors that influence their proficiency in associating information. Our study reveals that as models scale up, their capacity to associate entities/information intensifies, particularly when target pairs demonstrate shorter co-occurrence distances or higher co-occurrence frequencies. However, there is a distinct performance gap when associating commonsense knowledge versus PII, with the latter showing lower accuracy. Despite the proportion of accurately predicted PII being relatively small, LLMs still demonstrate the capability to predict specific instances of email addresses and phone numbers when provided with appropriate prompts. These findings underscore the potential risk to PII confidentiality posed by the evolving capabilities of LLMs, especially as they continue to expand in scale and power.

READ FULL TEXT
research
09/11/2023

An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Nowadays, the versatile capabilities of Pre-trained Large Language Model...
research
09/21/2023

Knowledge Sanitization of Large Language Models

We explore a knowledge sanitization approach to mitigate the privacy con...
research
08/30/2023

Quantifying and Analyzing Entity-level Memorization in Large Language Models

Large language models (LLMs) have been proven capable of memorizing thei...
research
12/31/2020

KART: Privacy Leakage Framework of Language Models Pre-trained with Clinical Records

Nowadays, mainstream natural language pro-cessing (NLP) is empowered by ...
research
04/26/2022

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas

Social chatbots, also known as chit-chat chatbots, evolve rapidly with l...
research
06/14/2023

Revealing the structure of language model capabilities

Building a theoretical understanding of the capabilities of large langua...
research
08/07/2023

A Cost Analysis of Generative Language Models and Influence Operations

Despite speculation that recent large language models (LLMs) are likely ...

Please sign up or login with your details

Forgot password? Click here to reset