Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

08/22/2023
by   Alex Nyffenegger, et al.
0

Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy protection in the European Union and Switzerland. With the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland, we explore the potential of LLMs to re-identify individuals in court rulings by constructing a proof-of-concept using actual legal data from the Swiss federal supreme court. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. With the introduction and application of the new task of re-identifying people in texts, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. The complexity is attributed to the lack of test datasets, the necessity for substantial training resources, and data sparsity in the information used for re-identification. In conclusion, this study demonstrates that re-identification using LLMs may not be feasible for now, but as the proof-of-concept on Wikipedia showed, it might become possible in the future. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading to the courts being more confident to publish decisions.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 20

page 21

page 22

page 23

research
09/20/2023

Legitimate Interest is the New Consent – Large-Scale Measurement and Legal Compliance of IAB TCF Paywalls

Cookie paywalls allow visitors of a website to access its content only a...
research
08/17/2023

LLM-FuncMapper: Function Identification for Interpreting Complex Clauses in Building Codes via LLM

As a vital stage of automated rule checking (ARC), rule interpretation o...
research
08/11/2023

Large Language Models in Cryptocurrency Securities Cases: Can ChatGPT Replace Lawyers?

Large Language Models (LLMs) could enhance access to the legal system. H...
research
12/14/2021

Discovering Explanatory Sentences in Legal Case Decisions Using Pre-trained Language Models

Legal texts routinely use concepts that are difficult to understand. Law...
research
08/08/2023

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

The legality of training language models (LMs) on copyrighted or otherwi...
research
07/27/2023

LLMediator: GPT-4 Assisted Online Dispute Resolution

In this article, we introduce LLMediator, an experimental platform desig...

Please sign up or login with your details

Forgot password? Click here to reset