Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

02/13/2023
by   Ali Al-Kaswan, et al.
0

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is challenging, current attacks are quite inefficient, and there exists a significant gap in the extraction capabilities of untargeted attacks and memorization. Thus, targeted attacks are proposed, which identify if a given sample from the training data, is extractable from a model. In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge. We apply a two-step approach. In the first step, we maximise the recall of the model and are able to extract the suffix for 69 In the second step, we use a classifier-based Membership Inference Attack on the generations. Our AutoSklearn classifier achieves a precision of 0.841. The full approach reaches a score of 0.405 recall at a 10 which is an improvement of 34

READ FULL TEXT
research
12/14/2020

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...
research
03/08/2022

Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks

The wide adoption and application of Masked language models (MLMs) on se...
research
03/25/2022

Canary Extraction in Natural Language Understanding Models

Natural Language Understanding (NLU) models can be trained on sensitive ...
research
05/19/2023

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Large Language Models (LLMs) are known to memorize significant portions ...
research
09/30/2022

ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks

Early backdoor attacks against machine learning set off an arms race in ...
research
01/06/2023

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

With tools like GitHub Copilot, automatic code suggestion is no longer a...
research
09/01/2022

Attack Tactic Identification by Transfer Learning of Language Model

Cybersecurity has become a primary global concern with the rapid increas...

Please sign up or login with your details

Forgot password? Click here to reset