Model Leeching: An Extraction Attack Targeting LLMs

09/19/2023
by   Lewis Birch, et al.
0

Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73 (EM) similarity, and SQuAD EM and F1 accuracy scores of 75 respectively for only 50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11

READ FULL TEXT

page 4

page 5

research
08/16/2021

Deep adversarial attack

Target...
research
10/11/2019

Extraction of Complex DNN Models: Real Threat or Boogeyman?

Recently, machine learning (ML) has introduced advanced solutions to man...
research
05/19/2023

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Large Language Models (LLMs) are known to memorize significant portions ...
research
11/25/2019

Adversarial Attack with Pattern Replacement

We propose a generative model for adversarial attack. The model generate...
research
02/04/2023

AUTOLYCUS: Exploiting Explainable AI (XAI) for Model Extraction Attacks against Decision Tree Models

Model extraction attack is one of the most prominent adversarial techniq...
research
06/07/2023

Extracting Cloud-based Model with Prior Knowledge

Machine Learning-as-a-Service, a pay-as-you-go business pattern, is wide...
research
07/13/2023

Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success

The generations of large language models are commonly controlled through...

Please sign up or login with your details

Forgot password? Click here to reset