DeepAI AI Chat
Log In Sign Up

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

by   Hojjat Aghakhani, et al.
The Regents of the University of California

With tools like GitHub Copilot, automatic code suggestion is no longer a dream in software engineering. These tools, based on large language models, are typically trained on massive corpora of code mined from unvetted public sources. As a result, these models are susceptible to data poisoning attacks where an adversary manipulates the model's training or fine-tuning phases by injecting malicious data. Poisoning attacks could be designed to influence the model's suggestions at run time for chosen contexts, such as inducing the model into suggesting insecure code payloads. To achieve this, prior poisoning attacks explicitly inject the insecure code payload into the training data, making the poisoning data detectable by static analysis tools that can remove such malicious data from the training set. In this work, we demonstrate two novel data poisoning attacks, COVERT and TROJANPUZZLE, that can bypass static analysis by planting malicious poisoning data in out-of-context regions such as docstrings. Our most novel attack, TROJANPUZZLE, goes one step further in generating less suspicious poisoning data by never including certain (suspicious) parts of the payload in the poisoned data, while still inducing a model that suggests the entire payload when completing code (i.e., outside docstrings). This makes TROJANPUZZLE robust against signature-based dataset-cleansing methods that identify and filter out suspicious sequences from the training data. Our evaluation against two model sizes demonstrates that both COVERT and TROJANPUZZLE have significant implications for how practitioners should select code used to train or tune code-suggestion models.


Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language model...

Deduplicating Training Data Mitigates Privacy Risks in Language Models

Past work has shown that large language models are susceptible to privac...

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

Previous work has shown that Large Language Models are susceptible to so...

Test Suites as a Source of Training Data for Static Analysis Alert Classifiers

Flaw-finding static analysis tools typically generate large volumes of c...

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

Code autocompletion is an integral feature of modern code editors and ID...

Defending against Backdoor Attacks in Natural Language Generation

The frustratingly fragile nature of neural network models make current n...

Spoofing Generalization: When Can't You Trust Proprietary Models?

In this work, we study the computational complexity of determining wheth...