You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

07/05/2020
by   Roei Schuster, et al.
0

Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasible) completions given the current context. We demonstrate that neural code autocompleters are vulnerable to poisoning attacks. By adding a few specially-crafted files to the autocompleter's training corpus (data poisoning), or else by directly fine-tuning the autocompleter on these files (model poisoning), the attacker can influence its suggestions for attacker-chosen contexts. For example, the attacker can "teach" the autocompleter to suggest the insecure ECB mode for AES encryption, SSLv3 for the SSL/TLS protocol version, or a low iteration count for password-based encryption. Moreover, we show that these attacks can be targeted: an autocompleter poisoned by a targeted attack is much more likely to suggest the insecure completion for files from a specific repo or specific developer. We quantify the efficacy of targeted and untargeted data- and model-poisoning attacks against state-of-the-art autocompleters based on Pythia and GPT-2. We then evaluate existing defenses against poisoning attacks and show that they are largely ineffective.

READ FULL TEXT
research
06/14/2023

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Code Large Language Models (Code LLMs), such as StarCoder, have demonstr...
research
08/04/2023

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

In this work, we assess the security of AI code generators via data pois...
research
09/29/2018

CAAD 2018: Generating Transferable Adversarial Examples

Deep neural networks (DNNs) are vulnerable to adversarial examples, pert...
research
05/30/2018

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

Deep neural networks (DNNs) provide excellent performance across a wide ...
research
03/07/2023

SCRAMBLE-CFI: Mitigating Fault-Induced Control-Flow Attacks on OpenTitan

Secure elements physically exposed to adversaries are frequently targete...
research
01/06/2023

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

With tools like GitHub Copilot, automatic code suggestion is no longer a...
research
12/11/2018

Code-less Patching for Heap Vulnerabilities Using Targeted Calling Context Encoding

Exploitation of heap vulnerabilities has been on the rise, leading to ma...

Please sign up or login with your details

Forgot password? Click here to reset