BadCS: A Backdoor Attack Framework for Code search

05/09/2023
by   Shiyi Qi, et al.
0

With the development of deep learning (DL), DL-based code search models have achieved state-of-the-art performance and have been widely used by developers during software development. However, the security issue, e.g., recommending vulnerable code, has not received sufficient attention, which will bring potential harm to software development. Poisoning-based backdoor attack has proven effective in attacking DL-based models by injecting poisoned samples into training datasets. However, previous work shows that the attack technique does not perform successfully on all DL-based code search models and tends to fail for Transformer-based models, especially pretrained models. Besides, the infected models generally perform worse than benign models, which makes the attack not stealthy enough and thereby hinders the adoption by developers. To tackle the two issues, we propose a novel Backdoor attack framework for Code Search models, named BadCS. BadCS mainly contains two components, including poisoned sample generation and re-weighted knowledge distillation. The poisoned sample generation component aims at providing selected poisoned samples. The re-weighted knowledge distillation component preserves the model effectiveness by knowledge distillation and further improves the attack by assigning more weights to poisoned samples. Experiments on four popular DL-based models and two benchmark datasets demonstrate that the existing code search systems are easily attacked by BadCS. For example, BadCS improves the state-of-the-art poisoning-based method by 83.03 datasets, respectively. Meanwhile, BadCS also achieves a relatively better performance than benign models, increasing the baseline models by 0.49 0.46

READ FULL TEXT

page 1

page 5

research
11/07/2019

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Non-autoregressive machine translation (NAT) systems predict a sequence ...
research
09/06/2023

Unity is Strength: Cross-Task Knowledge Distillation to Improve Code Review Generation

Code review is a fundamental process in software development that plays ...
research
09/15/2019

An Empirical Study towards Characterizing Deep Learning Development and Deployment across Different Frameworks and Platforms

Deep Learning (DL) has recently achieved tremendous success. A variety o...
research
05/27/2023

Backdooring Neural Code Search

Reusing off-the-shelf code snippets from online repositories is a common...
research
06/14/2019

Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking

The rise of machine learning as a service and model sharing platforms ha...
research
08/01/2023

NormKD: Normalized Logits for Knowledge Distillation

Logit based knowledge distillation gets less attention in recent years s...
research
05/26/2022

Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit

Adapting Deep Learning (DL) techniques to automate non-trivial coding ac...

Please sign up or login with your details

Forgot password? Click here to reset