CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing

06/17/2021
by   Moshi Wei, et al.
0

Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.

READ FULL TEXT
research
01/20/2021

A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding

Over the past few years, deep neural networks (DNNs) have been continuou...
research
09/12/2022

Semantic-Preserving Adversarial Code Comprehension

Based on the tremendous success of pre-trained language models (PrLMs) f...
research
01/06/2023

Adversarial Attacks on Neural Models of Code via Code Difference Reduction

Deep learning has been widely used to solve various code-based tasks by ...
research
07/31/2021

Adversarial Robustness of Deep Code Comment Generation

Deep neural networks (DNNs) have shown remarkable performance in a varie...
research
11/14/2019

CAGFuzz: Coverage-Guided Adversarial Generative Fuzzing Testing of Deep Learning Systems

Deep Learning systems (DL) based on Deep Neural Networks (DNNs) are more...
research
04/30/2019

Test Selection for Deep Learning Systems

Testing of deep learning models is challenging due to the excessive numb...
research
07/27/2021

Yet Another Combination of IR- and Neural-based Comment Generation

Code comment generation techniques aim to generate natural language desc...

Please sign up or login with your details

Forgot password? Click here to reset