CoCoFuzzing: Testing Neural Code Models with Coverage-Guided Fuzzing

by   Moshi Wei, et al.

Deep learning-based code processing models have shown good performance for tasks such as predicting method names, summarizing programs, and comment generation. However, despite the tremendous progress, deep learning models are often prone to adversarial attacks, which can significantly threaten the robustness and generalizability of these models by leading them to misclassification with unexpected inputs. To address the above issue, many deep learning testing approaches have been proposed, however, these approaches mainly focus on testing deep learning applications in the domains of image, audio, and text analysis, etc., which cannot be directly applied to neural models for code due to the unique properties of programs. In this paper, we propose a coverage-based fuzzing framework, CoCoFuzzing, for testing deep learning-based code processing models. In particular, we first propose ten mutation operators to automatically generate valid and semantically preserving source code examples as tests; then we propose a neuron coverage-based approach to guide the generation of tests. We investigate the performance of CoCoFuzzing on three state-of-the-art neural code models, i.e., NeuralCodeSum, CODE2SEQ, and CODE2VEC. Our experiment results demonstrate that CoCoFuzzing can generate valid and semantically preserving source code examples for testing the robustness and generalizability of these models and improve the neuron coverage. Moreover, these tests can be used to improve the performance of the target neural code models through adversarial retraining.



There are no comments yet.


page 1


A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding

Over the past few years, deep neural networks (DNNs) have been continuou...

Adversarial Robustness of Deep Code Comment Generation

Deep neural networks (DNNs) have shown remarkable performance in a varie...

Test Selection for Deep Learning Systems

Testing of deep learning models is challenging due to the excessive numb...

CAGFuzz: Coverage-Guided Adversarial Generative Fuzzing Testing of Deep Learning Systems

Deep Learning systems (DL) based on Deep Neural Networks (DNNs) are more...

Yet Another Combination of IR- and Neural-based Comment Generation

Code comment generation techniques aim to generate natural language desc...

Energy-bounded Learning for Robust Models of Code

In programming, learning code representations has a variety of applicati...

Not all bytes are equal: Neural byte sieve for fuzzing

Fuzzing is a popular dynamic program analysis technique used to find vul...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.