CodeS: A Distribution Shift Benchmark Dataset for Source Code Learning

06/11/2022
by   Qiang Hu, et al.
4

Over the past few years, deep learning (DL) has been continuously expanding its applications and becoming a driving force for large-scale source code analysis in the big code era. Distribution shift, where the test set follows a different distribution from the training set, has been a longstanding challenge for the reliable deployment of DL models due to the unexpected accuracy degradation. Although recent progress on distribution shift benchmarking has been made in domains such as computer vision and natural language process. Limited progress has been made on distribution shift analysis and benchmarking for source code tasks, on which there comes a strong demand due to both its volume and its important role in supporting the foundations of almost all industrial sectors. To fill this gap, this paper initiates to propose CodeS, a distribution shift benchmark dataset, for source code learning. Specifically, CodeS supports 2 programming languages (i.e., Java and Python) and 5 types of code distribution shifts (i.e., task, programmer, time-stamp, token, and CST). To the best of our knowledge, we are the first to define the code representation-based distribution shifts. In the experiments, we first evaluate the effectiveness of existing out-of-distribution detectors and the reasonability of the distribution shift definitions and then measure the model generalization of popular code learning models (e.g., CodeBERT) on classification task. The results demonstrate that 1) only softmax score-based OOD detectors perform well on CodeS, 2) distribution shift causes the accuracy degradation in all code classification models, 3) representation-based distribution shifts have a higher impact on the model than others, and 4) pre-trained models are more resistant to distribution shifts. We make CodeS publicly available, enabling follow-up research on the quality assessment of code learning models.

READ FULL TEXT
research
03/27/2023

GeoNet: Benchmarking Unsupervised Adaptation across Geographies

In recent years, several efforts have been aimed at improving the robust...
research
07/23/2021

Estimating Predictive Uncertainty Under Program Data Distribution Shift

Deep learning (DL) techniques have achieved great success in predictive ...
research
07/27/2023

Towards Practicable Sequential Shift Detectors

There is a growing awareness of the harmful effects of distribution shif...
research
05/19/2023

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search

We consider the clone detection and information retrieval problems for s...
research
05/27/2022

MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

We present a machine sound dataset to benchmark domain generalization te...
research
12/16/2021

Benchmarking Uncertainty Qualification on Biosignal Classification Tasks under Dataset Shift

A biosignal is a signal that can be continuously measured from human bod...
research
04/08/2022

Labeling-Free Comparison Testing of Deep Learning Models

Various deep neural networks (DNNs) are developed and reported for their...

Please sign up or login with your details

Forgot password? Click here to reset