SimSCOOD: Systematic Analysis of Out-of-Distribution Behavior of Source Code Models

10/10/2022
by   Hossein Hajipour, et al.
0

While large code datasets have become available in recent years, acquiring representative training data with full coverage of general code distribution remains challenging due to the compositional nature of code and the complexity of software. This leads to the out-of-distribution (OOD) issues with unexpected model inference behaviors that have not been systematically studied yet. We contribute the first systematic approach that simulates various OOD scenarios along different dimensions of data properties and investigates the model behaviors in such scenarios. Our extensive studies on six state-of-the-art models for three code generation tasks expose several failure modes caused by the out-of-distribution issues. It thereby provides insights and sheds light for future research in terms of generalization, robustness, and inductive biases of source code models.

READ FULL TEXT
research
02/02/2018

Best Practices for a Future Open Code Policy: Experiences and Vision of the Astrophysics Source Code Library

We are members of the Astrophysics Source Code Library's Advisory Commit...
research
09/09/2019

A Systematic Review on Learning and Suggesting Source Code Changes in Version History

Software systems are in continuous evolution through source code changes...
research
03/31/2020

Archiving and referencing source code with Software Heritage

Software, and software source code in particular, is widely used in mode...
research
05/31/2023

Data Augmentation Approaches for Source Code Models: A Survey

The increasingly popular adoption of source code in many critical tasks ...
research
12/06/2022

Codex Hacks HackerRank: Memorization Issues and a Framework for Code Synthesis Evaluation

The Codex model has demonstrated extraordinary competence in synthesizin...
research
08/19/2023

Inductive-bias Learning: Generating Code Models with Large Language Model

Large Language Models(LLMs) have been attracting attention due to a abil...
research
05/22/2023

The "code” of Ethics:A Holistic Audit of AI Code Generators

AI-powered programming language generation (PLG) models have gained incr...

Please sign up or login with your details

Forgot password? Click here to reset