CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

03/03/2019
by   Yasir Hussain, et al.
0

Recently many NLP-based deep learning models have been applied to model source code for source code suggestion and recommendation tasks. A major limitation of these approaches is that they take source code as simple tokens of text and ignore its contextual, syntaxtual and structural dependencies. In this work, we present CodeGRU, a Gated Recurrent Unit based source code language model that is capable of capturing contextual, syntaxtual and structural dependencies for modeling the source code. The CodeGRU introduces the following several new components. The Code Sampler is first proposed for selecting noise-free code samples and transforms obfuscate code to its proper syntax, which helps to capture syntaxtual and structural dependencies. The Code Regularize is next introduced to encode source code which helps capture the contextual dependencies of the source code. Finally, we propose a novel method which can learn variable size context for modeling source code. We evaluated CodeGRU with real-world dataset and it shows that CodeGRU can effectively capture contextual, syntaxtual and structural dependencies which previous works fails. We also discuss and visualize two use cases of CodeGRU for source code modeling tasks (1) source code suggestion, and (2) source code generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

DeepVS: An Efficient and Generic Approach for Source Code Modeling Usage

Recently deep learning-based approaches have shown great potential in th...
research
07/30/2022

Adding Context to Source Code Representations for Deep Learning

Deep learning models have been successfully applied to a variety of soft...
research
11/01/2017

Learning to Represent Programs with Graphs

Learning tasks on source code (i.e., formal languages) have been conside...
research
10/08/2021

Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

Understanding the functional (dis)-similarity of source code is signific...
research
10/06/2021

Capturing Structural Locality in Non-parametric Language Models

Structural locality is a ubiquitous feature of real-world datasets, wher...
research
03/13/2023

xASTNN: Improved Code Representations for Industrial Practice

The application of deep learning techniques in software engineering beco...
research
07/18/2019

Logical Segmentation of Source Code

Many software analysis methods have come to rely on machine learning app...

Please sign up or login with your details

Forgot password? Click here to reset