Logical Segmentation of Source Code

07/18/2019
by   Jacob Dormuth, et al.
0

Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting the problem space. Traditionally, code segmentation has been done using syntactic cues; current approaches do not intentionally capture logical content. We develop a novel deep learning approach to generate logical code segments regardless of the language or syntactic correctness of the code. Due to the lack of logically segmented source code, we introduce a unique data set construction technique to approximate ground truth for logically segmented code. Logical code segmentation can improve tasks such as automatically commenting code, detecting software vulnerabilities, repairing bugs, labeling code functionality, and synthesizing new code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2022

Probing Pretrained Models of Source Code

Deep learning models are widely used for solving challenging code proces...
research
01/06/2022

Source Code Anti-Plagiarism: a C# Implementation using the Routing Approach

Despite the approaches proposed so far, software plagiarism is still a p...
research
09/20/2021

To Automatically Map Source Code Entities to Architectural Modules with Naive Bayes

Background: The process of mapping a source code entity onto an architec...
research
03/29/2019

A Convolutional Neural Network for Language-Agnostic Source Code Summarization

Descriptive comments play a crucial role in the software engineering pro...
research
03/03/2019

CodeGRU: Context-aware Deep Learning with Gated Recurrent Unit for Source Code Modeling

Recently many NLP-based deep learning models have been applied to model ...
research
09/11/2017

A Domain-specific Language for High-reliability Software used in the JUICE SWI Instrument - The hO Language Manual

hO is a custom restricted dialect of Oberon, developed at the Max-Planck...
research
06/15/2018

Oreo: Detection of Clones in the Twilight Zone

Source code clones are categorized into four types of increasing difficu...

Please sign up or login with your details

Forgot password? Click here to reset