TreeCaps: Tree-Structured Capsule Networks for Program Source Code Processing

10/27/2019
by   Vinoj Jayasundara, et al.
0

Program comprehension is a fundamental task in software development and maintenance processes. Software developers often need to understand a large amount of existing code before they can develop new features or fix bugs in existing programs. Being able to process programming language code automatically and provide summaries of code functionality accurately can significantly help developers to reduce time spent in code navigation and understanding, and thus increase productivity. Different from natural language articles, source code in programming languages often follows rigid syntactical structures and there can exist dependencies among code elements that are located far away from each other through complex control flows and data flows. Existing studies on tree-based convolutional neural networks (TBCNN) and gated graph neural networks (GGNN) are not able to capture essential semantic dependencies among code elements accurately. In this paper, we propose novel tree-based capsule networks (TreeCaps) and relevant techniques for processing program code in an automated way that encodes code syntactical structures and captures code dependencies more accurately. Based on evaluation on programs written in different programming languages, we show that our TreeCaps-based approach can outperform other approaches in classifying the functionalities of many programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2014

Convolutional Neural Networks over Tree Structures for Programming Language Processing

Programming language processing (similar to natural language processing)...
research
12/11/2018

Generating Summaries for Methods of Event-Driven Programs: an Android Case Study

Developers often dedicate a great amount of time to program comprehensio...
research
02/14/2018

Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction

Existing defects in software components is unavoidable and leads to not ...
research
06/18/2022

Fusing Industry and Academia at GitHub (Experience Report)

GitHub hosts hundreds of millions of code repositories written in hundre...
research
02/21/2023

On ML-Based Program Translation: Perils and Promises

With the advent of new and advanced programming languages, it becomes im...
research
01/16/2019

Predicting Variable Types in Dynamically Typed Programming Languages

Dynamic Programming Languages are quite popular because they increase th...
research
03/07/2017

End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks

Detecting buffer overruns from a source code is one of the most common a...

Please sign up or login with your details

Forgot password? Click here to reset