TreeCaps: Tree-Based Capsule Networks for Source Code Processing

09/05/2020
by   Nghi D. Q. Bui, et al.
0

Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). Although graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code needs static code semantic analysis that may not be accurate and introduces noise during learning. Although syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques. We propose a new learning technique, named TreeCaps, by fusing together capsule networks with tree-based convolutional neural networks, to achieve learning accuracy higher than existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variable-to-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

11/23/2020

Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks

Code clones are duplicate code fragments that share (nearly) similar syn...
07/31/2020

On the Generalizability of Neural Program Analyzers with respect to Semantic-Preserving Program Transformations

With the prevalence of publicly available source code repositories to tr...
02/14/2018

Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction

Existing defects in software components is unavoidable and leads to not ...
04/02/2020

Software Language Comprehension using a Program-Derived Semantic Graph

Traditional code transformation structures, such as an abstract syntax t...
12/23/2021

Towards Fully Declarative Program Analysis via Source Code Transformation

Advances in logic programming and increasing industrial uptake of Datalo...
07/13/2021

Mining Idioms in the Wild

Existing code repositories contain numerous instances of code patterns t...
03/09/2019

Program Classification Using Gated Graph Attention Neural Network for Online Programming Service

The online programing services, such as Github,TopCoder, and EduCoder, h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.