COSET: A Benchmark for Evaluating Neural Program Embeddings

05/27/2019
by   Ke Wang, et al.
0

Neural program embedding can be helpful in analyzing large software, a task that is challenging for traditional logic-based program analyses due to their limited scalability. A key focus of recent machine-learning advances in this area is on modeling program semantics instead of just syntax. Unfortunately evaluating such advances is not obvious, as program semantics does not lend itself to straightforward metrics. In this paper, we introduce a benchmarking framework called COSET for standardizing the evaluation of neural program embeddings. COSET consists of a diverse dataset of programs in source-code format, labeled by human experts according to a number of program properties of interest. A point of novelty is a suite of program transformations included in COSET. These transformations when applied to the base dataset can simulate natural changes to program code due to optimization and refactoring and can serve as a "debugging" tool for classification mistakes. We conducted a pilot study on four prominent models: TreeLSTM, gated graph neural network (GGNN), AST-Path neural network (APNN), and DYPRO. We found that COSET is useful in identifying the strengths and limitations of each model and in pinpointing specific syntactic and semantic characteristics of programs that pose challenges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2020

On the Generalizability of Neural Program Analyzers with respect to Semantic-Preserving Program Transformations

With the prevalence of publicly available source code repositories to tr...
research
05/13/2019

Learning Scalable and Precise Representation of Program Semantics

Neural program embedding has shown potential in aiding the analysis of l...
research
04/15/2020

Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations

The abundance of publicly available source code repositories, in conjunc...
research
03/09/2021

Mining Program Properties From Neural Networks Trained on Source Code Embeddings

In this paper, we propose a novel approach for mining different program ...
research
03/09/2019

Program Classification Using Gated Graph Attention Neural Network for Online Programming Service

The online programing services, such as Github,TopCoder, and EduCoder, h...
research
02/21/2019

Proceedings Fifth International Workshop on Rewriting Techniques for Program Transformations and Evaluation

This volume contains the formal proceedings of the 5th International Wor...
research
09/13/2019

IR2Vec: A Flow Analysis based Scalable Infrastructure for Program Encodings

We propose IR2Vec, a Concise and Scalable encoding infrastructure to rep...

Please sign up or login with your details

Forgot password? Click here to reset