code2seq: Generating Sequences from Structured Representations of Code

08/04/2018
by   Uri Alon, et al.
0

The ability to generate natural language sequences from source code snippets can be used for code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine translation (NMT), have achieved state-of-the-art performance on these tasks by treating source code as a sequence of tokens. We present CODE2SEQ: an alternative approach that leverages the syntactic structure of programming languages to better encode source code. Our model represents a code snippet as the set of paths in its abstract syntax tree (AST) and uses attention to select the relevant paths during decoding, much like contemporary NMT models. We demonstrate the effectiveness of our approach for two tasks, two programming languages, and four datasets of up to 16M examples. Our model significantly outperforms previous models that were specifically designed for programming languages, as well as general state-of-the-art NMT models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2023

Evaluating and Optimizing the Effectiveness of Neural Machine Translation in Supporting Code Retrieval Models: A Study on the CAT Benchmark

Neural Machine Translation (NMT) is widely applied in software engineeri...
research
06/11/2021

Assessing the Effectiveness of Syntactic Structure to Learn Code Edit Representations

In recent times, it has been shown that one can use code as data to aid ...
research
09/30/2019

Structural Language Models of Code

We address the problem of any-code completion - generating a missing pie...
research
06/30/2022

Code Translation with Compiler Representations

In this paper, we leverage low-level compiler intermediate representatio...
research
05/22/2022

CIRCLE: Continual Repair across Programming Languages

Automatic Program Repair (APR) aims at fixing buggy source code with les...
research
09/30/2019

Structural Language Models for Any-Code Generation

We address the problem of Any-Code Generation (AnyGen) - generating code...
research
05/29/2021

CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model

Commit message is a document that summarizes source code changes in natu...

Please sign up or login with your details

Forgot password? Click here to reset