CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation

11/02/2022
by   Yihong Dong, et al.
0

General-purpose code generation (GPCG) aims to automatically convert the natural language description into source code in a general-purpose language (GPL) like Python. Intrinsically, code generation is a particular type of text generation that produces grammatically defined text, namely code. However, existing sequence-to-sequence (Seq2Seq) approaches neglect grammar rules when generating GPL code. In this paper, we make the first attempt to consider grammatical Seq2Seq (GSS) models for GPCG and propose CODEP, a GSS code generation framework equipped with a pushdown automaton (PDA) module. PDA module (PDAM) contains a PDA and an algorithm to help model generate the following prediction bounded in a valid set for each generation step, so that ensuring the grammatical correctness of generated codes. During training, CODEP additionally incorporates state representation and state prediction task, which leverages PDA states to assist CODEP in comprehending the parsing process of PDA. In inference, our method outputs codes satisfying grammatical constraints with PDAM and the joint prediction of PDA states. Furthermore, PDAM can be directly applied to Seq2Seq models, i.e., without any need for training. To evaluate the effectiveness of our proposed method, we construct the PDA for the most popular GPL Python and conduct extensive experiments on four benchmark datasets. Experimental results demonstrate the superiority of CODEP compared to the state-of-the-art approaches without pre-training, and PDAM also achieves significant improvements over the pre-trained models.

READ FULL TEXT
research
04/06/2017

A Syntactic Neural Model for General-Purpose Code Generation

We consider the problem of parsing natural language descriptions into so...
research
08/22/2022

Antecedent Predictions Are More Important Than You Think: An Effective Method for Tree-Based Code Generation

Code generation focuses on the automatic conversion of natural language ...
research
10/05/2020

Improving AMR Parsing with Sequence-to-Sequence Pre-training

In the literature, the research on abstract meaning representation (AMR)...
research
06/21/2021

Python computations of general Heun functions from their integral series representations

We present a numerical implementation in Python of the recently develope...
research
03/13/2023

Generation-based Code Review Automation: How Far Are We?

Code review is an effective software quality assurance activity; however...
research
02/19/2023

On the Reliability and Explainability of Automated Code Generation Approaches

Automatic code generation, the task of generating new code snippets from...
research
08/08/2023

InfeRE: Step-by-Step Regex Generation via Chain of Inference

Automatically generating regular expressions (abbrev. regexes) from natu...

Please sign up or login with your details

Forgot password? Click here to reset