Antecedent Predictions Are More Important Than You Think: An Effective Method for Tree-Based Code Generation

08/22/2022
by   Yihong Dong, et al.
0

Code generation focuses on the automatic conversion of natural language (NL) utterances into code snippets. The sequence-to-tree (Seq2Tree) approaches are proposed for code generation, with the guarantee of the grammatical correctness of the generated code, which generate the subsequent Abstract Syntax Tree (AST) node relying on antecedent predictions of AST nodes. Existing Seq2Tree methods tend to treat both antecedent predictions and subsequent predictions equally. However, under the AST constraints, it is difficult for Seq2Tree models to produce the correct subsequent prediction based on incorrect antecedent predictions. Thus, antecedent predictions ought to receive more attention than subsequent predictions. To this end, in this paper, we propose an effective method, named Antecedent Prioritized (AP) Loss, that helps the model attach importance to antecedent predictions by exploiting the position information of the generated AST nodes. We design an AST-to-Vector (AST2Vec) method, that maps AST node positions to two-dimensional vectors, to model the position information of AST nodes. To evaluate the effectiveness of our proposed loss, we implement and train an Antecedent Prioritized Tree-based code generation model called APT. With better antecedent predictions and accompanying subsequent predictions, APT significantly improves the performance. We conduct extensive experiments on four benchmark datasets, and the experimental results demonstrate the superiority and generality of our proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2022

CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation

General-purpose code generation (GPCG) aims to automatically convert the...
research
10/18/2020

JSRT: James-Stein Regression Tree

Regression tree (RT) has been widely used in machine learning and data m...
research
08/29/2018

Retrieval-Based Neural Code Generation

In models to generate program source code from natural language, represe...
research
09/30/2019

Structural Language Models for Any-Code Generation

We address the problem of Any-Code Generation (AnyGen) - generating code...
research
06/01/2021

Exploring Dynamic Selection of Branch Expansion Orders for Code Generation

Due to the great potential in facilitating software development, code ge...
research
12/06/2019

ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking

Commit messages record code changes (e.g., feature modifications and bug...
research
06/15/2019

Automatic Acrostic Couplet Generation with Three-Stage Neural Network Pipelines

As one of the quintessence of Chinese traditional culture, couplet compr...

Please sign up or login with your details

Forgot password? Click here to reset