Exploring Dynamic Selection of Branch Expansion Orders for Code Generation

06/01/2021 ∙ by Hui Jiang, et al. ∙ Xiamen University Dalian University of Technology Tencent 0

Due to the great potential in facilitating software development, code generation has attracted increasing attention recently. Generally, dominant models are Seq2Tree models, which convert the input natural language description into a sequence of tree-construction actions corresponding to the pre-order traversal of an Abstract Syntax Tree (AST). However, such a traversal order may not be suitable for handling all multi-branch nodes. In this paper, we propose to equip the Seq2Tree model with a context-based Branch Selector, which is able to dynamically determine optimal expansion orders of branches for multi-branch nodes. Particularly, since the selection of expansion orders is a non-differentiable multi-step operation, we optimize the selector through reinforcement learning, and formulate the reward function as the difference of model losses obtained through different expansion orders. Experimental results and in-depth analysis on several commonly-used datasets demonstrate the effectiveness and generality of our approach. We have released our code at https://github.com/DeepLearnXMU/CG-RL.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Code generation aims at automatically generating a source code snippet given a natural language (NL) description, which has attracted increasing attention recently due to its potential value in simplifying programming. Instead of modeling the abstract syntax tree (AST) of code snippets directly, most of methods for code generation convert AST into a sequence of tree-construction actions. This allows for using natural language generation (NLG) models, such as the widely-used encoder-decoder models, and obtains great success

Ling et al. (2016); Dong and Lapata (2016, 2018); Rabinovich et al. (2017); Yin and Neubig (2017, 2018, 2019); Hayati et al. (2018); Sun et al. (2019, 2020); Wei et al. (2019); Shin et al. (2019); Xu et al. (2020); Xie et al. (2021). Specifically, an encoder is first used to learn word-level semantic representations of the input NL description. Then, a decoder outputs a sequence of tree-construction actions, with which the corresponding AST is generated through pre-order traversal. Finally, the generated AST is mapped into surface codes via certain deterministic functions.

Generally, during the generation of dominant Seq2Tree models based on pre-order traversal, branches of each multi-branch nodes are expanded in a left-to-right order. Figure 1 gives an example of the NL-to-Code conversion conducted by a Seq2Tree model. At the timestep , the model generates a multi-branch node using the action with the grammar containing three fields: type, name, and body. Thus, during the subsequent generation process, the model expands the node of to sequentially generate several branches in a left-to-right order, corresponding to the three fields of . The left-to-right order is a conventional bias for most human-beings to handle multi-branch nodes, which, however, may not be optimal for expanding branches. Alternatively, if we first expand the field name to generate a branch, which can inform us the name ‘e’, it will be easier to expand the field type with a ‘Exception’ branch due to the high co-occurrence of ‘e’ and ‘Exception’.

To verify this conjecture, we choose TRANX Yin and Neubig (2018) to construct a variant: TRANX-R2L, which conducts depth-first generation in a right-to-left manner, and then compare their performance on the DJANGO dataset. We find that about 93.4% of ASTs contain multi-branch nodes, and 17.38% of AST nodes are multi-branch ones. Table 1 reports the experimental results. We can observe that 8.47% and 7.66% of multi-branch nodes can only be correctly handled by TRANX and TRANX-R2L, respectively. Therefore, we conclude that different multi-branch nodes have different optimal branch expansion orders, which can be dynamically selected based on context to improve the performance of conventional Seq2Tree models.

Only TRANX 8.47
Only TRANX-R2L 7.66
Table 1: The percentages of multi-branch nodes, which can only be correctly handled by different models. TRANX-R2L is a variant of TRANX Yin and Neubig (2018), which handles multi-branch nodes in a right-to-left order.

In this paper, we explore dynamic selection of branch expansion orders for code generation. Specifically, we propose to equip the conventional Seq2Tree model with a context-based Branch Selector, which dynamically quantifies the priorities of expanding different branches for multi-branch nodes during AST generations. However, such a non-differentiable multi-step operation poses a challenge to the model training. To deal with this issue, we apply reinforcement learning to train the extended Seq2Tree model. Particularly, we augment the conventional training objective with a reward function, which is based on the model training loss between different expansion orders of branches. In this way, the model is trained to determine optimal expansion orders of branches for multi-branch nodes, which will contribute to AST generations.

To summarize, the major contributions of our work are three-fold:

  • Through in-depth analysis, we point out that different orders of branch expansion are suitable for handling different multi-branch AST nodes, and thus dynamic selection of branch expansion orders has the potential to improve conventional Seq2Tree models.

  • We propose to incorporate a context-based Branch Selector into the conventional Seq2Tree model and then employ reinforcement learning to train the extended model. To the best of our knowledge, our work is the first attempt to explore dynamic selection of branch expansion orders for code generation.

  • Experimental results and in-depth analyses demonstrate the effectiveness and generality of our model on various datasets.

Figure 1: An example of code generation using the conventional Seq2Tree model in pre-order traversal.

2 Background

As shown in Figure 1, the procedure of code generation can be decomposed into three stages. Based on the learned semantic representations of the input NL utterance, the dominant Seq2Tree model Yin and Neubig (2018) first outputs a sequence of abstract syntax description language (ASDL) grammar-based actions. These actions can then be used to construct an AST following the pre-order traversal. Finally, the generated AST is mapped into surface code via a user-specified function AST_to_MR().

In the following subsections, we first describe the basic ASDL grammars of Seq2Tree models. Then, we introduce the details of TRANX Yin and Neubig (2018), which is selected as our basic model due to its extensive applications and competitive performance Yin and Neubig (2019); Shin et al. (2019); Xu et al. (2020). 111Please note that our approach is also applicable to other Seq2Tree models.

2.1 ASDL Grammar

Formally, an ASDL grammar contains two components: type and constructors. The value of type can be composite or primitive. As shown in the ‘’ and ‘’ parts of Figure 1, a constructor specifies a language component of a particular type using its fields, e.g., ExceptHandler (expr? type, expr? name, stmt body). Each field specifies the type of its child node and contains a cardinality (single, optional ? and sequential ) indicating the number of child nodes it holds. For instance, expr? name denotes the field name has optional child node. The field with composite type (e.g. expr) can be instantiated by constructors of corresponding type, while the field with primitive type (e.g. identifier) directly stores token.

There are three kinds of ASDL grammar-based actions that can be used to generate the action sequence: 1) ApplyConstr[]. Using this action, a constructor is applied to the composite field of the parent node with the same type as , expanding the field to generate a branch ending with an AST node. Here we denote the field of the parent node as frontier field. 2) Reduce. It indicates the completion of generating branches for a field with optional or multiple cardinalities. 3) GenToken[]. It expands a primitive frontier field to generate a token .

Obviously, a constructor with multiple fields can produce multiple AST branches222We also note that the field with sequential cardinality will be expanded to multiple branches. However, in this work, we do not consider this scenario, which is left as future work., of which generation order has important effect on the model performance, as previously mentioned.

Figure 2:

The reinforced training of the extended TRANX model with branch selector. We first fed the information of field and parent node into branch selector. Then, from the policy probability distribution of branch selector, we sample an order

and infer an order . Finally, we calculate the reward based on the model loss difference between and , and use the gradients to update parameters of the extended model.

2.2 Seq2Tree Model

Similar to other NLG models, TRANX is trained to minimize the following objective function:


where is the -th action, and is modeled by an attentional encoder-decoder network Yin and Neubig (2018).

For an NL description =, we use a BiLSTM encoder to learn its word-level hidden states. Likewise, the decoder is also an LSTM network. Formally, at the timestep , the temporary hidden state is updated as


where is the embedding of the previous action , is the previous decoder hidden state, and

is a concatenated vector involving the embedding of the frontier field and the decoder hidden state for the parent node. Furthermore, the decoder hidden state

is defined as


where is the context vector produced from the encoder hidden states and is a parameter matrix.

Here, we calculate the probability of action

according to the type of its frontier field:

  • Composite. We adopt an ApplyConstr action to expand the field or a Reduce action to complete the field.333Reduce action can be considered as a special ApplyConstr action The probability of using ApplyConstr[] is defined as follows:


    where denotes the embedding of the constructor .

  • Primitive. We apply a GenToken action to produce a token , which is either generated from the vocabulary or copied from the input NL description. Formally, the probability of using GenToken[] can be decomposed into two parts:


    where is modeled as .

Please note that our proposed dynamic selection of branch expansion orders does not affect other aspects of the model.

3 Dynamic Selection of Branch Expansion Orders

In this section, we extend the conventional Seq2Tree model with a context-based branch selector, which dynamically determines optimal expansion orders of branches for multi-branch AST nodes. In the following subsections, we first illustrate the elaborately-designed branch selector module and then introduce how to train the extended Seq2Tree model via reinforcement learning in detail.

3.1 Branch Selector

As described in Section 2.2, the action prediction at each timestep is mainly affected by its previous action, frontier field and the action of its parent node. Thus, it is reasonable to construct the branch selector determining optimal expansion orders of branches according to these three kinds of information.

Specifically, given a multi-branch node at timestep , where the ASDL grammar of action contains fields , we feed the branch selector with three vectors: 1) : the embedding of field , 2) : the embedding of action , 3) : the decoder hidden state, and then calculate the priority score of expanding fields as follows:


where and are learnable parameters.444We omit the bias term for clarity.

Afterwards, we normalize priority scores of expanding all fields into a probability distribution:


Based on the above probability distribution, we can sample times to form a branch expansion order , of which the policy probability is computed as


It is notable that during the sampling of , we mask previously sampled fields to ensure that duplicate fields will not be sampled.

3.2 Training with Reinforcement Learning

During the generation of ASTs, with the above context-based branch selector, we deal with multi-branch nodes according to the dynamically determined order instead of the standard left-to-right order. However, the non-differentiability of multi-step expansion order selection and how to determine the optimal expansion order lead to challenges for the model training. To deal with these issues, we introduce reinforcement learning to train the extended Seq2Tree model in an end-to-end way.

Concretely, we first pre-train a conventional Seq2Tree model. Then, we employ self-critical training with a reward function that measures loss difference between different branch expansion orders to train the extended Seq2Tree model.

3.2.1 Pre-training

It is known that a well-initialized network is very important for applying reinforcement learning Kang et al. (2020). In this work, we require the model to automatically quantify effects of different branch expansion orders on the quality of the generated action sequences. Therefore, we expect that the model has the basic ability to generate action sequences in random order at the beginning. To do this, instead of using the pre-order traversal based action sequences, we use the randomly-organized action sequences to pre-train the Seq2Tree model.

Concretely, for each multi-branch node in an AST, we sample a branch expansion order from a uniform distribution, and then reorganize the corresponding actions according to the sampled order. We conduct the same operations to all multi-branch nodes of the AST, forming a new training instance. Finally, we use the regenerated training instances to pre-train our model.

In this way, the pre-trained Seq2Tree model acquires the preliminary capability to make predictions in any order.

3.2.2 Self-Critical Training

With the above initialized parameters, we then perform self-critical training Rennie et al. (2017); Kang et al. (2020) to update the Seq2Tree model with branch selector.

Specifically, we train the extended Seq2Tree model by combining the MLE objective and RL objective together. Formally, given the training instance , we first apply the sampling method described in section 3.1 to all multi-branch nodes, reorganizing the initial action sequence to form a new action sequence , and then define the model training objective as


where denotes the conventional training objective defined in Equation 1, is the negative expected reward of branch expansion order for the multi-branch node , is a balancing hyper-parameter, denotes the set of multi-branch nodes in the training instance, and denotes the parameter set of our enhanced model.

More specifically, is defined as


where we approximate the expected reward with the loss of an order sampled from the policy .

Inspired by successful applications of self-critical training in previous studies Rennie et al. (2017); Kang et al. (2020), we propose the reward to accurately measure the effect of any order on the model performance. As shown in Figure 2, we calculate the reward using two expansion orders of branches: one is sampled from the policy , and the other is inferred from the policy with the maximal generation probability:


Please note that we extend the standard reward function by setting a threshold to clip the reward, which can prevent the network from being over-confident in current expansion order of branches.

Finally, we apply the REINFORCE algorithm Williams (1992) to compute the gradient:

Acc. Acc. Acc. BLEU / Acc.
COARSE2FINE Dong and Lapata (2018) 87.7 88.2
TRANX Yin and Neubig (2019) 77.3 0.4 87.6 0.1 88.8 1.0 24.35 0.4 / 2.5 0.7
TREEGEN Sun et al. (2020) 88.1 0.6
TRANX 77.2 0.6 87.6 0.4 88.8 1.0 24.38 0.5 / 2.2 0.5
TRANX (w/ pre-train) 77.5 0.4 87.8 0.7 88.41.1 24.57 0.5 / 1.4 0.3
TRANX-R2L 75.9 0.8 87.5 0.9 86.4 1.0 24.88 0.5 / 2.4 0.5
TRANX-RAND 74.6 1.1 86.4 1.4 81.7 1.8 19.73 1.1 / 1.6 0.6
TRANX-RL (w/o pre-train) 76.3 0.7 87.2 0.8 87.1 1.6 23.38 0.8 / 2.1 0.2
TRANX-RL 77.9 0.5 89.1 0.5 89.5 1.2 25.47 0.7 / 2.6 0.4
Table 2:

The performance of our model in comparison with various baselines. We report the mean performance and standard deviation over five random runs.

indicates the scores are previously reported ones. Note that we only report the result of TREEGEN on ATIS, since it is the only dataset with released code for preprocessing.

4 Experiments

To investigate the effectiveness and generalizability of our model, we carry out experiments on several commonly-used datasets.

4.1 Datasets

Following previous studies Yin and Neubig (2018, 2019); Xu et al. (2020), we use the following four datasets:

  • DJANGO Oda et al. (2015). This dataset totally contains 18,805 lines of Python source code, which are extracted from the Django Web framework, and each line is paired with an NL description.

  • ATIS. This dataset is a set of 5,410 inquiries of flight information, where the input of each example is an NL description and its corresponding output is a short piece of code in lambda calculus.

  • GEO. It is a collection of 880 U.S. geographical questions, with meaning representations defined in lambda logical forms like ATIS.

  • CONALA Yin et al. (2018). It totally consists of 2,879 examples of manually annotated NL questions and their Python solutions on STACK OVERFLOW. Compared with DJANGO, the examples of CONALA cover real-world NL queries issued by programmers with diverse intents, and are significantly more difficult due to its broad coverage and high compositionality of target meaning representations.

4.2 Baselines

To facilitate the descriptions of experimental results, we refer to the enhanced TRANX model as TRANX-RL. In addition to TRANX, we compare our enhanced model with several competitive models:

  • TRANX (w/ pre-train). It is an enhanced version of TRANX with pre-training. We compare with it because our model involves a pre-training stage.

  • COARSE2FINE Dong and Lapata (2018). This model adopts a two-stage decoding strategy to produce the action sequence. It first generates a rough sketch of its meaning, and then fills in missing detail.

  • TREEGEN Sun et al. (2020). It introduces the attention mechanism of Transformer Vaswani et al. (2017), and a novel AST reader to incorporate grammar and AST structures into the network.

  • TRANX-R2L. It is a variant of the conventional TRANX model, which deals with multi-branch AST nodes in a right-to-left manner.

  • TRANX-RAND. It is also a variant of the conventional TRANX model dealing with multi-branch AST nodes in a random order.

  • TRANX-RL (w/o pre-train). In this variant of TRANX-RL, we train our model from scratch. By doing so, we can discuss the effect of pre-training on our model training.

To ensure fair comparisons, we use the same experimental setup as TRANX Yin and Neubig (2018)

. Concretely, the sizes of action embedding, field embedding and hidden states are set to 128, 128 and 256, respectively. For decoding, the beam sizes for GEO, ATIS, DJANGO and CONALA are 5, 5, 15 and 15, respectively. We pre-train models in 10 epochs for all datasets. we determine the

s as 1.0 according to the model performance on validation sets. As in previous studies Alvarez-Melis and Jaakkola (2017); Yin and Neubig (2018, 2019)

, we use the exact matching accuracy (Acc) as the evaluation metric for all datasets. For CONALA, we use the corpus-level BLEU

Yin et al. (2018) as a complementary metric.

Acc. Acc. Acc. Acc.
TRANX 77.260.8 94.020.8 89.750.8 25.190.6
TRANX-R2L 76.881.0 93.800.3 89.281.1 24.740.7
TRANX-RL 78.980.9 94.870.5 90.640.9 26.900.6
Table 3: Performance of our model in predicting actions for child nodes of multi-branch nodes.

4.3 Main Results

Table 2 reports the main experimental results. Overall, our enhanced model outperforms baselines across all datasets. Moreover, we can draw the following conclusions:

First, our reimplemented TRANX model achieves comparable performance to previously reported results Yin and Neubig (2019) (TRANX). Therefore, we confirm that our reimplemented TRANX model are convincing.

Second, compared with TRANX, TRANX-R2L and TRANX-RAND, our TRANX-RL exhibits better performance. This result demonstrates the advantage of dynamically determining branch expansion orders on dealing with multi-branch AST nodes.

Third, the TRANX model with pre-training does not gain a better performance. In contrast, removing the model pre-training leads to the performance degradation of our TRANX-RL model. This result is consistent with the conclusion of previous studies Wang et al. (2018); Kang et al. (2020) that the pre-training is very important for the applying reinforcement learning.

0 88.37 93.02 90.11
1 100 100 100
2 100 100 100
3 78.94 81.57 89.47
4 96.93 96.93 96.93
5 95.65 95.23 95.65
6 78.75 75.00 80.63
Table 4: Accuracy on different data groups of ATIS according to the number of multi-branch nodes.
0 98.30 91.52 97.67
1 90.00 90.00 90.00
2 85.50 84.70 86.17
3 66.66 63.60 67.81
4 54.16 48.33 57.50
5 28.88 26.66 28.88
6 12.35 12.35 12.35
Table 5: Accuracy on different data groups of DJANGO according to the number of multi-branch nodes.

4.4 Effects of the Number of Multi-branch Nodes

As implemented in related studies on other NLG tasks, such as machine translation Bahdanau et al. (2015), we individually split two relatively large datasets (DJANGO and ATIS) into different groups according to the number of multi-branch AST nodes, and report the performance of various models on these groups of datasets.

Tables 4 and 5 show the experimental results. On most groups, TRANX-RL achieves better or equal performance than other models. Therefore, we confirm that our model is general to datasets with different numbers of multi-branch nodes.

4.5 Accuracy of Action Predictions for the Child Nodes

Given a multi-branch node, its child nodes have an important influence in the subtree. Therefore, we focus on the accuracy of action predictions for the child nodes.

For fair comparison, we predict actions with previous ground-truth history actions as inputs. Table 3 reports the experimental results. We observe that TRANX-RL still achieves higher prediction accuracy than other baselines on most groups, which proves the effectiveness of our model again.

(a) The first example.
(b) The second example.
Figure 3: Two DJANGO examples produced by different models.

4.6 Case Study

Figure 3 shows two examples from DJANGO. In the first example, TRANX first generates the leftmost child node at the timestep , incorrectly predicting GenToken[‘gzip’] as Reduce action. By contrast, TRANX-RL puts this child node in the last position and successfully predict its action, since our model benefits from the previously generated token ‘GzipFile’ of the sibling node, which frequently occurs with ‘gzip’.

In the second example, TRANX incorrectly predicts the second child node at the -th timestep, while TRANX-RL firstly predicts it at the timestep . We think this error results from the sequentially generated nodes and the errors in early timesteps would accumulatively harm the predictions of later sibling nodes. By comparison, our model can flexibly generate subtrees with shorter lengths, alleviating error accumulation.

5 Related Work

With the prosperity of deep learning, researchers introduce neural networks into code generation. In this aspect,

Ling et al. (2016) first explore a Seq2Seq model for code generation. Then, due to the advantage of tree structure, many attempts resort to Seq2Tree models, which represent codes as trees of meaning representations Dong and Lapata (2016); Alvarez-Melis and Jaakkola (2017); Rabinovich et al. (2017); Yin and Neubig (2017, 2018); Sun et al. (2019, 2020).

Typically, Yin and Neubig (2018) propose TRANX, which introduces ASTs as intermediate representations of codes and has become the most influential Seq2Tree model. Then, Sun et al. (2019, 2020) respectively explore CNN and Transformer architectures to model code generation. Unlike these work, Shin et al. (2019) present a Seq2Tree model to generate program fragments or tokens interchangeably at each generation step. From another perspective, Xu et al. (2020) exploit external knowledge to enhance neural code generation model. Generally, all these Seq2Tree models generate ASTs in pre-order traversal, which, however, is not suitable to handle all multi-branch AST nodes. Different from the above studies that deal with multi-branch nodes in left-to-right order, our model determines the optimal expansion orders of branches for multi-branch nodes.

Some researchers have also noticed that the selection of decoding order has an important impact on the performance of neural code generation models. For example, Alvarez-Melis and Jaakkola (2017) introduce a doubly RNN model that combines width and depth recurrences to traverse each node. Dong and Lapata (2018) firstly generate a rough code sketch, and then fill in missing details by considering the input NL description and the sketch. Gu et al. (2019a) present an insertion-based Seq2Seq model that can flexibly generate a sequence in an arbitrary order. In general, these researches still deal with multi-branch AST nodes in a left-to-right manner. Thus, these models are theoretically compatible with our proposed branch selector.

Finally, it should be noted that have been many NLP studies on exploring other decoding methods to improve other NLG tasks Zhang et al. (2018); Su et al. (2019); Zhang et al. (2019); Welleck et al. (2019); Stern et al. (2019); Gu et al. (2019a, b). However, to the best of our knowledge, our work is the first attempt to explore dynamic selection of branch expansion orders for tree-structured decoding.

6 Conclusion and Future Work

In this work, we first point out that the generation of domainant Seq2Tree models based on pre-order traversal is not optimal for handling all multi-branch nodes. Then we propose an extended Seq2Tree model equipped with a context-based branch selector, which is capable of dynamically determining optimal branch expansion orders for multi-branch nodes. Particularly, we adopt reinforcement learning to train the whole model with an elaborate reward that measures the model loss difference between different branch expansion orders. Extensive experiment results and in-depth analyses demonstrate the effectiveness and generality of our proposed model on several commonly-used datasets.

In the future, we will study how to extend our branch selector to deal with indefinite branches caused by sequential field.


The project was supported by National Key Research and Development Program of China (Grant No. 2020AAA0108004), National Natural Science Foundation of China (Grant No. 61672440), Natural Science Foundation of Fujian Province of China (Grant No. 2020J06001), Youth Innovation Fund of Xiamen (Grant No. 3502Z20206059), and the Fundamental Research Funds for the Central Universities (Grant No. ZK20720200077). We also thank the reviewers for their insightful comments.