GAP-Gen: Guided Automatic Python Code Generation

01/19/2022
by   Junchen Zhao, et al.
0

Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, an automatic code generation method guided by Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining the crucial syn-tactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently throughout the code. In our work, rather than pre-training, we focus on modifying the fine-tuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer-based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest, and Code-Docstring-Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2023

A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

Due to the development of pre-trained language models, automated code ge...
research
02/14/2022

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to t...
research
03/14/2021

Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting

Automatic code summarization frees software developers from the heavy bu...
research
04/28/2023

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

For a complicated algorithm, its implementation by a human programmer us...
research
10/07/2020

PyMT5: multi-mode translation of natural language and Python code with transformers

Simultaneously modeling source code and natural language has many exciti...
research
07/17/2019

Syntax and Stack Overflow: A methodology for extracting a corpus of syntax errors and fixes

One problem when studying how to find and fix syntax errors is how to ge...
research
12/20/2022

ReCode: Robustness Evaluation of Code Generation Models

Code generation models have achieved impressive performance. However, th...

Please sign up or login with your details

Forgot password? Click here to reset