A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

03/09/2023
by   Guang Yang, et al.
0

Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code is difficult to meet the syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize the lack of syntactic constraints into three significant challenges: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.

READ FULL TEXT
research
05/18/2020

Syntax-guided Controlled Generation of Paraphrases

Given a sentence (e.g., "I like mangoes") and a constraint (e.g., sentim...
research
01/19/2022

GAP-Gen: Guided Automatic Python Code Generation

Automatic code generation from natural language descriptions can be high...
research
09/24/2021

SEED: Semantic Graph based Deep detection for type-4 clone

Background: Type-4 clones refer to a pair of code snippets with similar ...
research
12/29/2020

Multi-task Learning based Pre-trained Language Model for Code Completion

Code completion is one of the most useful features in the Integrated Dev...
research
09/16/2019

A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning

Code completion, one of the most useful features in the integrated devel...
research
08/22/2022

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

The phenomenon of compounding is ubiquitous in Sanskrit. It serves for a...
research
05/09/2023

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation

Detecting parallelizable code regions is a challenging task, even for ex...

Please sign up or login with your details

Forgot password? Click here to reset