Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

08/22/2022
by   Sijie Shen, et al.
0

Code generation aims to generate a code snippet automatically from natural language descriptions. Generally, the mainstream code generation methods rely on a large amount of paired training data, including both the natural language description and the code. However, in some domain-specific scenarios, building such a large paired corpus for code generation is difficult because there is no directly available pairing data, and a lot of effort is required to manually write the code descriptions to construct a high-quality training dataset. Due to the limited training data, the generation model cannot be well trained and is likely to be overfitting, making the model's performance unsatisfactory for real-world use. To this end, in this paper, we propose a task augmentation method that incorporates domain knowledge into code generation models through auxiliary tasks and a Subtoken-TranX model by extending the original TranX model to support subtoken-level code generation. To verify our proposed approach, we collect a real-world code generation dataset and conduct experiments on it. Our experimental results demonstrate that the subtoken-level TranX model outperforms the original TranX model and the Transformer model on our dataset, and the exact match accuracy of Subtoken-TranX improves significantly by 12.75 model performance on several code categories has satisfied the requirements for application in industrial systems. Our proposed approach has been adopted by Alibaba's BizCook platform. To the best of our knowledge, this is the first domain code generation system adopted in industrial development environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2020

Neural Code Search Revisited: Enhancing Code Snippet Retrieval through Natural Language Intent

In this work, we propose and study annotated code search: the retrieval ...
research
05/25/2023

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Many fine-grained classification tasks, like rare animal identification,...
research
07/07/2017

A parallel corpus of Python functions and documentation strings for automated code documentation and code generation

Automated documentation of programming source code and automated code ge...
research
06/22/2021

On Adversarial Robustness of Synthetic Code Generation

Automatic code synthesis from natural language descriptions is a challen...
research
06/14/2022

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

Code generation is a longstanding challenge, aiming to generate a code s...
research
09/05/2023

Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data

Existing approaches to automatic data transformation are insufficient to...
research
10/05/2019

JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation

Interactive programming with interleaved code snippet cells and natural ...

Please sign up or login with your details

Forgot password? Click here to reset