Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

04/20/2020
by   Frank F. Xu, et al.
0

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2 code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2017

A Syntactic Neural Model for General-Purpose Code Generation

We consider the problem of parsing natural language descriptions into so...
research
07/12/2023

Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study

Automated logging statement generation techniques facilitate developers ...
research
08/29/2023

AskIt: Unified Programming Interface for Programming with Large Language Models

In the evolving landscape of software development, Large Language Models...
research
03/28/2022

Does Coding in Pythonic Zen Peak Performance? Preliminary Experiments of Nine Pythonic Idioms at Scale

In the field of data science, and for academics in general, the Python p...
research
12/06/2020

NaturalCC: A Toolkit to Naturalize the Source Code Corpus

We present NaturalCC, an efficient and extensible toolkit to bridge the ...
research
06/10/2021

AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation

Natural Language Generation (NLG) is a key component in a task-oriented ...
research
05/29/2021

CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model

Commit message is a document that summarizes source code changes in natu...

Please sign up or login with your details

Forgot password? Click here to reset