DeepAI AI Chat
Log In Sign Up

Lyra: A Benchmark for Turducken-Style Code Generation

08/27/2021
by   Qingyuan Liang, et al.
Peking University
0

Code generation is crucial to reduce manual software development efforts. Recently, neural techniques have been used to generate source code automatically. While promising, these approaches are evaluated on tasks for generating code in single programming languages. However, in actual development, one programming language is often embedded in another. For example, SQL statements are often embedded as strings in base programming languages such as Python and Java, and JavaScript programs are often embedded in sever-side programming languages, such as PHP, Java, and Python. We call this a turducken-style programming. In this paper, we define a new code generation task: given a natural language comment, this task aims to generate a program in a base language with an embedded language. To our knowledge, this is the first turducken-style code generation task. For this task, we present Lyra: a dataset in Python with embedded SQL. This dataset contains 2,000 carefully annotated database manipulation programs from real usage projects. Each program is paired with both a Chinese comment and an English comment. In our experiment, we adopted Transformer, a state-of-the-art technique, as the baseline. In the best setting, Transformer achieves 0.5 matching accuracy using Chinese and English comments, respectively. Therefore, we believe that Lyra provides a new challenge for code generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/09/2021

How to Identify Class Comment Types? A Multi-language Approach for Class Comment Classification

Most software maintenance and evolution tasks require developers to unde...
02/17/2023

PAC Prediction Sets for Large Language Models of Code

Prediction sets have recently been shown to be a promising strategy for ...
03/16/2022

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

While there has been a recent burgeoning of applications at the intersec...
09/06/2022

Automatic Code Documentation Generation Using GPT-3

Source code documentation is an important artifact for efficient softwar...
09/20/2019

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

Semantic code search is the task of retrieving relevant code given a nat...
05/25/2017

Data-Driven Program Completion

We introduce program splicing, a programming methodology that aims to au...
10/07/2020

PyMT5: multi-mode translation of natural language and Python code with transformers

Simultaneously modeling source code and natural language has many exciti...

Code Repositories

Lyra

Lyra: A Benchmark for Turducken-Style Code Generation


view repo