Asking Clarification Questions for Code Generation in General-Purpose Programming Language

12/19/2022
by   Haau-Sing Li, et al.
0

Code generation from text requires understanding the user's intent from a natural language description (NLD) and generating an executable program code snippet that satisfies this intent. While recent pretrained language models (PLMs) demonstrate remarkable performance for this task, these models fail when the given NLD is ambiguous due to the lack of enough specifications for generating a high-quality code snippet. In this work, we introduce a novel and more realistic setup for this task. We hypothesize that ambiguities in the specifications of an NLD are resolved by asking clarification questions (CQs). Therefore, we collect and introduce a new dataset named CodeClarQA containing NLD-Code pairs with created CQAs. We evaluate the performance of PLMs for code generation on our dataset. The empirical results support our hypothesis that clarifications result in more precise generated code, as shown by an improvement of 17.52 in BLEU, 12.72 in CodeBLEU, and 7.7% in the exact match. Alongside this, our task and dataset introduce new challenges to the community, including when and what CQs should be asked.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

DocCoder: Generating Code by Retrieving and Reading Docs

Natural-language-to-code models learn to generate a code snippet given a...
research
05/02/2023

Automated Code generation for Information Technology Tasks in YAML through Large Language Models

The recent improvement in code generation capabilities due to the use of...
research
12/08/2020

Edited Media Understanding: Reasoning About Implications of Manipulated Images

Multimodal disinformation, from `deepfakes' to simple edits that deceive...
research
08/25/2023

Does Asking Clarifying Questions Increases Confidence in Generated Code? On the Communication Skills of Large Language Models

Large language models (LLMs) have significantly improved the ability to ...
research
05/24/2023

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

In this work we examine the ability of language models to generate expli...
research
08/11/2022

Interactive Code Generation via Test-Driven User-Intent Formalization

Pre-trained large language models (LLMs) such as OpenAI Codex have shown...
research
05/24/2023

Who Wrote this Code? Watermarking for Code Generation

Large language models for code have recently shown remarkable performanc...

Please sign up or login with your details

Forgot password? Click here to reset