CGEMs: A Metric Model for Automatic Code Generation using GPT-3

08/23/2021
by   Aishwarya Narasimhan, et al.
0

Today, AI technology is showing its strengths in almost every industry and walks of life. From text generation, text summarization, chatbots, NLP is being used widely. One such paradigm is automatic code generation. An AI could be generating anything; hence the output space is unconstrained. A self-driving car is driven for 100 million miles to validate its safety, but tests cannot be written to monitor and cover an unconstrained space. One of the solutions to validate AI-generated content is to constrain the problem and convert it from abstract to realistic, and this can be accomplished by either validating the unconstrained algorithm using theoretical proofs or by using Monte-Carlo simulation methods. In this case, we use the latter approach to test/validate a statistically significant number of samples. This hypothesis of validating the AI-generated code is the main motive of this work and to know if AI-generated code is reliable, a metric model CGEMs is proposed. This is an extremely challenging task as programs can have different logic with different naming conventions, but the metrics must capture the structure and logic of the program. This is similar to the importance grammar carries in AI-based text generation, Q A, translations, etc. The various metrics that are garnered in this work to support the evaluation of generated code are as follows: Compilation, NL description to logic conversion, number of edits needed, some of the commonly used static-code metrics and NLP metrics. These metrics are applied to 80 codes generated using OpenAI's GPT-3. Post which a Neural network is designed for binary classification (acceptable/not acceptable quality of the generated code). The inputs to this network are the values of the features obtained from the metrics. The model achieves a classification accuracy of 76.92

READ FULL TEXT

page 2

page 5

page 9

research
06/22/2021

BARTScore: Evaluating Generated Text as Text Generation

A wide variety of NLP applications, such as machine translation, summari...
research
05/23/2023

INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback

The field of automatic evaluation of text generation made tremendous pro...
research
07/08/2021

HinGE: A Dataset for Generation and Evaluation of Code-Mixed Hinglish Text

Text generation is a highly active area of research in the computational...
research
03/20/2023

Code-Switching Text Generation and Injection in Mandarin-English ASR

Code-switching speech refers to a means of expression by mixing two or m...
research
06/06/2023

Correction of Errors in Preference Ratings from Automated Metrics for Text Generation

A major challenge in the field of Text Generation is evaluation: Human e...
research
02/14/2022

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

Automatic code generation is to generate the program code according to t...
research
08/04/2021

Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text

In this shared task, we seek the participating teams to investigate the ...

Please sign up or login with your details

Forgot password? Click here to reset