A Simple, Yet Effective Approach to Finding Biases in Code Generation

10/31/2022
by   Spyridon Mouselinos, et al.
0

Recently, scores of high-performing code generation systems have surfaced. As has become a popular choice in many domains, code generation is often approached using large language models as a core, trained under the masked or causal language modeling schema. This work shows that current code generation systems exhibit biases inherited from large language model backbones, which might leak into generated code under specific circumstances. To investigate the effect, we propose a framework that automatically removes hints and exposes various biases that these code generation models use. We apply our framework to three coding challenges and test it across top-performing coding generation models. Our experiments reveal biases towards specific prompt structure and exploitation of keywords during code generation. Finally, we demonstrate how to use our framework as a data transformation technique, which we find a promising direction toward more robust code generation.

READ FULL TEXT

page 2

page 5

page 9

page 14

page 15

page 16

page 20

page 21

research
05/24/2023

Uncovering and Quantifying Social Biases in Code Generation

With the popularity of automatic code generation tools, such as Copilot,...
research
05/23/2022

Challenges in Measuring Bias via Open-Ended Language Generation

Researchers have devised numerous ways to quantify social biases vested ...
research
09/14/2023

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

The increasing popularity of large language models (LLMs) has paved the ...
research
12/06/2022

Codex Hacks HackerRank: Memorization Issues and a Framework for Code Synthesis Evaluation

The Codex model has demonstrated extraordinary competence in synthesizin...
research
04/30/2022

Detoxifying Language Models with a Toxic Corpus

Existing studies have investigated the tendency of autoregressive langua...
research
05/10/2021

Societal Biases in Language Generation: Progress and Challenges

Technology for language generation has advanced rapidly, spurred by adva...
research
10/16/2021

Invariant Language Modeling

Modern pretrained language models are critical components of NLP pipelin...

Please sign up or login with your details

Forgot password? Click here to reset