Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

06/02/2023
by   Bonan Kou, et al.
0

Large Language Models (LLMs) have been demonstrated effective for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. To deepen our understanding, we investigate whether LLMs attend to the same parts of a natural language description as human programmers during code generation. An analysis of five LLMs on a popular benchmark, HumanEval, revealed a consistent misalignment between LLMs' and programmers' attention. Furthermore, we found that there is no correlation between the code generation accuracy of LLMs and their alignment with human programmers. Through a quantitative experiment and a user study, we confirmed that, among twelve different attention computation methods, attention computed by the perturbation-based method is most aligned with human attention and is constantly favored by human programmers. Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust.

READ FULL TEXT
research
04/07/2022

Testing the limits of natural language models for predicting human language judgments

Neural network language models can serve as computational hypotheses abo...
research
08/31/2022

How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot

Background: Recent advancements in large language models have motivated ...
research
07/25/2022

A Hazard Analysis Framework for Code Synthesis Large Language Models

Codex, a large language model (LLM) trained on a variety of codebases, e...
research
02/19/2023

On the Reliability and Explainability of Automated Code Generation Approaches

Automatic code generation, the task of generating new code snippets from...
research
06/20/2023

TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models

Large Language Models (LLMs) such as ChatGPT, have gained significant at...
research
09/12/2023

Unveiling the potential of large language models in generating semantic and cross-language clones

Semantic and Cross-language code clone generation may be useful for code...
research
05/17/2022

Measuring Alignment Bias in Neural Seq2Seq Semantic Parsers

Prior to deep learning the semantic parsing community has been intereste...

Please sign up or login with your details

Forgot password? Click here to reset