A Static Evaluation of Code Completion by Large Language Models

06/05/2023
by   Hantian Ding, et al.
0

Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform the same evaluation on complex real-world projects considering the execution cost. On the contrary, static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. In this work, we propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees. Compared with execution-based evaluation, our method is not only more efficient, but also applicable to code in the wild. For experiments, we collect code context from open source repos to generate one million function bodies using public models. Our static analysis reveals that Undefined Name and Unused Variable are the most common errors among others made by language models. Through extensive studies, we also show the impact of sampling temperature, model size, and context on static errors in code completions.

READ FULL TEXT

page 11

page 12

page 13

page 14

research
08/20/2023

A Study on Robustness and Reliability of Large Language Model Code Generation

Recently, the large language models (LLMs) have shown extraordinary abil...
research
09/08/2023

Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation

Large Language Models (LLMs) have made progress in various real-world ta...
research
11/29/2019

Pythia: AI-assisted Code Completion System

In this paper, we propose a novel end-to-end approach for AI-assisted co...
research
05/14/2018

Functional Baby Talk: Analysis of Code Fragments from Novice Haskell Programmers

What kinds of mistakes are made by novice Haskell developers, as they le...
research
08/09/2020

Function completion in the time of massive data: A code embedding perspective

Code completion is an important feature of integrated development enviro...
research
11/17/2022

Execution-based Evaluation for Data Science Code Generation Models

Code generation models can benefit data scientists' productivity by auto...
research
06/19/2023

Guiding Language Models of Code with Global Context using Monitors

Language models of code (LMs) work well when the surrounding code in the...

Please sign up or login with your details

Forgot password? Click here to reset