Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study

07/12/2023
by   Yichen Li, et al.
0

Automated logging statement generation techniques facilitate developers in writing appropriate logging statements that document software behaviors. Current retrieval-based and learning-based logging methods fail to provide accurate logging statements in complex software. Although existing large language models (LLMs) might be a good fit for the task due to their great success in natural language generation and programming language comprehension, their effectiveness and generalization capabilities have not been explored. To this end, this paper performs the first extensive study on applying LLMs for logging statement generation. We build LogBench, the first logging statement generation dataset. On LogBench, we evaluate the effectiveness and generalization capabilities of eight state-of-the-art LLMs, which include general-purpose and code-specific models ranging from 60M to 175B in size. Specifically, we evaluate LLM's logging effectiveness by studying 1) their ability to decide logging ingredients, 2) the impact of the internal characteristics of LLMs, and 3) the influence of external factors. We further evaluate LLM's logging generalization capabilities using unseen data derived from code transformation techniques. Our study demonstrates that existing LLMs fall short of practical requirements for generating proper logging statement texts. We also disclose the impact of internal characteristics and external factors for LLMs in automated logging. In addition, we observe that existing LLMs cannot generalize to logging unseen code, revealing their unsatisfactory generalization capabilities. Based on our findings, we further discuss three implications that can enhance logging statement generation in the future, such as developing a unified metric for logging quality, incorporating shareable code knowledge into LLMs, and devising suitable prompts.

READ FULL TEXT

page 1

page 6

page 7

research
04/20/2020

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Open-domain code generation aims to generate code in a general-purpose p...
research
08/09/2023

Adaptive Intellect Unleashed: The Feasibility of Knowledge Transfer in Large Language Models

We conduct the first empirical study on using knowledge transfer to impr...
research
05/02/2023

Automated Code generation for Information Technology Tasks in YAML through Large Language Models

The recent improvement in code generation capabilities due to the use of...
research
05/24/2023

SAGA: Summarization-Guided Assert Statement Generation

Generating meaningful assert statements is one of the key challenges in ...
research
12/09/2022

Automatically Generating CS Learning Materials with Large Language Models

Recent breakthroughs in Large Language Models (LLMs), such as GPT-3 and ...
research
05/15/2023

Improving ChatGPT Prompt for Code Generation

Automated code generation can be a powerful technique for software devel...
research
01/27/2021

In-IDE Code Generation from Natural Language: Promise and Challenges

A great part of software development involves conceptualizing or communi...

Please sign up or login with your details

Forgot password? Click here to reset