Large Language Models in Fault Localisation

08/29/2023
by   Yonghao Wu, et al.
0

Large Language Models (LLMs) have shown promise in multiple software engineering tasks including code generation, code summarisation, test generation and code repair. Fault localisation is essential for facilitating automatic program debugging and repair, and is demonstrated as a highlight at ChatGPT-4's launch event. Nevertheless, there has been little work understanding LLMs' capabilities for fault localisation in large-scale open-source programs. To fill this gap, this paper presents an in-depth investigation into the capability of ChatGPT-3.5 and ChatGPT-4, the two state-of-the-art LLMs, on fault localisation. Using the widely-adopted Defects4J dataset, we compare the two LLMs with the existing fault localisation techniques. We also investigate the stability and explanation of LLMs in fault localisation, as well as how prompt engineering and the length of code context affect the fault localisation effectiveness. Our findings demonstrate that within a limited code context, ChatGPT-4 outperforms all the existing fault localisation methods. Additional error logs can further improve ChatGPT models' localisation accuracy and stability, with an average 46.9 the state-of-the-art baseline SmartFL in terms of TOP-1 metric. However, performance declines dramatically when the code context expands to the class-level, with ChatGPT models' effectiveness becoming inferior to the existing methods overall. Additionally, we observe that ChatGPT's explainability is unsatisfactory, with an accuracy rate of only approximately 30 fault localisation performance under certain conditions, evident limitations exist. Further research is imperative to fully harness the potential of LLMs like ChatGPT for practical fault localisation applications.

READ FULL TEXT
research
08/10/2023

A Preliminary Evaluation of LLM-Based Fault Localization

Large Language Models (LLMs) have shown a surprising level of performanc...
research
04/17/2023

A study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repair

ChatGPT has revolutionized many research and industrial fields. ChatGPT ...
research
05/21/2022

Improving automatically generated code from Codex via Automated Program Repair

Large language models, e.g., Codex and AlphaCode, have shown capability ...
research
04/28/2023

Does Code Smell Frequency Have a Relationship with Fault-proneness?

Fault-proneness is an indication of programming errors that decreases so...
research
07/19/2023

Code Detection for Hardware Acceleration Using Large Language Models

Large language models (LLMs) have been massively applied to many tasks, ...
research
06/04/2022

Fault-Aware Neural Code Rankers

Large language models (LLMs) have demonstrated an impressive ability to ...
research
09/14/2023

An Assessment of ChatGPT on Log Data

Recent development of large language models (LLMs), such as ChatGPT has ...

Please sign up or login with your details

Forgot password? Click here to reset