The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models

08/01/2023
by   Haonan Li, et al.
0

Static analysis is a widely used technique in software engineering for identifying and mitigating bugs. However, a significant hurdle lies in achieving a delicate balance between precision and scalability. Large Language Models (LLMs) offer a promising alternative, as recent advances demonstrate remarkable capabilities in comprehending, generating, and even debugging code. Yet, the logic of bugs can be complex and require sophisticated reasoning and a large analysis scope spanning multiple functions. Therefore, at this point, LLMs are better used in an assistive role to complement static analysis. In this paper, we take a deep dive into the open space of LLM-assisted static analysis, using use-before-initialization (UBI) bugs as a case study. To this end, we develop LLift, a fully automated agent that interfaces with both a static analysis tool and an LLM. By carefully designing the agent and the prompts, we are able to overcome a number of challenges, including bug-specific modeling, the large problem scope, the non-deterministic nature of LLMs, etc. Tested in a real-world scenario analyzing nearly a thousand potential UBI bugs produced by static analysis, LLift demonstrates an extremely potent capability, showcasing a high precision (50 previously unknown UBI bugs in the Linux kernel. This research paves the way for new opportunities and methodologies in the use of LLMs for bug discovery in extensive, real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2021

Find Bugs in Static Bug Finders

Static bug finders have been widely-adopted by developers to find bugs i...
research
07/09/2023

Automatic Static Bug Detection for Machine Learning Libraries: Are We There Yet?

Automatic detection of software bugs is a critical task in software secu...
research
07/12/2019

Learning a Static Bug Finder from Data

Static analysis is an effective technique to catch bugs early when they ...
research
06/06/2023

Large Language Models of Code Fail at Completing Code with Potential Bugs

Large language models of code (Code-LLMs) have recently brought tremendo...
research
01/11/2019

Static Analysis for Asynchronous JavaScript Programs

Asynchrony has become an inherent element of JavaScript, as an effort to...
research
02/24/2022

Deploying Static Analysis

Static source code analysis is a powerful tool for finding and fixing bu...
research
09/18/2023

TOPr: Enhanced Static Code Pruning for Fast and Precise Directed Fuzzing

Directed fuzzing is a dynamic testing technique that focuses exploration...

Please sign up or login with your details

Forgot password? Click here to reset