FLAG: Finding Line Anomalies (in code) with Generative AI

06/22/2023
by   Baleegh Ahmad, et al.
0

Code contains security and functional bugs. The process of identifying and localizing them is difficult and relies on human labor. In this work, we present a novel approach (FLAG) to assist human debuggers. FLAG is based on the lexical capabilities of generative AI, specifically, Large Language Models (LLMs). Here, we input a code file then extract and regenerate each line within that file for self-comparison. By comparing the original code with an LLM-generated alternative, we can flag notable differences as anomalies for further inspection, with features such as distance from comments and LLM confidence also aiding this classification. This reduces the inspection search space for the designer. Unlike other automated approaches in this area, FLAG is language-agnostic, can work on incomplete (and even non-compiling) code and requires no creation of security properties, functional tests or definition of rules. In this work, we explore the features that help LLMs in this classification and evaluate the performance of FLAG on known bugs. We use 121 benchmarks across C, Python and Verilog; with each benchmark containing a known security or functional weakness. We conduct the experiments using two state of the art LLMs in OpenAI's code-davinci-002 and gpt-3.5-turbo, but our approach may be used by other models. FLAG can identify 101 of the defects and helps reduce the search space to 12-17

READ FULL TEXT

page 4

page 8

page 10

page 12

page 18

research
02/02/2023

Fixing Hardware Security Bugs with Large Language Models

Novel AI-based code-writing Large Language Models (LLMs) such as OpenAI'...
research
03/20/2023

Large Language Models and Simple, Stupid Bugs

With the advent of powerful neural language models, AI-based systems to ...
research
04/12/2022

Finding and Analyzing Crash-Consistency Bugs in Persistent-Memory File Systems

We present a study of crash-consistency bugs in persistent-memory (PM) f...
research
09/08/2020

Predicting Defective Lines Using a Model-Agnostic Technique

Defect prediction models are proposed to help a team prioritize source c...
research
12/03/2021

Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?

Human developers can produce code with cybersecurity weaknesses. Can eme...
research
07/15/2019

DeepRace: Finding Data Race Bugs via Deep Learning

With the proliferation of multi-core hardware, parallel programs have be...

Please sign up or login with your details

Forgot password? Click here to reset