BLESER: Bug Localization Based on Enhanced Semantic Retrieval

09/08/2021
by   Weiqin Zou, et al.
0

Static bug localization techniques that locate bugs at method granularity have gained much attention from both researchers and practitioners. For a static method-level bug localization technique, a key but challenging step is to fully retrieve the semantics of methods and bug reports. Currently, existing studies mainly use the same bag-of-word space to represent the semantics of methods and bug reports without considering structure information of methods and textual contexts of bug reports, which largely and negatively affects bug localization performance. To address this problem, we develop BLESER, a new bug localization technique based on enhanced semantic retrieval. Specifically, we use an AST-based code embedding model (capturing code structure better) to retrieve the semantics of methods, and word embedding models (capturing textual contexts better) to represent the semantics of bug reports. Then, a deep learning model is built on the enhanced semantic representations. During model building, we compare five typical word embedding models in representing bug reports and try to explore the usefulness of re-sampling strategies and cost-sensitive strategies in handling class imbalance problems. We evaluate our BLESER on five Java projects from the Defects4J dataset. We find that: (1) On the whole, the word embedding model ELMo outperformed the other four models (including word2vec, BERT, etc.) in facilitating bug localization techniques. (2) Among four strategies aiming at solving class imbalance problems, the strategy ROS (random over-sampling) performed much better than the other three strategies (including random under-sampling, Focal Loss, etc.). (3) By integrating ELMo and ROS into BLESER, at method-level bug localization, we could achieve MAP of 0.108-0.504, MRR of 0.134-0.510, and Accuracy@1 of 0.125-0.5 on five Defects4J projects.

READ FULL TEXT
research
08/17/2023

A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

Bug reports are an essential aspect of software development, and it is c...
research
04/19/2020

BuGL – A Cross-Language Dataset for Bug Localization

Bug Localization is the process of locating potential error-prone files ...
research
03/19/2021

Locating Faulty Methods with a Mixed RNN and Attention Model

IR-based fault localization approaches achieves promising results when l...
research
05/25/2023

Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization

Modern Deep Learning (DL) architectures based on transformers (e.g., BER...
research
08/24/2023

Pre-training Code Representation with Semantic Flow Graph for Effective Bug Localization

Enlightened by the big success of pre-training in natural language proce...
research
01/14/2021

GloBug: Using Global Data in Fault Localization

Fault Localization (FL) is an important first step in software debugging...
research
10/29/2018

SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer

We describe and evaluate a bug refutation extension for the Clang Static...

Please sign up or login with your details

Forgot password? Click here to reset