A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

08/17/2023
by   Avinash Patil, et al.
0

Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.

READ FULL TEXT

page 5

page 6

research
12/20/2022

Towards Understanding the Impacts of Textual Dissimilarity on Duplicate Bug Report Detection

About 40 major overhead during software maintenance. Traditional techniq...
research
09/08/2021

BLESER: Bug Localization Based on Enhanced Semantic Retrieval

Static bug localization techniques that locate bugs at method granularit...
research
04/09/2018

Using Categorical Features in Mining Bug Tracking Systems to Assign Bug Reports

Most bug assignment approaches utilize text classification and informati...
research
04/24/2023

Answering Follow-up Questions on Bug Reports with Structured Information Retrieval and Deep Learning

Software bug reports reported on bug-tracking systems often lack crucial...
research
03/18/2021

S3M: Siamese Stack (Trace) Similarity Measure

Automatic crash reporting systems have become a de-facto standard in sof...
research
10/29/2018

SMT-Based Refutation of Spurious Bug Reports in the Clang Static Analyzer

We describe and evaluate a bug refutation extension for the Clang Static...
research
05/16/2019

Better Security Bug Report Classification via Hyperparameter Optimization

When security bugs are detected, they should be (a) discussed privately ...

Please sign up or login with your details

Forgot password? Click here to reset