Duplicate Bug Report Detection: How Far Are We?

12/01/2022
by   Ting Zhang, et al.
0

Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.

READ FULL TEXT
research
07/20/2018

Poster: Improving Bug Localization with Report Quality Dynamics and Query Reformulation

Recent findings from a user study suggest that IR-based bug localization...
research
04/22/2021

An Extensive Study on Smell-Aware Bug Localization

Bug localization is an important aspect of software maintenance because ...
research
12/13/2022

Auto-labelling of Bug Report using Natural Language Processing

The exercise of detecting similar bug reports in bug tracking systems is...
research
05/09/2023

RLocator: Reinforcement Learning for Bug Localization

Software developers spend a significant portion of time fixing bugs in t...
research
04/24/2023

Answering Follow-up Questions on Bug Reports with Structured Information Retrieval and Deep Learning

Software bug reports reported on bug-tracking systems often lack crucial...
research
09/01/2022

Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Second Replication Study (GPT2SP Replication Report)

Fu and Tantithamthavorn have recently proposed GPT2SP, a Transformer-bas...
research
04/14/2021

The Surprising Performance of Simple Baselines for Misinformation Detection

As social media becomes increasingly prominent in our day to day lives, ...

Please sign up or login with your details

Forgot password? Click here to reset