Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study

05/04/2023
by   Sajjad Rahmani, et al.
0

The study of code example recommendation has been conducted extensively in the past and recently in order to assist developers in their software development tasks. This is because developers often spend significant time searching for relevant code examples on the internet, utilizing open-source projects and informal documentation. For finding useful code examples, informal documentation, such as Stack Overflow discussions and forums, can be invaluable. We have focused our research on Stack Overflow, which is a popular resource for discussing different topics among software developers. For increasing the quality of the recommended code examples, we have collected and recommended the best code examples in the Java programming language. We have utilized BERT in our approach, which is a Large Language Model (LLM) for text representation that can effectively extract semantic information from textual data. Our first step involved using BERT to convert code examples into numerical vectors. Subsequently, we applied LSH to identify Approximate Nearest Neighbors (ANN). Our research involved the implementation of two variants of this approach, namely the Random Hyperplane-based LSH and the Query-Aware LSH. Our study compared two algorithms using four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. The results of our analysis revealed that the Query- Aware (QA) approach outperformed the Random Hyperplane-based (RH) approach in terms of HitRate. Specifically, the QA approach achieved a HitRate improvement of 20 to the RH approach. Creating hashing tables and assigning data samples to buckets using the QA approach is at least four times faster than the RH approach. The QA approach returns code examples within milliseconds, while it takes several seconds (sec) for the RH approach to recommend code examples.

READ FULL TEXT
research
10/28/2022

I Know What You Are Searching For: Code Snippet Recommendation from Stack Overflow Posts

Stack Overflow has been heavily used by software developers to seek prog...
research
04/14/2022

Recommending Code Improvements Based on Stack Overflow Answer Edits

Background: Sub-optimal code is prevalent in software systems. Developer...
research
06/20/2018

Toxic Code Snippets on Stack Overflow

Online code clones are code fragments that are copied from software proj...
research
08/12/2022

Towards Code Summarization of APIs Using NLP Techniques

Each programming language comes with official documentation to guide dev...
research
07/06/2018

On the Use of Context in Recommending Exception Handling Code Examples

Studies show that software developers often either misuse exception hand...
research
05/17/2021

Mining Architecture Tactics and Quality Attributes Knowledge in Stack Overflow

Context: Architecture Tactics (ATs) are architectural building blocks th...

Please sign up or login with your details

Forgot password? Click here to reset