Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics

07/23/2018
by   Mohammad Masudur Rahman, et al.
0

Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search. We first collect candidate API classes from Stack Overflow using pseudo-relevance feedback and two term weighting algorithms, and then rank the candidates using Borda count and semantic proximity between query keywords and the API classes. The semantic proximity has been determined by an analysis of 1.3 million questions and answers of Stack Overflow. Experiments using 310 code search queries report that our technique suggests relevant API classes with 48 recall which are 32 state-of-the-art. Comparisons with two state-of-the-art studies and three popular search engines (e.g., Google, Stack Overflow, and GitHub native search) report that our reformulated queries (1) outperform the queries of the state-of-the-art, and (2) significantly improve the code search results provided by these contemporary search engines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2018

RACK: Automatic API Recommendation using Crowdsourced Knowledge

Traditional code search engines often do not perform well with natural l...
research
09/03/2020

CoNCRA: A Convolutional Neural Network Code Retrieval Approach

Software developers routinely search for code using general-purpose sear...
research
02/01/2021

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

As a popular Q A site for programming, Stack Overflow is a treasure fo...
research
08/05/2021

Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

Developers often depend on code search engines to obtain solutions for t...
research
07/07/2016

Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising

Sponsored search represents a major source of revenue for web search eng...
research
03/18/2019

Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge

Developers often search for relevant code examples on the web for their ...
research
07/12/2018

Improved Query Reformulation for Concept Location using CodeRank and Document Structures

During software maintenance, developers usually deal with a significant ...

Please sign up or login with your details

Forgot password? Click here to reset