Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets

05/19/2023
by   Ivan Sedykh, et al.
0

Code search is an important task that has seen many developments in recent years. However, previous attempts have mostly considered the problem of searching for code by a text query. We argue that using a code snippet (and possibly an associated traceback) as a query and looking for answers with bugfixing instructions and code samples is a natural use case that is not covered by existing approaches. Moreover, existing datasets use comments extracted from code rather than full-text descriptions as text, making them unsuitable for this use case. We present a new SearchBySnippet dataset implementing the search-by-code use case based on StackOverflow data; it turns out that in this setting, existing architectures fall short of the simplest BM25 baseline even after fine-tuning. We present a new single encoder model SnippeR that outperforms several strong baselines on the SearchBySnippet dataset with a result of 0.451 Recall@10; we propose the SearchBySnippet dataset and SnippeR as a new important benchmark for code search evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2017

Source Forager: A Search Engine for Similar Source Code

Developers spend a significant amount of time searching for code: e.g., ...
research
08/26/2019

Neural Code Search Evaluation Dataset

There has been an increase of interest in code search using natural lang...
research
10/28/2022

UniASM: Binary Code Similarity Detection without Fine-tuning

Binary code similarity detection (BCSD) is widely used in various binary...
research
02/25/2023

STACC: Code Comment Classification using SentenceTransformers

Code comments are a key resource for information about software artefact...
research
03/28/2019

Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns

Since programming concepts do not match their syntactic representations,...
research
09/03/2020

Detecting Bad Smells in Use Case Descriptions

Use case modeling is very popular to represent the functionality of the ...
research
10/21/2020

ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine

In a sponsored search engine, generative retrieval models are recently p...

Please sign up or login with your details

Forgot password? Click here to reset