Simplifying Deep-Learning-Based Model for Code Search

05/29/2020
by   Chao Liu, et al.
0

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR) based models for code search, which match keywords in query with code text. But they fail to connect the semantic gap between query and code. To conquer this challenge, Gu et al. proposed a deep-learning-based model named DeepCS. It jointly embeds method code and natural language description into a shared vector space, where methods related to a natural language query are retrieved according to their vector similarities. However, DeepCS' working process is complicated and time-consuming. To overcome this issue, we proposed a simplified model CodeMatcher that leverages the IR technique but maintains many features in DeepCS. Generally, CodeMatcher combines query keywords with the original order, performs a fuzzy search on name and body strings of methods, and returned the best-matched methods with the longer sequence of used keywords. We verified its effectiveness on a large-scale codebase with about 41k repositories. Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97 is over 66 times faster than DeepCS. Besides, comparing with the state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by 73 deep-learning-based models is promising because they compensate with each other by nature; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2021

Clone-Seeker: Effective Code Clone Search Using Annotations

Source code search plays an important role in software development, e.g....
research
03/15/2022

CSRS: Code Search with Relevance Matching and Semantic Matching

Developers often search and reuse existing code snippets in the process ...
research
03/28/2019

Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns

Since programming concepts do not match their syntactic representations,...
research
07/15/2018

Semantic Search by Latent Ontological Features

Both named entities and keywords are important in defining the content o...
research
03/29/2022

Accelerating Code Search with Deep Hashing and Code Classification

Code search is to search reusable code snippets from source code corpus ...
research
08/07/2020

PSCS: A Path-based Neural Model for Semantic Code Search

To obtain code snippets for reuse, programmers prefer to search for rela...
research
04/21/2020

Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance

The search of information in large text repositories has been plagued by...

Please sign up or login with your details

Forgot password? Click here to reset