DeepAI AI Chat
Log In Sign Up

Simplifying Deep-Learning-Based Model for Code Search

by   Chao Liu, et al.
Singapore Management University
Zhejiang University
Queen's University
Monash University
Baidu, Inc.

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR) based models for code search, which match keywords in query with code text. But they fail to connect the semantic gap between query and code. To conquer this challenge, Gu et al. proposed a deep-learning-based model named DeepCS. It jointly embeds method code and natural language description into a shared vector space, where methods related to a natural language query are retrieved according to their vector similarities. However, DeepCS' working process is complicated and time-consuming. To overcome this issue, we proposed a simplified model CodeMatcher that leverages the IR technique but maintains many features in DeepCS. Generally, CodeMatcher combines query keywords with the original order, performs a fuzzy search on name and body strings of methods, and returned the best-matched methods with the longer sequence of used keywords. We verified its effectiveness on a large-scale codebase with about 41k repositories. Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97 is over 66 times faster than DeepCS. Besides, comparing with the state-of-the-art IR-based model CodeHow, CodeMatcher also improves the MRR by 73 deep-learning-based models is promising because they compensate with each other by nature; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code.


page 1

page 2

page 3

page 4


Clone-Seeker: Effective Code Clone Search Using Annotations

Source code search plays an important role in software development, e.g....

CSRS: Code Search with Relevance Matching and Semantic Matching

Developers often search and reuse existing code snippets in the process ...

Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns

Since programming concepts do not match their syntactic representations,...

Semantic Search by Latent Ontological Features

Both named entities and keywords are important in defining the content o...

Accelerating Code Search with Deep Hashing and Code Classification

Code search is to search reusable code snippets from source code corpus ...

PSCS: A Path-based Neural Model for Semantic Code Search

To obtain code snippets for reuse, programmers prefer to search for rela...

Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance

The search of information in large text repositories has been plagued by...