PSCS: A Path-based Neural Model for Semantic Code Search

by   Zhensu Sun, et al.
Tongji University

To obtain code snippets for reuse, programmers prefer to search for related documents, e.g., blogs or Q A, instead of code itself. The major reason is due to the semantic diversity and mismatch between queries and code snippets. Deep learning models have been proposed to address this challenge. Compared with approaches using information retrieval techniques, deep learning models do not suffer from the information loss caused by refining user intention into keywords. However, the performance of previous works is not satisfactory because they ignore the importance of code structure. When the semantics of code (e.g., identifier names, APIs) are ambiguous, code structure may be the only feature for the model to utilize. In that case, previous works relearn the structural information from lexical tokens of code, which is extremely difficult for a model without any domain knowledge. In this work, we propose PSCS, a path-based neural model for semantic code search. Our model encodes both the semantics and structures of code represented by AST paths. We train and evaluate our model over 330k-19k query-function pairs, respectively. The evaluation results demonstrate that PSCS achieves a SuccessRate of 47.6 Mean Reciprocal Rank (MRR) of 30.4 match. The proposed approach significantly outperforms both DeepCS, the first approach that applies deep learning to code search task, and CARLCS, a state-of-the-art approach that introduces a co-attentive representation learning model on the basis of DeepCS. The importance of code structure is demonstrated with an ablation study on code features, which enlightens model design for further studies.


CSRS: Code Search with Relevance Matching and Semantic Matching

Developers often search and reuse existing code snippets in the process ...

CSSAM:Code Search via Attention Matching of Code Semantics and Structures

Despite the continuous efforts in improving both the effectiveness and e...

Code Search based on Context-aware Code Translation

Code search is a widely used technique by developers during software dev...

CRaDLe: Deep Code Retrieval Based on Semantic Dependency Learning

Code retrieval is a common practice for programmers to reuse existing co...

Simplifying Deep-Learning-Based Model for Code Search

To accelerate software development, developers frequently search and reu...

SEED: Semantic Graph based Deep detection for type-4 clone

Background: Type-4 clones refer to a pair of code snippets with similar ...

Code Representation Learning with Prüfer Sequences

An effective and efficient encoding of the source code of a computer pro...

Please sign up or login with your details

Forgot password? Click here to reset