BERT2Code: Can Pretrained Language Models be Leveraged for Code Search?

04/16/2021
by   Abdullah Al Ishtiaq, et al.
0

Millions of repetitive code snippets are submitted to code repositories every day. To search from these large codebases using simple natural language queries would allow programmers to ideate, prototype, and develop easier and faster. Although the existing methods have shown good performance in searching codes when the natural language description contains keywords from the code, they are still far behind in searching codes based on the semantic meaning of the natural language query and semantic structure of the code. In recent years, both natural language and programming language research communities have created techniques to embed them in vector spaces. In this work, we leverage the efficacy of these embedding models using a simple, lightweight 2-layer neural network in the task of semantic code search. We show that our model learns the inherent relationship between the embedding spaces and further probes into the scope of improvement by empirically analyzing the embedding methods. In this analysis, we show that the quality of the code embedding model is the bottleneck for our model's performance, and discuss future directions of study in this area.

READ FULL TEXT
research
03/28/2019

Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns

Since programming concepts do not match their syntactic representations,...
research
08/26/2019

Neural Code Search Evaluation Dataset

There has been an increase of interest in code search using natural lang...
research
10/16/2021

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a ...
research
10/19/2020

COSEA: Convolutional Code Search with Layer-wise Attention

Semantic code search, which aims to retrieve code snippets relevant to a...
research
02/14/2022

On the Importance of Building High-quality Training Datasets for Neural Code Search

The performance of neural code search is significantly influenced by the...
research
08/17/2023

Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language

We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance ma...
research
08/11/2021

Natural Language-Guided Programming

In today's software world with its cornucopia of reusable software libra...

Please sign up or login with your details

Forgot password? Click here to reset