Searching a Database of Source Codes Using Contextualized Code Search

01/10/2020
by   Rohan Mukherjee, et al.
0

We assume a database containing a large set of program source codes and consider the problem of contextualized code search over that database. A programmer has written some part of a program, but has left part of the program (such as a method or a function body) incomplete. The goal is to use the context surrounding the missing code to automatically 'figure out' which of the codes in the database would be useful to the programmer in order to help complete the missing code, in the sense that the programmer could either re-purpose the retrieved code and use the re-purposed code to fill the missing spot in the program. Or, the user could use the retrieved code as a model for implementing the missing code. The search is 'contextualized' in the sense that the search engine should use clues in the partially-completed code to figure out which database code is most useful. The user should not be required to formulate an explicit query. We cast contextualized code search as a learning problem, where the goal is to learn a distribution function computing the likelihood that each database code completes the program, and propose a neural model for predicting which database code is likely to be most useful. Because it will be prohibitively expensive to apply a neural model to each code in a database of millions or billions of codes at search time, one of our key technical concerns is ensuring a speedy search. We address this by learning a 'reverse encoder' that can be used to reduce the problem of evaluating each database code to computing a convolution of two normal distributions, making it possible to search a large database of codes in a reasonable time.

READ FULL TEXT

page 5

page 11

research
06/08/2017

Source Forager: A Search Engine for Similar Source Code

Developers spend a significant amount of time searching for code: e.g., ...
research
10/18/2021

The search of Type I codes

A self-dual binary linear code is called Type I code if it has singly-ev...
research
10/24/2022

Scalable Program Clone Search Through Spectral Analysis

We consider the problem of program clone search, i.e. given a target pro...
research
03/19/2021

A new method for constructing linear codes with small hulls

The hull of a linear code over finite fields is the intersection of the ...
research
12/01/2020

Latent Programmer: Discrete Latent Codes for Program Synthesis

In many sequence learning tasks, such as program synthesis and document ...
research
04/01/2022

Towards a machine-readable literature: finding relevant papers based on an uploaded powder diffraction pattern

We investigate a prototype application for machine-readable literature. ...
research
12/04/2018

Aroma: Code Recommendation via Structural Code Search

Programmers often write code which have similarity to existing code writ...

Please sign up or login with your details

Forgot password? Click here to reset