Generation-Augmented Query Expansion For Code Retrieval

12/20/2022
by   Dong Li, et al.
0

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet. However, existing models focus only on optimizing the documentation code pairs by embedding them into latent space, without the association of external knowledge. In this paper, we propose a generation-augmented query expansion framework. Inspired by the human retrieval process - sketching an answer before searching, in this work, we utilize the powerful code generation model to benefit the code retrieval task. Specifically, we demonstrate that rather than merely retrieving the target code snippet according to the documentation query, it would be helpful to augment the documentation query with its generation counterpart - generated code snippets from the code generation model. To the best of our knowledge, this is the first attempt that leverages the code generation model to enhance the code retrieval task. We achieve new state-of-the-art results on the CodeSearchNet benchmark and surpass the baselines significantly.

READ FULL TEXT

page 4

page 6

research
02/24/2020

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Code summarization generates brief natural language description given a ...
research
10/04/2022

Recitation-Augmented Language Models

We propose a new paradigm to help Large Language Models (LLMs) generate ...
research
10/16/2021

AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a ...
research
09/28/2022

FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

Retrieval-augmented generation models offer many benefits over standalon...
research
02/17/2021

I Want This Product but Different : Multimodal Retrieval with Synthetic Query Expansion

This paper addresses the problem of media retrieval using a multimodal q...
research
06/05/2023

SelfEvolve: A Code Evolution Framework via Large Language Models

Large language models (LLMs) have already revolutionized code generation...
research
07/07/2022

Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

This paper studies multi-task training of retrieval-augmented generation...

Please sign up or login with your details

Forgot password? Click here to reset