SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and its Retrieval

12/16/2022
by   Qing Huang, et al.
0

Pre-trained giant code models (PCMs) start coming into the developers' daily practices. Understanding what types of and how much software knowledge is packed into PCMs is the foundation for incorporating PCMs into software engineering (SE) tasks and fully releasing their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs' Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs' data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context-learning design factors to identify the best in-context learning configuration that developers can adopt in practice. With this best configuration, we investigate the effects of amount of example prompts and FQN data properties on Copilot's FQN inference capability. Our results confirm that Copilot stores diverse FQN knowledge and can be applied for the FQN inference due to its high inference accuracy and non-reliance on code analysis. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.

READ FULL TEXT

page 3

page 8

page 14

research
02/08/2023

An Empirical Comparison of Pre-Trained Models of Source Code

While a large number of pre-trained models of source code have been succ...
research
10/19/2020

Software Engineering Practices for Scientific Software Development: A Systematic Mapping Study

Background: The development of scientific software applications is far f...
research
05/06/2023

Empathy Models and Software Engineering – A Preliminary Analysis and Taxonomy

Empathy is widely used in many disciplines such as philosophy, sociology...
research
04/28/2023

Optimizing Workflow for Elite Developers: Perspectives on Leveraging SE Bots

Small-scale automation services in Software Engineering, known as SE Bot...
research
03/14/2021

The entrepreneurial logic of startup software development: A study of 40 software startups

Context: Software startups are an essential source of innovation and sof...
research
08/10/2022

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code

Partial code usually involves non-fully-qualified type names (non-FQNs) ...
research
09/07/2023

The Devil is in the Tails: How Long-Tailed Code Distributions Impact Large Language Models

Learning-based techniques, especially advanced Large Language Models (LL...

Please sign up or login with your details

Forgot password? Click here to reset