SCELMo: Source Code Embeddings from Language Models

Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair. Contextual embeddings are common in natural language processing but have not been previously applied in software engineering. We introduce a new set of deep contextualized word representations for computer programs based on language models. We train a set of embeddings using the ELMo (embeddings from language models) framework of Peters et al (2018). We investigate whether these embeddings are effective when fine-tuned for the downstream task of bug detection. We show that even a low-dimensional embedding trained on a relatively small corpus of programs can improve a state-of-the-art machine learning system for bug detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2019

Pre-trained Contextual Embedding of Source Code

The source code of a program not only serves as a formal description of ...
research
07/05/2023

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

Large language models trained on source code can support a variety of so...
research
07/08/2023

Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models

This paper presents an AI-assisted programming tool called Copilot for X...
research
08/17/2022

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

Neural clone detection has attracted the attention of software engineeri...
research
05/28/2021

Learning to Extend Program Graphs to Work-in-Progress Code

Source code spends most of its time in a broken or incomplete state duri...
research
10/14/2021

P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts

Recent work (e.g. LAMA (Petroni et al., 2019)) has found that the qualit...
research
11/16/2020

Neural Software Analysis

Many software development problems can be addressed by program analysis ...

Please sign up or login with your details

Forgot password? Click here to reset