A Literature Study of Embeddings on Source Code

04/05/2019
by   Zimin Chen, et al.
0

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and discuss the usage of word embedding techniques on programs and source code. The articles in this survey have been collected by asking authors of related work and with an extensive search on Google Scholar. Each article is categorized into five categories: 1. embedding of tokens 2. embedding of functions or methods 3. embedding of sequences or sets of method calls 4. embedding of binary code 5. other embeddings. We also provide links to experimental data and show some remarkable visualization of code embeddings. In summary, word embedding has been successfully applied on different granularities of source code. With access to countless open-source repositories, we see a great potential of applying other data-driven natural language processing techniques on source code in the future.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2019

Evaluating Word Embedding Models: Methods and Experimental Results

Extensive evaluation on a large number of word embedding models for lang...
research
10/11/2019

Evaluating Semantic Representations of Source Code

Learned representations of source code enable various software developer...
research
08/25/2019

Unsupervised Construction of Knowledge Graphs From Text and Code

The scientific literature is a rich source of information for data minin...
research
02/27/2020

The Spectral Underpinning of word2vec

word2vec due to Mikolov et al. (2013) is a word embedding method that is...
research
05/24/2018

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

Many deep learning architectures have been proposed to model the composi...
research
12/21/2019

Pre-trained Contextual Embedding of Source Code

The source code of a program not only serves as a formal description of ...
research
08/04/2021

A Comparison of Different Source Code Representation Methods for Vulnerability Prediction in Python

In the age of big data and machine learning, at a time when the techniqu...

Please sign up or login with your details

Forgot password? Click here to reset