Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

03/13/2018
by   Nghi D. Q. Bui, et al.
0

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may be either specific to the language grammars, or specific to certain kinds of code elements (e.g., tokens, phrases, API uses). This paper proposes a new approach to automatically learn cross-language representations for various kinds of structural code elements that may be used for program translation. Our key idea is two folded: First, we normalize and enrich code token streams with additional structural and semantic information, and train cross-language vector representations for the tokens (a.k.a. shared embeddings based on word2vec, a neural-network-based technique for producing word embeddings; Second, hierarchically from bottom up, we construct shared embeddings for code elements of higher levels of granularity (e.g., expressions, statements, methods) from the embeddings for their constituents, and then build mappings among code elements across languages based on similarities among embeddings. Our preliminary evaluations on about 40,000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores. When compared with an existing tool for mapping library API methods, our approach identifies many more mappings accurately. The mapping results and code can be accessed at https://github.com/bdqnghi/hierarchical-programming-language-mapping. We believe that our idea for learning cross-language vector representations with code structural information can be a useful step towards automated program translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2019

SAR: Learning Cross-Language API Mappings with Little Knowledge

To save manual effort, developers often translate programs from one prog...
research
02/27/2021

A Differential Testing Approach for Evaluating Abstract Syntax Tree Mapping Algorithms

Abstract syntax tree (AST) mapping algorithms are widely used to analyze...
research
10/11/2021

Searching for Replacement Classes

Software developers must often replace existing components in their syst...
research
04/25/2017

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

Computer programs written in one language are often required to be porte...
research
06/16/2021

Cross-Language Code Search using Static and Dynamic Analyses

As code search permeates most activities in software development,code-to...
research
10/30/2014

Tasks that Require, or can Benefit from, Matching Blank Nodes

In various domains and cases, we observe the creation and usage of infor...
research
07/21/2022

Toward a Generic Mapping Language for Transformations between RDF and Data Interchange Formats

While there exist approaches to integrate heterogeneous data using seman...

Please sign up or login with your details

Forgot password? Click here to reset