IR2Vec: A Flow Analysis based Scalable Infrastructure for Program Encodings

09/13/2019
by   Venkata Keerthy S, et al.
0

We propose IR2Vec, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with data and control flow information to capture the syntax as well as the semantics of the input programs. Our embeddings are obtained from the Intermediate Representation (IR) of the source code, and are both language as well as machine independent. The entities of the IR are modelled as relationships, and their representations are learned to form a seed embedding vocabulary. This vocabulary is used along with the flow analyses information to form a hierarchy of encodings based on various levels of program abstractions. We show the effectiveness of our methodology on a software engineering task (program classification) as well as optimization tasks (Heterogeneous device mapping and Thread coarsening). The embeddings generated by IR2Vec outperform the existing methods in all the three tasks even when using simple machine learning models. As we follow an agglomerative method of forming encodings at various levels using seed embedding vocabulary, our encoding is naturally more scalable and not data-hungry when compared to the other methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Neural program embeddings have demonstrated considerable promise in a ra...
research
06/19/2018

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, re...
research
03/18/2018

Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

With the rise of machine learning, there is a great deal of interest in ...
research
09/15/2021

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engine...
research
03/13/2019

SymPas: Symbolic Program Slicing

Program slicing is a technique for simplifying programs by focusing on s...
research
05/27/2019

COSET: A Benchmark for Evaluating Neural Program Embeddings

Neural program embedding can be helpful in analyzing large software, a t...
research
03/23/2020

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis

The increasing complexity of computing systems places a tremendous burde...

Please sign up or login with your details

Forgot password? Click here to reset