FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations

09/09/2023
by   Changan Niu, et al.
0

While the majority of existing pre-trained models from code learn source code features such as code tokens and abstract syntax trees, there are some other works that focus on learning from compiler intermediate representations (IRs). Existing IR-based models typically utilize IR features such as instructions, control and data flow graphs (CDFGs), call graphs, etc. However, these methods confuse variable nodes and instruction nodes in a CDFG and fail to distinguish different types of flows, and the neural networks they use fail to capture long-distance dependencies and have over-smoothing and over-squashing problems. To address these weaknesses, we propose FAIR, a Flow type-Aware pre-trained model for IR that involves employing (1) a novel input representation of IR programs; (2) Graph Transformer to address over-smoothing, over-squashing and long-dependencies problems; and (3) five pre-training tasks that we specifically propose to enable FAIR to learn the semantics of IR tokens, flow type information, and the overall representation of IR. Experimental results show that FAIR can achieve state-of-the-art results on four code-related downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2022

SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations

Recent years have seen the successful application of large pre-trained m...
research
04/20/2022

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Neural program embeddings have demonstrated considerable promise in a ra...
research
12/10/2019

RVSDG: An Intermediate Representation for Optimizing Compilers

Intermediate Representations (IRs) are central to optimizing compilers a...
research
06/13/2023

TRACED: Execution-aware Pre-training for Source Code

Most existing pre-trained language models for source code focus on learn...
research
06/29/2022

Diet Code is Healthy: Simplifying Programs for Pre-Trained Models of Code

Pre-trained code representation models such as CodeBERT have demonstrate...
research
05/10/2021

How could Neural Networks understand Programs?

Semantic understanding of programs is a fundamental problem for programm...
research
05/24/2022

GraphQ IR: Unifying Semantic Parsing of Graph Query Language with Intermediate Representation

Subject to the semantic gap lying between natural and formal language, n...

Please sign up or login with your details

Forgot password? Click here to reset