Understand Code Style: Efficient CNN-based Compiler Optimization Recognition System

01/18/2023
by   Shouguo Yang, et al.
0

Compiler optimization level recognition can be applied to vulnerability discovery and binary analysis. Due to the exists of many different compilation optimization options, the difference in the contents of the binary file is very complicated. There are thousands of compiler optimization algorithms and multiple different processor architectures, so it is very difficult to manually analyze binary files and recognize its compiler optimization level with rules. This paper first proposes a CNN-based compiler optimization level recognition model: BinEye. The system extracts semantic and structural differences and automatically recognize the compiler optimization levels. The model is designed to be very suitable for binary file processing and is easy to understand. We built a dataset containing 80,028 binary files for the model training and testing. Our proposed model achieves an accuracy of over 97 BinEye is a fully CNN-based system and it has a faster forward calculation speed, at least 8 times faster than the normal RNN-based model. Through our analysis of the model output, we successfully found the difference in assembly codes caused by the different compiler optimization level. This means that the model we proposed is interpretable. Based on our model, we propose a method to analyze the code differences caused by different compiler optimization levels, which has great guiding significance for analyzing closed source compilers and binary security analysis.

READ FULL TEXT
research
03/23/2021

Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference: An Empirical Study

Since compiler optimization is the most common source contributing to bi...
research
09/11/2023

Large Language Models for Compiler Optimization

We explore the novel application of Large Language Models to code optimi...
research
07/18/2022

Implementation of a Didactic Compiler for a superset of PL/0

This article describes the features of a compiler for a superset languag...
research
01/04/2022

On the Influence of the FPGA Compiler Optimization Options on the Success of the Horizontal Attack

This paper reports about the impact of compiler options on the resistanc...
research
06/10/2021

Semantic-aware Binary Code Representation with BERT

A wide range of binary analysis applications, such as bug discovery, mal...
research
04/13/2022

A Natural Language Processing Approach for Instruction Set Architecture Identification

Binary analysis of software is a critical step in cyber forensics applic...

Please sign up or login with your details

Forgot password? Click here to reset