Revisiting Lightweight Compiler Provenance Recovery on ARM Binaries

05/06/2023
by   Jason Kim, et al.
0

A binary's behavior is greatly influenced by how the compiler builds its source code. Although most compiler configuration details are abstracted away during compilation, recovering them is useful for reverse engineering and program comprehension tasks on unknown binaries, such as code similarity detection. We observe that previous work has thoroughly explored this on x86-64 binaries. However, there has been limited investigation of ARM binaries, which are increasingly prevalent. In this paper, we extend previous work with a shallow-learning model that efficiently and accurately recovers compiler configuration properties for ARM binaries. We apply opcode and register-derived features, that have previously been effective on x86-64 binaries, to ARM binaries. Furthermore, we compare this work with Pizzolotto et al., a recent architecture-agnostic model that uses deep learning, whose dataset and code are available. We observe that the lightweight features are reproducible on ARM binaries. We achieve over 99 approaches, while achieving a 583-times speedup during training and 3,826-times speedup during inference. Finally, we also discuss findings of overfitting that was previously undetected in prior work.

READ FULL TEXT
research
11/02/2017

BinPro: A Tool for Binary Source Code Provenance

Enforcing open source licenses such as the GNU General Public License (G...
research
02/24/2021

Learning to Make Compiler Optimizations More Effective

Because loops execute their body many times, compiler developers place m...
research
01/04/2023

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries

Reverse engineering binaries is required to understand and analyse progr...
research
10/11/2022

Leveraging Artificial Intelligence on Binary Code Comprehension

Understanding binary code is an essential but complex software engineeri...
research
03/17/2019

Compiler-assisted Adaptive Program Scheduling in big.LITTLE Systems

Energy-aware architectures provide applications with a mix of low (LITTL...
research
10/19/2020

Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Heterogeneous systems are becoming increasingly prevalent. In order to e...
research
08/07/2023

Evaluation of ARM CPUs for IceCube available through Google Kubernetes Engine

The IceCube experiment has substantial simulation needs and is in contin...

Please sign up or login with your details

Forgot password? Click here to reset