Towards usable automated detection of CPU architecture and endianness for arbitrary binary files and object code sequences

08/15/2019
by   Sami Kairajärvi, et al.
0

Static and dynamic binary analysis techniques are actively used to reverse engineer software's behavior and to detect its vulnerabilities, even when only the binary code is available for analysis. To avoid analysis errors due to misreading op-codes for a wrong CPU architecture, these analysis tools must precisely identify the Instruction Set Architecture (ISA) of the object code under analysis. The variety of CPU architectures that modern security and reverse engineering tools must support is ever increasing due to massive proliferation of IoT devices and the diversity of firmware and malware targeting those devices. Recent studies concluded that falsely identifying the binary code's ISA caused alone about 10% of failures of IoT firmware analysis. The state of the art approaches to detect ISA for arbitrary object code look promising - their results demonstrate effectiveness and high-performance. However, they lack the support of publicly available datasets and toolsets, which makes the evaluation, comparison, and improvement of those techniques, datasets, and machine learning models quite challenging (if not impossible). This paper bridges multiple gaps in the field of automated and precise identification of architecture and endianness of binary files and object code. We develop from scratch the toolset and datasets that are lacking in this research space. As such, we contribute a comprehensive collection of open data, open source, and open API web-services. We also attempt experiment reconstruction and cross-validation of effectiveness, efficiency, and results of the state of the art methods. When training and testing classifiers using solely code-sections from executable binary files, all our classifiers performed equally well achieving over 98% accuracy. The results are consistent and comparable with the current state of the art, hence supports the general validity of the algorithms

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2020

Function Identification in Android Binaries with Deep Learning

Application security support has become a preference for the enterprise ...
research
05/07/2021

argXtract: Deriving IoT Security Configurations via Automated Static Analysis of Stripped ARM Binaries

Recent high-profile attacks on the Internet of Things (IoT) have brought...
research
06/01/2022

Inter-BIN: Interaction-based Cross-architecture IoT Binary Similarity Comparison

The big wave of Internet of Things (IoT) malware reflects the fragility ...
research
06/16/2021

Cross-Language Code Search using Static and Dynamic Analyses

As code search permeates most activities in software development,code-to...
research
12/10/2021

BCD: A Cross-Architecture Binary Comparison Database Experiment Using Locality Sensitive Hashing Algorithms

Given a binary executable without source code, it is difficult to determ...
research
01/06/2023

CFG2VEC: Hierarchical Graph Neural Network for Cross-Architectural Software Reverse Engineering

Mission-critical embedded software is critical to our society's infrastr...
research
02/18/2023

Experimental Toolkit for Manipulating Executable Packing

Be it for a malicious or legitimate purpose, packing, a transformation t...

Please sign up or login with your details

Forgot password? Click here to reset