Automatic Classification of Object Code Using Machine Learning

05/06/2018
by   John Clemens, et al.
0

Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled computer object code, one can apply the same type of analysis to classify important aspects of the code, such as its target architecture and endianess. We show that using simple byte-value histograms we retain enough information about the opcodes within a sample to classify the target architecture with high accuracy, and then discuss heuristic-based features that exploit information within the operands to determine endianess. We introduce a dataset with over 16000 code samples from 20 architectures and experimentally show that by using our features, classifiers can achieve very high accuracy with relatively small sample sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

The Tribes of Machine Learning and the Realm of Computer Architecture

Machine learning techniques have influenced the field of computer archit...
research
09/30/2010

Mantis: Predicting System Performance through Program Analysis and Modeling

We present Mantis, a new framework that automatically predicts program p...
research
05/30/2023

Audio classification using ML methods

Machine Learning systems have achieved outstanding performance in differ...
research
11/28/2020

Exoplanet Detection using Machine Learning

We introduce a new machine learning based technique to detect exoplanets...
research
12/28/2020

Phishing Detection through Email Embeddings

The problem of detecting phishing emails through machine learning techni...
research
08/18/2023

A Graph-based Stratified Sampling Methodology for the Analysis of (Underground) Forums

[Context] Researchers analyze underground forums to study abuse and cybe...

Please sign up or login with your details

Forgot password? Click here to reset