Adversarial Binaries for Authorship Identification

09/21/2018
by   Xiaozhu Meng, et al.
0

Binary code authorship identification determines authors of a binary program. Existing techniques have used supervised machine learning for this task. In this paper, we look this problem from an attacker's perspective. We aim to modify a test binary, such that it not only causes misprediction but also maintains the functionality of the original input binary. Attacks against binary code are intrinsically more difficult than attacks against domains such as computer vision, where attackers can change each pixel of the input image independently and still maintain a valid image. For binary code, even flipping one bit of a binary may cause the binary to be invalid, to crash at the run-time, or to lose the original functionality. We investigate two types of attacks: untargeted attacks, causing misprediction to any of the incorrect authors, and targeted attacks, causing misprediction to a specific one among the incorrect authors. We develop two key attack capabilities: feature vector modification, generating an adversarial feature vector that both corresponds to a real binary and causes the required misprediction, and input binary modification, modifying the input binary to match the adversarial feature vector while maintaining the functionality of the input binary. We evaluated our attack against classifiers trained with a state-of-the-art method for authorship attribution. The classifiers for authorship identification have 91 accuracy on average. Our untargeted attack has a 96 showing that we can effectively suppress authorship signal. Our targeted attack has a 46 significantly more difficult to impersonate a specific programmer's style. Our attack reveals that existing binary code authorship identification techniques rely on code features that are easy to modify, and thus are vulnerable to attacks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2023

SHIELD: Thwarting Code Authorship Attribution

Authorship attribution has become increasingly accurate, posing a seriou...
research
09/21/2021

Attacks on Visualization-Based Malware Detection: Balancing Effectiveness and Executability

With the rapid development of machine learning for image classification,...
research
02/12/2018

Sphinx: A Secure Architecture Based on Binary Code Diversification and Execution Obfuscation

Sphinx, a hardware-software co-design architecture for binary code and r...
research
10/15/2019

Adversarial Examples for Models of Code

We introduce a novel approach for attacking trained models of code with ...
research
03/20/2023

Adversarial Attacks against Binary Similarity Systems

In recent years, binary analysis gained traction as a fundamental approa...
research
02/03/2023

A Systematic Evaluation of Backdoor Trigger Characteristics in Image Classification

Deep learning achieves outstanding results in many machine learning task...
research
12/01/2020

One-Pixel Attack Deceives Automatic Detection of Breast Cancer

In this article we demonstrate that a state-of-the-art machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset