BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

08/13/2022
by   Fiorella Artuso, et al.
0

A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model. In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task. We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks. Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2023

kTrans: Knowledge-Aware Transformer for Binary Code Embedding

Binary Code Embedding (BCE) has important applications in various revers...
research
01/21/2021

PalmTree: Learning an Assembly Language Model for Instruction Embedding

Deep learning has demonstrated its strengths in numerous binary analysis...
research
06/25/2023

FastBCSD: Fast and Efficient Neural Network for Binary Code Similarity Detection

Binary code similarity detection (BCSD) has various applications, includ...
research
07/06/2023

BrickPal: Augmented Reality-based Assembly Instructions for Brick Models

The assembly instruction is a mandatory component of Lego-like brick set...
research
09/11/2023

Large Language Models for Compiler Optimization

We explore the novel application of Large Language Models to code optimi...
research
05/20/2022

Learning to Reverse DNNs from AI Programs Automatically

With the privatization deployment of DNNs on edge devices, the security ...
research
08/06/2023

Binary Code Similarity Detection

Binary code similarity detection is to detect the similarity of code at ...

Please sign up or login with your details

Forgot password? Click here to reset