XFL: eXtreme Function Labeling

07/28/2021
by   James Patrick-Evans, et al.
0

Reverse engineers would benefit from identifiers like function names, but these are usually unavailable in binaries. Training a machine learning model to predict function names automatically is promising but fundamentally hard due to the enormous number of classes. In this paper, we introduce eXtreme Function Labeling (XFL), an extreme multi-label learning approach to selecting appropriate labels for binary functions. XFL splits function names into tokens, treating each as an informative label akin to the problem of tagging texts in natural language. To capture the semantics of binary code, we introduce DEXTER, a novel function embedding that combines static analysis-based features with local context from the call graph and global context from the entire binary. We demonstrate that XFL outperforms state-of-the-art approaches to function labeling on a dataset of over 10,000 binaries from the Debian project, achieving a precision of 82.5 different published embeddings for binary functions and show that DEXTER consistently improves over the state of the art in information gain. As a result, we are able to show that binary function labeling is best phrased in terms of multi-label learning, and that binary function embeddings benefit from moving beyond just learning from syntax.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2017

Leveraging Distributional Semantics for Multi-Label Learning

We present a novel and scalable label embedding framework for large-scal...
research
10/27/2018

A no-regret generalization of hierarchical softmax to extreme multi-label classification

Extreme multi-label classification (XMLC) is a problem of tagging an ins...
research
07/03/2019

Towards Interpretable Deep Extreme Multi-label Learning

Many Machine Learning algorithms, such as deep neural networks, have lon...
research
07/31/2023

Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels

Multi-label classification is a widely encountered problem in daily life...
research
10/27/2022

Comparing One with Many – Solving Binary2source Function Matching Under Function Inlining

Binary2source function matching is a fundamental task for many security ...
research
04/02/2022

Exploiting Local and Global Features in Transformer-based Extreme Multi-label Text Classification

Extreme multi-label text classification (XMTC) is the task of tagging ea...
research
11/23/2022

Private Multi-Winner Voting for Machine Learning

Private multi-winner voting is the task of revealing k-hot binary vector...

Please sign up or login with your details

Forgot password? Click here to reset