Software Ethology: An Accurate and Resilient Semantic Binary Analysis Framework

06/07/2019
by   Derrick McKee, et al.
0

When reverse engineering a binary, the analyst must first understand the semantics of the binary's functions through either manual or automatic analysis. Manual semantic analysis is time-consuming, because abstractions provided by high level languages, such as type information, variable scope, or comments are lost, and past analyses cannot apply to the current analysis task. Existing automated binary analysis tools currently suffer from low accuracy in determining semantic function identification in the presence of diverse compilation environments. We introduce Software Ethology, a binary analysis approach for determining the semantic similarity of functions. Software Ethology abstracts semantic behavior as classification vectors of program state changes resulting from a function executing with a specified input state, and uses these vectors as a unique fingerprint for identification. All existing semantic identifiers determine function similarity via code measurements, and suffer from high inaccuracy when classifying functions from compilation environments different from their ground truth source. Since Software Ethology does not rely on code measurements, its accuracy is resilient to changes in compiler, compiler version, optimization level, or even different source implementing equivalent functionality. Tinbergen, our prototype Software Ethology implementation, leverages a virtual execution environment and a fuzzer to generate the classification vectors. In evaluating Tinbergen's feasibility as a semantic function identifier by identifying functions in coreutils-8.30, we achieve a high .805 average accuracy. Compared to the state-of-the-art, Tinbergen is 1.5 orders of magnitude faster when training, 50 identifying functions in binaries generated from differing compilation environments, is 30

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2018

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

Binary code clone analysis is an important technique which has a wide ra...
research
12/22/2021

Semantics-Recovering Decompilation through Neural Machine Translation

Decompilation transforms low-level program languages (PL) (e.g., binary ...
research
12/04/2020

Automating Seccomp Filter Generation for Linux Applications

Software vulnerabilities in applications undermine the security of appli...
research
09/14/2022

Cornucopia: A Framework for Feedback Guided Generation of Binaries

Binary analysis is an important capability required for many security an...
research
03/09/2021

Finding Inlined Functions in Optimized Binaries

Much software, whether beneficent or malevolent, is distributed only as ...
research
10/05/2020

Wasm/k: Delimited Continuations for WebAssembly

WebAssembly is designed to be an alternative to JavaScript that is a saf...
research
09/06/2022

Fun2Vec:a Contrastive Learning Framework of Function-level Representation for Binary

Function-level binary code similarity detection is essential in the fiel...

Please sign up or login with your details

Forgot password? Click here to reset