Neutaint: Efficient Dynamic Taint Analysis with Neural Networks

07/08/2019
by   Dongdong She, et al.
0

Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution. Existing DTA techniques use rule-based taint-propagation, which is neither accurate (i.e., high false positive) nor efficient (i.e., large runtime overhead). It is hard to specify taint rules for each operation while covering all corner cases correctly. Moreover, the overtaint and undertaint errors can accumulate during the propagation of taint information across multiple operations. Finally, rule-based propagation requires each operation to be inspected before applying the appropriate rules resulting in prohibitive performance overhead on large real-world applications. In this work, we propose NEUTAINT, a novel end-to-end approach to track information flow using neural program embeddings. The neural program embeddings model the target's programs computations taking place between taint sources and sinks, which automatically learns the information flow by observing a diverse set of execution traces. To perform lightweight and precise information flow analysis, we utilize saliency maps to reason about most influential sources for different sinks. NEUTAINT constructs two saliency maps, a popular machine learning approach to influence analysis, to summarize both coarse-grained and fine-grained information flow in the neural program embeddings. We compare NEUTAINT with 3 state-of-the-art dynamic taint analysis tools. The evaluation results show that NEUTAINT can achieve 68 which is 10 second-best taint tool Libdft on 6 real world programs. NEUTAINT also achieves 61 effectiveness of the identified influential bytes.

READ FULL TEXT
research
11/07/2021

Sdft: A PDG-based Summarization for Efficient Dynamic Data Flow Tracking

Dynamic taint analysis (DTA) has been widely used in various security-re...
research
09/08/2019

Fine Grained Dataflow Tracking with Proximal Gradients

Dataflow tracking with Dynamic Taint Analysis (DTA) is an important meth...
research
07/03/2019

Learning Blended, Precise Semantic Program Embeddings

Learning neural program embeddings is key to utilizing deep neural netwo...
research
11/18/2020

GRAPHSPY: Fused Program Semantic-Level Embedding via Graph Neural Networks for Dead Store Detection

Production software oftentimes suffers from the issue of performance ine...
research
08/06/2019

A Transformational Approach to Resource Analysis with Typed-norms Inference

In order to automatically infer the resource consumption of programs, an...
research
05/04/2018

Dynamic Control Flow in Large-Scale Machine Learning

Many recent machine learning models rely on fine-grained dynamic control...
research
07/16/2021

Automatic Firmware Emulation through Invalidity-guided Knowledge Inference (Extended Version)

Emulating firmware for microcontrollers is challenging due to the tight ...

Please sign up or login with your details

Forgot password? Click here to reset