Path-Based Function Embedding and its Application to Specification Mining

02/21/2018
by   Daniel Defreez, et al.
0

Relationships among program elements is useful for program understanding, debugging, and analysis. One such kind of relationship is synonymous functions. Function synonyms are functions that play a similar role in code; examples include functions that perform initialization for different device drivers, and functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents func2vec, an algorithm that maps each function to a vector in a vector space such that function synonyms are grouped together. We compute the function embedding by training a neural network using sentences generated using random walks of the interprocedural control-flow graph. We show the effectiveness of func2vec in identifying function synonyms in the Linux kernel. Furthermore, we show how knowing function synonyms enables mining error-handling specifications with high support in Linux file systems and drivers.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/23/2018

Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks

In this paper we consider the binary similarity problem that consists in...
12/09/2021

Towards Neural Functional Program Evaluation

This paper explores the capabilities of current transformer-based langua...
08/22/2017

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

The problem of cross-platform binary code similarity detection aims at d...
09/08/2021

Computing on Functions Using Randomized Vector Representations

Vector space models for symbolic processing that encode symbols by rando...
06/19/2018

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, re...
06/03/2019

Probabilistic Existence Results for Parent-Identifying Schemes

Parent-identifying schemes provide a way to identify causes from effects...
10/09/2015

Multitasking Programming of OBDH Satellite Based On PC-104

On Board Data Handling (OBDH) has functions to monitor, control, acquire...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.