Representing LLVM-IR in a Code Property Graph

11/09/2022
by   Alexander Küchler, et al.
0

In the past years, a number of static application security testing tools have been proposed which make use of so-called code property graphs, a graph model which keeps rich information about the source code while enabling its user to write language-agnostic analyses. However, they suffer from several shortcomings. They work mostly on source code and exclude the analysis of third-party dependencies if they are only available as compiled binaries. Furthermore, they are limited in their analysis to whether an individual programming language is supported or not. While often support for well-established languages such as C/C++ or Java is included, languages that are still heavily evolving, such as Rust, are not considered because of the constant changes in the language design. To overcome these limitations, we extend an open source implementation of a code property graph to support LLVM-IR which can be used as output by many compilers and binary lifters. In this paper, we discuss how we address challenges that arise when mapping concepts of an intermediate representation to a CPG. At the same time, we optimize the resulting graph to be minimal and close to the representation of equivalent source code. Our evaluation indicates that existing analyses can be reused without modifications and that the performance requirements are comparable to operating on source code. This makes the approach suitable for an analysis of large-scale projects.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset