On Secure and Usable Program Obfuscation: A Survey

10/03/2017
by   Hui Xu, et al.
0

Program obfuscation is a widely employed approach for software intellectual property protection. However, general obfuscation methods (e.g., lexical obfuscation, control obfuscation) implemented in mainstream obfuscation tools are heuristic and have little security guarantee. Recently in 2013, Garg et al. have achieved a breakthrough in secure program obfuscation with a graded encoding mechanism and they have shown that it can fulfill a compelling security property, i.e., indistinguishability. Nevertheless, the mechanism incurs too much overhead for practical usage. Besides, it focuses on obfuscating computation models (e.g., circuits) rather than real codes. In this paper, we aim to explore secure and usable obfuscation approaches from the literature. Our main finding is that currently we still have no such approaches made secure and usable. The main reason is we do not have adequate evaluation metrics concerning both security and performance. On one hand, existing code-oriented obfuscation approaches generally evaluate the increased obscurity rather than security guarantee. On the other hand, the performance requirement for model-oriented obfuscation approaches is too weak to develop practical program obfuscation solutions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/15/2021

Secure Optimization Through Opaque Observations

Secure applications implement software protections against side-channel ...
09/08/2017

FreeGuard: A Faster Secure Heap Allocator

In spite of years of improvements to software security, heap-related att...
06/06/2020

Hardware-Software Contracts for Secure Speculation

Since the discovery of Spectre, a large number of hardware mechanisms fo...
05/07/2021

Conversational Code Analysis: The Future of Secure Coding

The area of software development and secure coding can benefit significa...
11/05/2020

Towards a Theory of Special-purpose Program Obfuscation

Most recent theoretical literature on program obfuscation is based on no...
04/20/2018

Securing Email

Email is the most ubiquitous and interoperable form of online communicat...
10/19/2021

Java Card Virtual Machine Memory Organization: a Design Proposal

The Java Card Virtual Machine (JCVM) platform is widely deployed on secu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Program obfuscation is a major technique for software intellectual property protection [1]. It transforms computer programs to new versions which are semantic-equivalent with the original one but harder to understand. The concept was originally introduced at the International Obfuscated C Code Contest in 1984, which awarded creative C source codes with “smelly styles”. It now becomes an indispensable technique for software protection. There are dozens of code obfuscation ideas proposed in the literature and implemented by obfuscation tools. However, a truth we cannot ignore is that current mainstream obfuscation techniques do not provide a security guarantee.

An obfuscation approach is secure if it guarantees that the essential program semantics can be protected and demonstrates adequate hardness for adversaries to recover the semantics. Existing obfuscation approaches generally cannot meet such criteria. Moreover, there are many notable attacks on current obfuscation mechanisms (e.g., [2, 3, 4, 5, 6, 7]). Such attacks generally assume particular obfuscation mechanisms and directly attack them without need to solve any hard problems. Take the most recent attack by Bichsel et al. [7]

as an example, which recovers a significant portion of the original lexical information from obfuscated Android apps. The attack just employs machine learning techniques to predict the original strings leveraging the residual information. It seems that the security of program obfuscation is not well-established as other security primitives, such as cryptography. So one important question is “

do we have secure and usable program obfuscation approaches? if not, what is the viable means towards one?

Recently in 2013, a breakthrough came from the theoretical perspective. Garg et al. [8] proposed the first candidate program obfuscation algorithm (i.e., graded encoding) for all circuits and showed that it could achieve a compelling security property: indistinguishability. The idea is to encode circuits with multilinear maps. It has been inspiring many follow-up investigations which aim to deliver obfuscation approaches with provable security (e.g., [9, 10]). Figure 1

demonstrates an explosion of such obfuscation research with a dashed line. However, such graded encoding approaches are still too inefficient to be usable. Besides, they focus on obfuscating computation models, such as circuits or Turing Machines, rather than real codes. Although circuits and codes are closely related, graded encoding mechanisms cannot be applied to practical codes directly. We need to figure out the gaps and connections in between, which are essential to explore secure and usable obfuscation approaches.

Fig. 1: The distribution of our surveyed obfuscation paper across years.

This survey aims to explore secure and usable program obfuscation approaches from the literature. Firstly, we study whether usable code-oriented obfuscation approaches can be secure. We confirm that no such approaches have demonstrated well-studied security. The primary reason is that current evaluation metrics are not adequate for security purposes. Existing investigations generally adopte the metrics proposed by Collberg et al. [11], which are potency, resilience, stealthy and cost. Note that such evaluation metrics emphasize on the increased obscurity (i.e., potency) rather than the semantics that remained clear in an obfuscated program. Therefore, the metrics guarantee little security.

Secondly, we study whether we can develop usable program obfuscation approaches from existing model-oriented obfuscation investigations. The result is negative. Current graded encoding mechanisms are too inefficient to be usable. They only satisfy the performance requirement defined by Barak et al. [12], i.e., an obfuscated program should incur only polynomial overhead. The requirement might be too weak because a qualified program can still grow very large. Moreover, existing model-oriented obfuscation approaches are only applicable to real programs which contain only simple mathematical operations; they do not apply to ordinary codes with complex syntactic structures. For example, model-oriented obfuscation approaches do not consider some code components, e.g., the lexical information and API calls. Such components serve as essential clues for adversaries to analyze a program and should be obfuscated.

To summarize, this paper serves as a first attempt to explore secure and usable obfuscation approaches with a comparative study of code-oriented obfuscation and model-oriented obfuscation. Our result is that we have no secure and usable obfuscation approaches in current practice. To develop such approaches, we suggest the community to design appropriate evaluation metrics at first.

The rest of this paper is organized as follows: we first discuss the related work in Section II; then we introduce our study approach in Section III and major results in Section IV; we survey the literature about code-oriented obfuscation in Section V and model-oriented obfuscation in Section VI; then we discuss the possible paths towards secure and usable program obfuscation in Section VII; finally, we conclude this study in Section VIII.

Ii Related Work

As obfuscation has been studied for almost two decades, several surveys are available. However, they mainly focus on either code-oriented obfuscation or model-oriented obfuscation. The surveys of code-oriented obfuscation include [13, 14, 15, 16, 17]. Balakrishnan and Schulze [13] surveyed several major obfuscation approaches for both benign codes and malicious codes. Majumdar et al. [14] conducted a short survey that summarizes the control-flow obfuscation techniques using opaque predicates and dynamic dispatcher. Drape et al. [15] surveyed several obfuscation techniques via layout tranformation, control-flow tranformation, data tranformation, language dependent transformations, etc. Roundy et al. [16] systematically studied obfuscation techniques for binaries, which have been frequently used by malware packers; Schrittwieser et al. [17] surveyed the resilience of obfuscation mechanisms to reverse engineering techniques. The surveys of model-oriented obfuscation include [18, 19]. Horvath et al. [18] studied the history of cryptography obfuscation, with a focus on graded encoding mechanisms. Barak [19] reviewed the importance of indistinguishability obfuscation.

To our best knowledge, none of the existing surveys includes a thorough comparative study of code-oriented obfuscation and model-oriented obfuscation. Indeed, the two categories are closely related, because they frequently cite each other. For example, the impossibility result for model-oriented obfuscation in [12] has been widely cited by code-oriented investigations (e.g., [20]). Our survey, therefore, severs as a pilot study on synthesizing code-oriented obfuscation and model-oriented obfuscation.

Note that there are another two papers [21, 22] that have noticed the gaps between code-oriented obfuscation and model-oriented obfuscation, and they work towards secure and usable obfuscation. Preda and Giacobazzi et al. [21] proposed to model the security properties of obfuscation with abstract interpretation, which can be further employed to deliver obfuscation solutions (e.g., [23, 24]). Kuzurin et al. [22] noticed that current security properties for model obfuscation are too strong, and they proposed several alternative properties for practical program obfuscation scenarios. Note that such papers coincide with us on the importance of our studied problem. We will discuss more details in Section VII.

Iii Study Approach

Fig. 2: A taxonomy of program obfuscation approaches.

Iii-a Survey Scope

This work discusses program obfuscation, including both code-oriented obfuscation and model-oriented obfuscation. A program obfuscator is a compiler that to transforms a program into another version , which is functionally identical to but much harder to understand. Note that the concept is consistent with the definitions of code-oriented obfuscation by Collberg et al. [25] and model-oriented obfuscation by Barak et al. [12]. By this concept, we rule out some manual obfuscation approaches that can not be generalized or automated in compilers (e.g., [26]).

Moreover, we restrict our study to general-purpose obfuscation, which means the obfuscation has no preference on program functionalities. However, if some obfuscation approaches are not designed general programs but are valuable to obfuscate general programs (e.g., white-box encryption [27, 28], malware camouflages [29]), we would also discuss them. Indeed, general-purpose obfuscation is a common obfuscation scenario and covers most obfuscation investigations.

Finally, we do not emphasize the differences among programming languages, such as C, or Java. Such differences are not critical issues towards secure and usable obfuscation. If one obfuscation approach is studied several times for different programming languages, we think such investigations are similar and only discuss a representative one.

Iii-B A Taxonomy of Obfuscation Approaches

Iii-B1 Taxonomy Hierarchy

For all the obfuscation approaches within the scope, we draw a taxonomy hierarchy as shown in Figure 2. In the first level, we divide program obfuscation into code-oriented obfuscation and model-oriented obfuscation. Each category can be identified with a groundbreaking paper.

The groundbreaking paper of code-oriented obfuscation was published in 1997 when Collberg et al. [25] conducted a pilot study on the taxonomy of obfuscation transformation. They have discussed several transformation approaches and evaluation metrics. Since then, obfuscation has been receiving extensive attentions in both research and industrial fields. Followup investigations mainly propose new ideas on obfuscation transformations in layout level, control level, or data level. Besides, there are also preventive obfuscation (e.g., [30, 31, 32, 33]) and polymorphic obfuscation (e.g., [34, 35, 36]). We defer our discussion about the details to Section V.

For model-oriented obfuscation, the groundbreaking paper was published in 2001 when Barak et al. [12] initiated the first theoretical study on the capability of program obfuscation. They studied how much semantic information can be hidden at most via obfuscation. To this end, they proposed a virtual black-box property and showed that not all programs can be obfuscated with the property. Hence, what security properties can obfuscation guarantee and how to achieve the property are the two important problems of this area. Currently, graded encoding [8] is the only available mechanism that can be implemented for obfuscation. We defer our discussion about the details to Section VI.

Note that other investigations (e.g., [22]) may employ practical obfuscation and theoretical obfuscation as the category names. However, such a categorization approach may not be very discriminative because practical investigations on obfuscating codes may also include theoretical studies (e.g., [37, 38]), and vice versa. Therefore, our categorization approach with code-oriented obfuscation and model-oriented obfuscation should be more appropriate.

Iii-B2 The Differences between Code-Oriented Obfuscation and Model-Oriented Obfuscation

Code-Oriented Obfuscation Model-Oriented Obfuscation
Research Community Software security, software engineering, etc Theoretical computation, cryptography, etc
Obfuscation Problem Program to Obfuscate Codes Circuits or Turing Machines
Adversarial Purpose Task-dependent
Semantics that computes outputs
given inputs
Protection Purpose To increase program obscurity To hide mathematical computations
Security-related Metrics
Potency or increased obscurity,
resilience against automated attackers,
stealthy to human attackers
Leaked information measured as the
probability of guessing input, output
Security Basis
Hard program analysis problems
(e.g., pointer analysis)
Cryptography algorithms
(e.g., multilinear maps)
TABLE I: The differences between code obfuscation and model obfuscation

Code-oriented obfuscation demonstrates non-trivial gaps with model-oriented obfuscation as summarized in Table I. They are studied by different research communities. Code-oriented obfuscation interests software security experts or engineers, who deal with real software protection issues. Model-oriented obfuscation interests scientists who pursue theoretical study on circuits or Turing Machines. Besides, model-oriented obfuscation also interests cryptographers because the candidate obfuscation approaches are based on cryptographic primitives.

The problems of code-oriented obfuscation and model-oriented obfuscation are very differet. Firstly, real codes are more complex than general computation models. For example, it may include components that are not considered in circuits, such as lexical information and function calls. Such components may serve as essential information for adversaries to interprete the software and should be obfuscated. Besides, it may contain challenging issues for obfuscators to handle, such as concurrent operations and pointers. Secondly, the two categories demonstrate different attacker models. Code obfuscation assumes the adversarial purpose is to understand a released software program, and the attacking methods can be either automated deobfuscation tools or manual inspections. Such obfuscation approaches are generally evaluated with the increased obscurity or program complexity, resilience to automated attackers, stealthy to human attackers, and costs. On the other hand, model-oriented obfuscation only assumes automated attackers (e.g., probabilistic polynomial-time Turing Machine) and it assumes the adversarial purpose is to infer the functionality of a computation model (circuit or Turing Machine). Such investigations (e.g., [12]) generally made assumptions that a program can be represented as explicit pairs of input, output. In this way, one can evaluate the security with mathematical representations, i.e., the probability of guessing the input, output pairs from an obfuscated program. A negligible probability implies that the obfuscated program leaks little information or it is secure.

Finally, the two categories resort to different security basis when proposing secure obfuscation solutions. Code-oriented obfuscation is interested in hard program analysis problems, such as pointer analysis. Model-oriented obfuscation generally makes further assumptions that a model contains only basic mathematical operations. In this way, it can adopt cryptographic primitives based on some mathematical hardness assumptions, such as multilinear maps.

Note that the differences are summarized based on existing obfuscation investigations until this survey is published. Some gaps may be mitigated in the future.

Iii-C Secure and Usable Obfuscation

To facilitate the following study, we clarify the concept of secure and usable obfuscation. An obfuscation approach is secure if it performs well in two aspects: obfuscation effectiveness and resistance. Obfuscation effectiveness means the confidentiality of program semantics. A secure obfuscation approach should hide as much program semantics as possible, especially the essential semantics. Resistance means the hardness to recover the confidential semantics. A secure obfuscation approach should be resistant to attacks. The two aspects are consistent with current evaluation metrics in both code-oriented obfuscation and model-oriented obfuscation fields. For code-oriented obfuscation, obfuscation effectiveness is similar as potency, while resistance includes both resilience and stealth. For model-oriented obfuscation, effectiveness means the security property (e.g., indistinguishability) which defines the maximum information that an obfuscated program leaks; while resistance means the complexity in attacking an obfuscation algorithm.

An obfuscation is usable if it can be applied to obfuscate real codes, and the incurred performance overhead is acceptable for real application scenarios. Such overhead includes both program size and execution time. The performance of an obfuscation approach can be arguably acceptable if it incurs trivial overhead.

Iv Overview of Findings

In short, we have no secure and usable program obfuscation approaches to date. The primary reason is that we do not have adequate evaluation metrics concerning both security and usability.

Firstly, none of the existing code-oriented obfuscation investigations evaluate residual semantics in an obfuscated program. They generally adopt the evaluation metrics proposed by Collberg et al. [11], which judges the increased obscurity rather than the unprotected code semantics. None of them employs evaluation metrics can meet our security requirement, especially in obfuscation effectiveness. Therefore, designing appropriate metrics seems the priority for developing secure obfuscation approaches in the future.

Secondly, current model-oriented obfuscation approaches are too inefficient to be employed in practice. Such approaches can satisfy the performance requirement defined by Barak et al. [12], i.e., the obfuscated program incurs only polynomial overhead, but they still cost too much. Such a performance requirement with polynomial overhead is too weak for an approach to be practical. Besides, the security requirements (e.g., indistinguishability) for obfuscating circuits might be too rigid to develop obfuscation solutions, not to mention a practical one. This may justify why graded encoding is the only obfuscation approach to date that can meet a security requirement. Moreover, current model-oriented obfuscation mechanisms are only applicable to codes with simple mathematical operations. Neither do we know how to handle non-mathematical code syntax nor how to handle other programming concepts, such as control flows and data flows.

V Code-Oriented Obfuscation

In 1993, Cohen [34] published the first code obfuscation paper. Later in 1997, Collberg et al. [25] conducted a groundbreaking study on the taxonomy of obfuscation transformations. Then, many obfuscation approaches were proposed in the literature.

Based on the purpose of protection, we divide code-oriented obfuscation approaches into three categories: preventive obfuscation, transformation obfuscation, and polymorphic obfuscation. Preventive obfuscation aims to impede attackers from obtaining the real codes. Transformation obfuscation degrades the readability of the real codes. Polymorphic obfuscation aims to prevent attackers from locating targeted semantics or features in each obfuscated version. Transformation obfuscation serves as a major code obfuscation technique, and most of the obfuscation approaches fall into this category. We further divide them into layout transformation, control transformation and data transformation, each of which focuses on increasing the obscurity of a particular perspective.

V-a Preventive Obfuscation

Preventive obfuscation raises the bar for adversaries to obtain code snippets in readable formats. It is generally designed for non-scripting programming languages, such as C/C++ and Java. For such software, a disassembly phase is required to translate machine codes (e.g., binaries) into human readable formats. Preventive obfuscation, therefore, aims to obstruct the disassembly phase by introducing errors to general dissemblers.

Linn and Debray [30] conducted the first preventive obfuscation study on ELF (executable and linkage format) programs, or binaries. They propose several mechanisms to deter popular disassembling algorithms. To thwart linear sweep algorithms, the idea is to insert uncompleted instructions as junk codes after unconditional jumps. The mechanism takes effect if a disassembler cannot handle such uncompleted instructions. To thwart recursive algorithms, they further replace regular procedure calls with branch functions and jump tables. In this way, the return addresses are only determined during runtime, and they can hardly be known by static disassemblers. Similarly, Popov et al. [32] have proposed to convert unconditional jumps to traps which raise signals. Then they employ a signal handling mechanism to achieve the original semantics. Darwish et al. [33] have verified that such obfuscation approaches are effective against commercial disassembly tools, e.g., IDA pro [39].

The idea is also applicable to the decompilation process of Java bytecodes. Chan and Yang [31] proposed several lexical tricks to impede Java decompilation. The idea is to modify bytecodes directly by employing reserved keywords to name variables and functions. This is possible because the validation check of identifiers is only performed by the frontend. In this way, the modified program can still run correctly, but it would cause troubles for decompilation tools.

To measure the effectiveness of preventive obfuscation, Linn and Debray [30] proposed confusion factor, i.e., the ratio of incorrectly disassembled instructions (or blocks, or functions) to all instructions [30]. Popov et al. [32] also adopted the metrics. Besides, they proposed another factor that measures the ratio of correct edges on a control-flow graph. Such approaches are based on tricks. Although they may mislead existing disassembly or decompilation tools, they are vulnerable to advanced handmade attacks.

V-B Layout Obfuscation

Layout obfuscation scrambles a program layout while keeping the syntax intact. For example, it may change the orders of instructions or scramble the identifiers of variables and classes.

Lexical obfuscation is a widely employed layout obfuscation approach which transforms the meaningful identifiers to meaningless ones. For most programming languages, adopting meaningful and uniform naming rules (e.g., Hungarian Notation [40]) is required as a good programming practice. Although such names are specified in source codes, some would remain in the released software. For example, the names of global variables and functions in C/C++ are kept in binaries, and all names of Java are reserved in bytecodes. Because such meaningful names can facilitate adversarial program analysis, we should scramble them. To make the obfuscated identifiers more confusing, Chan et al. [31] proposed to deliberately employ the same names for objects of different types or within different domains. Such approaches have been adopted by ProGuard [41] as a default obfuscation scheme for Android programs.

Besides, several investigations study obfuscation via shuffling program items. For example, Low [42] proposed to seperate the related items of Java programs wherever possible, because a program is harder to read if the related information is not physically close. Wroblewski [43] proposed to reorder a sequence of instructions if it does not change the program semantics.

In general, layout obfuscation has promising resistance because some transformations are one-way which cannot be reversed. But the obfuscation effectiveness is only limited to layout level. Moreover, some layout information can hardly be changed, such as the method identifiers from Java SDK. Such residual information is essential for adversaries to recover the obfuscated information. For example, Bichsel et al. [7] tried to deobfuscated ProGuard-obfuscated apps, and they successfully recovered around 80% names.

V-C Control Obfuscation

Control obfuscation increases the obscurity of control flows. It can be achieved via introducing bogus control flows, employing dispatcher-based controls, and etc.

V-C1 Bogus Control Flows

Bogus control flows refer to the control flows that are deliberately added to a program but will never be executed. It can increase the complexity of a program, e.g., in McCabe complexity [44] or Harrison metrics [45]. For example, McCabe complexity [44] is calculated as the number of edges on a control-flow graph minus the number of nodes, and then plus two times of the connected components. To increase the McCabe complexity, we can either introduce new edges or add both new edges and nodes to a connected component.

To guarantee the unreachability of bogus control flows, Collberg et al. [25] proposed the idea of opaque predicates. They defined opaque predict as the predicate whose outcome is known during obfuscation time but is difficult to deduce by static program analysis. In general, an opaque predicate can be constantly true (), constantly false (), or context-dependent (). There are three methods to create opaque predicates: numerical schemes, programming schemes, and contextual schemes.

Numerical Schemes

Numerical schemes compose opaque predicates with mathematical expressions. For example, is constantly true for all integers and . We can directly employ such opaque predicates to introduce bogus control flows. Figure 3(a) demonstrates an example, in which the opaque predicate guarantees that the bogus control flow (i.e., the else branch) will not be executed. However, attackers would have higher chances to detect them if we employ the same opaque predicates frequently in an obfuscated program. Arboit [46], therefore, proposed to automatically generate a family of such opaque predicates, such that an obfuscator can choose a unique opaque predicates each time.

Another mathematical approach with higher security is to employ crypto functions, such as hash function  [47], and homomorphic encryption [48]. For example, we can substitute a predicate with to hide the solution of for this equation. Note that such an approach is generally employed by malware to evade dynamic program analysis. We may also employ crypto functions to encrypt equations which cannot be satisfied. However, such opaque predicates incur much overhead.

To compose opaque constants resistant to static analysis, Moser et al. [49] suggested employing 3-SAT problems, which are NP-hard. This is possible because one can have efficient algorithms to compose such hard problems [50]. For example, Tiella and Ceccato [51] demonstrated how to compose such opaque predicates with k-clique problems.

To compose opaque constants resistant to dynamic analysis, Wang et al. [52] propose to compose opaque predicates with a form of unsolved conjectures which loop for a number of times. Because loop is a challenging issue for dynamic analysis, the approach in nature should be resistant to dynamic analysis. Examples of such conjectures include Collatz conjecture, conjecture, Matthews conjecture. Figure 3(b) demonstrates how to employ Collatz conjecture to introduce bogus control flows. No matter how we initialize , the program terminates with , and originalCodes() can always be executed.

(a) Opaque constant.
(b) Collatz conjecture.
(c) Dynamic opaque predicate.
Fig. 3: Control obfuscation with opaque predicates.
(a) Souce code.
(b) Dismantling while.
(c) Using switch.
Fig. 4: Control-flow flattening approach proposed by Wang et al. [53].

Programming Schemes

Because adversarial program analysis is a major threat to opaque predicates, we can employ challenging program analysis problems to compose opaque predicates. Collberg et al. suggest two classic problems, pointer analysis and concurrent programs.

In general, pointer analysis refers to determining whether two pointers can or may point to the same address. Some pointer analysis problems can be NP-hard for static analysis or even undecidable [54]. Another advantage is that pointer operations are very efficient during execution. Therefore, one can compose resillient and efficient opaque predicts with well-designed pointer analysis problems, such as maintaining pointers to some objects with dynamic data structures [11].

Concurrent programs or parallel programs is another challenging issue. In general, a parallel region of statements has different ways of execution. The execution is not only determined by the program, but also by the runtime status of a host computer. Collberg et al. [11] proposed to employ concurrent programs to enhance the pointer-based approach by concurrently updating the pointers. Majumdar et al. [55] proposed to employ distributed parallel programs to compose opaque predicates.

Besides, some approaches compose opaque predicates with programming tricks, such as leveraging exception handling mechanisms. For example, Dolz and Parra [56] proposed to use the try-catch mechanism to compose opaque predicates for .Net and Java. The exception events include division by zero, null pointer, index out of range, or even particular hardware exceptions [57]. The original program semantics can be achieved via tailored exception handling schemes. However, such opaque predicates have no security basis, and they are vulnerable to advanced handmade attacks.

Contextual Schemes

Contextual schemes can be employed to compose variant opaque predicates(i.e., ). The predicates should hold some deterministic properties such that they can be employed to obfuscate programs. For example, Drape [15] proposed to compose such opaque predicates which are invariant under a contextual constraint, e.g., the opaque predicate is constantly true if . Palsberg et al. [58] proposed dynamic opaque predicates, which include a sequence of correlated predicates. The evaluation result of each predicate may vary in each run. However, as long as the predicates are correlated, the program behavior is deterministic. Figure 3(c) demonstrates an example of dynamic opaque predicates. No matter how we initialize and , the program is equivalent to .

The resistance of bogus control flows largely depends on the security of opaque predicates. An ideal security property for opaque predicates is that they require worst-case exponential time to break but only polynomial time to construct. Notethat some opaque predicates are designed with such security concerns but may be implemented with flaws. For example, the 3-SAT problems proposed by Ogiso et al. [38] are based on trivial problem settings which can be easily simplified. If such opaque predicates are implemented properly, they would be promising to be secure.

V-C2 Dispatcher-Based Controls

A dispatcher-based control determines the next blocks of codes to be executed during runtime. Such controls are essential for control obfuscation, because they can hide the original control flows against static program analysis.

One major dispatcher-based obfuscation approach is control flattening, which transforms codes of depth into shallow ones with more complexity. Wang et al. [53] firstly proposed the approach. Figure 4 demonstrates an example from their paper that transforms a while loop into another form with switch-case. To realize such transformation, the first step is to transform the code the into an equivalent reprensentation with if-then-goto statements as shown in Figure 4(b); then they modify the goto statements with switch-case statements as shown in Figure 4(c). In this way, the original program semantics is realized implicitly by controlling the data flow of the switch variable. Because the execution order of code blocks are determined by the variable dynamically, one cannot know the control flows without executing the program. Cappaert and Preneel [59] formalized control flattening as employing a dispatcher node (e.g., switch) that controls the next code block to be executed; after executing a block, control is transferred back to the dispatcher node. Besides, there are several enhancements for code flattening. For example, to enhance the resistance to static program analysis on the switch variable, Wang et al. [60] proposed to introduce pointer analysis problems. To further complicate the program, Chow et al. [61] proposed to add bogus code blocks.

László and Kiss [62] proposed a control flattening mechanism to handle specific C++ syntax, such as try-catch, while-do, continue. The mechanism is based on abstract syntax tree and employs a fixed pattern of layout. For each block of code to obfuscate, it constructs a while statement in the outer loop and a switch-case compound inside the loop. The switch-case compound implements the original program semantics, and the switch variable is also employed to terminate the outer loop. Cappaert and Preneel [59] found that the mechanisms might be vulnerable to local analysis, i.e., the switch variable is directly assigned such that adversaries can infer the next block to execute by only looking into a current block. They proposed a strengthened approach with several tricks, such as employing reference assignment (e.g., ) instead of direct assignment (e.g., ), replacing the assignment via if-else with a uniform assignment expression, and employing one-way functions in calculating the successor of a basic block.

Besides control flattening, there are several other dispather-based obfuscation investigations (e.g., [30, 63, 64, 20]). Linn and Debray [30] proposed to obfuscate binaries with branch functions that guide the execution based on the stack information. Similarly, Zhang et al. [64] proposed to employ branch functions to obfuscate object-oriented programs, which defines a unified method invocation style with an object pool. To enhance the security of such mechanisms, Ge et al. [63] proposed to hide the control information in another standalone process and employ inter-process communications. Schrittwieser and Katzenbeisser [20] proposed to employ diversified code blocks which implement the same semantics.

Dispatcher-based obfuscation is resistance against static analysis because it hides the control-flow graph of a software program. However, it is vulnerable to dynamic program analysis or hybrid approaches. For example, Udupa et al. [2] proposed a hybrid approach to reveal the hidden control flows with both static analysis and dynamic analysis.

V-C3 Misc

There are several other control obfuscation approaches that do not belong to the discussed categories. Examples of such approaches are instructional control hiding (e.g., [65, 66]) and API call hiding (e.g., [67, 68]). They generally have special obfuscation purposes or based on particular tricks.

Instructional Control Hiding

Instructional control hiding converts explicit control instructions to implicit ones. Balachandran and Emmanuel [65] found that control instructions (e.g., jmp) are important information for reverse analysis. They proposed to substitute such instructions with a combination of mov and other instructions which implements the same control semantics. In an extreme case, Domas [66] think all high-level instructions should be obfuscated. He proposed movobfuscation, which employs only one instruction (i.e., mov) to compile the program. The idea is feasible because mov is Turing complete [69].

API Call Hiding

Collberg et al. [25] proposed a problem that the function invocation codes in Java programs are well understood but hard to obfuscate. They suggest substituting common patterns of function invocation with less obvious ones, such as those discussed by Wills [70]. The problem is significant, but surprisingly it has not been studied a lot by other investigations except [67, 68]. Kovacheva [67] investigated the problem for Android apps. He proposed to obfuscate the native calls (e.g., to libc libraries) via a proxy, which is an obfuscated class that wraps the native functions. Bohannon and Holmes [68] investigated a similar problem for Windows powershell scripts. To obfuscate an invocation command to Windows objects, they proposed to create a nonsense string first and then leverage Windows string operators to transform the string to a valid command during runtime.

More Tricks

Collberg et al. [25] proposed several other obfuscation tricks, such as aggregating irrelevant method into one method, scattering a method into several methods. Such tricks are also discussed in other investigations (e.g., [42, 71]) and implemented in obfuscation tools (e.g., JHide [72]). Besides, Wang et al. [73] proposed translingual obfuscation, which introduces obscurity by translating the programs written in C into ProLog before compilation. Because ProLog adopts a different program paradigm and execution model from C, the generated binaries should become harder to understand. Majumdar et al. [74] proposed slicing obfuscation, which increases the resistance of obfuscated programs against slicing-based deobfuscation attacks, such as by enlarging the size of a slice with bogus codes.

Not that all existing control obfuscation approaches focus on syntactic-level transformation, while the semantic-level protection has rarely been discussed. Although they may demonstrate different strengths of resistance to attacks, their obfuscation effectiveness concerning semantic protection remains unclear.

V-D Data Obfuscation

Data obfuscation transforms data objects into obscure representations. We can transform the data of basic types via splitting, merging, procedurization, encoding, etc.

Data splitting distributes the information of one variable into several new variables. For example, a boolean variable can be split into two boolean variables, and performing logical operations on them can get the original value.

Data merging aggregates several variables into one variable. Collberg et al. [75] demonstrated an example that merges two 32-bit integers into one 64-bit integer. Ertaul and Venkatesh [76] proposed another method that packs several variables into one space with discrete logarithms.

Data procedurization substitutes static data with procedure calls. Collberg et al. [75] proposed to substitute strings with a function which can produce all strings by specifying paticular parameter values. Drape [77] proposed to encode numerical data with two inverse functions and . To assign a value to a variable , we assign it to an injected variable as . To use , we invoke instead.

Data encoding encodes data with mathematical functions or ciphers. Ertaul and Venkatesh [76] proposed to encode strings with Affine ciphers (e.g., Caser cipher) and employ discrete logarithms to pack words. Fukushima et al. [78] proposed to encode the clear numbers with exclusive or operations and then decrypt the computation result before output. Kovacheva [67] proposed to encrypt strings with the RC4 cipher and then decrypt them during runtime.

The data obfuscation ideas can also be extended to abstract data types, such as arrays and classes. Collberg et al. [75] discussed the obfuscation transformations for arrays, such as splitting one array into several subarrays, merging several arrays into one array, folding an array to increase its dimension, or flattening an array to reduce the dimension. Ertaul and Venkatesh [76] suggested transforming the array indices with composite functions. Zhu et al. [79, 80] proposed to employ homomorphic encryption to obfuscate array. Obfuscation classes is similar as obfuscation arrays. Collberg et al. [75] proposed to increase the depth of the class inheritance tree by splitting classes or inserting bogus classes. Sosonkin et al. [81] also discussed class obfuscation techniques via coalescing and splitting. Such approaches can increase the complexity of a class, e.g., measured in CK metrics [82].

V-E Polymorphic Obfuscation

Polymorphism is a technique widely employed by malware camouflage, which creates different copies of malware to evade anti-virus detection [83, 84]. It can also be employed to obfuscate programs. Note that previous obfuscation approaches focus on introducing obscurities to one program, while polymorphic obfuscation generats multiple obfuscated versions of a program simultaneously. Ideally, it would pose similar difficulties for adversaries to understand the components of each particular version. It is a technique orthogonal to the classic obfuscation and mainly designed to impede large-scale and reproductive attacks to homogeneous software [85].

Polymorphic obfuscation generally relies on some randomization mechanisms to introduce variance during obfuscation. Lin 

et al. [35] proposed to generate different data structure layout during each compilation. The data objects, such as structures, classes, and stack variables declared in functions, can be reordered randomly in each version. Xin et al. [83] further improved the data structure polymorphism approach by automatically discovering the data objects that can be randomized and eliminating the semantic errors generated during reordering. Crane et al. [86] proposed to randomize the tables of pointers such that the introduced diversity can be resistant to some code reuse attacks. Besides, Xu et al. [36] suggested introducing security features in the polymorphic code regions.

Vi Model-Oriented Obfuscation

Model-oriented obfuscation studies the theoretical obfuscation problems on computation models, such as circuits and Turing Machines. In 1996, Goldreich and Ostrovsky [87] firstly studied a theoretical software intellectual property protection mechanism based on Oblivious RAM. Later, Hada [88] firstly studied the theoretical obfuscation problem based on Turing Machines. In 2001, Barak et al. [12] proposed a well-recognized modeling approach for obfuscating circuits and Turing Machines, which lays a foundation of this field. There are two important research topics in this field: 1) what is the best security property that obfuscation can achieve? 2) how can we achieve the property?

Security Property Requirement Security Strength
Virtual Black-Box Property (VBBP)
Ideal Security
Indistinguishability Property (INDP)
INDP VBBP
Differing-Input Property (DIP)
If ,
then
VBBP DIP INDP
Best-Possible Property (BPP)
BPP Efficient INDP
Virtual Grey-Box Property (VGBP)
INDP VGBP VBBP
TABLE II: A comparison of the security properties for model-oriented obfuscation. Notations: is a polynomial-size simulator; is a polynomial-size adversary; is a polynomial-size learner; is a unbounded-size simulator; and are programs that compute a same function and have similar cost; and are programs that compute different functions; means querying the oracle access for times; is a negligible number.

Vi-a Security Properties

Barak et al. [12] defined an obfuscator as a “compiler” that inputs a program and outputs a new program . should posses the same functionality as , incur only polynomial slowdown, and hold some properties of unintelligibility or security. Note that following investigations generally adopt “polynomial slowdown” to discriminate whether an algorithm is efficient, but they may adopt different properties of security. Table II lists several major security properties discussed in the literature. Next we introduce each property and justify why the indistinguishability property is mostly interested by existing obfuscation algorithms.

Vi-A1 Virtual Black-Box Property (VBBP)

Ideally, an obfuscated program should leak no more information than accessing the program in a black-box manner. The property is firstly proposed by Barak et al. [12] as virtual black-box obfuscation. Let be a polynomial-size adversary (e.g., a probabilistic polynomial-time Turing Machine). VBBP requires that for any such adversaries, there exists a polynomial-size simulator , such that is negligible. The expression means any program semantics learned by the adversary can be simulated with a polynomial-size simulator.

Barak et al. have shown a negative result that at least one family of efficient programs cannot be obfuscated with VBBP. can be constructed with any one-way functions, whose semantics cannot be learned from oracle access. Therefore, given only oracle access to the function , no efficient algorithm can compute better than random guessing. However, given any efficient program , there exists an efficient algorithm that can compute the function. In this way, an efficient program that computes a property of the function can be constructed as , but it cannot be efficiently constructed from oracle access. Goldwasser et al. [89] and Bitansky et al. [90] further showed that some encryption programs cannot be obfuscated with VBBP when auxiliary inputs are available.

The negative result implies we cannot achieve genreal-purpose obfuscation with VBBP. However, it does not mean no program can be obfuscated with VBBP. Point function is such an exception [91].

Vi-A2 Indistinguishability Property (INDP) and Best-Possible Property (BPP)

Although VBBP is not universally attainable, we still need some attainable properties. As an alternative, Barak et al. [92] proposed a weaker notion: indistinguishability obfuscation. It requires that if two programs and are functionally equivalent, and they have similar program size and execution time, then is negligible. Because the notion does not assume a polynomial-size simulator, it avoids the inefficiency issue caused by unobfuscatable programs. Barak et al. have shown that INDP is attainable for universal programs, such as employing an obfuscator that converts all programs to their canonical forms or lookup tables. However, such an obfuscator might be trivial if it is inefficient.

Another important issue is how useful INDP is since it has no intuitive to hide information. INDP guarantees that the obfuscated program leaks no more information than any other obfuscated program versions of similar cost. Therefore, if we can design an obfuscator with INDP, it guarantees the obfuscation would have the best effectiveness. To rule out inefficient obfuscation, Goldwasser et al. [93] proposed best-possible obfuscation. BPP requires that for any polynomial-size learner , there exists a polynomial-size simulator , such that is negligible. By assuming a polynomial-size simulator, BPP excludes inefficient indistinguishability obfuscation. Note that BPP is also similar to VBBP but it is weaker than VBBP. The differece is that for BPP, the simulator can access another version of the program, while for VBBP the simulator works as a black-box. Lin et al. [94] proposed another similar notion exponentially-efficient indistinguishability property(XINDP), that requires the obfuscated program should be smaller than its truth table. Lin et al. showed that XINDP implies efficient INDP under the assumption of learning with errors (LWE) [95].

Vi-A3 More Properties

There are other alternatives discussed in the literature, such as virtual grey-box property (VGBP) [96], differing-input property (DIP) [92] and its variations. Their security levels are considered as between VBBP and INDP [18].

DIP is another notion proposed by Barak et al. [92]. For two programs and of the same cost, it requires if an adversary can distinguish their obfuscated versions (i.e., and ), she should be able to differ any versions of and with the same cost, i.e., to find an input such that . When and compute the same function, DIP implies INDP. DIP is also known as extractability obfuscation [97]. However, Boyle and Pass [98], and Garg et al. [99] showed that DIP is not attainable for all programs. To avoid the impossibility, Ishai et al. [100] proposed public-coin DIP. Note that DIP is a stronger notion than INDP, and we can have many useful applications with a DIP obfuscator [101].

VGBP [96] is similar to VBBP except that it empowers the simulator to unbounded size. To be nontrival, it restricts the simulator to have only limited times of oracle access. Bitansky et al. [102] shown that VGBP also implies INDP.

In brief, we cannot obfuscate universal programs with VBBP security, but we may obfuscate universal programs with INDP security. INDP is also the best-possible security property if the obfuscation is efficient. Therefore, if VBBP is attainable for some programs, efficient INDP would guarantee VBBP.

Vi-B Candidate Approach

In 2013, Garg et al. [8] proposed the first candidate algorithm to achieve INDP. Their approach is based on the idea of functional encryption [103]. Functional encryption allows users to compute a function from encrypted data with some keys. In the scenario of program obfuscation, suppose a program computes a function , then the functional encryption problem is to encrypt the program , such that can still compute with a public key , but should not reveal . Because cryptography algorithms generally have strong security basis, functional encryption becomes a dominant idea for program obfuscation with provable security.

Currently, the only known functional encryption approach for program obfuscation is graded encoding. It encodes programs with multilinear maps, such that any tampering attacks that do not respect its mathematical structure (encoded with private keys) would lead to meaningless program executions. To employ graded encoding, Garg et al. [104] proposed to convert programs to matrix branching programs (MBP) before encoding. However, such conversion incurs much overhead. Later, Zimmerman [9], and Applebaum and Brakerski [105] proposed to encode circuits directly without converting to MBPs. Next we discuss the two mechanisms.

Vi-B1 MBP-based Graded Encoding

An essential requirement to employ graded encoding is that the encoded programs can be evaluated. MBP is such a model that holds a good algebraic structure for evaluation even after being encrypted. There are two phases for MBP-based graded encoding: the first phase converts programs to MBPs; the second phase encrypts MBPs with graded encoding mechanisms. Garg et al. [104] showed that the MBP-based approach is feasible for shallow circuits and can be extended to all circuits with fully homomorphic encryption.

Converting to MBP

(a) A branching program.
(b) A matrix branching program.
(c)

Generate randomized matrix branching program.

Fig. 5: The procedures to convert a program (i.e., if of int8 equals to ) to a randomized matrix branching program.

A matrix branching program that computes a function is given by a tuple

selects a matrix or for each according to the corresponding bit of input;

is a row vector of size

; are matrix pairs of size that encode program semantics; is a column vector of size .

Given an input , the MBP computes an output as follows:

Suppose the -th matrix pair corresponds to the -th bit of the input. If the -th bit is 0, then is selected, or vice versa. The program output is the matrix multiplication result, which is a matrix, or a value.

How can we convert general programs to MBPs? The Barrington’s Theorem states that we can convert any boolean formula (boolean circuit of fan-in-two, depth ) to a branching program of width and length  [106]. Garg et al. [104] simply employ the result and assume the MBP is composed of with matrices. Ananth et al. [107] found the resulting MBP following Barrington’s Theorem is not very efficient. They propose a new approach that converts boolean formulas of size to matrix branching programs of width and length . Besides, there are several other efforts towards converting to more efficient MBPs, such as [108, 109]. The conversion generally includes two steps: from a program to a branching program and from to .

: A branching program is a finite state machine. For boolean formulas , the finite state machine has one start state, two stop states ( and ), and several intermediate states. Sauerhoff et al. [110] demonstrated a general approach to simulate any boolean formulas over AND and OR gates with branching programs. It can be extended to any formulas as they can be converted to the form with only AND and OR. Figure 5(a) demonstrates an example which converts a boolean program to a branching program. Suppose is an integer of eight bits, the boolean formula is . To model the branching program we need 10 states: states (-) that accept each bit of input, and stop states ( for , and for ).

: To compose a that is functionally equivalent to , we should compute each matrix of the . In general, can be an all-zero row vector except the first position is , and can be an all-zero column vector except the last position is . can be constructed from the adjacency matrices of each state. For example, if the first bit of input is , the station transfers from to

, then we start with an identity matrix and assign 1 to the element of the first row and the ninth column. Figure 

5(b) demonstrates the matrices corresponding to the fist input bit of Figure 5(a).

Following such converting approaches, the elements of resulting matrices are either 1 or 0. To protect the matrices, Kilian [111] proposed that we can randomize the elements of an MBP while not changing its functionality.

: To randomize the matrices, we first generate random integer matrices and their inverse of size . Then we multiply the original matrices with such random matrices as follows.

The randomization mechanism ensures that all randomization matrices would be canceled when evaluating . Note that to avoid errors incurred by floating-point numbers, we should guarantee all the elements of matrices are integers as shown in Figure 5(c). This is feasible because when the dominant of is , is also an integer matrix. Stating from an identity matrix, such can be obtained via iterative transformations leveraging the determinant invariant rule.

Graded Encoding

Garg et al. [8] noticed that although the randomized matrix branching program provides some security, it still suffers three kinds of attacks: partial evaluation, mixed input, and other attacks that do not respect the algebraic structure. Partial evaluation means we can evaluate whether partial programs generate the same result for different inputs. Mixed input means we can tamper the program intentionally by selecting and if and are related to the same bit of input. Graded encoding is designed to defeat such attacks. It is based on multilinear maps, which can be traced back to the historical multiparty key exchange problem proposed in 1976 [112, 113].

In general, a graded encoding scheme includes four components: setup that generates the public and private parameters for a system, encoding that defines how to encrypt a message with the private parameters, operations that declare the supported calculations with encrypted messages, and a zero-testing function that evaluates if the plain text of an encrypted message should be . Currently, there are two graded encoding schemes: GGH scheme [114] which encodes data over lattices, and CLT scheme [115] which encodes data over integers. Note that the graded encoding schemes for program obfuscation are slightly different from their original versions for multiparty key exchange. For simplicity, below we only discuss the graded encoding schemes for obfuscation.

The GGH scheme is named after Garg, Gentry, and Halevi [114], and it is the first plausible solution to compose multilinear maps. GGH scheme is based on ideal lattices. It encodes an element over a quotient ring as , where is the principal ideal generated by a short vector . The four components of GGH are defined as follows.

Setup: Suppose the miltilinear level is . The system generates an ideal-generator which is chosen as and should be short, a large enough modulus , denominators from the ring . Then we publish the zero-testing parameter as , where is a small ring element.

Encoding: The encoding of an element in set can be computed as : .

Operations: If two encodings are in the same set (e.g., and ), then one can add up them . If the two encodings are from disjoint sets, one can multiply the two encodings .

Zero-Testing Function: A zero testing function for a level- encoding is defined as

Note that . If is an encoding of , should be a short vector in and the product can be smaller than a threshold, otherwise, should be a short vector in some coset of and the product should be very large.

The CLT scheme is another multilinear map construction approach proposed by Coron, Lepoint, and Tibouchi [115, 116]. It is based on integers. The four components of the scheme are defined as follows.

Setup: The scheme generates secret large primes , small primes , random integers , random integers , a modulo , and a zero-testing parameter

Encoding: Suppose is a small random integer, the encoding of an element in set is .

Operation: If and are encodings in the same set, one can add them up. If they are from disjoint sets, they can be multiplied.

Zero-Testing Function: A zero testing function for a level- encoding is

In this function, is a value related to the bit-size of the encoding parameters [115].

Note that both the GGH scheme and CLT scheme are noisy multilinear maps, because the encoding of a value varies at different times. The only deterministic function is the zero-testing function. However, when a program becomes complex, the noise may overwhelm the signal. Take the CLT scheme as an example, the size of should be as large as possible to overwhelm the noise. This requirement largely restrict the usability of graded encoding.

Vi-B2 Circuit-based Graded Encoding

Converting circuits to MBPs incurs much overhead because the size of an MBP is generally exponential to the depth of a circuit. To avoid such overhead, Zimmerman [9] and Applebaum and Brakerski [105] proposed to obfuscate circuit programs directly. The approach focuses on keyed circuit families , and it can be extended to general circuits because all circuits can be transformed to keyed circuits [117].

Fig. 6: The evaluation process of an encoded keyed circuit (key: ) given an input is .

Circuit-based graded encoding assumes a circuit structure can be made public, and only the key needs to be protected. Figure 6 demonstrates an example that evaluates an obfuscated circuit given an input . To generate such an obfuscated circuit, we encode each input wire with a pair of encodings corresponding to the input values of and , i.e., ,. To support the addition of values from different multilinear sets, we publish the encoding of for each set, i.e., ,. Besides, we encode each key bit as and publish the encoding of , i.e., . Note that the approach introduces a checksum mechanism, i.e., , s.t. . Evaluating the obfuscated circuit just follows the original circuit structure. As a result, we can compute an evaluation value and a checksum .

Vii On Secure and Usable Obfuscation

In this section, we carefully justify the prospects of the existing investigations towards secure and usable obfuscation.

Vii-a With Code-Oriented Obfuscation

In general, existing code-oriented obfuscation approaches are usable but insecure. We may hope to achieve secure program obfuscation in the future if we have adequate evaluation metrics for security. Our primary supporting evidence is two folds. Firstly, current security metrics are inadequate. Secondly, code obfuscation techniques are promising to be resistant because deobfuscation also suffers limitations.

Vii-A1 Inadequate Security Metrics

To our best knowledge, all existing evaluation metrics emloyed by code-oriented obfuscation investigations are inadequate concerning the security discussed in Section III-C. Most investigations and tools (e.g., Obfuscator-LLVM [118]) adopt the metrics defined by Collberg et al. [11]. Besides, other evaluation metrics (e.g., [119, 120, 121, 122]) proposed in the literature are also inadequate. For example, Anckaert et al. [119] followed the idea of potency and developed more detailed measurements. Ceccato et al. [120, 121] proposed to conduct controlled code comprehension experiments against human attackers, such that we can measure the security with task completion rate and time. Such metrics are either heuristic or consider little about protecting essential program semantics.

Vii-A2 Limitations of Deobfuscation

The most recognized deobfuscation attacks (e.g., [123, 4]

) are based on program analysis and pattern recognition, both of which suffer limitations. Pattern recognition requires a predefined pattern repository, and it cannot automatically adapt to new obfuscation techniques. Program analysis suffers many challenges. For example, symbolic execution is one major program analysis approach to detect opaque predicates 

[124, 6, 125, 126], but it is vulnerable to many challenges, such as handling symbolic arrays and concurrent programs [127].

Moreover, The Rice’s Theorem [128] implies that automated attackers would suffer theoretical limitations because whether a deobfuscated program is equivalent to the obfuscated version is undecidable. Only when the tricks are known, some deobfuscation problems can be in NP [37]. Therefore, program obfuscation approaches are promising to have good resistance.

Vii-B With Model-Oriented Obfuscation

Current model-oriented obfuscation approaches can be considered as secure but unusable. We may hope to achieve some usable obfuscation applications if the performance metrics can be improved or the security requirement can be weakened. Our evidence is two folds. Firstly, there are several obfuscation implementations and applications which demonstrate the usability issue. Secondly, existing investigations are optimistic about the security of model-oriented obfuscation.

Vii-B1 Usability Issues

As the development of model-oriented obfuscation, several investigations begin focusing on application issues, such as [129, 10, 130, 131]. Apon et al. [129] implemented a full obfuscation solution based on the CLT graded encoding. It can obfuscate programs written in SAGE [132], a Python library for algebraic operations. Due to the performance issue, they only demonstrated single-bit identity gate, AND gate, XOR gate, and point functions. Even for an 8-bit point function, it takes hours to obfuscate the program and several minutes to evaluate the program. Besides, the size of the resulting program is several gigabytes. Lewi et al. [10] implemented another obfuscator that can run on top of either libCLT [115] or GGHLite [133, 134], which are open-source libraries of multilinear maps. The input program should be written in Cryptol [135], which is a programming language for design cryptography algorithms. They also evaluated the performance when obfuscating point functions. With a better hardware configuration, it can obfuscate 40-bit point functions in minutes and evaluate the program in seconds. However, the obfuscated program sizes are hundreds of megabytes or even several gigabytes. Their results show that CLT has better performance over GGH for small-size point functions, but the advantage declines when the program size grows. Halevi et al. [131] implemented a simplified version of the graph-induced multi-linear maps [136] which should outperform the CLT scheme when the number of branching program states grows. However, their evaluation results have not shown fundamental changes of the performance.

To summerize, we can find two usability issues in such investigations. Firstly, the costs are unacceptable even when obfuscating straightforward mathematical expressions. The other issue is that current obfuscation implementations only focus on elementary mathematical expressions, such as XOR, point functions, and conjecture normal forms [137, 138]. We do not know how to handle other advanced mathematical operations, not to mention complex code syntax.

Vii-B2 Security of Graded Encoding

Graded encoding is very powerful, Sahai and Waters [139] showed that INDP obfuscation can serve as a center for many cryptographic applications. Moreover, several investigations (e.g., [140, 138, 141, 142, 143, 144]) showed that an INDP obfuscator can be more powerful than merely providing INDP under idealized models.

Besides, current graded encoding schemes can be considered as secure but still need to be carefully explored. Both the GGH and CLT schemes are based on a new Graded Decisional Diffie-Hellman (GGDH) hardness assumption for multilinear maps. The community generally agrees that the security of GGDH should be further explored, because it cannot be reduced to other well-established hardness assumptions, such as and NTRU for encryption over lattices [145]. Indeed, there are several investigations on cryptanalysis (e.g., [146, 147, 148, 149]) or proposing newly patched schemes (e.g., [142, 150, 151, 152]). However, no severe security flaw has been founded so far that would obsolesce the approach.

Vii-C With new Evaluation Properties

There are two investigations (i.e., [21, 22]) which coincide with our results. Kuzurin et al. [22] observed that there are considerable gaps between practical and theoretical obfuscation. On one hand, the security properties in theoretical or model-oriented obfuscation are too strong; on the other hand, we have no formal security evaluation approach for practical or code-oriented obfuscation. They proposed to design specific security properties for particular application scenarios, such as constant hiding and predict obfuscation. Preda and Giacobazziet al. [21] found that existing obfuscation evaluation metrics are textual or syntactic, which ignore the semantics. They propose to employ a semantic-based approach to evaluate the potency of obfuscation. To this end, they employ abstract interpretations to model the syntactic transformation of obfuscation with semantic-based approach [153]. A semantic-based obfuscation transformation is defined as . The obfuscation is potent if there is a property such that . Such properties can be as simple as the sign () of a variable or a complex watermarking [154]. Moreover, the authors have conducted several preliminary investigations (e.g., [155, 23, 24, 156]) on employing the ideas to obfuscate simple programs.

Note that all the investigations are still very preliminary. There is still a large room for improvement in secure and usable obfuscation.

Viii Conclusions

To conclude, this work explores secure and usable program obfuscation in the literature. We have surveyed both existing code-oriented obfuscation approaches and model-oriented obfuscation approaches, which exhibit gaps and connections in between. Our primary result is that we do not have secure and usable program obfuscation approaches, and the main reason is we lack appropriate evaluation metrics considering both security and usability. Firstly, we have no adequate security metrics to evaluate code-oriented obfuscation approaches. Secondly, the performance requirement for model-oriented obfuscation approaches is too weak, and the security requirements might be too strong. Moreover, we do not know how to apply model-oriented approaches to obfuscating general codes. Our survey and result would urge the communities to rethink the notion of security and usable program obfuscation and facilitate the development of such obfuscation approaches in the future.

References

  • [1] J. Cappaert, “Code obfuscation techniques for software protection,” Ph.D. dissertation, 2012.
  • [2] S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation: reverse engineering obfuscated code,” in Proc. of the 12th IEEE Working Conference on Reverse Engineering, 2005.
  • [3] S. Chandrasekharan and S. Debray, “Deobfuscation: improving reverse engineering of obfuscated code,” 2005.
  • [4] Y. Guillot and A. Gazet, “Automatic binary deobfuscation,” Journal in Computer Virology, 2010.
  • [5] K. Coogan, G. Lu, and S. Debray, “Deobfuscation of virtualization-obfuscated software: a semantics-based approach,” in Proc. of the 18th ACM Conference on Computer and Communications Security, 2011.
  • [6] B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic approach to automatic deobfuscation of executable code,” in Proc. of the 2015 IEEE Symposium on Security and Privacy, 2015.
  • [7] B. Bichsel, V. Raychev, P. Tsankov, and M. Vechev, “Statistical deobfuscation of android applications,” in Proc. of the ACM SIGSAC Conference on Computer and Communications Security, 2016.
  • [8] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters, “Candidate indistinguishability obfuscation and functional encryption for all circuits,” in Proc. of the 54th IEEE Annual Symposium on Foundations of Computer Science, 2013.
  • [9] J. Zimmerman, “How to obfuscate programs directly,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2015.
  • [10] K. Lewi, A. J. Malozemoff, D. Apon, B. Carmer, A. Foltzer, D. Wagner, D. W. Archer, D. Boneh, J. Katz, and M. Raykova, “5gen: A framework for prototyping applications using multilinear maps and matrix branching programs,” in Proc. of the 23rd ACM SIGSAC Conference on Computer and Communications Security, 2016.
  • [11] C. Collberg, C. Thomborson, and D. Low, “Manufacturing cheap, resilient, and stealthy opaque constructs,” in Proc. of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1998.
  • [12] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan, and K. Yang, “On the (im) possibility of obfuscating programs,” in Advances in Cryptology—CRYPTO.   Springer, 2001.
  • [13] A. Balakrishnan and C. Schulze, “Code obfuscation literature survey,” CS701 Construction of Compilers, 2005.
  • [14] A. Majumdar, C. Thomborson, and S. Drape, “A survey of control-flow obfuscations,” in Information Systems Security.   Springer, 2006.
  • [15] S. Drape et al., “Intellectual property protection using obfuscation,” Proc. of SAS, 2009.
  • [16] K. A. Roundy and B. P. Miller, “Binary-code obfuscations in prevalent packer tools,” ACM Computing Surveys, 2013.
  • [17] S. Schrittwieser, S. Katzenbeisser, J. Kinder, G. Merzdovnik, and E. Weippl, “Protecting software throughobfuscation: can it keep pace with progress in code analysis?” ACM Computing Surveys, 2016.
  • [18] M. Horváth and L. Buttyán, “The birth of cryptographic obfuscation-a survey,” 2016.
  • [19] B. Barak, “Hopes, fears, and software obfuscation,” Communications of the ACM, 2016.
  • [20] S. Schrittwieser and S. Katzenbeisser, “Code obfuscation against static and dynamic reverse engineering,” in Information Hiding.   Springer, 2011.
  • [21] M. Dalla Preda and R. Giacobazzi, “Semantic-based code obfuscation by abstract interpretation,” Automata, Languages and Programming, 2005.
  • [22] N. Kuzurin, A. Shokurov, N. Varnovsky, and V. Zakharov, “On the concept of software obfuscation in computer security,” in Information Security.   Springer, 2007.
  • [23] M. Dalla Preda, M. Madou, K. De Bosschere, and R. Giacobazzi, “Opaque predicates detection by abstract interpretation,” in Algebraic Methodology and Software Technology.   Springer, 2006.
  • [24] M. Dalla Preda, “Code obfuscation and malware detection by abstract interpretation,” Ph.D. dissertation, 2007.
  • [25] C. Collberg, C. Thomborson, and D. Low, “A taxonomy of obfuscating transformations,” The University of Auckland, Tech. Rep., 1997.
  • [26] M. Madou, L. Van Put, and K. De Bosschere, “Loco: An interactive code (de) obfuscation tool,” in Proc. of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, 2006.
  • [27] S. Chow, P. Eisen, H. Johnson, and P. C. Van Oorschot, “White-box cryptography and an AES implementation,” in International Workshop on Selected Areas in Cryptography.   Springer, 2002.
  • [28] ——, “A white-box DES implementation for DRM applications,” in ACM Workshop on Digital Rights Management.   Springer, 2002.
  • [29] J.-M. Borello and L. Mé, “Code obfuscation techniques for metamorphic viruses,” Journal in Computer Virology, 2008.
  • [30] C. Linn and S. Debray, “Obfuscation of executable code to improve resistance to static disassembly,” in Proc. of the 10th ACM Conference on Computer and Communications Security, 2003.
  • [31] J.-T. Chan and W. Yang, “Advanced obfuscation techniques for java bytecode,” Journal of Systems and Software, 2004.
  • [32] I. V. Popov, S. K. Debray, and G. R. Andrews, “Binary obfuscation using signals,” in Usenix Security, 2007.
  • [33] S. M. Darwish, S. K. Guirguis, and M. S. Zalat, “Stealthy code obfuscation technique for software security,” in Proc. of the International Conference on Computer Engineering and Systems.   IEEE, 2010.
  • [34] F. B. Cohen, “Operating system protection through program evolution,” Computers & Security, 1993.
  • [35] Z. Lin, R. D. Riley, and D. Xu, “Polymorphing software by randomizing data structure layout,” in Detection of Intrusions and Malware, and Vulnerability Assessment.   Springer, 2009.
  • [36] H. Xu, Y. Zhou, and M. R. Lyu, “N-version obfuscation,” in Proc. of the 2nd ACM International Workshop on Cyber-Physical System Security, 2016.
  • [37] A. Appel, “Deobfuscation is in NP,” Princeton University, 2002.
  • [38] T. Ogiso, Y. Sakabe, M. Soshi, and A. Miyaji, “Software obfuscation on a theoretical basis and its implementation,” IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, 2003.
  • [39] IDA, https://www.hex-rays.com/products/ida/, 2017.
  • [40] C. Simonyi, “Hungarian notation,” MSDN Library, 1999.
  • [41] ProGuard, http://developer.android.com/tools/help/proguard.html, 2016.
  • [42] D. Low, “Protecting java code via code obfuscation,” Crossroads, 1998.
  • [43] G. Wroblewski, “General method of program code obfuscation,” Ph.D. dissertation, Wroclaw University of Technology, 2002.
  • [44] T. J. McCabe, “A complexity measure,” IEEE Trans. on Software Engineering, 1976.
  • [45] W. A. Harrison and K. I. Magel, “A complexity measure based on nesting level,” ACM Sigplan Notices, 1981.
  • [46] G. Arboit, “A method for watermarking java programs via opaque predicates,” in The Fifth International Conference on Electronic Commerce Research, 2002.
  • [47] M. I. Sharif, A. Lanzi, J. T. Giffin, and W. Lee, “Impeding malware analysis using conditional code obfuscation.” in NDSS, 2008.
  • [48] W. Zhu and C. Thomborson, “A provable scheme for homomorphic obfuscation in software security,” in The IASTED International Conference on Communication, Network and Information Security, 2005.
  • [49] A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for malware detection,” in Proc. of the 23rd Annual Computer Security Applications Conference.   IEEE, 2007.
  • [50] B. Selman, D. G. Mitchell, and H. J. Levesque, “Generating hard satisfiability problems,” Artificial Intelligence, 1996.
  • [51] R. Tiella and M. Ceccato, “Automatic generation of opaque constants based on the k-clique problem for resilient data obfuscation,” in Proc. of the 24th International Conference on Software Analysis, Evolution and Reengineering.   IEEE, 2017.
  • [52] Z. Wang, J. Ming, C. Jia, and D. Gao, “Linear obfuscation to combat symbolic execution,” in ESORICS.   Springer, 2011.
  • [53] C. Wang, J. Hill, J. Knight, and J. Davidson, “Software tamper resistance: Obstructing static analysis of programs,” University of Virginia, Tech. Rep., 2000.
  • [54] W. Landi and B. G. Ryder, “Pointer-induced aliasing: a problem classification,” in Proc. of the 18th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1991.
  • [55] A. Majumdar and C. Thomborson, “Manufacturing opaque predicates in distributed systems for code obfuscation,” in Proc. of the 29th Australasian Computer Science Conference.   Australian Computer Society, 2006.
  • [56] D. Dolz and G. Parra, “Using exception handling to build opaque predicates in intermediate code obfuscation techniques,” Journal of Computer Science & Technology, 2008.
  • [57] H. Chen, L. Yuan, X. Wu, B. Zang, B. Huang, and P.-c. Yew, “Control flow obfuscation with information flow tracking,” in Proc. of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009.
  • [58] J. Palsberg, S. Krishnaswamy, M. Kwon, D. Ma, Q. Shao, and Y. Zhang, “Experience with software watermarking,” in Proc. of the 16th IEEE Annual Computer Security Applications Conference, 2000.
  • [59] J. Cappaert and B. Preneel, “A general model for hiding control flow,” in Proc. of the 10th Annual ACM Workshop on Digital Rights Management, 2010.
  • [60] C. Wang, J. Davidson, J. Hill, and J. Knight, “Protection of software-based survivability mechanisms,” in Proc. of the IEEE International Conference on Dependable Systems and Networks, 2001.
  • [61] S. Chow, Y. Gu, H. Johnson, and V. A. Zakharov, “An approach to the obfuscation of control-flow of sequential computer programs,” in Information Security.   Springer, 2001.
  • [62] T. László and A. Kiss, “Obfuscating c++ programs via control flow flattening,” Annales Universitatis Scientarum Budapestinensis de Rolando Eötvös Nominatae, Sectio Computatorica, 2009.
  • [63] J. Ge, S. Chaudhuri, and A. Tyagi, “Control flow based obfuscation,” in Proc. of the 5th ACM Workshop on Digital Rights Management, 2005.
  • [64] X. Zhang, F. He, and W. Zuo, Theory and practice of program obfuscation.   INTECH Open Access Publisher, 2010.
  • [65] V. Balachandran and S. Emmanuel, “Software code obfuscation by hiding control flow information in stack,” in Proc. of the IEEE International Workshop on Information Forensics and Security, 2011.
  • [66] C. Domas, “The movfuscator: Turning ’move’ into a soul-crushing RE nightmare,” REcon, 2015.
  • [67] A. Kovacheva, “Efficient code obfuscation for android,” in International Conference on Advances in Information Technology.   Springer, 2013.
  • [68] D. Bohannon and L. Holmes, “Revoke-obfuscation: powershell obfuscation detection using science,” BlackHat, 2017.
  • [69] S. Dolan, “mov is turing-complete,” 2013.
  • [70] L. M. Wills, “Automated program recognition: a feasibility demonstration,” Artificial Intelligence, 1990.
  • [71] M. Madou, B. Anckaert, B. De Bus, K. De Bosschere, J. Cappaert, and B. Preneel, “On the effectiveness of source code transformations for binary obfuscation,” in Proc. of the International Conference on Software Engineering Research and Practice.   CSREA Press, 2006.
  • [72] L. Ertaul and S. Venkatesh, “Jhide-a tool kit for code obfuscation.” in IASTED Conf. on Software Engineering and Applications, 2004.
  • [73] P. Wang, S. Wang, J. Ming, Y. Jiang, and D. Wu, “Translingual obfuscation,” 2016.
  • [74] A. Majumdar, S. J. Drape, and C. D. Thomborson, “Slicing obfuscations: design, correctness, and evaluation,” in Proc. of the 2007 ACM workshop on Digital Rights Management, 2007.
  • [75] C. Collberg, C. Thomborson, and D. Low, “Breaking abstractions and unstructuring data structures,” in Proc. of the IEEE International Conference on Computer Languages, 1998.
  • [76] L. Ertaul and S. Venkatesh, “Novel obfuscation algorithms for software security,” in Proc. of the 2005 International Conference on Software Engineering Research and Practice.   Citeseer, 2005.
  • [77] S. Drape et al., Obfuscation of abstract data types.   Citeseer, 2004.
  • [78] K. Fukushima, S. Kiyomoto, T. Tanaka, and K. Sakurai, “Analysis of program obfuscation schemes with variable encoding technique,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2008.
  • [79] W. Zhu, C. Thomborson, and F.-Y. Wang, “Applications of homomorphic functions to software obfuscation,” in Intelligence and Security Informatics.   Springer, 2006.
  • [80] W. F. Zhu, “Concepts and techniques in software watermarking and obfuscation,” Ph.D. dissertation, ResearchSpace, Auckland, 2007.
  • [81] M. Sosonkin, G. Naumovich, and N. Memon, “Obfuscation of design intent in object-oriented applications,” in Proc. of the Third ACM workshop on Digital Rights Management, 2003.
  • [82] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Trans. on Software Engineering, 1994.
  • [83] Z. Xin, H. Chen, H. Han, B. Mao, and L. Xie, “Misleading malware similarities analysis by automatic data structure obfuscation,” in Information Security.   Springer, 2010.
  • [84] I. You and K. Yim, “Malware obfuscation techniques: a brief survey,” in Proc. of International Conference on Broadband, Wireless Computing, Communication and Applications, 2010.
  • [85] S. Forrest, A. Somayaji, and D. H. Ackley, “Building diverse computer systems,” in Proc. of the sixth IEEE Workshop on Hot Topics in Operating Systems, 1997.
  • [86] S. J. Crane, S. Volckaert, F. Schuster, C. Liebchen, P. Larsen, L. Davi, A.-R. Sadeghi, T. Holz, B. De Sutter, and M. Franz, “It’s a TRaP: table randomization and protection against function-reuse attacks,” in Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015.
  • [87] O. Goldreich and R. Ostrovsky, “Software protection and simulation on oblivious RAMs,” Journal of the ACM, 1996.
  • [88] S. Hada, “Zero-knowledge and code obfuscation,” in ASIACRYPT.   Springer, 2000.
  • [89] S. Goldwasser and Y. T. Kalai, “On the impossibility of obfuscation with auxiliary input,” in Proc. of the 46th Annual IEEE Symposium on Foundations of Computer Science, 2005.
  • [90] N. Bitansky, R. Canetti, H. Cohn, S. Goldwasser, Y. T. Kalai, O. Paneth, and A. Rosen, “The impossibility of obfuscation with auxiliary input or a universal simulator,” in International Cryptology Conference.   Springer, 2014.
  • [91] R. Canetti, “Towards realizing random oracles: hash functions that hide all partial information,” in Advances in Cryptology.   Springer, 1997.
  • [92] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan, and K. Yang, “On the (im) possibility of obfuscating programs,” in Journal of the ACM, 2012.
  • [93] S. Goldwasser and G. N. Rothblum, “On best-possible obfuscation,” in Theory of Cryptography.   Springer, 2007.
  • [94] H. Lin, R. Pass, K. Seth, and S. Telang, “Indistinguishability obfuscation with non-trivial efficiency,” in IACR International Workshop on Public Key Cryptography.   Springer, 2016.
  • [95] O. Regev, “On lattices, learning with errors, random linear codes, and cryptography,” in STOC, 2005.
  • [96] N. Bitansky and R. Canetti, “On strong simulation and composable point obfuscation,” in Annual Cryptology Conference.   Springer, 2010.
  • [97] E. Boyle, K.-M. Chung, and R. Pass, “On extractability obfuscation,” in Theory of Cryptography Conference.   Springer, 2014.
  • [98] E. Boyle and R. Pass, “Limits of extractability assumptions with distributional auxiliary input,” in International Conference on the Theory and Application of Cryptology and Information Security.   Springer, 2014.
  • [99] S. Garg, C. Gentry, S. Halevi, and D. Wichs, “On the implausibility of differing-inputs obfuscation and extractable witness encryption with auxiliary input,” in International Cryptology Conference.   Springer, 2014.
  • [100] Y. Ishai, O. Pandey, and A. Sahai, “Public-coin differing-inputs obfuscation and its applications,” in Theory of Cryptography Conference.   Springer, 2015.
  • [101] P. Ananth, D. Boneh, S. Garg, A. Sahai, and M. Zhandry, “Differing-inputs obfuscation and applications.” IACR Cryptology ePrint Archive, 2013.
  • [102] N. Bitansky, R. Canetti, Y. T. Kalai, and O. Paneth, “On virtual grey box obfuscation for general circuits,” in International Cryptology Conference.   Springer, 2014.
  • [103] D. Boneh, A. Sahai, and B. Waters, “Functional encryption: Definitions and challenges,” in Theory of Cryptography Conference.   Springer, 2011.
  • [104] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters, “Candidate indistinguishability obfuscation and functional encryption for all circuits (full version),” in Cryptology ePrint Archive, 2013.
  • [105] B. Applebaum and Z. Brakerski, “Obfuscating circuits via composite-order graded encoding,” TCC, 2015.
  • [106] D. A. Barrington, “Bounded-width polynomial-size branching programs recognize exactly those languages in NC1,” in

    Proc. of the 18th Annual ACM Symposium on Theory of Computing

    , 1986.
  • [107] P. Ananth, D. Gupta, Y. Ishai, and A. Sahai, “Optimizing obfuscation: avoiding Barrington’s theorem,” in Proc. of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014.
  • [108] A. Sahai and M. Zhandry, “Obfuscating low-rank matrix branching programs.” IACR Cryptology ePrint Archive, 2014.
  • [109] D. Boneh, K. Lewi, M. Raykova, A. Sahai, M. Zhandry, and J. Zimmerman, “Semantically secure order-revealing encryption: multi-input functional encryption without obfuscation.” EUROCRYPT, 2015.
  • [110] M. Sauerhoff, I. Wegener, and R. Werchner, “Formula size over the full binary basis?” in Proc. of the 16th Annual Symposium on Theoretical Aspects of Computer Science.   Springer, 1999.
  • [111] J. Kilian, “Founding crytpography on oblivious transfer,” in Proc. of the 20th Annual ACM Symposium on Theory of Computing, 1988.
  • [112] W. Diffie and M. E. Hellman, “Multiuser cryptographic techniques,” in Proc. of the National Computer Conference and Exposition.   ACM, 1976.
  • [113] D. Boneh and A. Silverberg, “Applications of multilinear forms to cryptography,” Contemporary Mathematics, 2003.
  • [114] S. Garg, C. Gentry, and S. Halevi, “Candidate multilinear maps from ideal lattices,” in Proc. of the Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2013.
  • [115] J.-S. Coron, T. Lepoint, and M. Tibouchi, “Practical multilinear maps over the integers,” in Advances in Cryptology.   Springer, 2013.
  • [116] ——, “New multilinear maps over the integers,” in Annual Cryptology Conference.   Springer, 2015.
  • [117] J. Zimmerman, “How to obfuscate programs directly,” IACR Cryptology ePrint Archive, 2014.
  • [118] P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator-LLVM: software protection for the masses,” 2015.
  • [119] B. Anckaert, M. Madou, B. De Sutter, B. De Bus, K. De Bosschere, and B. Preneel, “Program obfuscation: a quantitative approach,” in Proc. of the ACM Workshop on Quality of Protection, 2007.
  • [120] M. Ceccato, M. Di Penta, J. Nagra, P. Falcarin, F. Ricca, M. Torchiano, and P. Tonella, “Towards experimental evaluation of code obfuscation techniques,” in Proc. of the 4th ACM Workshop on Quality of Protection, 2008.
  • [121] ——, “The effectiveness of source code obfuscation: An experimental assessment,” in Proc. of the 17th IEEE International Conference on Program Comprehension, 2009.
  • [122] M. Ceccato, A. Capiluppi, P. Falcarin, and C. Boldyreff, “A large study on the effect of code obfuscation on the quality of java code,” Empirical Software Engineering, 2015.
  • [123] J. Raber and E. Laspe, “Deobfuscator: an automated approach to the identification and removal of code obfuscation,” in Proc. of the 14th Working Conference on Reverse Engineering.   IEEE, 2007.
  • [124] B. Yadegari and S. Debray, “Symbolic execution of obfuscated code,” in Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015.
  • [125] J. Ming, D. Xu, L. Wang, and D. Wu, “LOOP: logic-rriented opaque predicate detection in obfuscated binary code,” in Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015.
  • [126] S. Banescu, C. Collberg, and A. Pretschner, “Predicting the resilience of obfuscated code against symbolic execution attacks via machine learning,” in USENIX Security Symposium, 2017.
  • [127] X. Hui, Z. Yangfan, K. Yu, and R. L. Michael, “Concolic execution on small-size binaries: challenges and empirical study,” in Proc. of the 47th IEEE/IFIP International Conference on Dependable Systems & Networks, 2017.
  • [128] J. E. Hopcroft, R. Motwani, and J. D. Ullman, “Automata theory, languages, and computation,” International Edition, 2006.
  • [129] D. Apon, Y. Huang, J. Katz, and A. J. Malozemoff, “Implementing cryptographic program obfuscation.” IACR Cryptology ePrint Archive, 2014.
  • [130] M. R. Brent Carmer, Alex J. Malozemoff, “5Gen-C: multi-input functional encryption and program obfuscation for arithmetic circuits,” in Proc. of the 24th ACM SIGSAC Conference on Computer and Communications Security, 2017.
  • [131] S. Halevi, T. Halevi, V. Shoup, and N. Stephens-Davidowitz, “Implementing BP-obfuscation using graph-induced encoding.” in Proc. of the 24th ACM SIGSAC Conference on Computer and Communications Security, 2017.
  • [132] W. Stein and D. Joyner, “Sage: System for algebra and geometry experimentation,” ACM SIGSAM Bulletin, 2005.
  • [133] A. Langlois, D. Stehlé, and R. Steinfeld, “GGHLite: more efficient multilinear maps from ideal lattices,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2014.
  • [134] M. R. Albrecht, C. Cocis, F. Laguillaumie, and A. Langlois, “Implementing candidate graded encoding schemes from ideal lattices,” in Proc. of the International Conference on the Theory and Application of Cryptology and Information Security.   Springer, 2014.
  • [135] J. R. Lewis and B. Martin, “Cryptol: high assurance, retargetable crypto development and validation,” in Proc. of the 2003 Military Communications Conference.   IEEE.
  • [136] C. Gentry, S. Gorbunov, and S. Halevi, “Graph-induced multilinear maps from lattices,” in Theory of Cryptography Conference.   Springer, 2015.
  • [137] Z. Brakerski and G. N. Rothblum, “Obfuscating conjunctions,” in Advances in Cryptology.   Springer, 2013.
  • [138] ——, “Black-box obfuscation for d-CNFs,” in Proc. of the 5th conference on Innovations in Theoretical Computer Science.   ACM, 2014.
  • [139] A. Sahai and B. Waters, “How to use indistinguishability obfuscation: deniable encryption, and more,” in Proc. of the 46th Annual ACM Symposium on Theory of Computing, 2014.
  • [140] R. Canetti and V. Vaikuntanathan, “Obfuscating branching programs using black-box pseudo-free groups.” IACR Cryptology ePrint Archive, 2013.
  • [141] Z. Brakerski and G. N. Rothblum, “Virtual black-box obfuscation for all circuits via generic graded encoding,” in Theory of Cryptography.   Springer, 2014.
  • [142] B. Barak, S. Garg, Y. T. Kalai, O. Paneth, and A. Sahai, “Protecting obfuscation against algebraic attacks,” in Proc. of the Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2014.
  • [143] N. Bitansky and V. Vaikuntanathan, “Indistinguishability obfuscation: from approximate to exact,” in Theory of Cryptography Conference.   Springer, 2016.
  • [144] M. Mahmoody, A. Mohammed, and S. Nematihaji, “More on impossibility of virtual black-box obfuscation in idealized models.” IACR Cryptology ePrint Archive, 2015.
  • [145] J. Hoffstein, J. Pipher, and J. H. Silverman, “Ntru: A ring-based public key cryptosystem,” in International Algorithmic Number Theory Symposium.   Springer, 1998.
  • [146] Y. Hu and H. Jia, “Cryptanalysis of GGH map,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2016.
  • [147] B. Minaud and P.-A. Fouque, “Cryptanalysis of the new multilinear map over the integers,” IACR Cryptology ePrint Archive, 2015.
  • [148] J. H. Cheon, P.-A. Fouque, C. Lee, B. Minaud, and H. Ryu, “Cryptanalysis of the new CLT multilinear map over the integers,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2016.
  • [149] J.-S. Coron, M. S. Lee, T. Lepoint, and M. Tibouchi, “Cryptanalysis of GGH15 multilinear maps,” in Annual Cryptology Conference.   Springer, 2016.
  • [150] D. Boneh, D. J. Wu, and J. Zimmerman, “Immunizing multilinear maps against zeroizing attacks.” IACR Cryptology ePrint Archive, 2014.
  • [151] S. Garg, P. Mukherjee, and A. Srinivasan, “Obfuscation without the vulnerabilities of multilinear maps.” IACR Cryptology ePrint Archive, 2016.
  • [152] S. Badrinarayanan, E. Miles, A. Sahai, and M. Zhandry, “Post-zeroizing obfuscation: new mathematical tools, and the case of evasive circuits,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques.   Springer, 2016.
  • [153] P. Cousot and R. Cousot, “Systematic design of program transformation frameworks by abstract interpretation,” in ACM SIGPLAN Notices, 2002.
  • [154] ——, “An abstract interpretation-based framework for software watermarking,” in ACM SIGPLAN Notices.   ACM, 2004.
  • [155] M. Dalla Preda and R. Giacobazzi, “Control code obfuscation by abstract interpretation,” in Proc. of the third IEEE International Conference on Software Engineering and Formal Methods, 2005.
  • [156] R. Giacobazzi, “Hiding information in completeness holes: New perspectives in code obfuscation and watermarking,” in Proc. of the sixth IEEE International Conference on Software Engineering and       Formal Methods, 2008.