Cryptography is instrumental to implementing security services such as confidentiality, integrity, and authenticity in most software (both, new and legacy). In practice, proper usage and correct implementation of cryptographic primitives are difficult; vulnerabilities often occur due to misuse or erroneous implementations of cryptographic primitives. Example vulnerabilities arising from misuse of cryptography include weak and/or broken random number generators, resulting in enabling an adversary to recover servers’ private keys [PsAndQs]. Cryptographic APIs are sometimes misused by software developers, e.g., causing applications to be insecure against specific attacks, such as chosen plaintext [egele2013empirical] which a typical software developer may be unaware of.
In addition, incorrect implementations of cryptographic primitives can result in leakage of secrets through side-channels [crypto-side-channel-1] or through “dead memory” [khunt]. Other vulnerabilities in software for embedded and generic systems include implementation flaws in cryptographic libraries (e.g., the HeartBleed [HeartBleed] and Poodle [Poodle] vulnerabilities in the OpenSSL library), weaknesses in protocol suites (e.g., cryptographic weakness in HTTPS implementations [postcards, Logjam-Attack]), and algorithmic vulnerabilities in cryptographic primitives (e.g., an unknown collision attack on the MD5 hash function [Counter-Cryptanalysis] or a chosen-prefix collision attack on the SHA1 hash function [leurentsha]). Even after such vulnerabilities are discovered, it may take a while before appropriate fixes are applied to existing software as demonstrated by a recent large-scale empirical study [li2017large] that showed many software projects did not patch cryptography-related vulnerabilities for a full year after their public disclosure. This represents a large window for adversaries to exploit such vulnerabilities.
We argue that (automated) tools that assist software and system designers, and developers, in performing identification, analysis, and replacement in binaries (without requiring source code) can help shorten such vulnerability window, especially for legacy software. To address this issue, we explore feasibility of designing and developing a toolchain for Augmentation and Legacy-software Instrumentation of Cryptographic Executables (ALICE).
Contributions: Specifically, our goal is to make the following contributions:
We design the ALICE framework to automatically augment and instrument executables with broken or insecure cryptographic primitives. We also open source ALICE’s code at https://github.com/SRI-CSL/ALICE.
We develop heuristics to determine the scope of the (binary) code segments requiring augmentation if the cryptographic primitives are replaced with stronger ones.
We implement ALICE and experimentally evaluate its performance on several executable open source binaries of varying complexity.
Outline: The rest of this paper is organized as follows: Section 2 discusses related work. Section 3 overviews the ALICE toolchain, while Section 4 contains its design details. Due to space constraints, implementation details are described in Appendix B. Section 5 contains the results of our experimental evaluations. Section 6 discusses ALICE’s limitations, while Section 7 concludes the paper.
2 Related Work
Identifying Cryptographic Primitives. Several publicly available tools [findcrypt, kanal, hcdetector]
utilize static analysis to identify cryptographic primitives by detecting known (large) constants used in their operation. Such constants, for example, can be in the form of look-up tables (e.g., S-Boxs in AES) or a fixed initialization vectors/values (e.g., IV in SHA-128/256). Such tools do not always produce accurate results as the detected algorithm may be another function or another cryptographic primitive that uses the same constant values[LGF+15]. They are also ineffective when dealing with obfuscated programs [CFM+12].
In terms of academic efforts, Lutz [lutz2008towards] detects block ciphers from execution traces based on three heuristics: the presence of loops, high entropy, and integer arithmetic. Grobert et al. [GWH+11] introduce an additional heuristic to extract cryptographic parameters from such execution traces and identify primitives by comparing the input-output relationships with those of known cryptographic functions. Lestringant et al. [LGF+15] propose a static method based on data flow graph isomorphism to identify symmetric cryptographic primitives. Recently, the CryptoHunt [CryptoHunt17] tool develop a new technique called bit-precise symbolic loop mapping to identify cryptographic primitives in obfuscated binaries.
Our work focuses on non-obfuscated programs as we target common (and possibly legacy) software and not malware. We rely on finding known constants to identify cryptographic primitives as our first step. We then improve the accuracy of detection by applying a heuristic based on input-output relationships, similar to the work in [GWH+11]. In contrast to [GWH+11], our identification algorithm does not require program execution traces.
While there is existing work on identifying executable segments implementing cryptographic primitives, none of such work investigates the problem of replacing an identified weak primitive with a more secure one. Such replacement requires non-trivial operations, even if one can successfully identify executable segments implementing cryptographic primitives. To accomplish such replacement, one has to perform the following: (1) determining all changes throughout the binary necessary for replacing the identified primitive, and (2) rewriting the binary to apply all of the determined changes. To the best of our knowledge, there is no prior work addressing the first task, as a standalone, or in conjunction with the second. The second task can be tackled using slight modifications of existing binary rewriting techniques. In this paper, we categorize different types of necessary changes one may require when replacing a cryptographic primitive, and then discuss how to locate and rewrite each category of such changes in Section 4.2. In the rest of this section, we overview general binary rewriting techniques.
Rewriting Binaries. Binary rewriting is a technique that transforms a binary executable into another without requiring the original’s source code. Typically, the transformed binary must preserve the functionality of the original one while possibly augmenting it with extra functionalities. There are two main categories of binary rewriting: static and dynamic binary rewriting.
In static binary rewriting [detours, bauman12superset, anand2013compiler, edwards2001vulcan], the original binary is modified offline without executing it. Static binary rewriting is typically performed by replacing the original instructions with an unconditional jump that redirects the program control flow to the rewritten instructions, stored in a different area of the binary. This relocation can be done at different levels of granularity such as inserting a jump for each modified instruction, for the entire section or for each routine containing modified instructions. Static binary rewriting often requires disassembling the entire binary and thus incurs high overhead during the rewriting phase, but typically results in small runtime overhead in the rewritten binary. This technique is thus well-suited for scenarios where the runtime performance of the rewritten binary is a primary concern. Another approach for static rewriting is to transform the binary into the relocatable disassembled code and directly rewrite instructions in the transformed code. Doing so completely eliminates runtime and size overhead in the rewritten binary. Nonetheless, this approach relies on many heuristics and assumptions for identifying and recovering all relocating symbols and is still subject to the high overhead during the rewriting phase. Some example tools that are based on this approach are Uroboros [uroboros] and Ramblr [ramblr].
Dynamic binary rewriting (or dynamic instrumentation) [valgrind, pin, DynamoRIO, perkins2009automatically] modifies the binary’s behaviors during its execution through the injection of instrumentation code. Due to the need to instrument the code at runtime, this technique may result in higher execution time compared to the original binary. The main advantage of dynamic rewriting is its ability to accurately capture information about a program’s states or behaviors, which is much harder when using static rewriting. Example dynamic binary rewriting tools include Pin [pin] and DynamoRIO [DynamoRIO].
In this work, we first leverage the runtime information retrieved from dynamic instrumentation to accurately locate instructions that need to be rewritten. Instruction rewriting is then performed statically in order to minimize the runtime overhead of the rewritten binary.
3 Overview of the Alice Framework
The most straightforward, and obvious, approach to replace implementations of vulnerable cryptographic primitives requires modifying (and then recompiling) a program’s source code. This takes time and effort, and renders it difficult to fix legacy software for which source code may not be available. Instead, we propose ALICE – a toolchain that automatically augments and replaces weak, vulnerable, and/or broken cryptographic primitives at the binary level.
To better illustrate how ALICE works, we start by presenting a simple representative example shown in Figure 1. This example program first computes an MD5 digest over an input string. The digest is then converted into a human-readable form, which is in turn displayed to the user.
MD5 has been shown to be vulnerable to collision and pre-image attacks [klima2006tunnels, sasaki2009finding]. Suppose that a system or software developer would like to manually rewrite parts of the binary in order to support a more secure hash algorithm – e.g., SHA-256. One way to accomplish this task is to perform the following steps:
Step-1: Identify the functions in the binary that implement MD5.
Step-2: Recover the type and order of parameters in the identified functions.
Step-3: Insert an implementation of a SHA-256 function with the same type and order of parameters into the original binary.
Step-4: Redirect all calls to MD5 to the newly added SHA-256 function.
Step-5: Determine all changes throughout the binary affected by an increase in the digest size (MD5’s digest size is 128 bits while that of SHA-256 is 256 bits).
Step-6: Rewrite the binary according to changes discovered in step-5.
Goal & Scope: ALICE is designed to automate the aforementioned steps. It targets ELF-based X86/64 binaries generated by compiling C programs. We do not assume any knowledge of, or require, the corresponding source code or debugging symbols. Since ALICE is built as a defensive tool to work on standard and legacy software, it assumes the target programs are not malicious. Obfuscated or malware binaries are out of scope in this work. We demonstrate concrete feasibility on cryptographic hash functions, but the design and ideas behind ALICE are general and can be applied to other primitives too.
NOTE: We use cryptographic hash functions (or hash functions) to denote the algorithmic details behind the implementations/executables. We denote the functions (or methods) in such implementations/executables that realize such hash function(s) as hash routines. We use target hash functions/routines to refer to the insecure hash functions/routines that need to be identified and replaced.
4 Design Details of Alice
This section discusses the design details of the ALICE toolchain. The operation of ALICE consists of three main phases: (i) identifying cryptographic primitives, (ii) scoping changes, and (iii) augmenting and rewriting changes.
4.1 Identifying Cryptographic Primitives (Hash Functions)
We designed ALICE to target non-malicious (i.e., unobfuscated) binary programs. The first phase of ALICE leverages this characteristic and identifies hash functions by first detecting static features that are known of the target hash function.
Observation 1 (Constants)
A common design approach for hash functions is to
initialize a digest buffer using well-known constants. As an example, MD5 uses 32-bit constant words:
with the following hex values:
, , ,
If we locate such constants in a binary, there is a high chance that a routine enclosing those constants implements part(s) of the MD5 hash function. Our approach starts by scanning a binary program to find the addresses where known constants appear. We then mark a routine in which those constants are enclosed as a candidate implementation of the target hash function.
Observation 2 (Context Initialization)
The example in Figure 1a illustrates a typical usage of a hash function in practice. An application function – main() – calls the MD5() function, which in turn calls MD5Init() to initialize a digest buffer with known constants. Having a dedicated function to setup an initial context (e.g., MD5Init()) is common practice when implementing most cryptographic primitives and can be found in several open-source libraries such as the OpenSSL or libgcrypt.
This observation suggests that the identified candidate implementation will typically correspond to the initialization routine – Init(). However, it is not always the case as Init() could be inlined. For example, MD5Init() in Figure 1 will be inlined inside MD5() when the program is compiled with the optimization flag O3. In this scenario, the identified routine will instead correspond to the implementation of the target hash function – MD5(). One could use a simple heuristic based on the size of the routine to distinguish between the two scenarios. However, we found this approach to produce a lot of false-negatives in practice. Instead, we adopt a more conservative approach and consider routines produced in both scenarios as candidates. More specifically, ALICE analyzes the program’s callgraph and determines all routines that invoke the previously identified routine. It then includes those caller routines into the list of candidates.
Now, ALICE needs a mechanism to eliminate false-positives, which can arise due to two reasons. The first reason is that our approach so far focuses on ensuring no false-negatives by accepting all routines possibly implementing a target hash routine. The second reason is that static features such as a constant vector are not always unique to a single hash function. It is not uncommon for different hash functions to share the same constant vectors. Examples of a pair of hash functions that use the same constant vectors are BLAKE2b – SHA-512 and MD5 – MD4. In Figure 1, even if we successfully determine that MD5Init() is inlined, we still cannot easily distinguish whether the identified hash routine implements MD4 or MD5 hash function.
Observation 3 (Input/Output Uniqueness)
A cryptographic hash function is deterministic, i.e., for a given input string, it always generates the same digest as output. The input/output pair is usually (in practice) unique to the hash function that produces them.
With this observation, the best way to test whether a candidate implements the target hash function is to execute the identified routine, and compare the resulting output with the expected output. Since we expect the identification phase to be an offline computation, naturally we would base this step of our approach on an offline dynamic execution technique, which allows us to execute a given routine with any concrete chosen input. This is in contrast with online dynamic execution, which requires running the entire binary program with test cases. To perform offline dynamic execution, it is necessary to setup a call stack with proper parameters that will be passed into that routine.
Observation 4 (Parameters)
An implementation of a cryptographic function generally takes a fixed number of function parameters. For example, in Figure 1a, MD5() include three parameters: an input string, the input length and an output digest. Some implementations do not mandate the use of input length as it can be inferred from the input string (e.g., via strlen()). Thus, a hash routine in such implementations will take only two parameters. It is also worth noting that, even though a number is fixed, the order in which these parameters appear may not be the same for all implementations.
Based on this observation, ALICE enumerates all possible combinations of a hash routine’s parameters and prepares 8 ( = 3!+2!) call stacks, each initialized with different combinations of parameters. It then executes a candidate routine on each call stack and observes the output buffer after the execution. The first phase of ALICE finishes by outputting candidates producing the expected output as well as the parameter information obtained from corresponding call stacks.
4.2 Scoping Changes
After locating target hash routines, ALICE must determine changes (throughout the binary) that are required for replacing such routines. We now describe three categories of such changes using the illustrative example in Figure 1, and later outline how to identify a subset of such changes (at the binary level).
Routine Replacement: The first category is a change in the hash routine itself. Code/instructions implementing the target hash routine are replaced by code/instructions that implement a more secure hash function. For example, if our goal is to replace the MD5 function with SHA-256 in Figure 1, instructions corresponding to MD5() (i.e., the ones starting from address 0x400616) need to be replaced by SHA-256 instructions. We will discuss how ALICE augments this type of change into a binary in Section 4.3.
Changes in Buffers Sizes: Depending on the digest size of both the replacement and the target hash functions, other related memory buffers may need to be enlarged to correctly accommodate the new replacement routine. For instance, replacing MD5 with SHA-256 in Figure 1 would also require enlarging the size of variables storing the output of the hash function (i.e., digest variable) from 16 bytes to 32 bytes. This change in buffer size affects other memory buffers that consume the output digest, e.g., hexdigest also needs to be expanded by 16 bytes. Such changes have to be scoped and propagated throughout the entire binary. We discuss how to identify this type of change in the remaining of this section and how ALICE performs augmentation on such changes in Section 4.3.
Changes in Logic: This category refers to changes that need to be applied to the underlying binary logic in order to have a correct resulting binary function. For example, in Figure 1, simply replacing MD5() with SHA-256() and enlarging hexdigest and digest variable do not suffice to produce the desired binary. One would have to also edit a loop terminating condition from to in line 19 and 24 to reflect the replacement SHA-256. At the binary-level, this change corresponds to modifying the instructions at addresses 0x4006dc and 0x40070d from [cmp DWORD PTR [rbp-0x54],0xf] to [cmp DWORD PTR [rbp-0x54],0x1f]. This requires knowing that the constant 0xf in those instructions is related to the digest size. However, it is hard, in some cases impossible, to locate and augment this type of changes without any prior knowledge of the correct behavior of the resulting binary. Therefore, we do not consider this category of changes in this work. We further discuss the difficulty of automatically determining this type of change in Appendix D.
Of the three categories of changes, ALICE in Section 4.1 while is out of scope in this work. The remainder of this section will focus on how ALICE locates the changes from .is identified in the previous phase of
ALICE leverages dynamic taint analysis to determine the change in buffer size .
Typically, dynamic taint analysis starts by marking any data that comes from an untrusted source as tainted.
It then observes program execution to keep track of the flow of tainted data in registers and memory.
We adapt this idea to identify all memory buffers that are affected (or tainted) by the output digest of the target hash routine. In particular, our dynamic taint analysis executes a binary on test inputs with the following taint policies:
Taint introduction. At the beginning of execution, ALICE initializes all memory locations to be non-tainted. During the execution, whenever entering the target hash routine, ALICE reads the value in the parameter registers to observe the base address of the output digest. Since the digest size is well-known and deterministic for any given hash function, ALICE can also identify the entire address range of the digest buffer. Upon exiting the routine, ALICE then assigns a taint label to all memory locations in the digest buffer. ALICE uses three taint labels to differentiate 3 types of memory allocations:
Static allocation. In our target executables, static memory is allocated at compile time before the program is executed. Thus, the location of this type of memory is usually deterministic, stored in either .data or .bss segment of the associated binary. Detecting whether a given memory location is statically allocated is simply done by checking whether its address lies within the boundaries of those segments.
Heap-based allocation. ALICE traces all heap-based memory allocations by intercepting a call to three well-known C routines: malloc(), calloc() and realloc(). Whenever each of these routines is called, ALICE learns the size of allocated memory by reading values of its parameter registers. Upon exiting the same routine, ALICE then learns the base address of allocated memory via the return value. With this information, ALICE later can determine whether memory at a given location is allocated on the heap.
Stack-based allocation. ALICE maintains stack-related information of the execution via a shadow stack. Specifically, after executing any call instruction, ALICE pushes into the shadow stack: a pair of the current stack pointer and an address of the function being called. Upon returning from a routine (via a ret instruction), ALICE pops the shadow stack. This information allows ALICE to reconstruct stack frames at any point during the execution of dynamic taint analysis. ALICE determines whether memory at a given address is on the stack by checking it against all stack frames.
Taint Propagation. ALICE’s taint propagation rules are enforced at the word-level granularity. While we could use a more precise granularity such as the bit-level [yadegari2014bit], we did not find such approach to be cost-effective as test inputs may require a timely interaction with a remote server; having a significantly long delay in the dynamic taint analysis can cause the remote server to timeout and consequently the analysis may not be performed as expected.
In addition to the general taint propagation rules, ALICE also considers the taint-through-pointer scenario: if a register A is tainted and a register B is assigned with the referenced value of A, i.e., B := *A, then B is considered tainted. Such rule is necessary to accurately capture the data-flow in a common usage of a hash function, where the raw digest value is converted to human-readable format via a look-up table, e.g., the use of sprintf() in line 20 of Figure 1a.
Using these rules, ALICE’s dynamic taint analysis can determine, and assign taint labels to, all memory locations affected by the output digest. At the end of the analysis, ALICE aggregates individual tainted memory locations into unified memory buffers. Our aggregation rule is simple: ALICE considers contiguous memory locations to be a memory buffer if their address range is at least as long as the target hash function’s digest size. Lastly, in this phase, ALICE outputs the types (i.e., either stack-based, heap-based or static), locations (e.g., a stack offset or global address), and relevant instructions (e.g., an instruction address of a call to malloc) of memory buffers that are derived from the output digest.
4.3 Augmenting and Rewriting Changes
ALICE can incorporate several rewriting approaches. Since runtime of rewritten binaries is our primary concern, we mainly use static binary rewriting that has been previously shown to have minimal impact on the runtime [bauman12superset].
To reduce the size of rewritten binaries, we rewrite at routine level rather than at section level111We intentionally avoid rewriting at the instruction level as this can potentially incur significant run-time overhead for the rewritten/output binaries.. If there is at least one instruction that needs to be edited in a particular routine, we rewrite the binary as follows:
Create a new empty section in the binary
Apply changes fromand to the routine
Modify the routine with respect to the placement of the new section
Insert the entire rewritten routine into the new section
Insert a jump instruction to the new section at original routine’s entry point
For step (3), we only need to ensure that the rewritten routine maintains the correct control flow targets. Doing so requires editing all instruction operands in the routine that use rip-relative addressing. The displacement of such operands is recomputed based on the address of the new location:
In step (2), to apply changes from ALICE generates a patch from a user-supplied C code that implements the replacement hash function, and adds them to the new empty section of the binary. To ensure correctness of the rewritten binaries, implementation of the user-supplied replacement hash function must have the same parameter order as that of the target hash function as well as be self-contained. These requirements are, however, simple to fulfill given that the user already has access to the replacement hash function’s source-code; we discuss this issue in detail in Appendix B. We also ensure that a call to the target hash routine is redirected to this new code by simply rewriting the first instruction of the target hash routine to: jmp [new_code_entry_point]. For each memory buffer identified in , ALICE computes the new buffer size based on the ratio of the digest sizes of the target hash function and that of the replacement hash function, i.e., . ALICE rewrites the binary to support the expanded buffers by employing different techniques for each type of buffers:,
Static Buffer. As a static buffer is allocated at a fixed address, we expand such buffer by creating another buffer at a new location and modify all instruction operands that access the original buffer to this newly allocated buffer. Specifically, ALICE first allocates a new data segment in the binary, and creates a mapping of the address of the original buffer to the address in the new data segment. To ensure that the rewritten binary uses the new address instead of the original, ALICE scans through all instructions in the original binary and edits the ones that contain an access to the original address by using information obtained from the previously computed address mapping.
Heap-based buffer. Unlike a static buffer, this type of buffer is allocated dynamically through a call to malloc(), alloc() or realloc() routine. Fortunately, ALICE learns when this type of buffer is allocated through the dynamic taint analysis in Section 4.2. Thus, expanding a heap-based buffer only requires ALICE to trace back to the instruction allocating such buffer, i.e. a call instruction to malloc(), alloc() or realloc(), and update the parameter register value storing the allocation size information to the new buffer size.
Stack-based buffer. Figure 2 shows how ALICE modifies the main routine in the example from Figure 1b to support the expansion of stack-based buffers: digest and hexdigest by 16 and 32 bytes respectively. Intuitively, expanding a buffer allocated on the stack at the binary level requires: (i) locating the routine that uses the corresponding stack frame, (ii) enlarging the frame to be large enough to hold the new buffers, and (iii) adjusting every access to memory inside the frame accordingly. ALICE’s previous phase, in Section 4.2, provides necessary information to satisfy the first requirement (via the shadow stack). To achieve the second requirement, ALICE rewrites the instructions that are responsible for increasing and decreasing the stack pointer in the prologue/epilogue of the located routine, e.g., [40062d: sub rsp,0x60] in Figure 2a. For the third requirement, ALICE iterates through all instructions in the routine and inspects the ones that use the stack offset, i.e., via rsp or rbp registers. ALICE then recomputes the stack offset with respect to the increased frame size and rewrites those instructions if the newly computed offset differs from the original. Steps of how ALICE recomputes the new stack offset are shown in Algorithm 1 of Appendix C. In Figure 2, ALICE identifies all instructions that access a stack element and rewrites the ones highlighted in green.
5 Experimental Evaluation
5.1 Experimental Setup
Goals & Datasets. The goal of this evaluation is two-fold, first, to assess whether ALICE can accurately identify and replace different implementations of hash functions, and second, to measure ALICE’s effectiveness on real-world applications. Different implementations may include different hash function structures (e.g., with or without Init()), different parameter orders, or simply different implementation details. We apply ALICE to a dataset that consists of four popular cryptographic libraries: OpenSSL, libgcrypt, mbedTLS, and FreeBL. We compile each library with different optimization levels, including O0, O1, O2, O3 and Os, into a static library. We then create a simple C application (similar to the one in Figure 1a) that calls exactly one hash function located in the static library. We compile this application without debugging/relocation symbols and link it with each individual static library. We also assess ALICE’s effectiveness on 6 real-world applications: smd5_mkpass and ssha_mkpass – github projects for creating LDAP passwords 222https://github.com/pellucida/ldap-passwords, md5sum and sha1sum – string/file checksum programs, lighttpd – a lightweight webserver program, and curl – a webclient command line tool. Similar to the first dataset, each program was compiled without debugging symbols and with various optimization levels, and statically linked to the cryptographic library used within the program.
Insecure Hash Functions. We consider MD2, MD4, MD5, SHA1, and RIPEMD-160 as insecure hash functions to be replaced in our experiments. Our objective is to identify implementations of such hash functions in binaries and replace them with stronger ones, i.e., SHA-256. A list of all (insecure) hash functions in each dataset is shown in Table 1.
Environment. Experiments are performed on a virtual machine with Ubuntu 16.04.5 OS, 4GB of RAM and 2 cores of 3.4GHz of CPU running on top of an Intel i7-3770 machine. The following versions of required tools are used in the experiments: gcc-5.4.0, angr-188.8.131.52 [angr], Triton-0.6 [triton] and Pin-2.14.71313 [pin].
|1-7 Real-world Programs||smd5_mkpass||✗||✗||OIL||✗||✗|
5.2 Evaluation Results: Cryptographic Libraries
As described in Section 4.2, we only consider automated inference of required changes from categories and in this work. In order to properly evaluate ALICE, we perform manual analysis to identify all changes required in and supply them to ALICE. The manually supplied changes for this case (in this dataset) consist of only a couple of instructions that typically specify loop termination condition(s). For example, in the binary from Figure 1b, we instruct ALICE to modify two instructions at addresses 0x4006dc and 0x40070d from [cmp DWORD PTR [rbp-0x54],0xf] to [cmp DWORD PTR [rbp-0x54],0x1f].
Correctness of Rewritten Binary. To simplify illustration, we first describe behaviors of the rewritten binaries that are considered incorrect in this dataset. First, if ALICE misidentifies any of necessary changes in the input binary, the resulting binary will not display the correct SHA-256 digest of the input variable. For instance, it may display nonsensical data, a digest produced by the original hash function or an incomplete version of the SHA-256 digest. Second, if ALICE’s rewriting phase does not function properly (e.g., it expands the buffer size by a different amount or adjusts memory access to the stack incorrectly), it likely results in a runtime error for the output binary. We consider the correctness of the rewritten binaries from this dataset to be the converse of the aforementioned behaviors, i.e., execution of the output binary must terminate without any errors and it must result in displaying the correct SHA-256 digest of the input variable. All binaries produced by ALICE in this dataset work as expected.
Binary Size and Execution Overhead. ALICE adds around 3-13KB to the output binaries (see breakdown details in Appendix G). On further inspection, we found two main reasons for this overhead. First, ALICE statically adds a patch implementing the replacement SHA-256 hash function, which contributes around 3KB to the output binary. Second, our underlying binary rewriter, patchkit [patchkit], expects code and data of this patch to be aligned to a page size (i.e., 4KB in our testing machine), which can add up to another 8KB to the output binary.
In terms of execution overhead, we implement a simple Pintool [pin] to count the number of instructions executed by the output binaries. We then compare the result with the baseline, where manual editing is performed on the original binary’s source code in order to replace the insecure hash function and the modified source-code is properly optimized by the standard gcc compiler. The results in Figure (a)a show that the binaries produced by ALICE have low execution overhead with an average of 300 added instructions, or only an increase of 0.3%, compared to the baseline. We also did not observe any noticeable increase in execution-time for the output binaries.
Toolchain Runtime. Figure (b)b shows runtime of ALICE to produce the output binaries. The total runtime heavily depends on the size of input binaries, and is dominated by the runtime of the identification phase. This bottleneck happens because the identification phase involves heavy-weight analysis such as disassembling the entire binary and/or recovering the binary call graph. It is also worth noting that such analysis is performed only once and ALICE re-uses the analysis results in latter phases; this leads to lower runtimes in the latter phases.
5.3 Evaluation Results: Real-World Binaries
Similar to the previous dataset, we manually inspect the binaries in this dataset to identify changes required in ALICE. Such changes are mainly related to the digest size that is hard-coded in the source code., then supply them to
Correctness of Rewritten Binary. We consider rewritten binaries to be correct if changes performed by ALICE: (1) correctly implement new functionalities with respect to the target SHA-256 hash function and (2) do not interfere with the remaining functionalities. For instance, the former enforces the rewritten binary of md5sum to be able to perform sha256sum of a given input string. We realize the latter requirement by executing the binaries produced by ALICE with all test cases (except the ones that use insecure hash functions) provided in their original respective project repository. We discuss expected functionalities of each test program in Appendix E.
|Program||OFLAG||Binary Size||Correctness of|
Table 2 shows the correctness of output binaries produced by ALICE in this dataset. All output binaries pass all test cases in their original project repository while only one output binary fails to pass the expected functionality. We manually examined the failed binary and found that ALICE misidentified an insecure hash routine. Our further inspection reveals that the main culprit appears to be our underlying dynamic concrete executor, which fails to output the expected MD5 digest even if ALICE sets up a proper call stack. As a result, ALICE’s identification phase did not detect this insecure hash function in the input binary, and the output binary remained unchanged.
Binary Size and Execution Overhead. Table 2 shows the increase in binary size in this dataset. ALICE adds around 4 to 11KB to the original binary. As mentioned in Section 5.2, up to 8KB of this overhead is caused by the underlying binary rewriter performing a patch alignment. The remaining overhead stems from rewritten functions that are appended at the end of the new binary. We note that we excluded the result for the curl binary compiled with O3 in Table 2 (and subsequent figures) as ALICE could not produce the correct output binary in that case.
We did not observe any noticeable execution overhead in terms of execution-time when running the output binary against the provided test cases from the project’s repository. In addition, we measure the number of executed instructions for the output binary to perform the expected functionality with respect to the SHA-256 function and compare it to the baseline, where we manually edit the source code to replace the insecure hash function. The result, shown in Figure 4, also indicates negligible () increase in execution-time in this case. Note that even though execution of the rewritten curl binaries becomes faster (requires 2% fewer instructions), this improvement is still negligible. As such, we do not claim that ALICE helps producing a more efficient output binary.
Toolchain Runtime. Figure 5 illustrates ALICE’s runtime to produce each output binary. In simpler programs (e.g., md5sum or smd5_mkpass), ALICE identifies and replaces weak primitives in less than a minute. For more complex programs (e.g., lighttpd), ALICE’s runtime can be a bit slower – up to 5 minutes. Most of the runtime overhead comes from the scoping phase because it needs to instrument a large number of instructions (e.g, k instructions for lighttpd) while execution of simpler programs contains significantly fewer instructions. We consider ALICE’s runtime to be acceptable since the entire process only needs to be performed once, making the toolchain runtime not a primary concern.
|/( + )||50%||50%||60%||75%||87.5%||50%||70.17%|
Reduction in Manual Efforts. While ALICE currently does not automatically identify changes from the class, it still saves considerable manual effort. We quantify such savings in Table 3 as the number of instructions required to be rewritten in order to implement changes for each category. On average, ALICE automatically identifies and rewrites 1,266 instructions, which translates into 99.87% reduction in manual efforts. However, we acknowledge that it may be possible to use existing cryptographic identification tools (with some modifications) to locate changes from . Even when such tools exist, our toolchain still significantly reduces manual work by 70.17%. It is worth emphasizing again that no existing tools are capable of identifying changes from and .
6 Limitations and Future Work
The current version of ALICE has limitations. First, ALICE relies on some underlying (open source) building-block tools and inherits their limitations. For example, we encountered instances where the underlying x86-64 assembler, Keystone [keystone], fails to translate uncommon instructions whose operands contain fs registers or a rep instruction. Whenever we encounter such an issue, we manually fixed it by directly hard-coding the correct behavior into ALICE. Furthermore, the first phase of ALICE relies on the angr framework [angr] for disassembly of stripped binaries; angr does not perform static disassembly with correctness guarantees. In fact, static disassembly of stripped binaries with correctness guarantees is still an open problem [andriesse2016depth]. Thus, angr may produce incorrect results in ALICE’s first phase, which affects outcomes of output binaries.
ALICE also assumes that the routine implementing an insecure hash function and necessary changes are statically included in the main application. ALICE does not currently support identifying and replacing insecure cryptographic primitives located in a dynamic library. Expanding ALICE’s functionalities to dynamic libraries is possible since most of our underlying tools are capable of locating and analyzing dynamic libraries used by the main application.
ALICE does not automatically identify nor rewrite changes from and relies on the user to supply them to the toolchain. In practice, some manual effort is required to locate changes in binary logic for a large binary. Automating this process is a challenging problem and is an interesting avenue for future work. Due to space limitations, we discuss this issue in more detail in Appendix D.
Finally, we design ALICE to target non-malicious legacy binaries and assume that such binaries are not obfuscated. In practice, even legitimate software may make use of obfuscation techniques, e.g., to protect intellectual property. Extending ALICE to support obfuscated but non-malicious binaries (e.g., the software is not malware trying to evade analysis) is also an interesting future direction.
This work was sponsored by the U.S. Department of Homeland Security (DHS) Science and Technology (S&T) Directorate under Contract No. HSHQDC-16-C-00034. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DHS and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DHS or the U.S. government. The authors thank the anonymous reviewers for their valuable comments.
We have developed ALICE, a toolchain for identifying weak or broken cryptographic primitives and replacing them with secure ones without relying on source code or debugging symbols. We have implemented a prototype of ALICE that can detect several cryptographic primitives while only requiring access to the binaries containing them. Our implementation of ALICE can also automatically replace weak and/or broken implementations of cryptographic hash functions in ELF-based x86-64 binaries. We have demonstrated ALICE’s effectiveness on various open-source cryptographic libraries and real-world applications utilizing cryptographic hash functions. Our experimental results show that ALICE can successfully locate and replace insecure hash functions while preserving existing functionalities in the original binaries.
Appendix A Detectable Cryptographic Primitives
A list of cryptographic primitives that are detected by ALICE is shown in Table 4.
|Primitive||Name||Constant Type||Constant Values|
|Hash||MD5||IVs||[01234567, …, 76543210]|
|Functions||SHA1||IVs||[01234567, …, f0e1d2c3]|
|IVs||[67e6096a, …, 19cde05b]|
|Ciphers||RC5||Magic Constant (P)||6b2aed8a…|
|DES||S-Box or SPtrans||0e040d01… or 00080802…|
|Elliptic||NIST-P192||Prime, Base Point x||[ffffffff…, 188da80e…]|
|Curves||NIST-P224||Prime, Base Point x||[ffffffff…, b70e0cbd…]|
|(Signatures||NIST-P384||Prime, Base Point x||[ffffffff…, aa87ca22…]|
|& Key Exchange)||NIST-P256||Prime, Base Point x||[ffffffff…, 6b17d1f2…]|
|NIST-P512||Prime, Base Point x||[ffffffff…, b70e0cbd…]|
Appendix B Implementation Details of Alice
We describe below implementation details for each phase required in the operation of the ALICE toolchain.
Phase 1: Identifying and Locating Cryptographic Primitives. We implement this phase on top of the binary analysis platform angr [angr].
Specifically, we use angr’s CFGFast API to perform the disassembly of instructions and generate the callgraph for a given binary.
ALICE uses the resulting disassembly to determine parts of the binary containing specific constant vectors.
ALICE analyzes the program callgraph to locate candidate routines.
Finally, to determine whether a candidate routine actually implements the target hash function,
we use angr’s built-in dynamic concrete executor to
invoke each candidate routine on a given test string and compare the output with the expected output.
Phase 2: Scoping Changes. ALICE’s second phase is built on top of a dynamic analysis framework, Triton [triton]. In particular, we used Triton’s taint engine (which is implemented as a Pintool [pin]) for the implementation of our taint propagation rules. Triton also provides an API for instrumenting a callback function to be executed at any point during the dynamic taint analysis. We register two callback functions:
The first callback is invoked immediately after the execution of each instruction. It implements our taint introduction rule by determining whether the current instruction: (1) is a call to, or a return from, the target hash routine; (2) results in any new tainted memory; (3) accesses static memory; and (4) changes a position of the stack frame.
The second callback is executed at the end of dynamic taint analysis. It is responsible for aggregating all tainted memory cells into memory buffers and outputs the necessary results to the next phase.
Phase 3: Augmenting and Rewriting Changes. There are two steps in this phase: (1) binary rewriting, and (2) patch generation.
First, we use a static binary rewriting tool, patchkit [patchkit], to edit the input binary to support new changes. After identifying and augmenting changes into the affected routines, we use patchkit’s inject API to add the modified routine into the new section of the original binary. Then, we call the hook API to redirect all calls from the original routine to this modified routine.
Second, we generate a binary patch containing an implementation of the replacement hash routine. Performing this is not trivial because we must ensure the following:
The binary patch must be self-contained, i.e., it must not rely on any external functions and/or libraries. To satisfy this, we statically included implementations of external functions used by the patch into its source code. Then, we create the patch based on such functions instead of the external functions/library. The example is shown in Line 1 of Figure 6 where the glibc strlen() is statically added into the patch’s source code.
Since the patch will be injected into a new section of a different binary, it must be compiled with respect to the address of this new section. The example shell script that we used to compile a SHA-256 patch as well as the patch’s source code are shown in Figure 7 and 6, respectively. At the source code level, we separate code and data of the patch by assigning them to different sections. Then, to ensure that they can be loaded from a specific address, we compile each with --section-start flag (Line 8 in Figure 7) and store the result into separate binaries (Line 9-10 in Figure 7).
The hash function in the patch must have the same order of parameters as the target hash function. To achieve this, we include hash implementations of all possible parameter orders into the patch’s source code and assigned each of the implementation to different binary sections (See Line 11-20 of Figure 6 for an example). Upon learning the parameters of the target hash function (from phase 2), we then select the section containing the corresponding implementation and integrate it into the patch at load-time (Line 11 in Figure 7).
Appendix C Recomputing Stack Offset
In phase 3, once detecting a new stack frame size, ALICE determines all instructions that access memory within the target stack frame. If an access lies above any tainted buffer 333Recall that all tainted buffers are identified in ALICE’s phase 2. (with respect to the rsp register), ALICE increments such access by , where is a sum of the size difference of all tainted buffers located below the given memory access. Otherwise, the stack offset remains unchanged. Details of this algorithm are shown in Algorithm 1.
Appendix D Locating Changes in Binary Logic
ALICE currently does not automatically identify and rewrite changes in binary logic ( ) but instead relies on the user to supply them to the toolchain. In practice, some manual effort may be required to locate changes in binary logic for a large binary. Automating this process without access to source code is currently a difficult open problem as we illustrate through a simple example below.
Figure 8 shows the instructions corresponding to a loop (from Line 19 to 21) in the C code shown in Figure 1a compiled with and without optimizations. Recall that operating changes in binary logic in this example corresponds to extending the loop termination conditions from the original digest size () to the new digest size ( in the case of SHA-256) at the source-code level. Naïvely, one could employ a heuristic based on the digest size to automatically identify such changes at the binary level: by finding instructions containing the digest size value and rewriting them with the new digest size value. However, this heuristic would only be effective on the unoptimized binary while it would fail on the optimized binary in Figure 8b. This is because the loop in the optimized binary is transformed in such a way that it terminates based on a different condition, i.e., the hexdigest buffer (at instruction 0x400655) instead of the digest size value. Therefore, solving this problem even for the simple program would require an approach more sophisticated than a simple heuristic. Solving this problem for general programs (e.g., the ones from Figure 9) is more challenging.
Given the difficulty of addressing 1, replacing MD5() with SHA-256() by just enlarging hexdigest and digest variables does not suffice to produce the desired binary; one would have to also edit a loop terminating condition from to in line 19 and 24 to reflect the replacement SHA-256. We argue that it is reasonable to expect a user (which needs to only understand the interface of the SHA-256 hash function) to know that SHA-256 produces a digest with length of 32 bytes and that any code using such a digest should expect a buffer or variable of 32 bytes. The user can then instruct ALICE that the terminating condition of the loop should be enlarged from 16 bytes to 32 bytes. We argue that a large portion of the class of changes can be addressed via such an approach, but leave it for future work to investigate this in more detail.in its generality, it merits a separate treatment in another paper. Instead, we only outline here a semi-automated approach to address that assumes access to source code and relies on minimal inputs from users. For example, assuming the user has access to the source code, they can identify obvious logic conditions that should be changed, e.g., loop terminating conditions that should be changed. For example, in Figure
Appendix E Expected Functionalities for Real-World Binaries
This section discusses expected functionalities of both original and rewritten binaries for all real-world programs used in Section 5.3. The expected functionalities of original programs are also used as test inputs in ALICE’s scoping phase in order to perform the dynamic taint analysis.
md5sum and sha1sum are popular checksum command line programs that can be used to compute the MD5 and SHA1 functions of a given file or input string. The expected functionality of the rewritten md5sum and sha1sum binaries is to be able to compute the SHA-256 function on a given input string and output it to the terminal.
The smd5_mkpass program outputs a salted MD5 password from two inputs – a user’s secret key and a salt. This password then can be used to perform user authentication in the Lightweight Directory Access Protocol (LDAP) [ldap]. The ssha_mkpass program works similarly but generates a salted SHA1 password instead of a salted MD5 password. Hence, we expect ALICE to produce the rewritten smd5_mkpass and ssha_mkpass binaries capable of computing a salted SHA256 password from a secret key and a salt.
curl is a client-side command-line utility tool for transferring data using URL syntax. It supports various internet protocols such as HTTP(S), FTP(S), cookies, proxy tunneling or even access authentication. In particular, we focus on evaluating ALICE on curl’s implementation of digest access authentication (or DigestAuth) while removing other functionalities that use an insecure hash function in the curl binary.
DigestAuth provides an access control mechanism to a webserver. On the client side, DigestAuth constructs and sends the authorization token to the webserver by applying a hash function multiple times to the username, password and nonce generated by the webserver. The webserver then makes a decision based on the received token whether to grant an access to requested resources or not. Detailed description of DigestAuth is shown in Appendix F.1.
Particularly, version 7.56.0 of curl implements DigestAuth based on the RFC 2617 [rfc2617] that supports only MD5 as the underlying hash function. Even though there is no known attack on such construction yet, it would still be a good idea to migrate it to a more secure one. Therefore, we expect ALICE to produce the output curl binary capable of performing DigestAuth using SHA-256 instead of MD5. Also, as part of this experiment, we implemented a test webserver in Python using Flask-HTTPAuth 444https://flask-httpauth.readthedocs.io, which provides the DigestAuth functionality with MD5 or SHA-256 as the underlying hash function.
lighttpd provides a light-weight implementation of a webserver while remaining standards-compliant and flexible. In particular, we are interested in evaluating ALICE on lighttpd’s implementation of basic authentication (BasicAuth). Similar to DigestAuth, BasicAuth is a mechanism to enforce access controls to web resources. The main distinction is that a client in BasicAuth creates an authorization token by encoding the username and password with Base64. This is in contrast with a use of of a hash function in DigestAuth. The webserver then validates the authorization token by comparing it with the corresponding password stored in the password file. To be secure, the password file must not store passwords in clear; instead they should be stored in an encrypted or hashed format. lighttpd’s BasicAuth supports various password encryption formats, including the insecure SHA1 hash function. We provide further details of BasicAuth in Appendix F.2. Our goal in this experiment is to replace SHA1 with SHA-256; hence, after applying ALICE on the lighttpd binary, we expect the output webserver binary to be able to perform BasicAuth using passwords stored as the SHA-256 format, instead of the SHA1 format. For this experiment, we implemented a test webclient in Python to perform the client-side functionality of BasicAuth.
Appendix F Details of Cryptographic Schemes
f.1 Digest Authentication Scheme
Digest Authentication Scheme (DigestAuth) is depicted in details in Figure 10. In our experiment from Section 5.3, the client communicates with the server through the curl executable that implements DigestAuth using SHA1 as the underlying function. The rewritten curl is considered correct if it can execute DigestAuth with and pass all of curl’s existing test cases.
f.2 Basic Authentication Scheme
Figure 11 shows all steps required in Basic Authentication scheme (BasicAuth). In our experiment from Section 5.3, the server side is deployed using lighttpd and uses MD5 as the underlying function. Our goal is to locate the MD5 hash function in the lighttpd binary and replace it with SHA-256. We consider the rewritten lighttpd produced by ALICE to be correct if it can also follow the scheme in Figure 11 with .
Appendix G Full Results for Size of Executables
|Crypto Library||Algorithm||OFLAG||Binary Size|
|Crypto Library||Algorithm||OFLAG||Binary Size|