ARM Pointer Authentication based Forward-Edge and Backward-Edge Control Flow Integrity for Kernels

12/23/2019 ∙ by Yutian Yang, et al. ∙ 0

Code reuse attacks are still big threats to software and system security. Control flow integrity is a promising technique to defend against such attacks. However, its effectiveness has been weakened due to the inaccurate control flow graph and practical strategy to trade security for performance. In recent years, CPU vendors have integrated hardware features as countermeasures. For instance, ARM Pointer Authentication (PA in short) was introduced in ARMV8-A architecture. It can efficiently generate an authentication code for an address, which is encoded in the unused bits of the address. When the address is de-referenced, the authentication code is checked to ensure its integrity. Though there exist systems that adopt PA to harden user programs, how to effectively use PA to protect OS kernels is still an open research question. In this paper, we shed lights on how to leverage PA to protect control flows, including function pointers and return addresses, of Linux kernel. Specifically, to protect function pointers, we embed authentication code into them, track their propagation and verify their values when loading from memory or branching to targets. To further defend against the pointer substitution attack, we use the function pointer address as its context, and take a clean design to propagate the address by piggybacking it into the pointer value. We have implemented a prototype system with LLVM to identify function pointers, add authentication code and verify function pointers by emitting new machine instructions. We applied this system to Linux kernel, and solved numerous practical issues, e.g., function pointer comparison and arithmetic operations. The security analysis shows that our system can protect all function pointers and return addresses in Linux kernel.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 7

page 8

page 9

page 10

page 11

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Since the first emerging in the 1990s [solar-rop], code reuse attack has become a big threat to software and system security, especially after code injection has been defeated by hardware features, including NX/SMEP/SMAP on x86 and XN/PXN/PAN on ARM. Specifically, after hijacking the control flow through memory corruption, attackers could chain existing code snippets (called code gadgets) together to perform malicious operations. This is called return-oriented programming (ROP in short) [ROP, ROPRISC]. Previous studies showed that given a large codebase (such as Linux kernel or libc), ROP has been shown to be Turing complete [roemer2012return], making it a powerful attack.

To defend against the ROP attack, multiple solutions have been proposed, which are roughly falling into two categories. The first category includes systems to make attackers hard to obtain necessary information to launch the attack, either by randomizing memory layout [aslr, aslr1, Oxymoron, wartell2012binary], or reducing the number of available gadgets [GFREE]. However, address randomization has been proven to be ineffective [aslrattack1, aslrattackT1], since the address information could be leaked or inferred. Moreover, the large codebase makes it impossible to totally eliminate code gadgets. The second category includes systems to protect the integrity of control flow (CFI in short) [CFI, zhang2013practical, zhang2013control]. Though CFI is a promising technique, its effectiveness has been weakened [CFB] due to the inaccurate control flow graph and the practical strategy to trade security for performance.

In recent years, hardware-assisted control flow enforcement [mashtizadeh2015ccfi, mohan2015opaque] has drawn much attention. These systems mainly borrow hardware features that were designed for other purposes. Nowadays, vendors have directly embedded security features for CFI in modern CPUs. For instance, ARM introduced Pointer Authentication (PA) in ARMv8.3 [arm-pa]. Specifically, it reuses unused bits in the virtual address of the ARM64 architecture to calculates and embed an authentication code for the pointer, thus the name Pointer Authentication Code (PAC). When the pointer is de-referenced, the embedded authentication code could be used to verify its validity by the hardware. To facilitate its use, multiple instructions are added.

Since its debut, PA has been considered as a promising defense due to its powerful security guarantees and efficient pointer value verification [qualcomm-ret-addr]. However, to leverage this feature, programmers need to change and recompile their programs to use the new instructions. Though a couple of papers are adopting PA to protect code and data pointers in user programs [hans2019pac, HANSCALLSTACK, HANSCANARY], there is no open implementations that leverage PA to protect privileged software, i.e., OS kernel 111We are aware that Apple has adopted PA in its latest version of iOS XNU kernel. However, its implementation details are unknown.. Due to the differences between OS and user programs (for instance, while user programs could assume that the underlying kernel is trusted to provide cryptography keys to generate the authentication code, OS kernels cannot make such an assumption), how to effectively use PA to protect OS kernel is still an open research question.

Our work   In this paper, we shed lights on how to leverage PA to protect control flows of OS kernels, and present the first design and implementation of such a system. Specifically, we propose PACKER, which is short for Pointer AuthentiCation for KERnels, protects both function pointers and return addresses in Linux kernel, thus providing both forward- and backward-edge control flow integrity. To the best of our knowledge, it is the first open implementation of applying PA to Linux kernel.

In order to leverage PA to provide complete protection of function pointers, PACKER needs to append the authentication code to the value of a function pointer 222In this paper, if not specified, we use “function pointer” to denote its value, i.e., the jump target., track the propagation of function pointers, and verify its validity when loading its value from memory or branching to the jump target. Specifically, PACKER calculates an authentication code (PAC) for each function pointer (we call the function pointer with an authentication code as a PACed pointer) before it is written into memory. The PAC is computed using the combination of a hardware cryptography key, the function pointer value, and a context. Then PACKER tracks the propagation of the function pointer with the help of the LLVM compiler. When a PACed pointer is loaded from the memory, PACKER verifies the value to ensure that it has not been modified (attackers have arbitrary memory write capability). However, the previous step is not enough since an attacker could directly jump before the instruction that dereferences a function pointer (the blr instruction for instance). In this case, PACKER verifies the jump target in indirect branch instructions before jumping to it.

When calculating the authentication code, unlike the previous work that leverages the function type as a context [hans2019pac], PACKER takes the address of a function pointer as its context. That’s because for each function pointer, the function type is not unique. Attackers could obtain the PACed function pointer and reuse it for another function pointer. This is called pointer substitution attack. By using the unique address of a function pointer as its context, PACKER is immune to this attack. However, the challenge is the location of de-referencing a function pointer may be far from the location where it is loaded, thus we need to propagate the address of a function pointer between procedures. PACKER takes a clean design to piggyback the pointer address into the pointer value (Figure 4) to solve this problem.

To protect return addresses, PACKER will generate the PAC for a return address before saving it to the stack and check the PAC after loading it from the stack. The stack pointer is used as the context, so that the signed return addresses cannot be replayed across different stack frames. As a result, the return address corruption and return address replay attacks will be defeated by PACKER. Moreover, different from existing works, PACKER uses a single instruction to authenticate the loading return address and return it atomically, which defeats the time of check to time of use attacks.

We have implemented a prototype system with LLVM and applied it to Linux kernel v5.0.1. Specifically, we developed LLVM passes to identify function pointers, add authentication code, propagate and verify function pointers by emitting new machine instructions into the binary. To apply PACKER to Linux kernel, we also modified the kernel to patch the statically initialized function pointers, and solved multiple practical issues, including function pointer comparison, function pointer arithmetic operations and function pointers inside a union. The security analysis shows that PACKER can protect all the function pointers and return addresses in Linux kernel, with a performance overhead between % to , using the micro benchmark of system calls.

This paper has the following contributions:

  • We propose the first design of using the ARM pointer authentication to protect control flow transfers of Linux kernels. Our design protects all function pointers and return addresses in Linux kernel, thus providing both forward-edge and backward-edge control flow integrity.

  • To implement PACKER, we have proposed a series of new techniques to solve technical challenges. In particular, we proposed address-base authentication code generation to defend against pointer substitution attacks, and pointer address piggyback to propagate function pointer address. We also proposed methods to identify function pointers, and verify them when loading, storing their values, and branching into targets.

  • We have implemented a prototype of PACKER based on the latest Clang/LLVM and applied it to protect the latest version of the Linux kernel. PACKER successfully protects 100% of indirect call sites and return addresses.

The organization of this paper is as follows: background knowledge is given in section II. section III discusses the threat model and assumptions. PACKER design is presented in detail in IV. We discuss the implementation details in section V and evaluate both the security and performance of PACKER in section VI. We compare PACKER with related works in section VII. Finally, we conclude the whole paper in section VIII.

Fig. 1: ARMv8.3 Pointer Format with Pointer Authentication. The Pointer Authentication Code (PAC) is embedded into the unused bits of a pointer.

Ii Background

In this section, we give preliminary background knowledge of the techniques used by this paper, including pointer authentication and ROP/JOP attacks.

Ii-a ARMv8.3 Pointer Authentication

ARM has introduced a new hardware security feature in ARMv8.3, named Pointer Authentication (PA) [ARM_PA], to protect integrity of pointers saved in memory. The basic idea of PA is to compute a cryptographic keyed hash, trunk the hash and embed it into the unused bits in the pointer. The functionality of the cryptographic keyed hash is the same to message authentication code (MAC), therefore it is termed as Pointer Authentication Code (PAC).

Figure 1 depicts the format of a PACed pointer on ARM64. VA_BIT represents the size of the virtual address space, which is usually 39 or 48 bits. The other bits of a pointer are not used for address translation, therefore can be used to hold the PAC. Note that the top 8 bits are occupied by memory tag if the ARM memory tag extension (MTE) [ARM_MTE] is enabled. Therefore, depending on the configuration, the PAC size can be 7 or 15 bits. ARMv8.3 PA uses QARMA block cipher algorithm [avanzi2017qarma] for PAC generation:

PAC = QARMA(key, pointer, context).

QARMA takes a 64-bit pointer with a 64-bit context as inputs and outputs a 64-bit cipher block. QARMA uses 128-bit key, which is kept in dedicated registers. ARMv8.3 PA provides five key registers, out of which, APIAKey and APIBKey are designed for encrypting code pointers; APDAKey and APDBKey are designed for encrypting data pointers; and APGAKey can be used for general purpose. The output cipher is then truncated to a suitable size and embedded into the PAC field showed in Figure1.

ARMv8.3 also provides a new set of instructions for PA support. pac* instruction are designed for generating and embedding the PAC. For example, pacia x0, x1 accepts x0 as the pointer and x1 as the context, generates the PAC using APIAKey, and embeds the PAC into x0. Correspondingly, aut* instructions are designed for PAC authentication. For example, autia x0, x1 will verify the PAC embedded in x0 by using APIAKey and x1 as the context, if x0 has a valid PAC, x0 will be changed into a normal pointer, otherwise its top bits will be flipped and an address translation error will be triggered upon de-referencing the pointer.

Some instructions are designed for specific usages, such as paciasp generates the PAC using x30 as the pointer and stack pointer sp as the context by default. Similarly, autiasp authenticates the PAC using x30 as the pointer and stack pointer sp as the context.

Besides the basic PAC generation and authentication instructions, ARMv8.3 PA also provides PA combined instructions. For example, function pointer branch (call) operation is usually done by blr instruction, which branches to the function pointer and updates the link register with the correct return address. PA now provides blraa, which authenticates the function pointer first before branching to function pointer. Similarly, for the function return, PA provides retaa, which authenticates the return address before returning to it.

Guarded by pointer authentication, even though the attacker can corrupt function pointers with memory corruption vulnerabilities, the corrupted function pointer cannot pass the authentication without the knowledge of the key. Therefore, ARMv8.3 PA can be a cornerstone for designing new control flow protection schemes. However, due to the limited number of key registers, PA is vulnerable to pointer substitution attacks if the context is not selected properly. Existing works [hans2019pac] propose to use function types as the context, which still allows the same type function pointer substitution attacks.

Ii-B ROP and JOP Attacks

Software memory corruption bugs have existed for more than 30 years [Song13oakland], during which lots of attacks and defenses mechanisms have been proposed. In early years, attackers would inject assembly codes (called shellcode) into the application memory and then jump to the injected instructions. However, since Data Execution Prevention (DEP) has been proposed, the WX has been supported by almost all mainstream architectures, and injection code in the writable memory area became impossible.

With code injection being defeated, attackers cannot inject new code, begin to reuse existing code to construct new attack functions. This kind of attack is termed as code reuse attacks. To launch a code reuse attack, the attacker first hijacks the program’s control flow to execute deliberately selected assembly instruction snippets, called gadgets. Gadgets can be chained together to construct new functions for malicious ends.

Depending on the control data that the attacker hijacks, code reuse attacks can be divided into two categories: return-oriented programming (ROP) and jump-oriented programming (JOP). In return-oriented programming (ROP) attacks, the attacker controls the call stack through vulnerabilities, such as buffer overflows, then by injecting the gadgets’ addresses as the return addresses, the control flow of the program is redirected to the gadgets. In ROP, each gadget ends with a return instruction ret, and that is why it is called return-oriented programming. To defend against ROP attacks, security researchers proposed address space layout randomization (ASLR), which randomizes the address of code, stack, and heap, making it hard to predict the code gadgets’ addresses and the buffer overflowed address. However, ASLR is vulnerable to address leaks [belleville2019kald]. Its design cannot solve the return address corruption problem fundamentally.

JOP attack, on the other hand, overwrites the function pointers. When the program calls the corrupted function pointer via instructions like blr or br, the program’s control flow is hijacked by attackers. Gadgets in JOP end with a jump instruction, such as blr or br on ARM and jmp on x86, hence gains the name of jump-oriented programming. To defend against the JOP attack, researchers proposed control-flow integrity. The main idea is to check the function pointers jumping targets according to the Control Flow Graph (CFG) or function pointer type so that only the targets are in the CFG or jump targets are the same type with the function pointer, then the jump is allowed.

Iii Threat Model and Assumptions

Iii-a Threat Model

The attacker in our paper is powerful with arbitrary kernel memory read and write capability. However, the attacker cannot change existing kernel code or inject new code to the kernel. This is reasonable as WX is supported by all mainstream CPU architectures. Moreover, new isolation based designs [azab2014hypervision, azab2016skee] use trust execution environments, such as TrustZone [arm-tz], to protect the kernel code against different kinds of kernel vulnerabilities.

Even though the attacker gains arbitrary kernel memory read and write capability via kernel vulnerabilities [zhang2019pex], he/she still cannot read or write the registers directly, such as the PA key registers. However, the attacker is able to read and write the register contents that are saved to the kernel memory.

With the arbitrary kernel memory read and write capability, the attacker tries his/her best to change the control data, such as function pointers or return addresses, to gain code execution capability in kernel space. The attacker can corrupt the function pointers in the kernel data section, stack, and heap or the return addresses on the kernel stack. The attacker may also try to guess the PAC value or launch pointer substitution attacks to replace the original pointer value with the interested PACed pointer values.

Iii-B Assumptions

We assume that the kernel boot-up process is trusted. This is a valid assumption as the bootloader can verify the cryptographic hash of the kernel binary easily when loading kernel image to memory. The boot time verification guarantees the integrity of the kernel image, as well as the trustworthiness of the kernel boot-up. After the kernel fully boots up, allowing system calls, that is when the vulnerabilities can be triggered, and the kernel can be attacked.

We further assume the random number generation in the kernel is trusted so that the generated random number has the expected random entropy. As a result, the attacker cannot guess the PA key easily. Finally, we assume that the hardware works as defined by the ARM specification, especially for the Pointer Authentication related hardware.

Fig. 2: PACKER Overview. PACKER contains two stages: compile time and runtime.

Iv PACKER Design

Iv-a Overview

As mentioned in section II-A, due to the limited number of key registers, ARMv8.3 PA is vulnerable to pointer substitution attacks. For example, even though the attacker does not have the key, it can trigger vulnerabilities to leak a PACed code pointer, and substitute the attacking pointer with this leaked code pointer. The leaked code pointer already has a valid PAC, thus can pass the PA authentication. In this way, the attacker can still launch JOP attacks, in which the function pointers become the replayed PACed function pointers. Existing works [hans2019pac] proposed to use function type as the context when generating PAC, which achieves the same protection with the fine-grained CFI [tice2014enforcing] that only allows a function pointer to jump to a set of functions with the same function signature at runtime [qualcomm-ret-addr]. Pointer substitution attacks still exist for the code pointer with the same type (function signature).

To defeat pointer substitution attacks, PACKER proposes address-based PAC, in which the virtual address of a function pointer variable is used as its context when computing the PAC. Therefore, we have

PAC = QARMA(key, pointer, address),

where the key the 128-bit encryption key, the pointer is the function pointer value while address is the virtual address of the function pointer variable. The basic idea behind address-based PAC is that all function pointers in kernel memory are within the same address space, therefore all of them have different virtual addresses. To defend against pointer substitution attacks, PACKER leverages the unique virtual address to generate unique pointer PAC, so that one PAC is bonded to one particular address, cannot be replayed to other addresses.

Overall, PACKER consists of two stages: compiling stage and run-time stage, as shown in Figure 2. During the compiling stage, PACKER relies on Clang to compile kernel source code to LLVM IR. Then on the kernel IR, PACKER first analyses global variables and all data structures inside a module, and then identifies function pointers inside the module section IV-C.

After that, with the identified function pointer information, PACKER instruments the IR and the backend instructions that involve the function pointer store, load and branch operations section IV-D. Finally, for the return address, PACKER inserts PAC generation code in the function prologue, as well as the PAC checking code in the function epilogue section IV-E.

After the compiling stage, a PA-instrumented vmlinux binary is generated. During the runtime stage, especially the kernel boot-up process, PACKER first configures the registers and initializes the PA keys section IV-F. Then PACKER generates PAC for the statically initialized function pointers and dynamically function pointer assignments that happen before PA initialization section IV-G. After that, PACKER functionality is complete, it protects all code pointers inside kernel memory, including function pointers and return addresses.

Fig. 3: Code in kernel which passes FP as a function argument. The argument func of __async_schedule function is a function pointer, its address information is lost at its call site in Line 5.
Fig. 4: The pointer format with pointer-address piggyback. A piggyback pointer contains the function pointer value, the PAC, and the function pointer address.

Iv-B Function Pointer Address Propagation

Pointers with PAC should be authenticated at both the function pointer load (loading a function pointer from memory to a register) and the function pointer branch (jumping to a function pointer). In address-based PAC, authenticating at load site is straightforward as the pointer’s address and value can obtain easily from the load instruction.

However, for PAC authentication at branch (call) site, deciding the address of a function pointer becomes much harder. If the function pointer call is within the same function as the pointer loading, we can get the function pointer address by going through the use-def chain of the function pointer. However, in kernel, the gap between the loading and branching can be inter-procedure. For example, the kernel has hundreds of places that load a function pointer and pass its value as a parameter to a callee function, while the actual function pointer branch is in the callee function, as shown in __async_schedule in Figure 3 and do_dentry_open function in Figure 6(a). In those cases, it is impossible to specify the function pointer address at the callee site. One native solution can be changing the callee function by adding the address as an additional parameter. However, changing hundreds of calling functions is not practical.

In order to address the function pointer address propagation problem, we propose pointer-address piggyback. The basic idea is to piggyback the address on the function pointer, so that the function pointer always carries its address. As a result, we can always get the function pointer’s address whenever we use the function pointer. To achieve pointer-address piggyback, we encode the function pointer value so that the encoded function pointer contains the point-to value, the PAC, and the address, as shown in Figure 4. Our key observation is that for kernel with defconf configuration, the total number of address-taken functions is less than 10k, which can be encoded by 14 bits (). Therefore, we can use 14 bits to index a function pointer value (the point-to address). PAC needs 7 bits, giving us 43 bits for encoding the address, as shown in Figure 4. As ARM/ARM64 are word (4 bytes) aligned, which means the 43 bits can be used to index 45 bits of virtual memory.

1:function AnalyzeModule(M)
2:     
3:     
4:     do
5:         
6:         for all  do
7:              
8:         end for
9:     while 
10:     return S
11:end function
12:
13:function AnalyzeFunction(F, S, STI)
14:     for all  do
15:         for all  do
16:              
17:         end for
18:     end for
19:     for all  do
20:         for all  do
21:              
22:         end for
23:     end for
24:     return
25:end function
26:
27:function AnalyzeInst(I, S, STI)
28:     for all  do
29:         if  then
30:              
31:         end if
32:     end for
33:     
34:     
35:     return
36:end function
Algorithm 1 Precise function pointer identification

Iv-C Identifying Function Pointers

To locate instructions to be instrumented, identifying function pointers among all the variables inside a module is required to be precise. Any mistaken instrumentation causes unexpected behaviors or even kernel panic. Note that only relying on function pointer type can hardly cover all of the function pointers, as some function pointers are typed as void* or worse, 64-bit integer. Only when we combine program semantics on these variables, e.g., they become targets of indirect calls or they are assigned with function pointers, can we identify them as function pointers. We have also found that some fields inside a struct type are not function pointer type but contain function pointers, as shown in Figure 5. Recording these fields can help us to discover more corner cases.

Following these insights, we have developed an intra-procedure and field-sensitive analysis method to precisely identify all the function pointers based on LLVM IR, whose details are given in Algorithm 1. Note that function pointer identification is totally different from function pointer alias (point-to) analysis. PACKER only requires to distinguish function pointers among all variables, while the later one tries to determine the point-to set of a function pointer.

Before diving into the details, we need to clarify the terms. M denotes a module, F denotes a function and I denotes an instruction. Set S in AnalyzeModule contains all the identified function pointers by our algorithm. We use function pointer field to denote a field that contains a function pointer inside a struct. STI stores all function pointer fields to support our analysis.

Iv-C1 Global Information Collection

The first step is to analyze data structure and statically initialized global variables, whose results can serve as basic information for subsequent function pointer identification and function pointer patching during kernel’s early boot section IV-G.

In AnalyzeGlobalFP at creftype 2, we first walk through the initialization list of all global variables. The set of global variables which are initialized by function names are used to initialize S, including those laying inside a struct. If they are part of a struct, the struct as well as their fields inside the struct are also stored into the set .

The set enables us to pick out initial function pointer fields: if a field with its struct type in set , or if it is of function pointer type, it will be stored to STI. This step is done at creftype 3. Each record of a function pointer field consists the type name of a struct and a sequence of indices to reach this field. These struct names and indices can be duplicated because a struct can be nested in kernel, such as task_struct. We organize STI by a directed acyclic graph so that we can store all function pointer fields with faster query speed and less memory overhead. A node in the graph represents a struct type that contains a function pointer field, or just a basic type which may contain function pointers if it has no successor. An edge from node A to B indicates type B is included in type A.

By analyzing module global information, we get initialized S and STI, which hold basic information for analysis on each function. After finishing analysis on a function, S and STI are also updated by the analysis results. We analyse each function inside the module in one iteration and our algorithm will iterate continuously until S no further changes.

Iv-C2 Per-Function Pointer Identification

As described from creftype 14 to creftype 18, rather than considering conditional or loop relationships among basic blocks, we linearly walk through each instruction inside a function because they do not affect the analysis result. More specifically, we focus on type information and its propagation across values, taking a loop just once or multiple times makes no difference on our analysis.

For each IR instruction, all its operands that can be identified as function pointers by global information are first added to S, i.e., we check for each operand if it is of function pointer type, or comes from a function pointer field. Then a propagation rule based on the kind of the IR instruction, namely the transfer function of the IR, is enforced to decide whether the rest operands and the instruction output should be added to S. So far we have implemented transfer functions for seven kinds of IR instructions: bitcast, icmp, phinode, store, load, getelementptr and call, the details of which are further revealed in section V-B.

Note that our algorithm iterates twice—-first in forward and then in backward directions inside a function as indicated at creftype 16 and creftype 21. We do not simply iterate only once since an indirect call instruction does not propagate function-pointer attribute but generates the attribute for the called pointer. The function pointer operand added to set S by an indirect call cannot propagate to the operands in the previously visited instructions. Therefore, we add an extra backward iteration to fully propagate function-pointer attribute. Intuitively, set S will become stable after the two rounds as no more function pointer attribute is generated during the second iteration—-it just propagates along the instructions.

Fig. 5: Even though both sides of the assignment at line 13 are void* type. PACKER can identify both of them as function pointers through field-sensitive analysis.

Iv-C3 Precision

Our pointer identification algorithm is able to precisely identify most of function pointers inside the kernel, can eventually achieve 100% after handling the corner cases in kernel. These corner cases and our measures will be later detailed in section V-D. Generally, the precision of our algorithm comes mainly from three aspects:

Exclusiveness of function pointers: we assume a function pointer is exclusive, i.e., it should always point to executable code after initialization and should not contain data of any other types at any time. In fact, this rule works in most times because mix use of function pointers and other data in one variable could be dangerous. For example, a variable that contains a data pointer could be called in this case, which leads to arbitrary code execution. The exclusiveness has made our analysis more precise because in our algorithm, a variable must or must not be a function pointer, rather than may be.

Semantic awareness: Rather than depending only on type information, PACKER also utilizes program semantics to identify function pointers. Based on the semantics of the seven kinds of instructions we have modeled, PACKER are capable of identifying more function pointers.

Field sensitivity: PACKER takes advantage of field-sensitive analysis to achieve higher precision. Figure 5 gives us a code snippet in kernel. At line 13, both bdev->bd_holder and holder are of void* type and they are neither called or assigned with function pointers within blkdev_get. Consequently, an algorithm with only semantic awareness will not identify them as function pointers.

However, PACKER records all the function pointer fields to assist function pointer identification. Note that the same field bd_holder in block_device has been assigned a function bd_may_claim. So PACKER considers this field as a function pointer field and records it in STI. When PACKER encounters the same field inside bdev->bd_holder, it immediately identifies the variable as a function pointer. Then, holder is also identified as a function pointer from semantics of the assignment.

The precision of our algorithm ensures that instrumentation instructions are inserted into correct locations. Inserted instructions will enforce function pointer integrity and return address integrity, which are introduced in the next two sections.

(a) C code of ptmx_fops.open assignment. Line 12 shows that ptmx_open is assigned to the function pointer open inside the struct ptmx_fops.
(b) Assembly code of ptmx_fops.open assignment without PACKER instrumentation. Line 6 shows the actually function pointer store operation, in which x0 holds the function pointer value.
(c) Assembly code of ptmx_fops.open assignment with PACKER instrumentation. Line 6 shows the PAC generation instruction, in which x0 holds the pointer value, x1 holds the address of the pointer.
Fig. 6: Function pointer store operation in PACKER.

Iv-D PAC Instrumentation on Function Pointer Store/Load/Branch

Iv-D1 Generating PAC on Function Pointer Store

All function pointers in memory should be protected by PAC to defeat attacker’s corruption, thus they should be PACed before storing into memory. Function pointers are stored into memory under the following three cases: 1) Statically initialized function pointers which are loaded from binary to memory. 2) Dynamically assigned function pointers which are saved by store instructions. 3) Byte-object function pointers which are treated as a sequence of bytes and copied from another address by memcpy and memmove.

Function pointers in the first case are PACed dynamically during kernel initialization after PAC keys setup. In fact, some dynamically assigned function pointers before PAC key initialization also need to be patched and we leave the details to section IV-G.

Most of kernel function pointer store falls into the second case. For this case, PACKER gets the function pointer and the address to be stored, PACs the function pointer before the storing instruction, as shown in Figure 6. Line 12 in Figure 5(a) shows that ptmx_open is saved into function pointer open inside struct ptmx_fops. Line 6 in Figure 5(b) shows the actual store instruction. Figure 5(c) shows the assembly code after PACKER instrumentation. Line 7 shows the PAC generation instruction, in which x0 holds pointer value while x1 holds the pointer’s address. Some already loaded function pointers may be in piggyback form by our design, while others may be in normal form, e.g., immediate function addresses. PACKER is able to distinguish these two kinds of pointers as normal kernel function pointers have a fixed pattern in most times. But piggybacked pointers never follow this pattern. They are decoded to normal pointers before PACed.

The third case is more special as function pointers are decomposed to bytes and lose their type information during memcpy or memmove. We wrap these functions with pac_memcpy and pac_memove and replace them. Our wrapper functions check every byte of the destination object and use the pattern of PACed function pointers to match function pointers. We do not match on the source object as it could be overlapped with the destination and changed during memmove. Strict matching rules are adopted in matching. Old PAC is stripped from a recognized function pointer, which is then patched with new PAC calculated by its new address as the context.

(a) C code of load and branch operations. Line 6 shows a function pointer loads while Line 8 is a function pointer branch (function pointer call).
(b) Assembly code of ptmx_fops.open without PACKER instrumentation. Line 4 loads the function pointer to register x0. Line 7 moves function pointer to register x8. Line 10 is the function pointer branch, which calls function pointer in x8.
(c) Assembly code of ptmx_fops.open with PACKER instrumentation. PACKER adds Line 6 and Line 14. Line 6 is the PAC authentication code while x0 holds the PACed pointer while x1 holds its address. Line 18 is the authenticate and branch instruction, which authenticates the PACed pointer in x8 first. x18 holds the pointer address.
Fig. 7: Function pointer load and store operations in PACKER.

Iv-D2 Authenticating PAC on Function Pointer Load and Branch

PACKER authenticates function pointers on both function pointer load and branch. Authenticating function pointers on branch instructions is easy to understand because it prevents function pointer branch instructions from being abused as JOP gadgets to jump to arbitrary addresses. However, function pointer loads also need to be authenticated for two reasons. First, the function pointer is loaded from memory, thus may be corrupted by the attacker, as the attacker has arbitrary memory write capability. Second, the function pointer is loaded into registers, thus it may propagate across different registers and used by other instructions, such the PAC generation instruction and store instruction. Without function pointer load authentication, the attacker can corrupt one function pointer in the memory to be a malicious code address. When the function pointer is loaded, a malicious code address can propagate to the PAC generation and store instructions. As a result, the PAC generation and store instructions can be abused by the attacker as a signing gadget. To break this signing gadget, we either authenticate function pointer before PAC generation instruction or authenticate on function point load. As the PAC generation instructions must accept un-PACed function pointers, we cannot authenticate them before PAC generation. Therefore, PACKER authenticates all function pointer loads, so that no illegal function pointer can sneak into registers.

To authenticate function pointer load, PACKER instruments all loads with PAC authentication. Line 6 in Figure 6(a) shows a function pointer load, which is compiled into ldr instructions of Line 3-4 in Figure 6(b). Line 6 in Figure 6(c) shows the PACKER instruments PAC authentication instruction that authenticates the PACed pointer in x0 with x1 holding the address as the context.

The other place to check PAC is at function pointer branch instruction (indirect call site). If a function pointer passes PAC check upon loading, it is transformed into piggyback form. A piggyback pointer is then extended to a PACed function pointer for the second PAC check at function pointer call site (function pointer branch). As shown in Figure 6(c), at the function pointer call site, the original blr instruction is replaced by blraa in Line 14, which authenticates the pointer in x8 first by using address in x18 before branching.

Iv-D3 Function Pointer PAC Instruction Insertion

To insert PAC generation and authentication code shown in Figure 5(c) and Figure 6(c), PACKER performs instrumentation in both LLVM IR and LLVM backend, as shown by the “Store/Load/Branch Instrument” block and the “FP PAC Instruction Insertion” block in Figure 2.

In LLVM IR, PACKER first inserts IR instructions to indicate function pointer store, load, and branch respectively. The inserted calling instructions call different functions depending on the instrumented instructions, such as a call to pac_store is inserted to replace function pointer store, pac_load to replace function pointer load. For function pointer branch, things are a little bit different. We do not replace function pointer call, but add a pac_call just before it. The parameters for pac_store and pac_load are the function pointer and its address, which can be fetched from operands of store and load instructions. For pac_call, its parameter is the called function pointer, which is supposed to be a pointer in piggyback form at run-time and can be decoded to a PACed pointer with its address. The inserted instructions will then be lowered from IR to LLVM machine instructions in the backend.

Different from LLVM IR, LLVM machine instructions are closely related to the target architecture, including CPU-specific instructions. Therefore PAC-related instructions, which are only available on ARMv8.3 and newer ARMv8 architectures, are inserted in LLVM backend. Inserted machine instructions will replace the stub function calls we have added. For example, when PACKER backend encounters a call to pac_store, it replaces it with a PAC-generating instruction followed by a store instruction storing the PACed pointer to the address just like what Figure 5(c) shows. Note that the original function pointer and the address are already in register X0 and X1 as the function parameters of our stub calls. Dealing with pac_load is just a similar process. However, stub calls to pac_call is a little different. We must decode the input piggyback pointer into a PACed pointer and its address, which are needed to be stored in two registers for blraa. Note that we cannot directly use X0 and X1 to save them because they are parameters passed into the indirect call. Instead, we save them to virtual register provided by LLVM backend. The virtual registers will be mapped to registers that are not live at this point automatically by LLVM backend. Figure 6(c) displays the output of PACKER backend for indirect calls.

(a) Existing design for protecting return address. Interrupt can happen between autiasp and ret.
(b) Return address protection in PACKER. Compared with existing design, retaa guarantees the atomicity.
Fig. 8: Return address protection in PACKER.

Iv-E Return Address Protection

The return address PAC generation and checking reuses the existing idea [qualcomm-ret-addr]: using the stack pointer as the context, generating the PAC for the return address register before pushing it to the stack and verify the PAC right after loading it from the stack, as shown in Figure 7(a). However, the existing design uses separate PAC authentication instruction autiasp and return instruction ret. The interrupt may happen in between, giving the attacker chances to launch time of check to time of use (TOCTTOU) attack. For example, during the interrupt, the authentication of x30 already passes, and its value will be saved on the interrupt stack. The attacker can overwrite x30 value on stack and jump to an arbitrary place on return.

To address this problem, PACKER proposes to use the retaa instruction to authenticate and ret in one instruction. PACKER guarantees the atomicity and improves the security by eliminating the gaps between time of check and time of use.

Note that in kernel space, every thread has its dedicated kernel stack, while the stack pointer points to the stack frame within these per-thread kernel stacks. In other words, the stack pointer sp for different thread is guaranteed to be different, while using these stack pointer as the PAC context makes sure that the PAC cannot be replayed across different thread in kernel. More specifically, the stack pointer is per-thread, and per-stack-frame, making it virtually impossible to launch the pointer substitution attacks.

Iv-F Pointer Authentication Initialization

Before utilizing pointer authentication (PA) to protect kernel function pointers and return addresses, we must first setup pointer authentication keys. Current Linux kernel only provides the PA for user space, not for kernel itself yet [kernel-pa, kernel-padoc]. Moreover, PA keys in Linux kernel are saved in kernel memory without any protection [kernel-pakey], makes it vulnerable to arbitrary kernel memory read/write attacks.

To setup the PA environment for kernel, we set TBI and TBID bits in tcr_el1 as soon as start_kernel is called to ensure PAC is 7 bits. Then we configure the sctlr_el1 to enable the IA and IB keys for PA. After that, PACKER needs to invoke kernel random number generation functions to generate 128 bits random numbers and set it into PA key registers.

Note that randomness functionality is not enabled at the very beginning of kernel boot. As a result, even the PA key registers are setup right after kernel randomness initialization, hundreds of functions that invoking function pointers are already executed before PA initialization. This will impose several challenges to PACKER. First, before PA key setups, a call to a function pointer may happen. As we mentioned before, a PAC check would occur at a function pointer callsite, the check will never pass without setting the keys. To avoid checking failure and crash, PACKER adds key setup check, if key has not setup, PACKER will not check PAC signature. Here PACKER only adds the check code to codes executed at kernel initialization process, and all those codes will be freed after kernel boot up, so this will not jeopardize the security of PACKER.

Second, hundreds of function get executed before PA initialization. These functions may have pointer load and store operations. Right after key setup, PACKER will patch the statically allocated function pointer with the correct PAC. However, the store operations happens before PA key ready stores the un-PACed pointer values. Therefore, PACKER must be able to trace all the store operations and patch all these locations. To address this problem, PACKER proposed to instrument all store operations. If PA is not ready, will allow the operation proceed, but will record the target address for later patch, details in section IV-G.

Iv-G Statically and Dynamically Initialized Function Pointer Patch

Same with userspace case [hans2019pac], the statically allocated and initialized function pointers do not contain PAC signature, as the pointer authentication key is not available at the compiling time. Those function pointers need to be patched with proper PAC after we set PA key. However, different from user space, the kernel cannot rely on the loader, so it must figure out the addresses which need to be patched.

To address this problem, during the compiling time, PACKER emits all addresses of the statically allocated functions. Especially for statically allocated kernel structures that contain function pointer fields, PACKER emits the address of the kernel structures and the offsets of the function pointer members. For patching, PACKER maps this information to the actual pointer addresses during the kernel booting up. For each statically allocated and initialized function pointer variable or the function pointer member inside a statically allocated structure, PACKER first reads its value, calculates the PAC value, and writes back to the memory. Note that this is all done after PA key is set.

As mentioned before, kernel randomization is not enabled at the very beginning. As a result, hundreds of functions get executed before the PA key is ready. These functions involve function pointers assignment. In other words, besides the statically initialized function pointers, PACKER also needs to patch function pointers that get initialized dynamically before PA key is ready. To do this, as all function pointer stores are already instrumented, PACKER checks the PA key status before the PA code generation and the store. If PA key is not ready, PACKER skips the PA code generation, just stores the raw pointer value. At the same time, PACKER will record the address of this function pointer. After PA key initialization, PACKER will come back to calculate the PA code for all recorded addresses, and update the pointer value with the correct PAC.

V Implementation

In the implementation section, we first give out the environment settings we used for our implementation. Then we talk about the PACKER modification on compiler and the kernel. Finally, we present the details of the practical issues we encountered during our implementation.

V-a Environment Settings

We have implemented PACKER on LLVM 10 and Linux kernel v5.0.1. PACKER kernel is built at optimization level O2. The kernel binary is running on ARMv8-A Fixed Virtual Platforms (FVP) based on Fast Model v11.7.30. FVP is a software simulator from ARM, which provides pointer authentication hardware simulations. FVP environment is set up on Ubuntu 18.04, running on Intel i7-7700. To boot up the kernel with PACKER on FVP, we also wrap a bootloader with PACKER kernel using boot-wrapper and build a minimal initram file system with buildroot.

V-B Compiler Modifications

PACKER Clang/LLVM implementation contains 3 passes, about 2600 lines of code. LLVM organises all its analysis and optimization functionalities in the unit of pass, therefore, both of our works on IR level and on backend are implemented in passes. LLVM passes are executed in a serial order and the positions of our passes in the order can affect the result. On IR level, IR output of the prior passes may be optimized by the optimization passes. Therefore, to avoid our inserted IR from being optimized, we put all our IR passes together at the end of all IR passes. On backend, LLVM machine code is lowered by passes. In this process, it loses high-level information, such as the type info and the virtual registers info, and gets closer to the real target assembly code. Some of our backend passes use virtual registers so they must run before the register allocation pass. The others are put to the end of the machine code passes.

For ease of use, We have also modified Clang/LLVM backend to support command line flags. One can pass -mkfpi flag to enable IR-level functionalities while pass -mllvm -aarch64-enable-kfpi flags to enable backend instrumentation.

V-B1 IR-level Implementation

We have added three passes to the LLVM passes to implement function pointer identification section IV-C and IR instruction instrumentation section IV-D, namely InitPass, MarkPass, and InstrumentPass.

First, the InitPass analyzes global variable initialization and data structure that contains function pointers, and store the results for subsequent use. Note that it does not change IR code and will pass the IR to MarkPass as soon as it finishes its work.

MarkPass is responsible for identifying all IR values which are function pointers, i.e., calculating the set S in Algorithm 1. Here we focus on implementation details about the transfer functions of the seven instructions: bitcast transforms the type of an IR value and we put both the output and the input value into set S if any of them is in S. icmp compares two integers and decides the result according to the input condition. Therefore, either one of the two operands belonging to S indicates the other should be also added to S.

phinode is an implementation of in static single assignment(SSA) form. It chooses one of inputs as its output according to its predecessor basic blocks. We just add all the values including the output to S if one of these values is in S. The strategy is based on our insight that in most cases, no value could be a function pointer in one condition but a data pointer or a normal integer in another. A special case is the union type which may violate our assumption, details in section V-D.

store instruction saves the first operand into the memory pointed to by the second operand and has no output. If the first operand is a function pointer, then the second operand should be a pointer to a function pointer. Note that we have to distinguish function pointer and the pointer to a function pointer for store so we add an extra attribute level to each element in set S. Level 0 means the element is a function pointer, level 1 means a pointer to a function pointer and so on. Consequently, the second operand in store must be one level higher than the first operand. load instruction just does the opposite process of store.

getelementptr, also known as GEP, takes a struct or array pointer and a sequence of indices as its inputs. It calculates the offset from the indices and outputs the pointer incremented by the offset. The instruction is usually used to index a field inside a struct or array. As we maintain all function pointer fields information including the struct types and indices in a DAG STI (Algorithm 1), we can decide the result of a GEP is a level 1 function pointer if its inputs forms a path in the DAG. Finally, call instructions just adds the called operand to S.

IR code is not modified in MarkPass either and passed to InstrumentPass, where the instrumentation IR code is finally added according to S. InstrumentPass also outputs the global variables initialized by function pointers with their offsets inside a struct to support function pointer patching during early kernel boot.

V-B2 Backend Implementation

We have added four backend passes to LLVM AArch64 backend, i.e., VirtRegPass, BranchPass, RAPass and MemcpyPass. VirtRegPass allocates virtual registers to save the PACed function pointer and its context needed by blraa. BranchPass changes all the blr to blraa using the two virtual registers as inputs. Both of the two passes must execute before register allocation. The other two are executed at the end of all the backend passes. RAPass is for return address protection, it first locates function frame setup and destroy and then inserts paciasp before frame setup and autiasp after frame destroy. MemcpyPass replaces all calls to __memcpy and __memmove to pac_memcpy and pac_memmove, respectively. Although this has been done once in IR, we do this just in case that a sequence of assignment on continuous memory is optimized to __memcpy at backend, which actually happens at optimization level O2.

V-C Kernel Patching

PACKER’s kernel modification has about 600 lines of code, including initializing pointer authentication hardware and patching the un-PACed function pointers. The PA initialization is mainly set up the registers to enable PA functionality. Moreover, it also calls the kernel random number generator to generate PA key. As a result, the PA can only be initialized after kernel enables the random number generation functionality. Therefore, PA initialization is done right after add_device_randomness in start_kernel.

It is worth mentioning that PA initialization code contains the sensitive instructions that loading the PA key. To guarantee the security and remove all PA key manipulation related instructions, we mark all PA initialization code as init text, so that it will be freed by free_initmem as soon as the kernel boot completes. As mentioned before, PACKER replaces all blr instructions to blraa. For br instructions, PACKER changes kernel build by adding -fno-jump-tables to instruct the compiler not use br instructions.

For function pointer patching, PACKER patches all statically initialized function pointers and the function pointers that get assignment before PA initialization. To achieve this, PACKER uses a giant array to hold all the addresses of pointers to be patched. It also inserts code after PA initialization to generate PAC for each pointers. Again, the array is marked as init data, so that it will be freed after booting up to save the memory.

Note that PACKER is designed only for kernel code pointer protection, with only one PA key register is used, and leaving the other four PA key registers for user space PA protection. Therefore, PACKER is design with the consideration of user space PA compatibility.

V-D Practical Issues

We encountered numerous practical issues during our implementation of PACKER. Due to the space limitation, here we only discuss several of them in detail.

Fig. 9: An example of function pointer comparison in kernel. Line 6 compares a function pointer with a function name.

V-D1 Function Pointer Comparison

In kernel, function pointers are usually used to compare with a function name directly, as shown by Line 6 in Figure 9. The comparison can be against a function name, which is the constant address of a function, or in some rare cases, against magic numbers like 1 or 2. In our implementation, the value loaded from the pointer would be transformed into the piggyback form, therefore its value will not match at every comparison. In those cases, we need to restore the function pointer after loading from the memory. The implementation of this part is pretty straightforward: our IR pass will traverse every CmpInst and check if the type of the operand is function pointer, if yes, replacing the operand with the restored value.

Fig. 10: An example of function arithmetic in kernel. Line 17 contains function pointer increment fn++.

V-D2 Function Pointer Arithmetic

Besides comparison, arithmetic on function pointers also exists in Linux kernel. For instance, do_initcall_level in Line 15 of Figure 10 passes function pointer fn to do_one_initcall while fn is calculated using the base address and the offset, as shown in Line 17. Fortunately, such case only appears once at kernel boot up stage. As we trust the kernel boot up, we consider the content in variable initcall_levels as benign values, so in our implementation, after the function pointer is calculated, we generates the PAC using a constant context and the value of the pointer, so that it could pass the PAC authentication at blraa.

Fig. 11: An example of physical address used as a function pointer. The function pointer replace_phys gets the physical address at Line 6, is invoked at Line 9.

V-D3 Function Pointer Holding Physical Address

In Linux, for certain memory management unit (MMU) related function, the kernel will use its physical address directly, rather than virtual address (More precisely, for identical map, the virtual address and the physical address are the same).

As shown in Figure 11, function idmap_cpu_replace_ttbr1’s physical address is assigned to function pointer replace_phys at Line 7. After that, the kernel turns off the memory management and directly branch to the physical address holding in replace_phys at Line 11. Unfortunately, this breaks our rule that all operands of an indirect call should be a piggyback form pointer, and the system will go panic when the corresponding blraa is executed. As we mentioned in previous section, we add several instructions to change the pointer into piggyback form with a constant context before the indirect call, and this will not increase attack surface as the address is constant value from a adrp instruction, which cannot be changed by the attacker.

Fig. 12: An exmaple of union that contains function pointer. _sigev_un is a union. Function pointer _function is at Line 8.

V-D4 Function Pointer in Union

Union type in kernel can contain a field that can be both function pointers and data such as integers, as shown in  Figure 12. The data inside union variables are treated as function pointers or integers accordingly. This case breaks our assumption that a function pointer variable cannot contain data of other types. As a result, Algorithm 1 may mistake an integer for a function pointer. To address this problem, we use the alignment size of the field to distinguish whether a union field is a function pointer or not. Our key observation is that function pointer value is 64-bit, need to have 64-bit alignment. Therefore, if the program is using the field as an int32, the entry is considered to be used as data, either _pad or _tid in the figure; otherwise the type of the field is a function pointer, and we need to insert the PACKER code.

Note that the case is rare in Linux kernel and once we adopt the above rule, we can filter out all the troubles brought by union type.

Vi Evaluation

In this section, we evaluate both the security and performance of PACKER.

Vi-a Security Analysis

For the security evaluation, we want to examine if PACKER can protect both function pointers and return addresses. Therefore, we first analyse the function pointer and return address coverage. We objdumped the generated vmlinux and extracted all the function pointer branch instructions and function return instructions.

For function pointer branch instructions, PACKER compiler component replaces all blr instructions in C code by blraa instruction during the compiling process. PACKER also manually changes the blr instructions in assembly code to blraa. As a result, 100% of all function pointer branch are checked by PACKER. Compared with iOS kernel PA implementation which has blr residuals, PACKER contains no raw blr instructions, thus is more secure. Note that blraa does the PAC authentication and function pointer branch in one single instruction, giving the attacker no chance to launch time of check to time of use attacks.

For function pointer storing and loading, the pointer authentication hardware on ARMv8.3 does not provide the atomic instructions for the authenticate-store as well as the load-authenticate. Therefore, function pointer store and load are still vulnerable to time of check to time of use attacks. Here, we want to argue that this is a hardware limitation. Also, even though the store and load are not atomic, the final function pointer branch will be authenticated before branch atomically by PACKER, which can defeat any function pointer corruptions.

For the return address, we check all functions in vmlinux dump to examine that for all functions that pushing return address in prologue and popping return address in epilogue should be protected by PACKER. We go through the whole kernel dump using a script and our result shows that PACKER protects all return address pushing operations. For all return address pushing operations, PACKER inserts a PAC generation instruction. For all return address popping from stack, PACKER changes the return to retaa so that the return address will be authenticated before the actual return. Here, different from existing schemes of separating the PAC authentication instruction and the return instruction into two instruction [qualcomm-ret-addr, arm-pa], PACKER uses a single instruction retaa to achieve the atomicity of both authentication and return. Therefore, returns protections in PACKER is more secure by defeating time of check to time of use attacks.

Fig. 13: Performance evaluation of PACKER using Unixbench. Time overhead of the original kernel is normalized to 1.

Vi-B Performance Analysis

We choose Unixbench to evaluate PACKER performance. Unixbench is dedicated for unix-like systems and can measure performance of a system from different aspects. Three Linux kernels, which have the same version and configuration but different security level, are tested. One of them is compiled with original LLVM and is used as our baseline. The other two are both protected by PACKER, but one of them does not have return address protection. For each kernel, we have conducted all the tests listed in Figure 13 and these tests focus on critical system calls. Note that all the syscall-unrelated arithmetic tests like whetstone and arithoh have little connection with the performance of PACKER kernel, so their performance overhead are averaged and treated as the userspace arithmetic test in Figure 13.

Compared with the original kernel without PACKER protection, PACKER introduces around 10%-20% performance overhead. Complex syscalls like fork and write introduce more overhead because the number of stack frames and function pointer calls is larger. PACKER also introduces around 1%-2% overhead on userspace arithmetic because PACKER also protects kernel context switch and makes it a little bit slower. Note that performance overhead does not mean PACKER kernel is 10%-20% larger than the original kernel. In fact, PACKER image size is 7.0% larger.

We believe that the performance overhead is not low, but reasonable and acceptable. It is reasonable because kernel itself is much more complex than most user application. It contains many function pointers and indirect calls. The calling stack in kernel is also badly nested. Besides, protection of context switch and indirect calls inside interruption handlers also add to our overhead. We argue that the result is also acceptable because it reflects the upper bound of PACKER overhead. In this evaluation, it is the pure syscall overhead that we have measured. A user application cannot call complex system calls like fork all the time. So for users of PACKER system, the overhead is better than our result.

In our future work, we are planning more evaluation of PACKER, including the instruction count and performance overhead break down. We also plan to optimize the performance of PACKER based on the evaluation results.

Vii Related Work

There are variant CFI mechanisms to defend code reuse attacks. Among all defense mechanisms proposed against ROP, ASLR is widely used in modern operating systems, the address of the program will be randomized under such protections so that attackers can’t easily locate the gadgets. Beside ASLR, function pointer encryption is proposed with different encrypt methods [EncodePointer, Shuffler, PointGuard] to defend JOP attacks. In those methods, function pointers will be encrypted with a process/thread specific secret key and decrypted when being used.

CFI will compute a control-flow graph in advance to ensure the control transfers are within the pre-computed graph. And most CFI mechanisms can not protect both user programs and kernels due to the huge differences in between. Moreover, as most software control-flow protection techniques suffer from high performance overhead, hardware-assists control flow protection mechanisms are proposed. These techniques [davi2014hardware, davi2015hafix, qiu2016physical, qiu2017control, zhang2018hcic, hans2019pac] leverage hardware feature or add extra hardware modules to realize protection operations, thus reduce the overhead.

Vii-a User CFI

Vii-A1 Software-Based CFI

Compact Control Flow Integrity and Randomization (CCFIR) [zhang2013practical] protects both forward-edge and backward-edge control-flow integrity for binary executables. CCFIR implements a new code segment named Springboard which contains stubs of all indirect targets (i.e. function pointers and return addresses). CCFIR redirects all indirect jump/call instructions and ret instructions to jump to stubs in Springboard with specified policies. Bin-CFI [zhang2013control] provides control-flow integrity for COTS binaries. It uses a similar design to CCFIR instrument to enforce control-flow integrity. However, both CCFIR and bin-CFI are found to be insufficient [goktas2014out, davi2014stitching].

Vii-A2 Hardware-Assisted CFI

Cryptographic CFI (CCFI) [mashtizadeh2015ccfi] employs cryptography mechanism to protect the control-flow integrity. Similar to PACKER, CCFI uses cryptographic MACs which is produced with AES. CCFI calculates and checks MACs of function pointers and return address when they are loaded. Thus, CCFI protects both forward-edge and backward-edge control flow integrity. And CCFI uses address as context to compute MAC which is the same design as PACKER. Opaque control-flow integrity (O-CFI) [mohan2015opaque] protects control-flow integrity by restricting indirect branch targets within an address bound. The address bound can be derived from source code or object code and can be randomized by code layout randomization [wartell2012binary]. When a program is loaded, O-CFI randomly selects a bound pair, which indicates the legal branch address region, from a bounds lookup table. And O-CFI needs the help of x86 segmentation selector to prevent accident leakage of the bounds lookup table. The overhead of O-CFI is 4.7%.

As ROP attacks need to continuously execute several gadgets, KBouncer [pappas2012kbouncer], as well as ROPecker [cheng2014ropecker], use Last Branch Recording (LBR), which records last executed branches, to detect ROP attacks. And the latter one has an average overhead of 2.6%. Since LBR only records last 16 branches, CFIMon [xia2012cfimon] uses branch trace store to break the limitation. The authors argue that CFIMon prevents both JOP and ROP attacks. And CFIMon has an overhead of 6.1%, higher than KBouncer and ROPecker.

Vii-B Kernel CFI

[li2018fine] and [ge2016fine] proposed fine-grained control-flow integrity solutions for kernel. Since computing kernel control flow graph is a tricky thing, they mainly focus on reducing the number of indirect control-flow targets in static analysis and both of them achieved more than 99% of indirect control-flow targets. For enforcing control flow integrity [ge2016fine] uses restricted pointer indexing [wang2010hypersafe] to enforce kernel control-flow integrity while [li2018fine] uses a similar scheme named indexed hooks [li2011comprehensive].

KCoFI [criswell2014kcofi], which extends secure virtual architecture (SVA) [criswell2007secure], provides a coarse-grained but complete kernel control-flow integrity solution. KCoFI needs to recompile the whole operating system kernel into a virtual instruction set which benefits the security by ensuring security policies are not violated. And the formal model of KCoFI is only partial proved. Both KCoFI and PACKER need support of the compiler. However, KCoFI introduces a significant performance overhead due to the virtual instruction set while PACKER employs existing hardware feature and introduce a much lower overhead.

Viii Conclusion

This paper presents PACKER, which utilizes ARMv8.3 pointer authentication for kernel code pointer protection. In particular, PACKER generates PAC for every function pointer store and return address pushing to stack, and authenticates the PAC on every function pointer load and branch, and return address popping from stack. Moreover, to defeat pointer substitution attacks, we propose a novel address-based PAC generation based on the observation that all function pointers in kernel have a different virtual address, which can be used as the context to achieve unique PAC. To achieve address-based PAC, we design the pointer-address piggyback for address propagation. We also proposed new techniques for identifying function pointers, for pointer store, load and branch authentication and for handling statically initialized function pointers.

We have implemented a prototype of PACKER based on Clang/LLVM and Linux kernel. In our implementation, PACKER is able to protection 100% of indirect call sites and return addresses. We further evaluated our implementation on ARM Fixed Virtual Platforms. For all eight tests, the performance overhead introduced by PACKER ranges from 15% to 25%.

In our future work, we plan to test the performance of PACKER thoroughly, by using different techniques such as the instruction counting. Based on the evaluation results, we also plan to optimize PACKER.

References