KASR: A Reliable and Practical Approach to Attack Surface Reduction of Commodity OS Kernels

by   Zhi Zhang, et al.
Peking University
Baidu, Inc.

Commodity OS kernels have broad attack surfaces due to the large code base and the numerous features such as device drivers. For a real-world use case (e.g., an Apache Server), many kernel services are unused and only a small amount of kernel code is used. Within the used code, a certain part is invoked only at runtime while the rest are executed at startup and/or shutdown phases in the kernel's lifetime run. In this paper, we propose a reliable and practical system, named KASR, which transparently reduces attack surfaces of commodity OS kernels at runtime without requiring their source code. The KASR system, residing in a trusted hypervisor, achieves the attack surface reduction through a two-step approach: (1) reliably depriving unused code of executable permissions, and (2) transparently segmenting used code and selectively activating them. We implement a prototype of KASR on Xen-4.8.2 hypervisor and evaluate its security effectiveness on Linux kernel-4.4.0-87-generic. Our evaluation shows that KASR reduces the kernel attack surface by 64 off 40 all 6 real-world kernel rootkits. We measure its performance overhead with three benchmark tools (i.e., SPECINT, httperf and bonnie++). The experimental results indicate that KASR imposes less than 1 to an unmodified Xen hypervisor) on all the benchmarks.



There are no comments yet.



A Reliable and Practical Approach to Kernel Attack Surface Reduction of Commodity OS

Commodity OS kernels are known to have broad attack surfaces due to the ...

MultiK: A Framework for Orchestrating Multiple Specialized Kernels

We present, MultiK, a Linux-based framework 1 that reduces the attack su...

DuVisor: a User-level Hypervisor Through Delegated Virtualization

Today's mainstream virtualization systems comprise of two cooperative co...

A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit

Autotuning of performance-relevant source-code parameters allows to auto...

On-the-fly Code Activation for Attack Surface Reduction

Modern code reuse attacks are taking full advantage of bloated software....

The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application

Software reuse may result in software bloat when significant portions of...

Divide et Impera: MemoryRanger Runs Drivers in Isolated Kernel Spaces

One of the main issues in the OS security is to provide trusted code exe...

Code Repositories


A bunch of links related to Linux kernel exploitation

view repo


Not ready yet

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In order to satisfy various requirements from individuals to industries, commodity OS kernels have to support numerous features, including various file systems and numerous peripheral device drivers. These features inevitably result in a broad attack surface, and this attack surface becomes broader and broader with more services consolidated into the kernel every year. As a consequence, the current kernel attack surface gives an adversary numerous chances to compromise the OS kernel and exploit the whole system. Although we have moved into the virtualization and cloud era, the security threats are not being addressed. Instead it becomes even worse with the introduction of additional software stacks, e.g., a hypervisor layer. Recent years have witnessed many proposed approaches which realized the severity of this issue and made an effort to reduce the attack surface of the virtualized system. Specifically, schemes like NoHype [32], XOAR [7] HyperLock [35] and Min-V [24] are able to significantly reduce the attack surface of the hypervisor. In addition, several other schemes have been proposed to reduce the huge kernel attack surface, which are summarized into the following three categories.

Build from Scratch. The first category attempts to build a micro-kernel with a minimal attack surface [1, 12, 11, 14], among which Sel4 [14] is the first OS that achieves a high degree of assurance through formal verification. Although such micro-kernel schemes retrofit security, they are incompatible with legacy applications.

Re-Construction. The second category makes changes to current monolithic kernel. Nooks [31], and LXFI [21] isolate buggy device drivers to reduce the attack surface of the kernel. Considering that the reduced kernel is still large, Nested Kernel [9] places a small isolated kernel inside the monolithic kernel, further reducing the attack surface. Besides, strict access-control policies [8, 28] and system call restrictions [26] also contribute a lot. A common limitation of these approaches is that they all require modifications of the kernel source code, which is usually not applicable.

Customization. The last category manages to tailor existing kernels without modifications. Tartler [33], Kernel Tailoring [18] and Lock-in-Pop [19] require the Linux source code of either the kernel or core libraries (i.e., glibc) to restrict user’s access to the kernel. They lack the OS distribution support due to the requirement of source code re-compiling. Ktrim [17] and KRAZOR [16] rely on specific kernel features (i.e., kprobes) to binary-instrument kernel functions and remove unused ones. Face-Change [10] is a hypervisor-based technique to tailor the kernel code. It supports neither the Kernel Address Space Layout Randomization (KASLR) [8] nor multiple-vCPU for the target kernel. Besides, it induces a worst-case overhead of %, impeding its deployment in practice.

Overview. In this paper, we propose a reliable and practical virtualized system, named KASR, which is able to transparently reduce the attack surface of a commodity OS kernel at runtime.

Consider a specified application workload (e.g., an Apache server), whose operations do not necessarily need all kernel services. Instead, only a subset of the services are invoked to support both the target Apache process and the kernel. For example, both of them always require code blocks related to memory management (e.g., kmalloc, kfree, get_page) and synchronization mechanisms (e.g., _spin_lock). Apart from that, certain used kernel functions are only used during a specific period of kernel’s lifetime and remain unused for the rest of the time. For instance, the initialization (e.g., kernel_init) and power-off actions (e.g., kernel_power_off) will only be taken when the kernel starts up and shuts down, respectively. In contrast to these used kernel code, many other kernel services are never executed. We call them unused kernel code in this paper. The unused kernel code resides in the main memory, contributing to a large portion of the kernel attack surface. For example, a typical kernel vulnerability, e.g., CVE--, is exploited via a crafted system call perf_event_open that is unused or never invoked in the Apache workload.

Motivated by the above observation, KASR achieves the kernel attack surface reduction in two steps. The first step is to reliably deprive unused code of executable permissions. Commodity OS kernels are designed and implemented to support all kinds of use cases (e.g., the Apache server and Network File System service), and therefore there will be a large portion of kernel code (e.g., system call handlers) unused for a given use case. By doing so, this step could effectively reduce a large portion of the attack surface. The second step transparently segments used code and selectively activates it according to the specific execution demands of the given use case. This segmentation is inspired by the observation that certain kernel code blocks (e.g., kernel_init) only execute in a particular period, and never execute beyond that period. As a result, KASR dramatically reduces the attack surface of a running OS kernel.

We implement a KASR prototype on a private cloud platform, with Xen as the hypervisor and Ubuntu Server LTS as the commodity OS. The OS kernel is unmodified Linux version --generic with KASLR [8] enabled. KASR only adds about SLoC to the hypervisor code base. We evaluate its security effectiveness under the given use cases (e.g., Linux, Apache, MySQL and PHP (LAMP)-based server). The experimental results indicate that KASR reduces more than kernel attack surface at the granularity of code pages. Also, we trims off of Common Vulnerabilities and Exposures (CVEs), since the CVE reduction indicates the number of CVEs that KASR could avoid. In addition, KASR successfully detects and blocks all real-world kernel rootkits. We also measure the performance overhead using several popular benchmark tools as given use cases, i.e., SPECint, httperf and bonnie++. The overall performance overheads are , and on average, respectively.

Contributions. In summary, we make the following key contributions:

  • Propose a novel two-step approach to reliably and practically reduce the kernel attack surface with being agnostic to the particular OS.

  • Design and implement a practical KASR system on a recent private cloud platform. KASR transparently “fingerprints” used kernel code and enables them to execute according to their execution phases.

  • Evaluate the security effectiveness of the KASR system by the reductions of kernel attack surface, CVE and the mitigation of real-world rootkits.

  • Measure the performance overhead of the KASR system using several popular benchmark tools. The low overhead makes KASR reasonable for real-world deployment.

Organization. The rest of the paper is structured as follows. In Section 2, we briefly describe our system goals and a threat model. In Section 3, we present the kernel attack surface, its measurement and the rationale of its reduction. We introduce in detail the system architecture of KASR in Section 4. Section 5 and Section 6 present the primary implementation of KASR and its performance evaluation, respectively. In Section 7 and Section 8, we discuss limitations of KASR, and compare it with existing works, respectively. At last, we conclude this paper in Section 9.

2 Threat Model and Design Goals

Before we describe our design, we specify the threat model and the design goals.

2.1 Threat Model

In this paper, we focus on reducing the attack surfaces of commodity OS kernels in a virtualized environment. Currently, most personal computers, mobile phones and even embedded devices are armed with the virtualization techniques, such as Intel [13], AMD [2] and ARM virtualization support [3]. Thus, our system can work on such devices.

We assume a hypervisor or a Virtual Machine Monitor (VMM) working beneath the OS kernel. The hypervisor is trusted and secure as the root of trust. Although there are vulnerabilities for some existing hypervisors, we can leverage additional security services to enhance their integrity [34, 6, 4] and reduce their attack surfaces [32, 7]. As our system relies on a training-based approach, we also assume the system is clean and trusted in the training stage, but it could be compromised at any time after that.

We consider threats coming from both remote adversaries and local adversaries. A local adversary resides in user applications, such as browsers and email clients. The kernel attack surface exposed to the local adversary includes system calls, exported virtual file system (e.g., Linux proc file system) for user applications. A remote adversary stays outside and communicates with the OS kernel via hardware interfaces, such as a NIC. The kernel attack surface for the remote adversary usually refers to device drivers.

2.2 Design Goals

Our goal is to design a reliable, transparent and efficient system to reduce the attack surfaces of commodity OS kernels.

G1: Reliable. The attack surface should be reliably and persistently reduced. Even if kernel rootkits can compromise the OS kernel, they cannot enlarge the reduced attack surface to facilitate subsequent attacks.

G2: Transparent. The system should transparently work for the commodity OS kernels. Particularly, it neither relies on the source code nor breaks the kernel code integrity through binary instrumentation. Source code requirement is difficult to be adopt in practice. And breaking the code integrity raises compatibility issues against security mechanisms, such as Integrity Measurement Architecture.

G3: Efficient. The system should minimize the performance overhead, e.g., the overall performance overhead on average is less than .

Among these goals, G1 is for security guarantee, while the other two goals (G2 and G3 ) are for making the system practical. Every existing approach has one or more weaknesses: they either are unreliable (e.g., Lock-in-Pop [19] as per G1), or depend on the source code (e.g., SeL4 [14]), or break the kernel code integrity (e.g., Ktrim [17]), or incur high performance overhead (e.g., Face-Change [10]). Our KASR system is able to achieve all the above goals at the same time.

3 Design Rationale

We first present how to measure the attack surface of a commodity OS kernel, and then illustrate how to reliably and practically reduce it.

3.1 Attack Surface Measurement

To measure the kernel attack surface, we need a security metric that reflects the system security. Generally, the attack surface of a kernel is measured by counting its source line of code (SLoC). This metric is simple and widely used. However, this metric takes into account all the source code of a kernel, regardless of whether it is effectively compiled into the kernel binary. To provide a more accurate security measurement, Kurmus et al. [18] propose a fine-grained generic metric, named GENSEC, which only counts effective source code compiled into the kernel. More precisely, in the GENSEC metric, the kernel attack surface is composed of the entire running kernel, including all the Loadable Kernel Modules (LKMs).

However, the GENSEC metric only works with the kernel source code, rather than the kernel binary. Thus it is not suitable for a commodity OS with only a kernel binary that is made of a kernel image and numerous module binaries. To fix this gap, we apply a new KASR security metric. Specifically, instead of counting source lines of code, the KASR metric counts all executable instructions.

Similar to prior schemes that commonly use SLoC as the metric of the attack surface, the KASR metric uses the Number of Instructions (NoI). It naturally works well with instruction sets where all the instructions have an equal length (e.g., ARM instructions). However, with a variable-length instruction set (e.g., x86 instructions [13]), it is hard to count instructions accurately. In order to address this issue on such platforms, we use the Number of Instruction Pages (NoIP). NoIP is reasonable and accurate due to the following reasons. First, it is consistent with the paging mechanism that is widely deployed by all commodity OS kernels. Second, the kernel instructions are usually contiguous and organized in a page-aligned way. Finally, it could smoothly address the issue introduced by variable-length instructions without introducing any explicit security and performance side-effects. In this paper, the KASR metric depends on NoIP to measure the kernel attack surface.

3.2 Benefits of Hardware-assisted Virtualization

In a hardware-assisted virtualization environment, there are two levels of page tables. The first-level page table, i.e., Guest Page Table (GPT), is managed by the kernel in the guest space, and the other one, i.e., Extended Page Table (EPT), is managed by the hypervisor in the hypervisor space. The hardware checks the access permissions at both levels for a memory access. If the hypervisor removes the executable permission for a page in the EPT, then the page can never be executed, regardless of its access permissions in the GPT. These mechanisms have been widely supported by hardware processors (e.g., Intel [13], AMD [2], and ARM [3]) and commodity OSes.

With the help of the EPT, we propose to reduce the attack surface by transparently removing the executable permissions of certain kernel code pages. This approach achieves all system goals listed before. First, it is reliable (achieving G1) since an adversary in the guest space does not have the capability of modifying the EPT configurations. Second, the attack surface reduction is transparent (achieving G2), as the page-table based reduction is enforced in the hypervisor space, without requiring any modifications (e.g., instruction instrumentation) of the kernel binary. Finally, it is efficient (achieving G3) as all instructions within pages that have executable permissions are able to execute at a native speed.

4 KASR Design

Figure 1: The architecture of the KASR system.

We firstly elaborate the design of the KASR system. As depicted in Figure 1, the general working flow of KASR proceeds in two stages: an offline training stage followed by a runtime enforcement stage. In the offline training stage, a trusted OS kernel Kern is running beneath a use case (e.g., user application ) within a virtual machine. The KASR offline training processor residing in the hypervisor space, monitors the kernel’s lifetime run, records its code usage and generates a corresponding database. The generated kernel code usage database is trusted, as the system in the offline training stage is clean. Once the generated database becomes stable and ready to use, the offline training stage is done.

In the runtime enforcement stage, the KASR module, running the same virtual machine, loads the generated database and reduces the attack surface of Kern. The kernel attack surface is made up of the kernel code from the kernel image as well as loaded LKMs. A large part of the kernel attack surface is reliably removed (the dotted square in Figure 1). Still, the remaining part (the solid shaded-square in Figure 1) is able to support the running of the use case . The attack surface reduction is reliable, as the hypervisor can use the virtualization techniques to protect itself and the KASR system, indicating that no code from the virtual machine can revert the enforcement.

4.1 Offline Training Stage

Figure 2: Offline Training Stage. The KASR offline training processor working in the hypervisor space, extracts used code from the OS kernel, segments used code into three phases (i.e., startup, runtime and shutdown) and generates the kernel code usage database.

Commodity OSes are designed and implemented to support various use cases. However, for a given use case (e.g., ), only certain code pages within the kernel (e.g., Kern) are used while other code pages are unused. Thus, the KASR offline training processor can safely extract the used code pages from the whole kernel, the so-called used code extraction. On top of that, the used code pages can be segmented into three phases (e.g., startup, runtime and shutdown). The code segmentation technique is inspired by the observation that some used code pages are only used in a particular time period. For instance, the init functions are only invoked when the kernel starts up and thus they should be in the startup phase. However, for certain functions, e.g., kmalloc and kfree, they are used during the kernel’s whole lifetime and owned by all three phases. The KASR offline training processor uses the used code extraction technique (Section 4.1.1) to extract the used code pages, and leverages the used code segmentation technique (Section 4.1.2) to segment used code into different phases. All the recorded code usage information will be saved into the kernel code usage database, as shown in Figure 2.

The database will become stable quickly after the KASR offline processor repeats the above steps several times. Actually, this observation has been successfully confirmed by some other research works [18, 17]. For instance, for the use case of LAMP, a typical httperf [23] training of about ten minutes is sufficient to detect all required features, although the httperf does not cover all possible paths. This observation is reasonable due to the following two reasons. First, people do not update the OS kernel frequently, and thus it will be stable within a relatively long period. Second, although the user-level operations are complex and diverse, the invoked kernel services (e.g., system calls) are relatively stable, e.g., the kernel code that handles network packets and system files is constantly the same.

4.1.1 Used Code Extraction

A key requirement of this technique is to collect all used pages for a given workload. It means that the collection should cover the whole lifetime of an OS kernel, from the very beginning of the startup phase to the last operation of the shutdown phase. A straightforward solution is to use the trace service provided by the OS kernel. For instance, the Linux kernel provides the ftrace feature to trace the kernel-level function usage. However, all existing integrated tracing schemes cannot cover the whole life cycle. For example, ftrace always misses the code usage of the startup phase [18] before it is enabled. Extending the trace feature requires modifying the kernel source code. To avoid the modification and cover the whole life cycle of the OS kernel, we propose a hypervisor-based KASR offline training processor. The offline training processor, working in the hypervisor space, starts to run before the kernel starts up and remains operational after the kernel shuts down.

In the following, we will discuss how to trace and identify the used code pages in the kernel image and loaded LKMs.

Kernel Image Tracing. Before the kernel starts to run, the offline training processor removes the executable permissions of all code pages of the kernel image. By doing so, every code execution within the kernel image will raise an exception, driving the control flow to the offline training processor. In the hypervisor space, the offline training processor maintains the database recording the kernel code usage status. When getting an exception, the offline training processor updates the corresponding record, indicating that a kernel code page is used. To avoid this kernel code page triggering any unnecessary exceptions later, the offline training processor sets it to executable. As a result, only the newly executed kernel code pages raise exceptions and the kernel continues running, thus covering the lifetime used code pages of the kernel image. Note that the offline training processor filters out the user-space code pages by checking where the exception occurs. (i.e., the value of Instruction Pointer (IP) register).

Kernel Modules Tracing. The above tracing mechanism works smoothly with the kernel image, but not with newly loaded LKMs. All LKMs can be dynamically installed and uninstalled into/from memory at runtime, and the newly installed kernel modules may re-use the executable pages that have already been freed by other modules in order to load their code. Thus, their page contents have totally changed and they become new code pages that ought to be traced as well. If we follow the kernel tracing mechanism, such to-be-reused pages cannot be recorded into the database. Because these pages have been traced and the processor has set them to executable, they are unable to trigger any exceptions even when they are reused by other modules.

To address this issue, we dictate that only the page currently causing the exception can gain the executable permission while other pages cannot. Specifically, when a page raises an exception, the offline training processor sets it to executable so that the kernel can proceed to next page . Once raises the exception, it is set to executable while is set back to non-executable. Likewise, the offline training processor sets back to non-executable when another exception occurs. By doing so, pages like or can trigger new exceptions if they will be re-used by newly installed modules and thus all used code pages can be traced. Obviously, this approach is also suitable for the kernel image tracing.

Page Identification. The traced information is saved in the database, and the database reserves a unique identity for each code page. It is relatively easy to identify all code pages of the kernel image when its address space layout is unique and constant every time the kernel starts up. Thus, a Page Frame Number (PFN) could be used as the identification. However, recent commodity OS kernels have already enabled the KASLR technology [8] and thus the PFN of a code page is no longer constant. Likewise, this issue also occurs with the kernel modules, whose pages are dynamically allocated at runtime, and each time the kernel may assign a different set of PFNs to the same kernel module.

A possible approach is to hash every page’s content as its own identity. It works for most of the code pages but will fail for the code pages which have instructions with dynamically determined opcodes, e.g., for the call instruction, it needs an absolute address as its operand, and this address may be different each time, causing the failure of page identification. Another alternative is to apply the fuzzy hash algorithm (e.g., ssdeep [15]) over a page and compute a similarity (expressed as a percentage) between two pages. e.g., if two pages have a similarity of over , they are identical. However, such low similarity will introduce false positives, which can be exploited by attackers to prompt malicious pages for valid ones in the runtime enforcement stage.

To address the issues, we propose a multi-hash-value approach. In this offline training stage, we trace the kernel for multiple rounds (e.g., rounds) to collect all the used pages and dump the page content of each used page. Then we build a map of what bytes are constant and what bytes are dynamic in every used page. Each used page has multiple ranges and each range is made up of consecutive constant bytes. The ranges are separated by the dynamic bytes. Based on the map, we compute a hash value for every range. If and only if two pages have the same hash value for each range, they are identical. As a result, a page’s identity is to hash everything within the page but the dynamic bytes. On top of that, we observe that the maximum byte-length of the consecutive dynamic bytes is , making it hardly possible for attackers to replace the dynamic bytes with meaningful rogue ones. Relying on the approach, the risk of abusing the false positives is minimized.

4.1.2 Used Code Segmentation

This technique is used to segment the used code into several appropriate phases. By default, there are three phases: startup, runtime, and shutdown, indicating which phases the used code have been executed in. When the kernel is executing within one particular phase out of the three, the offline training processor marks corresponding code pages with that phase. After the kernel finishes its execution, the offline training processor successfully marks all used code pages and saves their records into the database. To be aware of the phase switch, the offline training processor captures the phase switch events. For the switch between startup and runtime, we use the event when the first user application starts to run, while for the switch between runtime and shutdown, we choose the execution of the reboot system call as the switch event.

4.2 Runtime Enforcement Stage

Figure 3: Runtime Enforcement Stage. The KASR module residing in the hypervisor space reduces OS kernel attack surface in two consecutive steps. The first step (i.e., permission deprivation) reliably deprives unused code of executable permission, and the second step (i.e., lifetime segmentation) selectively activates corresponding used code according to their phases.

When the offline training stage is done and a stable database has been generated (see details in Section 5.2), KASR is ready for runtime enforcement. As shown in Figure 3, the KASR module loads the generated database for a specific workload, and reduces the kernel attack surface in two steps:

  1. Permission Deprivation. It keeps the executable permissions of all used code pages (the solid shaded square in Figure 3), and reliably removes the executable permissions of all unused code pages (the dotted square in Figure 3)

  2. Lifetime Segmentation. It aims to further reduce the kernel attack surface upon the permission deprivation. As shown in Figure 3, it transparently allows the used kernel code pages of a particular phase to execute while setting the remaining pages to non-executable.

All instructions within the executable pages can execute at a native speed, without any interventions from the KASR module. When the execution enters the next phase, the KASR module needs to revoke the executable permissions from the pages of the current phase, and set executable permissions to the pages of the next phase. To reduce the switch cost, the KASR module performs two optimizations. First, if a page is executable within the successive phase, the KASR module skips its permission-revocation and keeps it executable. Second, the KASR module updates the page permissions in batch, rather than updating them individually.

5 KASR Database

This section presents the implementation details of the KASR database, including database data-structure, database operations.

5.1 Data Structure

Basically, the database consists of two single-linked lists, which are used to manage the pages of kernel image and loaded modules, respectively. Both lists have their own list lock to support concurrent updates. Every node of each list representing a page is composed of a node lock, a page ID, a status flag and a node pointer pointing to its next node. The node lock is required to avoid race conditions and thus other nodes can be processed in parallel.

Page ID. The page ID is used to identify a page especially during the database updates. As kernel-level randomization is enabled within the kernel, we use the multi-hash-value approach for the identification. Specifically, we trace the kernel for rounds to make sure that all the used pages are collected. Pages in different rounds are considered to be identical (i.e., a same page) if they satisfy two properties: (1) more than out of bytes (i.e., over ) are constant and the same among these pages; (2) the maximum byte-length of the consecutive different bytes (i.e., dynamic bytes) among these pages is no greater than . And then we perform a per-byte comparison of the identical pages so as to build a map of what bytes are constant and what bytes are dynamic with the pages. By doing so, each used page has multiple ranges of consecutive constant bytes and dynamic bytes are between these ranges. As a result, all the constant bytes of every range are hashed as a value and all the hash values make up the page ID.

Status Flag. The status flag indicates the phase status (i.e., startup, runtime and shutdown) of a used page. The flag is initialized as startup when the kernel boots up. Once the kernel switches from the startup phase to the runtime phase, or from the runtime phase to the shutdown phase, appropriate exceptions are triggered so that the offline training processor can update the flag accordingly. In our implementation, all code pages of the guest OS are deprived of executable permissions. Once the OS starts to boot, it will raise numerous EPT exceptions. In the hypervisor space, there is a handler (i.e., ept_handle_violation) responding to the exception, and thus the offline training processor can mark the beginning of the runtime phase by intercepting the first execution of the user-space code as well as its end by intercepting the execution of the reboot system call.

5.2 Database Operations

The database operations are mainly composed of three parts, i.e., populating, saving and loading.

Populate Database. To populate the database, the KASR offline training processor must trace all the used pages and thus dictates that only the page raising the exception would become executable while others are non-executable. However, we find that this will halt the kernel. The reason is that the x86 instructions have variable lengths and an instruction may cross a page boundary, which means that the first part of the instruction is at the end of a page, while the rest is in the beginning of the next page. Under such situations, the instruction-fetch will result in infinite loops (i.e., trap-and-resume loops).

To address this issue, we relax the dictation and implement a queue of pages that own executable permissions. When the queue is full of two pages that have caused the first two exceptions (i.e., the first two used pages), it will then be updated by First-in, First-out, i.e., the newest used page will be pushed in while the oldest used page will be popped out. Besides solving the cross-page-boundary problem, we also accelerate the tracing performance. Besides, we can capture all loaded modules, as all of them have no less than code pages.

To the end, it is not enough to obtain all the used pages by running the offline training stage just once. Thus, it is necessary to repeat this stage for multiple rounds until the database size becomes stable. In our experiments, rounds are enough to get a stable database (see Section 6).

Save and Load Database. The database is generated in the hypervisor space, and stored in the hard disk for reuse. Specifically, we have developed a tiny tool in the privileged domain to explicitly save the database into the domain’s disk after the offline training stage, and load the existing database into the hypervisor space during the runtime enforcement stage.

6 Evaluation

We have implemented a KASR prototype on our private cloud platform, which has a Dell Precision T PC with eight CPU cores (i.e., Intel Core Xeon-E) running at GHz. Besides, Intel VT-x feature is enabled and supports the page size of KB. Xen version is the hypervisor while Hardware-assisted Virtual Machine (HVM) is the Ubuntu Server LTS, which has a KASLR-enabled Linux kernel of version --generic with four virtual CPU cores and GB physical memory. KASR only adds around K SLoC in Xen.

In the rest of this section, we measure the reduction rates of the kernel attack surface. On top of that, we characterize the reduced kernel attack surface in the metrics of Common Vulnerabilities and Exposures (CVEs). The use cases we choose are SPECint, httperf, bonnie++, LAMP (i.e., Linux, Apache, MySQL and PHP) and NFS (i.e., Network File System). Furthermore, we test and analyze its effectiveness in defending against real-world kernel rootkits. Also, we measure the performance overhead introduced by KASR through the selected use cases above. The experimental results demonstrate that we can effectively reduce kernel attack surface by , CVEs by , safeguard the kernel against popular kernel rootkits and impose negligible (less than ) performance overhead on all use cases.

6.1 Kernel Attack Surface Reduction

In the runtime enforcement stage, we measure the kernel attack surface reduction through three representative benchmark tools, namely, SPECint, httperf and bonnie++ and two real-world use cases (i.e., LAMP and NFS).

SPECint [29] is an industry standard benchmark intended for measuring the performance of the CPU and memory. In our experiment, the tool has sub-benchmarks in total and they are all invoked with a specified configuration file (i.e., linux64-ia32-gcc43+.cfg).

On top of that, we measure the network I/O of HVM using httperf [23]. HVM runs an Apache Web server and Dom tests its I/O performance at a rate of starting from to requests per second ( connections in total).

Also, we test the disk I/O of the guest by running bonnie++ [5] with its default parameters. For instance, bonnie++ by default creates a file in a specified directory, size of which is twice the size of memory.

Besides, we run the LAMP-based web server inside the HVM. Firstly, we use the standard benchmark ApacheBench to continuously access a static PHP-based website for five minutes. And then a Web server scanner Nikto [30] starts to run so as to test the Web server for insecure files and outdated server software and also perform generic and server type specific checks. This is followed by launching Skipfish [22], an active web application security reconnaissance tool. It operates in an extensive brute-force mode to carry out comprehensive security checks. Running these tools in the LAMP server aims to cover as many kernel code paths as possible.

Lastly, the other comprehensive application is NFS. HVM is configured to export a shared directory via NFS. In order to stress the NFS service, we also use bonnie++ to issue read and write-access to the directory.

Cases Orig.Kern Aft.Per.Dep. Aft.Lif.Seg.
Page(#) Page(#) Reduction(%) Page(#) Reduction(%)
SPECint % %
httperf % %
bonnie++ % %
LAMP % %
NFS % %
Table 1: In every case, the kernel code pages are significantly tailored after each step. Generally, KASR can reduce the kernel attack surface by % after the permission deprivation, and % after the lifetime segmentation. (Orig.Kern = Original Kernel, Aft.Per.Dep. = After Permission Deprivation, Aft.Lif.Seg. = After Lifetime Segmentation)

All results are displayed in Table 1. Note that the average results for SPECint are computed based on sub-benchmark tools. We determine two interesting properties of the kernel attack surface from this table. First, the attack surface reduction after each step is quite significant and stable for different use cases. Generally, the attack surface is reduced by roughly % and % after the permission deprivation and lifetime segmentation, respectively, indicating that less than half of the kernel code is enough to serve all provided use cases. Second, complicated applications (i.e., LAMP and NFS) occupy more kernel code pages than the benchmarks, indicating that they have invoked more kernel functions.

6.1.1 CVE Reduction

Although some kernel functions (e.g., architecture-specific code) contain past CVE vulnerabilities, they are never loaded into memory during the kernel’s lifetime run and do not contribute to the attack surface. As a result, we only consider the CVE-vulnerable functions that are loaded into the kernel memory. We investigate CVE bugs of recent two years that provide a link to the GIT repository commit and identify CVEs that exist in the kernel memory of all five use cases.

We observe that KASR has removed % of CVEs in the memory. To be specific, some CVE-vulnerable kernel functions within the unused kernel code pages are deprived of executable permissions after the permission deprivation. For example, the ecryptfs_privileged_open function in CVE-2016-1583 before Linux kernel- is unused, thus being eliminated. After the lifetime segmentation, some other vulnerable functions are also removed (e.g., icmp6_send in CVE--).

6.2 Rootkit Prevention

Even though the kernel attack surface is largely reduced by KASR, still there may exist vulnerabilities in the kernel, which could be exploited by rootkits. We demonstrate the effectiveness of KASR in defending against real-world kernel rootkits. Specifically, we have selected popular real-world kernel rootkits coming from a previous work [25] and the Internet. These rootkits work on typical Linux kernel versions ranging from to , representing the state-of-the-art kernel rootkit techniques. All these rootkits launch attacks by inserting a loadable module and they can be divided into three steps:

  1. inject malicious code into kernel allocated memory;

  2. hook the code on target kernel functions (e.g., original syscalls);

  3. transfer kernel execution flow to the code.

KASR is able to prevent the third step from being executed. Specifically, rootkits could succeed at Step-1 and Step-2, since they can utilize exposed vulnerabilities to modify critical kernel data structures, inject their code and perform target-function hooking so as to redirect the execution flow. However, they cannot execute the code in Step-3, because KASR decides whether a kernel page has an executable permission. Recall that KASR reliably dictates that unused kernel code (i.e., no record in the database) has no right to execute in the kernel space, including the run-time code injected by rootkits. Therefore, when the injected code starts to run in Step-3, EPT violations definitely will occur and then be caught by KASR. The experimental results from Table 2 clearly show that KASR has effectively defended against all rootkits. As a result, KASR is able to defend against the kernel rootkits to a great extent.

OS Kernel Rootkit

Attack Vector

Attack Failed?
Linux - adore-ng LKM
xingyiquan LKM
rkduck LKM
Diamorphine LKM
suterusu LKM
nurupo LKM
Table 2: KASR successfully defended against all kernel rootkits. (LKM = Loadable Kernel Module)

6.3 Performance Evaluation

In this section, we evaluate the performance impacts of KASR on CPU computation, network I/O and disk I/O using the same settings as we measure the kernel attack surface reduction. Benchmark tools are conducted with two groups, i.e., one is called Original (HVM with an unmodified Xen), the other is KASR.

Specifically, SPECint has 12 sub-programs and the CPU overhead caused by KASR within each sub-program is quite small and stable. In particular, the maximum performance overhead is % while the average performance overhead is % for the overall system.

Httperf tests the Apache Web server inside the HVM using different request rates. Compared to the Original, the network I/O overhead introduced by KASR ranges from % to % and the average is only %.

The disk I/O results are generated by bonnie++ based on two test settings, i.e., sequential input and sequential output. For each setting, the read, write and rewrite operations are performed and their results indicate that KASR only incurs a loss of % on average.

6.4 Offline Training Efficiency

We take LAMP server as an example to illustrate the offline training efficiency, indicating how fast to construct a stable database for a given workload. Specifically, we repeat the offline training stage for several rounds to build the LAMP database from scratch. After the first round, we get code pages, % of the final page number. After that, successive offline training rounds are completed one by one, each of which updates the database based on previous one, ensuring that the final database records all used pages. From Figure 5, it can be seen that the database as a whole becomes steady after multiple rounds (i.e., in our experiments). This observation is also confirmed in other cases.

Figure 4: In the case of LAMP, its database is built from scratch and keeps its size increasing until the round th.
Figure 5: Incremental offline training. Compared to that of Figure 5, only more offline training rounds based on a provided database are needed to reach the same stable state, largely reducing the offline training cost.

In fact, it is still time-consuming to build a particular database from scratch. To further accelerate this process, we attempt to do the offline training stage from an existing database. In our experiments, we integrate every database generated respectively for SPECint, httperf, bonnie++ into a larger one, and try to generate the LAMP database using incremental training. Based on the integrated database, we find that only rounds are enough to generate the stable database for LAMP, shown in Figure 5, significantly improving the offline training efficiency.

7 Discussion

In this section, we will discuss limitations of our approach.

Training Completeness. Similar to Ktrim [17], KRAZOR [16] and Face-Change [10], KASR also uses a training-based approach. As the approach might miss some corner cases, it may cause KASR to mark certain pages that should be used as unused, resulting in an incomplete offline training database. Theoretically speaking, it is possible for such situations to occur. However, in practice, they have never been observed in our experiments so far. Interestingly, Kurmus al et. [18] found that a small offline training set is usually enough to cover all used kernel code for a given use case, implying that the corner cases usually do not increase the kernel code coverage. If the generated database is incomplete, EPT violations may have been triggered at runtime. For such situations, KASR has two possible responses. One is to directly stop the execution of the guest, which is suitable for the security sensitive environment where any violations may be treated as potential attacks. The other one is to generate a log message, which is friendly to the applications that have high availability requirements. The generated log contains the execution context and the corresponding memory content to facilitate a further analysis, e.g., system forensics.

Fine-grained Segmentation. By default, we have three segmented phases (i.e., startup, runtime, and shutdown). Actually, the whole lifecycle could be segmented into more phases, corresponding to different working stages of a user application. Intuitively, a more fine-grained segmentation will achieve a better kernel attack surface reduction. Nonetheless, more phases will introduce more performance overhead, such as the additional phase switches. In addition, it will increase the complexity of the KASR offline training processor, and consequently increases the trusted computing base (TCB). At last, the KASR module has to deal with the potential security attacks, e.g., malicious phase switches. To prevent such attacks, a state machine graph of phases should be provided, where the predecessor, successor and the switch condition of each phase should be clearly defined. At runtime, the KASR module will load this graph and enforce the integrity: only the phase switches existing in the graph are legal, and any other switches will be rejected.

8 Related Work

In this section, we provide an overview of existing approaches to enhance the kernel security that require no changes to the kernel. Specifically, the approaches are either kernel or hypervisor-dependent.

Kernel customizations [33, 18] present automatic approaches of trimming kernel configurations adapted to specific use cases so that the tailored configurations can be applied to re-compile the kernel source code, thus minimizing the kernel attack surface. Similarly, Seccomp [26] relies on the kernel source code to sandbox specified user processes by simply restricting them to a minimal set of system calls. Lock-in-Pop [19] modifies and re-compiles glibc to restrict an application’ access to certain kernel code. In contrast, both Ktrim [17] and KRAZOR [16] utilize kprobes to trim off unused kernel functions and prevent them from being executed. All of the approaches above aim at providing a minimized kernel view to a target application.

In the virtualized environment, both Secvisor [27] and NICKLE [25] only protect original kernel TCB and do nothing to reduce it. Taking a step further, unikernel [20] provides a minimal kernel API surface to specified applications but developing the applications is highly dependent on the underlying unikernel. Face-Change [10] profiles the kernel code for every target application and uses the Virtual Machine Introspection (VMI) technique to detect process context switch and thus provide a minimized kernel TCB for each application. However, Face-Change has three disadvantages: (1) Its worst-case runtime overhead for httperf testing Apache web server is %, whereas our worst overhead is % (see Section 6.3), making it impractical in the cloud environment. (2) Its design naturally does not support KASLR, which is an important kernel security feature and has been merged into the Linux kernel mainline since kernel version 3.14. In contrast, KASR is friendly to the security feature. (3) While multiple-vCPU support is critical to system performance in the cloud environment, it only supports a single vCPU within a guest VM, whereas KASR allocates four vCPUs to the VM.

9 Conclusion

Commodity OS kernels provide a large number of features to satisfy various demands from different users, exposing a huge surface to remote and local attackers. In this paper, we have presented a reliable and practical approach, named KASR, which has transparently reduced attack surfaces of commodity OS kernels at runtime without relying on their kernel source code. KASR deploys two surface reduction approaches. One is spatial, i.e., the permission deprivation marks never-used code pages as non-executable while the other is temporal, i.e., the lifetime segmentation selectively activates appropriate used code pages. We implemented KASR on the Xen hypervisor and evaluated it using the Ubuntu OS with an unmodified Linux kernel. The experimental results showed that KASR has efficiently reduced of kernel attack surface, of CVEs in all given use cases. In addition, KASR defeated all real-world rootkits and incurred low performance overhead (i.e., less than on average) to the whole system.

In the near future, our primary goals are to apply KASR to the kernel attack surface reduction of a Windows OS since KASR should be generic to protect all kinds of commodity OS kernels.


  • [1] Accetta, M., Baron, R., Bolosky, W., Golub, D., Rashid, R., Tevanian, A., Young, M.: Mach: A new kernel foundation for unix development (1986)
  • [2] AMD, Inc.: Secure virtual machine architecture reference manual (Dec 2005)
  • [3] ARM, Inc.: Armv8. https://community.arm.com/docs/DOC-10896 (2011)
  • [4] Azab, A.M., Ning, P., Wang, Z., Jiang, X., Zhang, X., Skalsky, N.C.: Hypersentry: Enabling stealthy in-context measurement of hypervisor integrity. In: Proceedings of the 17th ACM Conference on Computer and Communications Security. pp. 38–49. CCS ’10 (2010)
  • [5] Bonnie: http://www.coker.com.au/bonnie++ (1999)
  • [6] Cheng, Y., Ding, X.: Guardian: Hypervisor as security foothold for personal computers. In: Trust and Trustworthy Computing, pp. 19–36. Springer Berlin Heidelberg (2013)
  • [7] Colp, P., Nanavati, M., Zhu, J., Aiello, W., Coker, G., Deegan, T., Loscocco, P., Warfield, A.: Breaking up is hard to do: Security and functionality in a commodity hypervisor. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. pp. 189–202. SOSP ’11, ACM (2011)
  • [8] Cook, K.: Linux kernel aslr (kaslr). Linux Security Summit (2013)
  • [9] Dautenhahn, N., Kasampalis, T., Dietz, W., Criswell, J., Adve, V.: Nested kernel: An operating system architecture for intra-kernel privilege separation. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. pp. 191–206. ASPLOS ’15 (2015)
  • [10] Gu, Z., Saltaformaggio, B., Zhang, X., Xu, D.: Face-change: Application-driven dynamic kernel view switching in a virtual machine. In: Dependable Systems and Networks (DSN), 44th Annual IEEE/IFIP International Conference. pp. 491–502. DSN’14, IEEE (2014)
  • [11] Herder, J.N., Bos, H., Gras, B., Homburg, P.: Minix 3: A highly reliable, self-repairing operating system. ACM SIGOPS Operating Systems Review 40(3), 80–89 (2006)
  • [12] Herder, J.N., Bos, H., Gras, B., Homburg, P., Tanenbaum, A.S.: Construction of a highly dependable operating system. In: Proceedings of the 6th European Dependable Computing Conference. pp. 3–12. EDCC’06, IEEE (2006)
  • [13] Intel, Inc.: Intel 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2a, 2b, 2c, 3a, 3b and 3c (Oct 2011)
  • [14] Klein, G., Andronick, J., Elphinstone, K., Heiser, G., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: sel4: Formal verification of an operating-system kernel. Commun. ACM 53(6), 107–115 (Jun 2010)
  • [15] Kornblum, J.: Fuzzy hashing and ssdeep (2010)
  • [16] Kurmus, A., Dechand, S., Kapitza, R.: Quantifiable run-time kernel attack surface reduction. In: Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 212–234. Springer (2014)
  • [17] Kurmus, A., Sorniotti, A., Kapitza, R.: Attack surface reduction for commodity os kernels: trimmed garden plants may attract less bugs. In: Proceedings of the Fourth European Workshop on System Security. ACM (2011)
  • [18] Kurmus, A., Tartler, R., Dorneanu, D., Heinloth, B., Rothberg, V., Ruprecht, A., Schröder-Preikschat, W., Lohmann, D., Kapitza, R.: Attack surface metrics and automated compile-time os kernel tailoring. In: Proceedings of the 20th Annual Network and Distributed System Security Symposium. NDSS’13 (2013)
  • [19] Li, Y., Dolan-Gavitt, B., Weber, S., Cappos, J.: Lock-in-pop: Securing privileged operating system kernels by keeping on the beaten path. In: USENIX Annual Technical Conference. pp. 1–13. USENIX Association (2017)
  • [20] Madhavapeddy, A., Mortier, R., Rotsos, C., Scott, D., Singh, B., Gazagnaire, T., Smith, S., Hand, S., Crowcroft, J.: Unikernels: Library operating systems for the cloud. In: Acm Sigplan Notices. vol. 48, pp. 461–472. ACM (2013)
  • [21] Mao, Y., Chen, H., Zhou, D., Wang, X., Zeldovich, N., Kaashoek, M.F.: Software fault isolation with api integrity and multi-principal modules. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. pp. 115–128. ACM (2011)
  • [22] Michal, Z., Niels, H., Sebastian, R.: https://code.google.com/archive/p/skipfish (2010)
  • [23] Mosberger, D., Jin, T.: Httperf - a tool for measuring web server performance. SIGMETRICS Perform. Eval. Rev. 26(3), 31–37 (Dec 1998)
  • [24] Nguyen, A., Raj, H., Rayanchu, S., Saroiu, S., Wolman, A.: Delusional boot: Securing hypervisors without massive re-engineering. In: Proceedings of the 7th ACM European Conference on Computer Systems. pp. 141–154. EuroSys ’12 (2012)
  • [25] Riley, R., Jiang, X., Xu, D.: Guest-transparent prevention of kernel rootkits with vmm-based memory shadowing. In: Recent Advances in Intrusion Detection. pp. 1–20. Springer (2008)
  • [26] Seccomp: https://lwn.net/Articles/332974 (2005)
  • [27] Seshadri, A., Luk, M., Qu, N., Perrig, A.: Secvisor: A tiny hypervisor to provide lifetime kernel code integrity for commodity oses. In: ACM SIGOPS Operating Systems Review. vol. 41, pp. 335–350. ACM (2007)
  • [28] Smalley, S., Vance, C., Salamon, W.: Implementing selinux as a linux security module. NAI Labs Report 1(43),  139 (2001)
  • [29] Standard Performance Evaluation, Inc.: Specint. http://www.spec.org (2006)
  • [30] Sullo, C.: https://cirt.net/nikto (2012)
  • [31] Swift, M.M., Martin, S., Levy, H.M., Eggers, S.J.: Nooks: An architecture for reliable device drivers. In: Proceedings of the 10th workshop on ACM SIGOPS European workshop. pp. 102–107. ACM (2002)
  • [32] Szefer, J., Keller, E., Lee, R.B., Rexford, J.: Eliminating the hypervisor attack surface for a more secure cloud. In: Proceedings of the 18th ACM Conference on Computer and Communications Security. pp. 401–412. CCS ’11 (2011)
  • [33] Tartler, R., Kurmus, A., Heinloth, B., Rothberg, V., Ruprecht, A., Dorneanu, D., Kapitza, R., Schröder-Preikschat, W., Lohmann, D.: Automatic os kernel tcb reduction by leveraging compile-time configurability. In: Proceedings of the 8th Workshop on Hot Topics in System Dependability. pp. 3–3 (2012)
  • [34] Wang, Z., Jiang, X.: Hypersafe: A lightweight approach to provide lifetime hypervisor control-flow integrity. In: Proceedings of the 2010 IEEE Symposium on Security and Privacy. pp. 380–395. SP ’10 (2010)
  • [35] Wang, Z., Wu, C., Grace, M., Jiang, X.: Isolating commodity hosted hypervisors with hyperlock. In: Proceedings of the 7th ACM European Conference on Computer Systems. pp. 127–140. EuroSys ’12 (2012)