TeleHammer : A Stealthy Cross-Boundary Rowhammer Technique

12/06/2019
by   Zhi Zhang, et al.
0

Rowhammer exploits frequently access specific DRAM rows (i.e., hammer rows) to induce bit flips in their adjacent rows (i.e., victim rows), thus allowing an attacker to gain the privilege escalation or steal the private data. A key requirement of all such attacks is that an attacker must have access to at least part of a hammer row adjacent to sensitive victim rows. We refer to these rowhammer attacks as PeriHammer. The state-of-the-art software-only defences against PeriHammer attacks is to make such hammer rows inaccessible to the attacker. In this paper, we question the necessity of the above requirement and propose a new class of rowhammer attacks, termed as TeleHammer. It is a paradigm shift in rowhammer attacks since it crosses memory boundary to stealthily rowhammer an inaccessible row by virtue of freeloading inherent features of modern hardware and/or software. We propose a generic model to rigorously formalize the necessary conditions to initiate TeleHammer and PeriHammer, respectively. Compared to PeriHammer, TeleHammer can defeat the advanced software-only defenses, stealthy in hiding itself and hard to mitigate. To demonstrate the practicality of TeleHammer and its advantages, we have created a TeleHammer's instance, called PThammer, which leverages the address-translation feature of modern processors. We observe that a memory access can induce a fetch of a Level-1 page-table entry (PTE) from memory and thus cause hammering the PTE once. To achieve a high hammer-frequency, we flush relevant TLB and cache effectively and efficiently. To this end, PThammer can cross user-kernel boundary to rowhammer rows occupied by Level-1 PTEs and induce bit flips in adjacent victim rows that also host Level-1 PTEs. We have exploited PThammer to defeat advanced software-only defenses in bare-metal systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

07/17/2020

PThammer: Cross-User-Kernel-Boundary Rowhammer through Implicit Accesses

Rowhammer is a hardware vulnerability in DRAM memory, where repeated acc...
02/20/2021

SoftTRR: Protect Page Tables Against RowHammer Attacks using Software-only Target Row Refresh

Rowhammer attacks that corrupt level-1 page tables to gain kernel privil...
02/20/2018

Still Hammerable and Exploitable: on the Effectiveness of Software-only Physical Kernel Isolation

All the state-of-the-art rowhammer attacks can break the MMU-enforced in...
05/30/2019

ExplFrame: Exploiting Page Frame Cache for Fault Analysis of Block Ciphers

Page Frame Cache (PFC) is a purely software cache, present in modern Lin...
04/22/2019

RowHammer: A Retrospective

This retrospective paper describes the RowHammer problem in Dynamic Rand...
11/15/2019

Computationally Data-Independent Memory Hard Functions

Memory hard functions (MHFs) are an important cryptographic primitive th...
05/13/2018

Nethammer: Inducing Rowhammer Faults through Network Requests

A fundamental assumption in software security is that memory contents do...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In 2014, Kim et al. [18] discovered an infamous software-induced hardware fault, the so-called “rowhammer” bug. In particular, frequent accessing the same addresses in two DRAM (Dynamic Random Access Memory) rows (i.e., hammer rows) can cause bit flips in an adjacent row (i.e., a victim row). If sensitive structures, such as page tables, are placed onto the victim row, an adversary can corrupt the structures by exploiting adjacent hammer rows although she has no access to the structures. As such, the bug is able to break MMU-based memory isolation between different security domains without any software vulnerabilities, thus enabling a powerful class of attacks targeting DRAM-based systems. The attacks are so hazardous that they can either gain the privilege escalation [33, 12, 5, 8, 34, 11, 35, 37] or steal the private data [32, 3, 21]. To utilize the rowhammer bug, all existing rowhammer attacks require an access to at least a part of an exploitable hammer row (a hammer row is only exploitable when it is adjacent to sensitive victim rows [33] or one part of it is sensitive [21]) as shown in Figure 1. As their access to the hammer row is legitimate and within memory boundary, we term such attacks as PeriHammer. To defeat PeriHammer based attacks, numerous hardware and software-based mitigation techniques have been proposed. As hardware-based mitigation require DRAM updates or upgrade and cannot be backported, recent software-only defenses including CATT [6], CTA [36] and RIP-RH [4] are practical for bare-metal systems and target different rowhammer attacks. These defenses in common enforce DRAM-based memory isolation at different granularity to prevent exploitable hammer rows from being accessed by attackers. Take CATT [6] as an example, it isolates DRAM memory of kernel domain with guarding rows such that all exploitable hammer rows that induce bit flips in kernel structures belong to kernel domain.

Fig. 1: PeriHammer: In all existing rowhammer attacks, an attacker B requires access to at least part of a hammer row to either gain privilege escalation or steal private data. TeleHammer: An attacker A freeloads a chain of built-in features of modern hardware and/or software, effects of which result in a hammer row being hammered by using a benign entity E.

Our contributions: we introduce a paradigm shift in rowhammer attacks through a new class of rowhammer attacks, called TeleHammer (shown in Figure 1). It freeloads a chain of built-in features (e.g., out-of-order execution) of modern hardware and/or software to hammer inaccessible rows through a benign entity. It thus essentially eliminates the above requirement necessary to PeriHammer and make it possible again to break the above advanced defenses.

To rigorously formalize the necessary requirements to initiate TeleHammer, we propose a generic formal model, which can also formalize PeriHammer. The model indicates that PeriHammer is a special case of TeleHammer, and TeleHammer exhibits the following advantages over PeriHammer.

  • TeleHammer can defeat the advanced software-only defenses, since it eschews the critical requirement of PeriHammer.

  • TeleHammer is stealthy, since it hammers a hammer row not by itself but using a benign entity, making it hard to trace the real culprit.

  • TeleHammer is hard to be mitigated, since abundant instances can be derived by its design. A countermeasure against specific instances cannot defeat TeleHammer.

To demonstrate the practicality of TeleHammer, we create a working instance, called PThammer, that satisfies the formal requirements of TeleHammer. Specifically, we observe that a memory access triggers the address translation in modern OSes on x86-84 microarchitecture. In response to the memory access, the processor first searches Translation-lookaside Buffer (TLB) to check if a corresponding physical address exists. If the search fails (i.e., a TLB miss), then the processor searches paging structure that hosts a partial address mapping of different page-table levels [2]. If another miss occurs, it fetches four-level page-table entries (PTEs) from CPU cache, otherwise DRAM memory. Fetching PTEs from memory causes hammering the PTEs once. Although PTEs reside in the kernel space and are inaccessible to users, PThammer can cross user-kernel boundary to hammer them by freeloading the address-translation feature.

In order to trigger exploitable rowhammer bit flips, PThammer targets Level-1 PTEs and needs to effectively and efficiently flush the address mapping from TLB and Level-1 PTE from cache. However, it is not authorized to perform such flushes by the means of instructions, PThammer instead constructs a minimal eviction set for TLB and cache, respectively. By doing so, PThammer can implicitly flush any target entry from TLB and memory line from cache, resulting in subsequent memory access hammering Level-1 PTEs. On top of that, PThammer forces Level-1 PTE allocation to span over hammer and victim rows so as to induce bit flips in PTEs and gain kernel privilege. By performing stealthy cross-memory-boundary hammer, PThammer not only crosses the user-kernel boundary but also defeats all the aforementioned practical defenses (we will talk about how to compromise the defenses in Section IV).

The main contributions of this paper are as follows:

  • All previous rowhammer exploits (i.e., PeriHammer) require access permissions to an exploitable hammer row. In contrast, we propose a new class of rowhammer attacks, called TeleHammer, that eschews this critical requirement.

  • We present a generic model to formally define necessary conditions to launch TeleHammer and PeriHammer, respectively. Based on the model, we summarize three advantages of TeleHammer over PeriHammer.

  • We propose an instance of TeleHammer, called PThammer, that leverages the address-translation feature of modern processors to hammer page tables and defeat the advanced software-only defenses in bare-metal systems.

The rest of the paper is structured as follows. In Section II, we briefly introduce the background information and summarize related works. In Section III, we present a formal model of TeleHammer and talk about how to instantiate TeleHammer in detail. Section IV evaluates PThammer thoroughly. In Section V, we discuss how to compromise ZebRAM [20], a defense for virtualization systems, shed light on other possible instances of TeleHammer and discuss possible mitigation against TeleHammer and PThammer. We conclude this paper in Section VI.

Ii Background and Related Work

Ii-a CPU Cache

In commodity Intel x86 micro-architecture platforms, there are three levels of CPU caches. Among all levels of caches, the first level of cache (i.e., L1 cache) is closest to CPU. L1 cache has two types of caches, i.e., L1D caching data and L1I caching instructions. The second level of cache, L2, is unified caching both data and instructions. Similar to L2, the last-level cache (LLC) or L3, is also unified. Generally speaking, cache of a specific level is set-associative and it consists of S sets. Each set contains L lines and data or code can be cached in any line of the set; this is referred as a L-way set-associative cache set. For each cache line, it stores B bytes. Thus, the overall cache size of that level will be . As L1 and L2 are private to a physical core and LLC is shared among all cores, LLC cache is first partitioned into slices and one slice serves one core with a higher priority. For each slice, it is further divided into cache sets as mentioned above.

When an accessed variable is stored in a cache set, Intel micro-architectures use its virtual or physical address to decide its corresponding cache set of a specific cache level. For instance, L1 cache set is indexed using bits 6 to 11 of a virtual address. For L3, its indexing scheme is more complicated. In contrast to L1 and L2 that are private to a physical core, L3 is shared among all cores. So L3 cache is firstly partitioned into slices, and one slice serves one core with a higher priority. For each slice, it is further divided into cache sets as mentioned above. As such, some physical-address bits are XORed to decide a slice, and some bits (bits 6 to 16) are XORed to index a cache set [25].

Ii-B Translation-lookaside Buffer

Translation Lookaside Buffer (TLB) has two levels. The first-level (i.e., L1), consists of two parts: one that caches translations for code pages, called L1 instruction TLB (L1 iTLB), and the other that caches translations for data pages, called L1 data TLB (L1 dTLB). The second level TLB (L2 sTLB) is larger and shared for translations of both code and data. Similar to the CPU cache above, the TLB at each level is also partitioned into sets and ways.

Note that a virtual address (VA) determines a TLB set of each level. Although there is no public information about the mapping between the VA and the TLB set, it has been reverse-engineered on quite a few Intel commodity platforms [10].

Ii-C Address Translation

Memory Management Unit (MMU) enforces memory virtualization primarily by the means of paging mechanism. Paging on the x86-64 platform usually uses four levels of page tables to translate a virtual address to a physical address. As such, virtual-address bits are divided into 4 parts. The bits 3947 are used to index the Page Map Level table (the PML4 base address is in ) and consequently these bits decide the page offset of a PML4 entry. The bits 3038 are used to index a selected page directory pointer table (the base address of PDPT comes from the PML4 entry and the entry determines a physical page frame number and attributes (e.g., access rights) for access to the physical page). The bits 2129 are used to index a selected page directory (PD) table (the base address of PD comes from the PDPT entry). The bits 1220 are used to index a selected page table (the base address of PT comes form the PD entry). Now the indexed PT entry points to the physical address of a corresponding page and the rest bits 011 are the offset into that page.

In order to facilitate the process, TLB is introduced to cache the address translations while cache is involved to store the accessed data as well as the page table entries of all levels.

Ii-D Dynamic Random-Access Memory

Main memory of most modern computers uses Dynamic Random-Access Memory (DRAM). Memory modules are usually produced in the form of dual inline memory module, or DIMM, where both sides of the memory module have separate electrical contacts for memory chips. Each memory module is directly connected to the CPU’s memory controller through one of the two channels. Logically, each memory module consists of two ranks, corresponding to its two sides, and each rank consists of multiple banks. A bank is structured as arrays of memory cells with rows and columns.

Every cell of a bank stores one bit of data whose value depends on whether the cell is electrically charged or not. A row is a basic unit for memory access. Each access to a bank “opens” a row by transferring the data in all the cells of the row to the bank’s row buffer. This operation discharges all the cells of the row. To prevent data loss, the row buffer is then copied back into the cells, thus recharging the cells. Consecutive access to the same row will be fulfilled by the row buffer, while accessing another row will flush the row buffer.

Ii-E Rowhammer Overview

Rowhammer bugs: Kim et al. [18] discovered that current DRAMs are vulnerable to disturbance errors induced by charge leakage. In particular, their experiments have shown that frequently opening the same row (i.e., hammering the row) can cause sufficient disturbance to a neighboring row and flip its bits without even accessing the neighboring row. Because the row buffer acts as a cache, another row in the same bank is accessed to flush the row buffer after each hammering so that the next hammering will re-open the hammered row, leading to bit flips of its neighboring row.

Hammering techniques: generally speaking, there are three techniques regarding hammering a vulnerable DRAM.

Double-sided hammer: two adjacent rows of a victim row are hammered simultaneously and the adjacent rows are called hammer rows [18].

Single-sided hammer: Seaborn et al. [33] proposed a single-sided hammering by randomly picking multiple addresses and just hammering them with the hope that such addresses are in different rows within the same bank.

One-location hammer: one-location hammering  [11] randomly selects a single address for hammering. It exploits the fact that advanced DRAM controllers employ a more sophisticated policy to optimize performance, preemptively closing accessed rows earlier than necessary.

Key requirements: the following requirements are needed by PeriHammer-based attacks to gain either privilege escalation or private information.

First, CPU cache must be either flushed or bypassed. It can be invalidated by instructions such as clflush on x86. In addition, conflicts in the cache can evict data from the cache since cache is much smaller than the main memory. Therefore, to evict hammer rows from the cache, we can use a crafted access pattern [13] to cause cache conflicts for hammer rows. Also, we can bypass the cache by accessing uncached memory.

Second, the row buffer must be cleared between consecutive hammering DRAM rows. Both double-sided and single-sided hammering explicitly perform alternate access to two or more rows within the same bank to clear the row buffer. One-location hammering relies on the memory controller to clear the row buffer.

Third, existing rowhammer attacks require that a hammer row be accessible to an attacker in order to gain the privilege escalation or steal the private data, such that a victim row can be compromised by hammering the hammer row.

Fourth, either the hammer row or the victim row must contain sensitive data objects (e.g., page tables) we target. If the victim row hosts the data objects, an attacker can either gain the privilege escalation or steal the private data [33, 3]. If the hammer row hosts the data objects, an attacker can steal the private data [21].

Ii-E1 Rowhammer Attacks

In order to trigger rowhammer bug, frequent and direct memory access is a prerequisite. Thus, we classify rowhammer attacks into three categories based on how they flush or bypass cache.

Instruction-based cache flush: the clflush instruction is commonly used for explicit cache flush [18, 33, 11, 32] ever since Kim et al. [18] revealed the rowhammer bug. This instruction can flush all levels of cache entries related to a specific virtual address and it can be executed by an unprivileged process on the x86 architecture.

Eviction-based cache flush: alternatively, an attacker can evict a target address by accessing congruent memory addresses which are mapped to the same cache set and same cache slice as the target address [1, 13, 5, 25, 27]. A set of congruent memory addresses is called an eviction set. Our PThammer also applies the eviction-based approach to flush Level-1 PTEs from cache.

Uncached Memory Access: as direct memory access (DMA) memory is uncached, past works such as Throwhammer [34] and Nethammer [23] on x86 microarchitecture and Drammer [35] on ARM platform have abused DMA memory for hammering. Note that although Throwhammer and Nethammer appear to be similar to PThammer on the surface, unlike PThammer they can neither achieve the privilege escalation nor steal the private data. They can achieve the disturbances in the computer system and potentially achieve the denial of services.

Iii TeleHammer Overview

In this section, we first present the threat model and assumptions, and then introduce the formal model of TeleHammer, followed by an instance of TeleHammer to demonstrate its practicality.

Iii-a Threat Model and Assumptions

Our threat model is similar to other rowhammer attacks [37, 32, 31, 5, 12, 33]. Specifically,

  • The kernel is considered to be secure against software-only attacks. In other words, our attack does not rely on any software vulnerabilities.

  • An adversary controls an unprivileged user process that has no privileges such as accessing pagemap that has the mapping between a virtual address and a physical address.

  • An attacker has no knowledge about the kernel memory locations that are bit-flippable.

  • The installed DRAM modules are susceptible to rowhammer-induced bit flips. Pessl et al. [30] report that many mainstream DRAM manufacturers have vulnerable DRAM modules, including DDR3 and DDR4 memory.

Iii-B Formal Modeling of TeleHammer

We propose a formal model of TeleHammer to characterize its attack paradigm.

Let be a set of entities, which can be any component in modern OSes that is able to initiate memory accesses. If , then could be, for instance, an unprivileged attack process, processor, DMA controller and etc. A set of memory addresses is denoted by . Each memory address has access permissions assigned to each entity. Given a memory address , the permission function returns a set of entities that can access . That is, if , then has read/write/execute permissions to .

Only in this model, memory refers to not only the DRAM memory row but also other types of high-speed memory (e.g., cache, register, DRAM row buffer, etc.). Generally, a memory access starts by searching the content in the fastest memory hardware first (e.g., registers). If the search fails, then it goes to other memory hardware such as the slowest DRAM memory row. The validity function is needed to indicate whether contains valid contents (i.e., ) or not (i.e., ) to satisfy the search. The time function returns the time latency taken by to access .

As we are aware of, a memory access from an entity may trigger other entities with subsequent memory accesses to complete a computing task. For instance, when a regular user initiates a memory access, it can trigger the modern processor to access page-table entries. Such situation is modeled by the directed memory graph defined below.

Definition 1 (Directed Memory Graph).

A directed memory graph (e.g., Figure 2) is a pair , where memory addresses in constitutes the nodes of , and contains all the directed edges. A directed edge in is represented by a quintuple such as in Figure 2, where and , , respectively. An edge has the following semantics:

  • and,

  • an access to can potentially trigger to access within time .

Note that the time is decided by triggering and then accessing . As such, should be greater than given the time taken by the trigger. Since memory addresses in this model have different memory types, there exist other edges starting from such as . Which edge to access at runtime is highly dependent on the time taken by the edge. Intuitively, the edge that has a shorter time would have a higher chance to be selected. Take Level-1 cache as an example, it is shared between all the cores of the processor and partitioned into multiple slices (one for each core). Each core will choose to access its own slice rather than others since the time to access its own slice is faster.

To exploit the rowhammer bug, an attacker must hammer a node (e.g., in Figure 2) that is located in the DRAM row, rather than other nodes (e.g., in cache). As such, the attacker is supposed to select the edge at runtime and we call such edge a memory access edge.

Definition 2 (Memory Access Edge).

An edge with is defined as a memory access edge, denoted by if () satisfies the following requirements:

  • or,

  • .

If , such nodes are printed in a dashed circle in Figure 2 and they do not contain valid content. Thus, their edges will not be taken. Or if the edges (e.g., ) take a longer time, then such edges will not be taken, either. As such, the entity can specify the memory access to at runtime by setting if .

For instance, let and be within the cache and DRAM row, respectively, which has the same valid data. If wants to hammer frequently to trigger rowhammer bug, it must specify access to every time it performs hammering. To this end, can invoke clflush instruction to flush . Alternatively, can achieve the same goal by leveraging the cache eviction policy [1, 13].

However, if both and store the same page-table entry while has no access permissions to them, it becomes quite challenging to set . Besides, the time to set must be as low as possible, because hammering requires a high frequency. We use a function to denote the time cost of specifying the access to .

To overcome the aforementioned challenges, we need to build a minimal eviction set to evict the page-table entry from and make . When is invalid, then an access to in Figure 2 can trigger subsequent accesses and will access , thus building up a communication path. Formally, the communication path is defined below.

Definition 3 (Communication Path).

As shown in Figure 2, and (). A communication path is a sequence of memory access edges (, ) for which there is a sequence of distinct nodes (, ) such that for .

Given the path , we use to denote the last memory access edge in the path. Let . Then, means a subpath of excluding ; that is, the concatenation of and is . For path , its time latency is denoted by .

Definition 4 (Communication Latency).

Let be a communication path, , and . Then, is defined as:

  • , if , otherwise,

  • .

Note that when , then and are the same node, i.e., such that has only one memory access edge.

When a hammer row is being hammered, the rowhammer bug can badly affect either a victim row that is either the hammer row itself [21], or a victim row that is multiple-row away (within the same DRAM bank) from the hammer row [18, 37]. As such, is given to indicate the maximum row distance between a hammer row and victim row. The DRAM row-index function returns the row index in DRAM if a node is within a DRAM row, or otherwise.

As a minimum hammer frequency is needed to successfully hammer a DRAM row, we use to represent a required maximum time latency to hammer once.

Note that both and are decided not only by the DRAM module itself but also the rowhammer techniques (e.g., single-sided rowhammering).

Besides, a sensitivity function returns if contains critical information (e.g., a page table or a cryptographic key), otherwise . When and it suffers from the rowhammer bug, it indicates that an attacker is able to gain privilege escalation, steal private data, etc.

To this end, the following defines the necessary conditions for a TeleHammer based exploit.

Definition 5 (TeleHammer).

Let be the directed memory graph of a computing task being conducted by an attack process , exemplified in Figure 2, where , and () represent an attack address, a hammer address and a victim address, respectively. can launch a TeleHammer-based exploit, if conditions below are satisfied:

  • , ,

  • ,

  • in ,

As shown in Figure 2, modern hardware expects to take the fastest path to handle the computing task for the attack process , i.e., . However, the path must be changed to so as to hammer by using . As such, must set the nodes such as to invalid, since accessing such nodes takes a shorter time. The time taken by the setting is . To successfully do hammering once, also need to consider the time by accessing (i.e., ), the time by walking through the path (i.e., ) and the time has to wait to perform next hammering (i.e., ). Thus, the sum of all the aforementioned time cost should be no greater than , as shown in the last condition.

When and refer to the same memory address, i.e., , then TeleHammer has an access to and actually becomes PeriHammer. As such, we can also formally define PeriHammer below based on the above formal model.

Definition 6 (PeriHammer).

PeriHammer would succeed if the following conditions are met:

  • , ,

  • ,

  • ,

Clearly, the last condition removes the latency caused by the path , making it faster to hammer once. Besides, it is much easier for to specify the access to other than other nodes and spend much less time compared to in TeleHammer as discussed in Definition III-B.

Note that the value of in Definition 5 is dependent on whether is that last accessed memory node that the computing task needs to access. If is as shown in Figure 2, then is negligible. Otherwise, then has to wait for during which the task reaches the last node from . For in Definition 6, it is neglectable since is the only accessed node and does not have to wait for next hammering.

Fig. 2: Formal Modeling of TeleHammer and PeriHammer. TeleHammer specifies the path from the initial memory access to the last so as to hammer indirectly. When and refer to the same node, i.e., TeleHammer can hammer directly and thus becomes PeriHammer. (Node has a dashed circle, meaning that has a lower latency than , thus has to set to invalid to specify the memory access edge from to .)

A comparison of TeleHammer and PeriHammer: as shown in Figure 2, TeleHammer is effective against the rowhammer defenses where is located in a physically isolated DRAM partition, since it requires no access to .

On top of that, TeleHammer is stealthy and hard to be traced by dynamic analysis at runtime, since it has a complicated communication path and hammer by using . In contrast, an attacker via PeriHammer hammers directly by herself.

Besides, mitigating TeleHammer is challenging due to abundant communication path candidates. TeleHammer can identify as many paths as possible by leveraging built-in features of modern hardware and/or software. Thus eliminating the communication path we have identified in the following sections essentially cannot defend against TeleHammer.

Clearly, TeleHammer is slower than PeriHammer by comparing their time condition in their respective definition, indicating that PeriHammer is faster in inducing bit flips.

To demonstrate the practicality of TeleHammer, we have created an instance of it, called PThammer. Besides, we discuss in detail about other possible instances in Section V.

Iii-C PThammer: page-table based TeleHammer

PThammer is page-table based TeleHammer. It allows an unprivileged attacker to hammer page tables by exploiting a processor, resulting in bit flips in other page tables.

In the following, we discuss how PThammer exploit can satisfy the formal conditions specified in Definition 5.

Iii-C1 How Satisfy Formal Definition 5

First, page tables are critical in memory isolation and are inaccessible to an unprivileged attacker. If the attacker can compromise a memory address hosting page tables by the rowhammer effect, then it satisfies the first condition; here, the address refers to and .

Second, page tables are common and can be widely distributed in modern OS kernels. Thus, both and can be kernel addresses hosting page tables, that is, hammering page tables of will flip bits in page tables of . To this end, we can leverage previous works such as memory spray [7] and memory ambush  [33] to force the kernel to create a large number of page-table pages, with the hope that some page tables are placed into hammer addresses like while some are within victim addresses like . As such, we can create numerous pairs of such hammer addresses and victim addresses so that they become highly likely to induce exploitable bit flips in page tables. Note that the rowhammer defense (i.e., CTA [36]) allocates all page tables from a reserved memory partition and this will greatly increase the number of pairs compared to page-table allocation from the whole system memory.

Third, there exists a communication path (see Definition 5) that allows an attacker to indirectly access page tables. To this end, we observe that a least privileged memory triggers an address translation where the processor can access page tables from memory. When a user allocates a virtual memory page by malloc and then accesses the page for the first time, an address-translation process occurs. Within the process, the processor performs multi-level page-table walk, populates corresponding page-table entries (PTEs) and allocates a physical memory page for the user. To facilitate subsequent memory access as shown in Figure 3, Translation Look-aside Buffer (TLB) stores a complete address mapping from a virtual address to a physical address. Paging structure caches a partial address mapping of different page-table levels [2]. For instance, the paging structure of Level-2 PD translates a virtual address to a physical address of Level-1 PT. With bits 1220 from the virtual address [15], a corresponding physical address of a Level-1 PTE can be obtained. CPU cache copies the accessed four-level PTEs from memory. By doing so, the processor will search these hardware structures in the order of priority to get a matching physical address. If TLB, paging structure and cache are all effectively flushed, the processor then has to access the four-level PTEs from memory. As such, an access to an address by can trigger the processor to access four-level PTEs from memory if the flushing operation is conducted effectively.

Last, as can be within cache, the time () to access it is negligible. To meet the time condition in the definition, the time () to walk through the above identified path, the time () to specify the path, and the time () to wait for next hammering must all be as low as possible.

Optimize and : we optimize the identified communication path by making host Level-1 PTE rather than other-level PTE, shown as a solid line with an arrow in Figure 3. The path is optimized for following reasons.

  • flushing paging structure is required when hosts other-level PTE. Directly flushing paging structure requires executing a privileged instruction such as invlpg, while indirect flushing needs to reverse-engineer the mapping between a virtual address and the paging structure index.

  • flushing all-level (or other single-level) PTEs from paging structure and cache is intuitively more time consuming compared to flushing Level-1 PTE, as shown in Figure 3.

  • specifying such a path consumes much less memory, making the exploit stealthier. As mentioned above, we need to allocate a lot of page-table pages to flip exploitable bits in page tables. Creating a PT-level page of entries requires exhausting 2MiB memory. For a higher page-table-level page, its creation requires much more memory. For example, the PD-level page creation requires 1GiB memory.

  • compared to other-level PTEs, a bit flip in a Level-1 PTE is easier to become exploitable and gain privilege escalation, since the Level-1 PTE determines the physical memory address that a user can access.

As hosting Level-1 PTE is the last accessed memory address when translating a virtual address, it means that .

Optimize : to specify the path to the Level-1 PTE (shown in red solid line in Figure 3), we only need to flush the address mapping from TLB and the Level-1 PTE from cache with the PD-level paging structure still being effective.

Intuitively, we can simply invoke invlpg to flush the whole TLB. As for the cache flush, we can perform a page-table walk, get the virtual address of the Level-1 PTE and thus flush its valid content from cache by invoking clflush. By doing so, we are able to flush both TLB and cache as quickly as possible. However, kernel privilege is required to complete the flushing. Alternatively, we can perform the flushing indirectly by manipulating cache and TLB replacement states. As the size of TLB and cache is limited, we can simply create many pages as an eviction buffer and access those pages one by one so as to evict target TLB entry and cache line. Although this approach can effectively flush both TLB and cache, it does not reduce to its minimum.

In a nutshell, the key challenge to minimize the time is to determine two minimum eviction sets so as to flush targeted TLB entry and cache line effectively and efficiently.

Fig. 3: Address Translation. A solid line with an arrow indicates the fastest communication path that PThammer identifies to hammer a Level-1 page-table entry (PTE). When specifying the path, PThammer only flushes TLB and cache while retains all-level paging structure effective. Note that PML4E, PDPTE, PDE are other three-level PTE, respectively.

Iii-C2 Effective and Efficient TLB Flush

as Gras et al. [10] have revealed that there exists an explicit mapping between a virtual page number and multi-level TLB sets, we simply create an initial eviction set that contains multiple (physical) pages to flush a cached virtual address from TLB. One subset of the pages is congruent and mapped to a same L1 dTLB set while the other is congruent and mapped to a same L2 sTLB set if TLB applies a non-inclusive policy.

Take one of our test machines, Lenovo Thinkpad T420, as an example, both L1 dTLB and L2 sTLB have a -way set-associative for every TLB set and thus intuitively (physical) pages are enough as an minimum eviction set to evict a target virtual address from TLB. Actually, when we create such an eviction set and then profile the access latency of a target virtual address, its latency remains quite unstable.

To collect finer-grained information on TLB misses induced by the target address, we develop a kernel module that applies Intel Performance Counters (PMCs) to monitor a certain performance event related to TLB misses, that is, dtlb_load_misses.miss_causes_a_walk

. The experimental results show that TLB misses in both levels do not always occur during the aforementioned profiling operation, meaning that the target address has not been effectively evicted by the eviction set, and thereby rendering the TLB flush ineffective. This is probably because that the eviction policy on TLB is not true Least Recently Used (LRU).

How Decide the Minimal Size for a TLB Eviction Set: to this end, we propose a working Algorithm 1 that can decide a minimal size without knowing its eviction policy. Note that the minimal size is used to construct a minimal TLB eviction set in PThammer while PThammer itself does not use the algorithm. Specifically, line 2 to 8 defines a function that reports a TLB-miss number () induced by accessing . Specifically, the function argument (i.e., ) is write-accessed (line 4-6) to flush the cached in TLB (line 3) and then of write-accessing is reported in line 7. Based on a pre-allocated , we select those all pages that are indexed to the same TLB set as the by leveraging the reverse engineered mapping [10] in line 9-14. Note that the size is large enough to effectively flush any targeted virtual address and it is decided by the number of TLB entries that serve 4KiB-page translation if is allocated from a 4KiB-page list, otherwise, the number of TLB entries that support 2MiB or 1GiB should be involved. The selected pages are then populated and added into as shown in line 10-13. It is necessary to populate the selected pages in order to trigger the address-translation feature and thus TLB will cache address mappings accordingly. In line 15, we can gain a threshold for effective TLB flushes. We then start to tailor the set to its minimum while retain its effectiveness in line 16-23.

1 Initially: is a page-aligned virtual address that needs its cached TLB entry flushed. A buffer () is pre-allocated, size of which is decided by available TLB entries. A set () is initialized to empty. A unique number is assigned to .
2 Function 
3       
4        foreach  do
5              
6              
7        end foreach
8        is decided by accessing .
9        return
10       
11
12foreach  do
13        if   then
14              
15               add into .
16              
17        end if
18       
19 end foreach
20
21 for  do
22        take one out of .
23       
24        if  then
25               put back into and break.
26              
27        end if
28       
29 end for
30return the size of
Algorithm 1 Decide a minimal eviction-set size for TLB

Iii-C3 Effective and Efficient Cache Flush

now we are going to flush a cached Level-1 PTE (L1PTE) that corresponds to a target virtual address. Considering that last-level cache (LLC) is inclusive [15], we target flushing the L1PTE from LLC such that the L1PTE will also be flushed out from both L1 and L2 caches (we thus use cache and LLC interchangeably in the following section). In contrast to TLB that is addressed by a virtual page-frame number, LLC is indexed by physical-address bits, the mapping between them has also been reverse engineered [14, 26, 16]. Based on the mapping, we can intuitively create an eviction set consisting of many congruent memory lines (i.e.,cache-line-aligned virtual addresses), which are mapped to the same cache slice and cache set as the L1PTE. On top of that, the eviction set can also be minimized in case where the eviction policy of LLC is not publicly documented.

How Decide the Minimal Size for a Cache Eviction Set: we extend the aforementioned kernel module to count the event of last-level cache misses (i.e., longest_lat_cache.miss) and have a similar algorithm to Algorithm 1 to decide the minimal size for a cache eviction set, namely, construct a large enough eviction set congruent as a target virtual address and gain a threshold of cache-miss number induced by accessing the target address, remove memory lines randomly from the set one by one and verify whether currently induced cache-miss number is less than the threshold. If yes, a minimal size can be determined. Note this this algorithm is also performed in an offline phase long before PThammer is launched.

Although the size of eviction-set is determined ahead of time, PThammer in our threat model cannot know the mapping between a virtual and a physical address, making it challenging to construct an eviction set for any target virtual address during its execution. Also, PThammer cannot obtain the L1PTE’s physical address, and thus it is difficult to learn the L1PTE’s exact location (e.g., cache set and cache slice) in LLC. To address the above two problems during the execution, PThammer first constructs a complete pool of eviction sets, which can be used to flush any target data object including the L1PTE. It then selects an eviction set from the pool to evict a target L1PTE without its cache location.

How Construct a Complete Pool of Cache Eviction Sets: the pool has a large enough number of eviction sets and each can be used to flush a memory line from a specific cache set within a cache slice. The size of each eviction set is the pre-determined minimum size. We implement the construction based on previous works [25, 9].

If a target system enables superpage, a virtual address and its corresponding physical address have the same least significant 21 bits, indicating that if we know a virtual address from a pre-allocated super page, then its physical address bit 020 is leaked and thus we know the cache set index that the virtual address maps to (see Section II-A). The only unsolved is the cache slice index. Based on a past algorithm [25], we allocate a large enough memory buffer (e.g., twice the size of LLC), select memory lines from the buffer that have the same cache-set index and group them into different eviction sets, each for one cache slice.

If superpage is disabled, then only the least significant 12 bits (i.e., 4KiB-page offset) is shared between virtual and physical addresses and consequently we know a partial cache-set index (i.e., bits 611). As such, we utilize another previous work [9] to group potentially congruent memory lines into a complete pool of individual eviction sets. Compared to the above grouping operation, this grouping process is much slower, since there are many more memory lines sharing the same partial cache-set bits rather than complete bits.

1 Initially: a virtual page-aligned address () is allocated and needs its L1PTE cache-line flushed. A complete pool of individual eviction sets (). is decided by the page offset of . is initialized to and indicates the maximum latency induced by accessing . represents the eviction set used for the L1PTE cache flush.
2 Function 
3        foreach  do
4               read-access .
5        end foreach
6       flush a target TLB entry.
7        is decided by accessing .
8        return
9       
10
11foreach  do
12        obtain from first memory line in .
13        if  then
14               .
15               if  then
16                      .
17                      .
18                     
19               end if
20              
21        end if
22       
23 end foreach
24return
Algorithm 2 Select a minimal cache eviction set

How Select a Target Cache Eviction Set: After completing the pool construction, we develop an Algorithm 2 to select an eviction set from the pool and evict a L1PTE corresponding to a target address.

In line 9, we enumerate all the eviction sets in the pool and then collect those sets that have the same page offset as the L1PTE in line 11. This collect policy is based on an interesting property of the cache. Oren et al. [29] report that if there are two different physical memory pages that their first memory lines are mapped to the same cache set of LLC, then the rest memory lines of the two pages also share (different) cache sets. This means if we request many (physical) memory lines that have the same page offset as the L1PTE and access each memory line, then we can flush the L1PTE from LLC.

After the selection, line 12-16 will select the target eviction set from the collected ones. In line 12, we profile every selected eviction set through a predefined function from line 2-8. Within this function, we perform read access to each memory line of one eviction set, which will implicitly flush the L1PTE from cache if the eviction set is congruent with the L1PTE, and then flush the target TLB entry related to to make sure the subsequent address translation will access the L1PTE. At last, we measure the latency induced by accessing . Based on this function, we can find the targeted eviction set that causes the maximum latency in line 13-16, as fetching the L1PTE from DRAM is time-consuming when accessing triggers the address translation in line 7. Give that cache is shared between page-table entries and user data, we must carefully set to page-aligned, that is, its page offset is and different from the L1PTE. As such, they are placed into different cache sets and the selected eviction set is ensured to flush the target L1PTE rather than .

Iv Evaluation

(a) The TLB miss rate remains quite stable (i.e., 100%) when the TLB eviction set reduces from to and then decreases dramatically when the set is reduced to .
(b) The cache miss rate is no less than 90% until the eviction set of but decreases substantially to 62% when the set becomes .
Fig. 4: Effectiveness of TLB and Cache Flush.

In this section, we test PThammer on Lenovo Thinkpad T420 with Intel i5-2540M and Samsung 8GiB DDR3 memory. The operating system running above is Ubuntu 16.04 LTS for x86-64 and has a Linux kernel of 4.8.0-generic. By default, the system disables the superpage feature. No matter whether the feature is enabled, we can observe the first cross-boundary bit flips on the test machine within an hour. As a case, we then leverage PThammer to compromise the state-of-the-art rowhammer defenses with the default system setting.

System Setting Evic. Pool Construct Evic. Set Select
TLB LLC TLB LLC
superpage enabled 11millisec 0.3min 1microsec 285millisec
superpage disabled 11millisec 18min 1microsec 283millisec
TABLE I: Time costs for eviction pool construction and eviction set selection. Note that we perform the pool construction only once at the beginning of PThammer and then select TLB and cache eviction sets to perform double-sided rowhammer.

Iv-a PThammer

We first decide the minimal eviction-set size to effectively and efficiently flush TLB and last-level cache (LLC) at an offline stage. Based on the pre-determined size, we can dynamically construct a minimal TLB or LLC eviction set from a complete pool of TLB or LLC eviction sets, and corresponding time costs are presented in Table I.

Iv-A1 Decide Respective Minimal Eviction-Set Size

Based on the Algorithm 1 in Section III-C2, we first obtain an initial eviction set where its page number is twice the number of both L1dTLB and L2sTLB wayness and each page is mapped to the same L1dTLB set or L2sTLB set as a target page-aligned virtual address. We then remove one page from the set each time to check a TLB miss rate of the target virtual address, as shown in Figure 3(a). As we can see from the Figure, the TLB miss rate initially remains quite stable (nearly 100%) when the eviction-set size drops down one by one until 12, and thereafter decreases dramatically. Clearly, 12 ought to be the minimal size.

For LLC, out test machine has 12-wayness of LLC and each initial eviction set is set to have 24 memory lines that map to the same LLC set as a target virtual address. Similar to TLB, memory lines in the eviction set are also removed one by one and the LLC miss rate for each removal is shown in Figure 3(b). Clearly, the LLC miss rate on each machine remains quite stable (i.e., almost 100%) until the set size of 16, but decreases gradually below 90% after 12. For the sake of effectiveness and efficiency, we choose 12 as the minimal size and are able to induce bit flips using PThammer.

Iv-A2 Prepare Respective Minimal Eviction Set

As mentioned in Section III-C2 and Section III-C3, preparing a minimal eviction set for either TLB or LLC consists of a complete pool construction and target eviction set selection and corresponding time costs are displayed in Table I.

For TLB, we allocate a complete pool of 4KiB pages and its page number is twice the number of both L1dTLB and L2sTLB entries that support a 4KiB-page, since we target a virtual address that requires four-level page-table. As can be see from the table, its construction and selection in both settings are fast.

For LLC, we construct a complete pool of either 2MiB pages (superpage enabled) or 4KiB pages (superpage) and its size in both settings are twice the size of LLC. As the cache-set bits in the 2MiB setting are known, the eviction pool construction is much faster (0.3 minutes) compared to that (18 minutes) in the 4KiB setting. The number of eviction sets in each pool is almost the same as the LLC set-number, making their selection efficiency similar to each other.

Fig. 5: When the time caused by one double-sided rowhammer increases, the time to find the first bit flip also grows. When the time cost per hammer is greater than cycles. no bit flip is observed within 3 hours.

Iv-A3 Double-sided PThammer

As mentioned in Section III-C, the time cost for each hammer must be no greater than the maximum latency allowed to induce bit flips. Given that double-sided hammer is the most efficient way to flip bits, we first determine the maximum latency of the machine by applying a previously published tool 111 https://github.com/google/rowhammer-test.

The tool embeds the clflush instructions inside one round of double-sided hammer, which is the most efficient (costs only around 300 cycles) and effective (cache miss rate per one round is 100%) way to flush the CPU caches. In order to increase the time cost for each round of hammer, we add a certain number of NOP instructions that precede the clflush instructions and produce the result in Figure 5. As shown in the Figure, the time until the first bit flip to occur grows gradually until the time per hammer increases to around 1100 cycles and thereafter increases significantly before the hammer cost reaches 1500. We cannot observe any bit flip within 3 hours when the cost increases a bit from 1500 and thus we use 1500 as the maximum cost permitted to flip bits.

We then check whether the time cost for each hammer meets the allowed latency. For each double-sided PThammer, it requires accessing two user virtual addresses as well as their respective TLB eviction set (i.e., 24 virtual addresses) and cache eviction set (i.e., 24 virtual addresses). In both system settings, we conduct double-sided PThammer for a thousand rounds and measure the time that each round takes. The results show that the time taken of every round is in the range of {, }, which is much less than the maximum latency and also indicates that most of the address-accesses in each hammer are served by CPU caches rather than DRAM.

To perform successful double-sided PThammer, we also need to select appropriate user virtual addresses such that their related Level-1 PTEs are in the same DRAM bank with one row apart. However, the physical address of each PTE is required to know their location in DRAM. As we have no access to the kernel space, we cannot gain the the physical address of a page-table page. To address this problem, we leverage Cheng et al. [7] to allocate consecutive physical pages for Level-1 page-tables. Combining the consecutive physical pages of Level-1 page tables and DRAM row size, we can select user addresses that meet the above requirement.

To this end, we conduct PThammer for three runs in each system setting and corresponding time costs are displayed in Table II. Due to the time cost in construction a complete pool for LLC, the average time cost in the superpage-disabled setting is about three times as long as that in the other setting, but we can observe the first bit flip in a Level-1 PTE within 30 minutes in both settings.

System Setting Run # Hammer Cost Total Cost
Until First Flip
superpage enabled run 8min 9min
run 17min 18min
run 4min 5min
averaged cost all runs 10min 11min
superpage disabled run 10min 29min
run 15min 33min
run 6min 25min
averaged cost all runs 10min 30min
TABLE II: Time costs of double-sided PThammer for three runs on each system setting. On average, both settings can observe the first bit flip within 10 minutes of hammer and complete PThammer within 30 minutes.

Iv-B Defeat the state-of-the-art software-only defenses

To defend against rowhammer attacks, numerous software-only defenses have been proposed. Among the software-based defenses, CATT [6], CTA [36] and RIP-RH [4] are practical to mitigate existing rowhammer attacks in bare-metal systems. Note that RIP-RH [4] enforces DRAM-based process isolation and thus prevents attackers from hammering target user processes. However, it does not protect the kernel and its page tables. Clearly, PThammer can defeat it by inducing rowhammer bit flips in a Level-1 PTE and gain kernel privilege. In this section, we demonstrate proof-of-concept attacks against CATT [6] and CTA [36] respectively in the default system setting.

Compromise CATT [6]: CATT [6] partitions each DRAM bank into a kernel part and a user part. These two parts are separated by at lease one unused row. When physical memory request is initiated, CATT allocates memory from either the kernel part or the user part according to the intended use of the memory. By doing so, CATT can confine bit-flips induced by the user domain to its own partition and thereby prevent rowhammer attacks from affecting the kernel domain, the so-called physical kernel isolation.

Essentially, CATT can prevent some of PeriHammer based rowhammer attacks from positioning attacker-accessible memory adjacent to vulnerable but critical kernel memory. However, we are still able to indirectly hammer kernel memory from the user domain by leveraging PThammer and thus induce exploitable bit flips in the page table entries. It mainly consists of the following four steps:

  1. Rely on past works [7, 35, 21] to allocate consecutive DRAM rows for page-table pages;

  2. Perform double-sided PThammer by using a pair of selected user virtual addresses;

  3. Verify whether “exploitable” bit flips have occurred by checking if a virtual address points to a page-table page. If not, go to step 2 to restart PThammer;

  4. If yes, we have gained the kernel privilege and we can gain the root privilege by changing uid of current process to 0.

Compromise CTA [36]: the latest software defense is CTA (i.e., Cell-Type-Aware) [36], which focuses on PTE-based privilege escalation rowhammer attacks. In such attacks, all the attackers induce bit-flips in Level-1 page table entries (PTEs) such that the induced PTEs no longer point to the attackers’ memory pages but instead point to other page-table pages of the same process, thereby gaining illegal access to the page tables. In order to destroy this core property, CTA proposes CTA memory allocation and places Level-1 page tables in DRAM true-cells above a “Low Water Mark” in the physical memory. If a PTE has a bit-flip in its physical frame number, it only points to a physical address lower than the “Low Water Mark” rather than the page-table region.

By leveraging PThammer, we can break CTA and gain the root privilege. The key steps for the exploit are listed below:

  1. We spray the physical memory under the “Low Water Mark” with a large enough number of security critical structures, i.e., cred (note that cred stores the critical uid field.). To this end, the attack process creates child processes by invoking the fork system call. For each child process creation, the kernel is forced to allocate a kernel stack and multiple kernel structures including cred.

  2. Inside each child process, it firstly registers a signal and then goes to sleep. The registered signal will help the attack process wake up the child process when necessary.

  3. After completing the child-process creations, the attack process starts to occupy consecutive DRAM rows above the “Low Water Mark” by forcing page-table page allocations.

  4. The attack process performs double-sided PThammer;

  5. The attack process verifies whether “exploitable” bit flips have occurred by checking if a virtual address (VA) points to cred structure page. As the cred contains three user ids (e.g., uid and suid) and three group ids (e.g., gid and sgid) stored sequentially, the attack process can construct a unique string of the six ids and compare the string to the VA-pointing page. If the pointed page does not contain the string, then go to the step 4 to restart PThammer;

  6. If yes, the attack process has located a cred structure, changes uid to 0 and then wakes up every child process by delivering the registered signal. Inside the signal-catching function, each child process can check whether it has become a root process by invoking getuid.

V Discussion

Defeat ZebRAM [20]: ZebRAM is a rowhammer defense but only works for a virtualized system, it thus does not suit our threat model and we can extend PThammer a bit to defeat it in our future work.

Empirically, ZebRAM observes that hammering a can only affect adjacent and

. Based on this observation, ZebRAM leverages the hypervisor to split memory of a VM into safe and unsafe regions using even and odd rows in a zebra pattern. That is, all even rows of the VM are for the safe region that contains data, while all odd rows are for the unsafe region as swap space. As such, a rowhammer attack from the safe region can only incur useless bit flips in the unsafe region. For a rowhammer attack from the unsafe region, it is not possible since the unsafe region is inaccessible to an unprivileged attacker.

However, Kim et al. [18] report that the above empirical observation is not correct, i.e., hammering a row can affect three rows or more in a certain number of DRAM modules. Besides, an attacker can compromise ZebRAM as follows. ZebRAM does not protect the physical memory of the hypervisor and thus extended page tables (EPTs) residing in the hypervisor space are adjacent to each other. As such, an unprivileged attacker can initiate regular memory accesses to conduct PThammer-like attacks, causing bit flips in EPT entries and escaping the VM.

Other Possible Instances of TeleHammer: Besides PThammer, there might also exist other instances of TeleHammer that leverage other built-in features of modern hardware/software, Particularly, features that focus more on functionality and performance may become potential candidates. For the hardware, we discuss about two famous CPU features. Specifically, out-of-order and speculative execution are two optimization features that allow a parallel execution of multiple instructions to make use of instruction cycles efficient. As such, an unprivileged attacker can leverage such features to bypass memory isolation and access kernel memory by using the processor [19, 24].

For the software, we talk about OS kernel features that handle local and network requests. A system call is a programmatic feature in which a user application requests a service from the kernel. By invoking a system call handler, a user can indirectly access kernel memory by using the kernel. A network I/O mechanism is also a programmatic feature that allows the OS to serve requests from the network. Particularly, the network interface card (NIC) will throw out a hardware exception to notify the kernel of each network packet NIC receives. Within the exception handler, the kernel will access kernel memory. Thus, a remote user can invoke this feature to access kernel memory by using the kernel.

As a result, an attacker can potentially build up an exploitable communication path to a target kernel address by abusing the above features.

Mitigation: Intuitively, we might detect both TeleHammer and PeriHammer using performance counters [1]. However, such anomaly-based detection is prone to false positives and/or false negatives by nature [6].

Alternatively, we might take hardware defenses such as PARA [18], TRR [28, 17] and TWiCe [22] to increase DRAM refresh rate for specified rows, which would reduce in Definition 5 (see section III-B) as much as possible so as to break the last time condition in the definition. Unfortunately, they require new hardware designs and thus cannot be used to protect legacy systems.

For PThammer, we might cache PTEs in an isolated cache to eliminate a communication path identified by PThammer. Since PTEs are placed in a separated cache, then PThammer cannot use the cache-eviction approach to evict PTEs. However, reserving an isolated cache only for page-table pages is expensive in hardware and requires re-designing hardware. Even if such an isolated cache for PTEs would be released by CPU manufacturers, there might exist other communication paths for PThammer to hammer PTEs, or other instances of TeleHammer that hammers other critical structures in the kernel space. Summarizing, we believe that TeleHammer-based rowhammer attacks are hard to mitigate.

Vi Conclusion

In this paper, we first observed a critical condition required by existing rowhammer exploits to gain the privilege escalation or steal the private data. We then proposed a new class of rowhammer attacks, called TeleHammer, that eschews the condition. Besides, we presented a formal model to define key conditions to set up TeleHammer and PeriHammer and summarized three advantages of TeleHammer over PeriHammer. On top of that, we created an instance of TeleHammer, called PThammer and developed a PThammer-based attack that allows an unprivileged attacker to compromise the latest software-only rowhammer defense and gain the root privilege.

References

  • [1] Z. B. Aweke, S. F. Yitbarek, R. Qiao, R. Das, M. Hicks, Y. Oren, and T. Austin (2016) ANVIL: software-based protection against next-generation rowhammer attacks. ACM SIGPLAN Notices 51 (4), pp. 743–755. Cited by: §II-E1, §III-B, §V.
  • [2] T. W. Barr, A. L. Cox, and S. Rixner (2010) Translation caching: skip, don’t walk (the page table). ACM SIGARCH Computer Architecture News, pp. 48–59. Cited by: §I, §III-C1.
  • [3] S. Bhattacharya and D. Mukhopadhyay (2016) Curious case of rowhammer: flipping secret exponent bits using timing analysis. In International Conference on Cryptographic Hardware and Embedded Systems, pp. 602–624. Cited by: §I, §II-E.
  • [4] C. Bock, F. Brasser, D. Gens, C. Liebchen, and A. Sadeghi (2019) RIP-rh: preventing rowhammer-based inter-process attacks. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, pp. 561–572. Cited by: §I, §IV-B.
  • [5] E. Bosman, K. Razavi, H. Bos, and C. Giuffrida (2016)

    Dedup est machina: memory deduplication as an advanced exploitation vector

    .
    In Security and Privacy, 2016 IEEE Symposium on, pp. 987–1004. Cited by: §I, §II-E1, §III-A.
  • [6] F. Brasser, L. Davi, D. Gens, C. Liebchen, and A. Sadeghi (2017) CAn’t touch this: software-only mitigation against rowhammer attacks targeting kernel memory. In USENIX Security Symposium, Cited by: §I, §IV-B, §IV-B, §V.
  • [7] Y. Cheng, Z. Zhang, S. Nepal, and Z. Wang (2018) Still hammerable and exploitable: on the effectiveness of software-only physical kernel isolation. arXiv preprint arXiv:1802.07060. Cited by: §III-C1, item i, §IV-A3.
  • [8] P. Frigo, C. Giuffrida, H. Bos, and K. Razavi (2018) Grand pwning unit: accelerating microarchitectural attacks with the gpu. In Security and Privacy, 2018 IEEE Symposium on, Cited by: §I.
  • [9] D. Genkin, L. Pachmanov, E. Tromer, and Y. Yarom (2018) Drive-by key-extraction cache attacks from portable code. In International Conference on Applied Cryptography and Network Security, pp. 83–102. Cited by: §III-C3, §III-C3.
  • [10] B. Gras, K. Razavi, H. Bos, and C. Giuffrida (2018) Translation leak-aside buffer: defeating cache side-channel protections with tlb attacks. In 27th USENIX Security Symposium (USENIX Security 18), pp. 955–972. Cited by: §II-B, §III-C2, §III-C2.
  • [11] D. Gruss, M. Lipp, M. Schwarz, D. Genkin, J. Juffinger, S. O’Connell, W. Schoechl, and Y. Yarom (2017) Another flip in the wall of rowhammer defenses. arXiv preprint arXiv:1710.00551. Cited by: §I, §II-E1, §II-E.
  • [12] D. Gruss, C. Maurice, and S. Mangard (2016) Rowhammer.js: a remote software-induced fault attack in javascript. In Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 300–321. Cited by: §I, §III-A.
  • [13] D. Gruss, C. Maurice, and S. Mangard (2017-05) Program for testing for the dram rowhammer problem using eviction. Note: https://github.com/IAIK/rowhammerjs Cited by: §II-E1, §II-E, §III-B.
  • [14] R. Hund, C. Willems, and T. Holz (2013) Practical timing side channel attacks against kernel space aslr. In 2013 IEEE Symposium on Security and Privacy, pp. 191–205. Cited by: §III-C3.
  • [15] Intel, Inc. (2014-09) Intel 64 and IA-32 architectures optimization reference manual. Cited by: §III-C1, §III-C3.
  • [16] G. Irazoqui, T. Eisenbarth, and B. Sunar (2015) Systematic reverse engineering of cache slice selection in intel processors. In 2015 Euromicro Conference on Digital System Design, pp. 629–636. Cited by: §III-C3.
  • [17] JEDEC Solid State Technology Association. (2015) LOW power double data rate 4 (lpddr4). Note: https://www.jedec.org/standards-documents/docs/jesd209-4b Cited by: §V.
  • [18] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu (2014) Flipping bits in memory without accessing them: an experimental study of dram disturbance errors. In ACM SIGARCH Computer Architecture News, Vol. 42, pp. 361–372. Cited by: §I, §II-E1, §II-E, §II-E, §III-B, §V, §V.
  • [19] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom (2018) Spectre attacks: exploiting speculative execution. arXiv preprint arXiv:1801.01203. Cited by: §V.
  • [20] R. K. Konoth, M. Oliverio, A. Tatar, D. Andriesse, H. Bos, C. Giuffrida, and K. Razavi (2018) ZebRAM: comprehensive and compatible software protection against rowhammer attacks. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 697–710. Cited by: §I, §V.
  • [21] A. Kwong, D. Genkin, D. Gruss, and Y. Yarom (2020) RAMBleed: reading bits in memory without accessing them. In 41st IEEE Symposium on Security and Privacy (S&P), Cited by: §I, §II-E, §III-B, item i.
  • [22] E. Lee, I. Kang, S. Lee, G. E. Suh, and J. H. Ahn (2019) TWiCe: preventing row-hammering by exploiting time window counters. In Proceedings of the 46th International Symposium on Computer Architecture, pp. 385–396. Cited by: §V.
  • [23] M. Lipp, M. T. Aga, M. Schwarz, D. Gruss, C. Maurice, L. Raab, and L. Lamster (2018) Nethammer: inducing rowhammer faults through network requests. arXiv preprint arXiv:1805.04956. Cited by: §II-E1.
  • [24] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg (2018) Meltdown. arXiv preprint arXiv:1801.01207. Cited by: §V.
  • [25] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee (2015) Last-level cache side-channel attacks are practical. In 2015 IEEE Symposium on Security and Privacy, pp. 605–622. Cited by: §II-A, §II-E1, §III-C3, §III-C3.
  • [26] C. Maurice, N. Scouarnec, C. Neumann, O. Heen, and A. Francillon (2015) Reverse engineering intel last-level cache complex addressing using performance counters. In Proceedings of the 18th International Symposium on Research in Attacks, Intrusions, and Defenses, RAID 2015, pp. 48–65. Cited by: §III-C3.
  • [27] C. Maurice, M. Weber, M. Schwarz, L. Giner, D. Gruss, C. A. Boano, S. Mangard, and K. Römer (2017) Hello from the other side: ssh over robust cache covert channels in the cloud.. In NDSS, pp. 8–11. Cited by: §II-E1.
  • [28] Micron, Inc. (2015) DDR4 sdram mt40a2g4, mt40a1g8, mt40a512m16 data sheet. Note: https://www.micron.com/products/dram/ddr4-sdram/ Cited by: §V.
  • [29] Y. Oren, V. P. Kemerlis, S. Sethumadhavan, and A. D. Keromytis (2015) The spy in the sandbox: practical cache attacks in javascript and their implications. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1406–1418. Cited by: §III-C3.
  • [30] P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard (2016) DRAMA: exploiting dram addressing for cross-cpu attacks. In USENIX Security Symposium, pp. 565–581. Cited by: 4th item.
  • [31] R. Qiao and M. Seaborn (2016) A new approach for rowhammer attacks. In Hardware Oriented Security and Trust (HOST), 2016 IEEE International Symposium on, pp. 161–166. Cited by: §III-A.
  • [32] K. Razavi, B. Gras, E. Bosman, B. Preneel, C. Giuffrida, and H. Bos (2016) Flip feng shui: hammering a needle in the software stack. In USENIX Security Symposium, pp. 1–18. Cited by: §I, §II-E1, §III-A.
  • [33] M. Seaborn and T. Dullien Exploiting the dram rowhammer bug to gain kernel privileges. In Black Hat’15, Cited by: §I, §II-E1, §II-E, §II-E, §III-A, §III-C1.
  • [34] A. Tatar, R. K. Konoth, E. Athanasopoulos, C. Giuffrida, H. Bos, and K. Razavi (2018) Throwhammer: rowhammer attacks over the network and defenses. In 2018 USENIX Annual Technical Conference, Cited by: §I, §II-E1.
  • [35] V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice, G. Vigna, H. Bos, K. Razavi, and C. Giuffrida (2016) Drammer: deterministic rowhammer attacks on mobile platforms. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1675–1689. Cited by: §I, §II-E1, item i.
  • [36] X. Wu, T. Sherwood, F. T. Chong, and Y. Li (2019) Protecting page tables from rowhammer attacks using monotonic pointers in dram true-cells. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, pp. 645–657. Cited by: §I, §III-C1, §IV-B, §IV-B.
  • [37] Y. Xiao, X. Zhang, Y. Zhang, and R. Teodorescu (2016) One bit flips, one cloud flops: cross-vm row hammer attacks and privilege escalation. In USENIX Security Symposium, pp. 19–35. Cited by: §I, §III-A, §III-B.