Dataplant: In-DRAM Security Mechanisms for Low-Cost Devices

02/19/2019
by   Lois Orosa, et al.
0

Low-cost devices are now commonplace and can be found in diverse environments such as home electronics or cars. As such devices are ubiquitous and portable, manufacturers strive to minimize their cost and power consumption by eliminating many features that exist in other computing systems. Unfortunately, this results in a lack of basic security features, even though their ubiquity and ease of physical access make these devices particularly vulnerable to attacks. To address the lack of security mechanisms in low-cost devices that make use of DRAM, we propose Dataplant, a new set of low-cost, high-performance, and reliable security primitives that reside in and make use of commodity DRAM chips. The main idea of Dataplant is to slightly modify the internal DRAM timing signals to expose the inherent process variation found in all DRAM chips for generating unpredictable but reproducible values (e.g., keys, seeds, signatures) within DRAM, without affecting regular DRAM operation. We use Dataplant to build two new security mechanisms: 1) a new Dataplant-based physical unclonable function (PUF) with high throughput and high resiliency to temperature changes, and 2) a new cold boot attack prevention mechanism based on Dataplant that automatically destroys all data within DRAM on every power cycle with zero run-time energy and latency overheads. These mechanisms are very easy to integrate with current DDR memory modules. Using a combination of detailed simulations and experiments with real DRAM devices, we show that our Dataplant-based PUF has 10x higher throughput than state-of-the-art DRAM PUFs while being much more resilient to temperature changes. We also demonstrate that our Dataplant-based cold boot attack protection mechanism is 19.5x faster and consumes 2.54x less energy when compared to existing mechanisms.

READ FULL TEXT VIEW PDF
06/10/2021

CODIC: A Low-Cost Substrate for Enabling Custom In-DRAM Functionalities and Optimizations

DRAM is the dominant main memory technology used in modern computing sys...
04/05/2022

Watermarked ReRAM: A Technique to Prevent Counterfeit Memory Chips

Electronic counterfeiting is a longstanding problem with adverse long-te...
08/07/2018

LDPUF: Exploiting DRAM Latency Variations to Generate Robust Device Signatures

Physically Unclonable Functions (PUFs) are potential security blocks to ...
05/11/2022

Key-Value Stores on Flash Storage Devices: A Survey

Key-value stores (KV) have become one of the main components of the mode...
02/25/2019

DRAMNet: Authentication based on Physical Unique Features of DRAM Using Deep Convolutional Neural Networks

Nowadays, there is an increasing interest in the development of Autonomo...
07/13/2018

What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study

Main memory (DRAM) consumes as much as half of the total system power in...

1 Introduction

Low-cost devices have enabled new application domains with the advent of the Internet of Things (IoT) [1]. Example IoT application domains include home automation, intelligent transport systems, wearable computing, and implantable chips for healthcare. Due to their ubiquity and need for mobility, these domains strongly need low-cost and low-power hardware. As a result, manufacturers strive to minimize the number of features on these devices. Unfortunately, this minimalist approach often results in a lack of support for basic security features, enabling a variety of attacks, from taking over automobiles [2] and scooters [3], to orchestrated widespread compromises [4, 5].

While support for security mechanisms is commonly found in larger computing systems (e.g., laptops, servers), this support often requires dedicated hardware (e.g., AES encryption chips [6, 7, 8, 9], random number generators [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]) that, relative to the chip sizes of many IoT devices, is highly costly in terms of area and energy. As a result, there is a strong need for low-cost security mechanisms for protecting low-power computer systems.

Our goal in this paper is to develop a low-cost solution for supporting commonly-used security mechanisms. To this end, we propose Dataplant, a novel low-cost, high-performance, and reliable set of security primitives for commodity DRAM chips. The key idea of Dataplant is to take advantage of inherent DRAM behavior to generate unpredictable values, which we can use to support several security mechanisms without additional cost. DRAM has been optimized during decades for increasing capacity and bandwidth, but existing security mechanisms (e.g., encryption) are too complex for being supported inside commodity chips (e.g., encryption). This work is the first to propose a set of security primitives that are simple enough that can be supported with existing commodity DRAM chips today by largely taking advantage of the existing circuit design and chip operation, without affecting normal DRAM operation.

Our primitive enables the implementation of security mechanisms efficiently in DRAM. We analyze and evaluate two of them in this work: (1) physical unclonable functions (PUFs) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35] for authentication and (2) cold boot attack prevention [36, 37, 38, 39].

1. Lightweight Authentication Using PUFs. Authentication is a key concern in the IoT ecosystem. A lack of proper authentication mechanisms has resulted in numerous attacks, such as spying on baby monitors [40], changing the dosage delivered by drug infusion pumps [41], producing deadly electric shocks in pacemakers [42], and hacking security cameras [43]. Even in systems that do provide strong authentication, a key with low entropy can cause cryptographic operations to fail [44].

Using Dataplant, we provide a lightweight, high-entropy source of random keys that can be used for 1) challenge-response (CR) protocols based on PUFs [45, 46] and 2) generating seeds to securely bootstrap authentication mechanisms such as the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) [47].

2. Cold Boot Attack Prevention. Physical security is critical in an ecosystem where many of the devices are autonomous or may be deployed in public places. One of the simplest and most effective physical attacks is a cold boot attack [37, 38, 39, 48, 49, 36]. Its goal is to retrieve secret information stored in DRAM from the victim’s computer system. For example, an attacker could perform a cold boot attack to retrieve sensitive tracking information from moving devices (e.g., smart watches), private health information (e.g., cardiac monitoring devices), or recent passwords (e.g., ATMs). A common approach to mitigate this attack is encrypting the entire memory [50, 51, 52, 53, 54, 55], but this is expensive in terms of hardware, performance and power consumption, making it unsuitable for low-cost and low-power devices [56]. Using Dataplant, we design a cold boot attack prevention mechanism that immediately destroys all of the data inside DRAM, automatically at power up, without incurring any latency or power overhead (Section 6).

We extensively evaluate Dataplant and the two security applications based on Dataplant using a combination of detailed circuit-level simulations, system simulations, and experiments on real DRAM chips. We obtain two key results. First, our Dataplant-based PUFs are 10x faster while achieving significantly better repeatability under changing temperatures. We also show that the unpredictable numbers generated by Dataplant pass 14 NIST tests  [57], demonstrating randomness and suitability for use as keys or seeds. Second, our Dataplant-based cold boot attack protection is 34x faster and 2.5x more energy efficient than a system implementing our prevention mechanism without Dataplant. Dataplant also enables other security mechanisms (e.g., secure deallocation [58]), and it can be used by system designers and software developers once Dataplant is available in commodity DRAM chips.

Contributions We make the following contributions:

  • [leftmargin=3mm,itemsep=0mm,parsep=0mm,topsep=0mm]

  • We design and implement Dataplant, a new set of low-cost in-DRAM primitives for enabling security features, especially in computer systems with limited hardware capabilities and low-cost requirements.

  • We show how Dataplant enables two critical security mechanisms that are currently lacking in low-cost and low-power systems.

  • We extensively evaluate our new security mechanisms based on Dataplant, and demonstrate that these mechanisms are significantly faster and more energy-efficient than their state-of-the-art counterparts.

2 Background

We provide some background on the DRAM architecture relevant to this work. We describe the organization of a DRAM chip, the architecture of its sense amplifiers and the operations that are performed on a DRAM chip.

DRAM Organization DRAM chips are manufactured in a variety of configurations [59], including a range of capacities and data bus widths ranging between 4 and 16 pins. Since an individual DRAM chip has a small capacity and a limited data width, multiple DRAM chips are usually grouped together in the same DRAM module to form a rank, providing a larger data bus (usually 64-bits wide). Specialized DRAM for IoT can have fewer chips and a narrower bus [60, 61, 62].

Figure 1: DRAM organization, sense amplifier, and cell.

Each DRAM chip consists of multiple banks, and each bank contains multiple 2D arrays (or subarrays) of DRAM cells as shown in Figure 1. The cells are stacked in rows of 4 or 8 KB that share a wordline. Each cell consists of a capacitor which stores the data in form of charge, and an access transistor controlled by the wordline that connects the cell to the Sense Amplifier through the bitline.

DRAM Sense Amplifier (SA) The Sense Amplifiers (SAs) are used for sensing and amplifying the small charge of the cell capacitor to a CMOS-readable value. A set of SAs connected to a row of cells is called row buffer. Figure 1 shows how a cell is connected to a Sense Amplifier (SA) via a bitline. The actions related to the functioning of the SA can be summarized into three steps. First, to be able to sense the cell’s charge, the SA sets the bitline to the precharge level (/2). Second, the cell (at or V) shares its charge with the bitline, which produces a small change in the voltage of the bitline (). Third, the SA is activated and it amplifies the delta of the bitline voltage towards the original value of the cell (either or V).

DRAM Operation The memory controller issues three basic commands as part of a DRAM read or write operation: 1) the Activation (ACT) command senses and amplifies the data from the target row into the row buffer; 2) the read/write (RD/WR) command transfers data from/to the row buffer to/from the DRAM bus; 3) the Precharge (PRE) command clears the row buffer and prepares the subarray for subsequent read/write operations (precharges the bitlines).

Figure 2 details the steps for reading a DRAM cell.

1
Initially, the bitline is precharged to /2 with the wordline set to 0V.

2
To access data from DRAM, the memory controller first issues an ACT command, which raises the voltage of the target wordline and connects the cells of that row to the bitline. This causes the deviation of the bitline voltage in one direction (charge sharing).

3
As a result, the sense amplifier senses and amplifies this deviation (sensing phase). After reaching this phase, the memory controller can issue RD or WR commands. The time needed to finish the ACT command is specified by the timing parameter .

4
The sense amplifier continues to amplify the deviation until the voltage of the cell is fully restored.

5
After that, the controller issues a PRE command to lower the wordline voltage back to 0V and drive the sense amplifier and bitline to /2 . The time needed to complete a PRE command is specified by the timing parameter . Once precharged, the subarray is ready for the next access.

Figure 2: DRAM Activation (ACT), Read (RD) and Precharge (PRE) commands.

3 Overview of DATAPLANT

Dataplant is a novel set of in-DRAM primitives that enable low-cost, high-performance implementations of several commonly-used security mechanisms. The Dataplant primitives (Section 3.1) quickly and cheaply generate unpredictable yet reproducible values. In this work, we show that these primitives can be used to enable low-cost versions of two security mechanisms (Section 3.2).

3.1 Dataplant Primitives

Dataplant consists of two core primitives:

  • [leftmargin=3mm,itemsep=0mm,parsep=0mm,topsep=0mm]

  • US-Dataplant (Unpredictable Values in the DRAM SAs) generates an unpredictable value in the SAs by exploiting inherent process variation in the SAs. US-Dataplant is implemented by simply altering the timing of a few DRAM signals. The generated value can be stored in DRAM, or can read by the processor from the SAs without overwriting data in DRAM.

  • UC-Dataplant (Unpredictable Values in the DRAM Cells) generates an unpredictable value in the DRAM cells by exploiting cell process variation. Like US-Dataplant, UC-Dataplant is implemented by only altering the timing of DRAM signals. The mechanism first empties the DRAM cell, and then restores an unpredictable value during the next activation (relying on process variation). Unlike US-Dataplant, the value generated using UC-Dataplant overwrites the original contents of the cell. Since there are many more cells than SAs in a typical DRAM chip, UC-Dataplant can generate a much larger variety of unpredictable values than US-Dataplant.

The Dataplant primitives share many similarities with DRAM activation. This makes our approach easy to integrate into commodity DRAM chips, and facilitates its easy adoption by industry and standards bodies. We describe the circuit-level implementation details of the two primitives in Section 4, and evaluate their latency, energy, and area in Section 7.1.

Appendix A describes and evaluates an alternative Dataplant implementation that generates deterministic values.

3.2 Implementing Security Mechanism
Using Dataplant Primitives

We demonstrate the effectiveness of our Dataplant primitives by using them to implement two common security mechanisms at low cost. We briefly discuss how Dataplant enables these mechanisms below, and discuss the implementation of these mechanisms in detail in Section 5 and Section 6. Appendix B describes and evaluates secure deallocation, an additional security mechanism that can be implemented with Dataplant.

Dataplant PUFs for Authentication. Existing DRAM physical unclonable functions (PUFs) [63, 64, 65, 66, 67, 68] can enable authentication, but suffer from three major shortcomings. First, they need to write a known data pattern to a targeted memory region, requiring either a dedicated part of memory for PUFs or a backup of the data currently residing in the targeted region. Both of these require Operating System (OS) support, which is infeasible in many low-cost devices. Second, most DRAM PUFs have long evaluation times. Third, many of the PUFs have unstable responses as the temperature changes, limiting their usefulness for IoT devices in the field. Using US-Dataplant and UC-Dataplant, we implement two new DRAM PUFs that overcome all three of these challenges. Our US-Dataplant-based PUF is the first PUF that does not need to alter the contents of DRAM cells. Both of our PUF mechanisms are orders of magnitude faster than state-of-the-art DRAM PUFs, and generate repeatable responses even with large temperature changes, as we demonstrate in Section 7.2.2.

The lack of strong keys is a source of diverse attacks in IoT devices [44, 69]. We can use our US-Dataplant and UC-Dataplant primitives to generate unpredictable numbers with high entropy, as we show in Section 7.2.4, making them suitable for generating cryptographic keys or seeds for cryptographic functions.

Preventing Cold Boot Attacks. Although DRAM memory is volatile, the stored data does not immediately disappear at power-off. Data can be naturally retained in DRAM cells up to minutes after a power-off [49], which enables cold boot attacks [37, 38, 39, 48, 49, 36] (i.e., the data can be read as soon as the device is powered back up). We propose a mechanism that destroys all the data in DRAM by issuing any of the Dataplant primitives when the chip is powered up. Unlike prior mechanisms to prevent cold boot attacks [50, 51, 52, 53, 54, 55], our Dataplant-based mechanism protects against even a computationally unbounded adversary, as it makes brute-force attacks impossible. Our mechanism requires no changes aside from the existence of the Dataplant primitives, incurs no latency or energy overhead at runtime as it operates only at power-up, and it is secure as it operates within DRAM automatically without any external actions (e.g., no DRAM commands).

(a) Regular Activation
(b) US-Dataplant
(c) UC-Dataplant
Figure 6: SPICE simulation of the internal DRAM signals involved in regular precharge and (a) a regular activation, (b) US-Dataplant (including the optional overwriting of the cell) and (c) UC-Dataplant (including the optional overwriting of the cell). In all cases, =1V and the original content of the DRAM cell is “one" ().

4 Implementing DATAPLANT

This section shows the implementation details of the two Dataplant variants, including the different trade-offs between design complexity and available features. To illustrate how they operate, we simulate a detailed sense amplifier SPICE model, as specified in [70].

For comparison purposes, we include the simulation of a regular activation in Figure (a)a. As explained in Section 2, in a regular activation, first, the wordline is activated () for sharing the cell charge with the bitline and altering its voltage. After that, the SA is triggered () to sense this variation in the bitline so that the cell charge is restored towards its original value.

4.1 US-Dataplant Primitive

US-Dataplant generates unpredictable values by exploiting the SA process variation. The key idea behind US-Dataplant is to avoid the charge sharing phase by triggering the SA while the bitline is at precharge voltage level. By doing so, the SA amplifies towards an unpredictable value with no influence from the cell, so the restored value will depend on the SA process variation. The cell can be optionally overwritten by triggering the wordline.

Figure (b)b shows how a value is generated (including the optional overwriting of the cell). The SA () is activated

1
when the bitline () is precharged

2
(i.e., /2 = 0.5V), which drives the bitline towards a value that depends on the process variation

3
(0V in the figure). At this point, the generated value is ready to be read from the SA. If we want to store it in the DRAM cell, the wordline is triggered

4
(), and the generated value is moved to the DRAM cell

5
(), independently of the previous content of the cell.

To illustrate the effects of process variation in the values generated by US-Dataplant, we perform SPICE simulations for five instances of a common SA design with small changes in their physical characteristics, which simulates process variation. Figure 7 shows an SA with zero variation ( = 0) that generates a "zero" value when it is activated with a precharged bitline (bitline difference = 0V). An SA with variation -, + or +2 also generates a "zero". However, cells with variation -2 generates a "one" value. These variations at fabrication time cannot be controlled, and their layout is unpredictable and unique for each device.

Figure 7: Values generated with five different process variations. US-Dataplant operates at 0V bitline difference.

Our SPICE simulations show that the power demanded by US-Dataplant can vary up to 5% depending on the initial value contained on the cell.

4.2 UC-Dataplant Primitive

UC-Dataplant generates unpredictable values by exploiting DRAM process variation in a different way than US-Dataplant. The key idea is to discharge the cell by triggering the precharge logic when the wordline is active. By doing so, the next regular activation will sense and amplify a cell with voltage /2. When the cell is read in the next activation, the cell does not disturb the bitline because both are at precharge voltage level (/2), so the SA will sense a value that depends on process variation (similar to Figure 7).

Figure (c)c shows how UC-Dataplant discharges a cell. The wordline () is activated

1
at the same time with the precharge logic

2
(), which discharges the cell () towards /2

3
. Our SPICE simulations show that UC-Dataplant consumes the same power independently of the initial value of the cell (as the final value is always /2). In Section 7.2.2, we evaluate the feasibility of UC-Dataplant by emulating it on real DRAM chips.

4.3 Discussion

Trade-offs Our two Dataplant implementations have different trade-offs. First, UC-Dataplant relies on destroying the previous content of the cell for generating data, while US-Dataplant can either generate data or destroy the DRAM content. Second, UC-Dataplant is our fastest implementation, but accessing the generated values from the processor requires an additional activation command. US-Dataplant, however, can generate and access the data with only one command. Third, UC-Dataplant value generation relies on the DRAM cells, which are orders of magnitude smaller than the SA. Consequently, UC-Dataplant is potentially more sensitive to technology scaling effects (e.g., data pattern dependencies [71]).

To get the best of all designs, we can merge US-Dataplant and UC-Dataplant in a hybrid implementation with a low hardware overhead. We leverage one of the in-DRAM mode registers (MR) to select which of the two primitives to use, as register MR3 has 13 unused bits.

Security US-Dataplant does not leak any information about the previous content of the cell, because it generates values that are independent of the cell content (Figure 6). UC-Dataplant, as described in Section 4.2, has a slightly different behavior, as it first discharges the DRAM cells in a row, and then activates the sense amplifiers. If an attacker manages to interfere between these two steps, she could try to bias the cells towards some particular value before the amplification (e.g., data pattern dependencies between DRAM cells [71] or row hammering [72]). However, these two steps are executed back-to-back in a few nanoseconds. Thus, there is not enough time to induce any row hammering [72] or similar attacks, which require milliseconds to succeed [73, 74].

Reliability We do not expect reliability degradation due to thermal or electrical interference, because US-Dataplant and UC-Dataplant do not consume significant extra power. The bitlines are floating and there are no transistors directly driving the bitlines. Neither early triggering of SAs nor precharge logic at this time will cause extra power consumption.

Hardware Cost The hardware cost of implementing Dataplant in DRAM is very low. Compared to a regular activation, US-Dataplant only needs to slightly change the timing of the signals that trigger the SA and the access transistor (Section 4.1), and UC-Dataplant only needs to trigger the precharge logic instead of the SA logic (Section 4.2). Independently of the specific circuitry and method used for generating and distributing these internal DRAM signals (these details are not disclosed by the vendors), incorporating our new Dataplant primitives have very low impact on modern commodity DRAM chips. For example, if the signals are generated with a Finite State Machine (FSM), our implementation only requires a few more states that have negligible impact on the hardware overhead.

5 Dataplant PUFs

Establishing a trusted connection to IoT devices is critical in some applications such as health, home automation, transportation, etc [75]. In this work we study PUFs as a convenient building block to implement lightweight authentication mechanisms that fits the needs of IoT [76, 77, 24, 78, 63]. A PUF is a digital fingerprint that provides a unique identifier to a semiconductor device. PUFs can be used not only for implementing authentication mechanisms, but also for binding hardware to software platforms, for secure key storage or for key-less secure communication.

There are two fundamental ways of using PUFs for authentication. First, a PUF-based Challenge-Response (CR) authentication composed of a two-stage process. In the first stage, called profiling, the device’s PUF is evaluated multiple times by a server (or another device). This happens the first time a device becomes operational. The second stage is the actual authentication that happens in the field. The server asks the device to provide a previously-profiled value (challenge). The device then responds with the PUF value (response). If the values match at the server, then the device is authenticated.

Second, using PUFs as a generator of true random numbers for public-keys. Encryption schemes use public keys to achieve security, key exchange algorithms use random numbers to establish secret session keys, and commitment schemes use random numbers to hide committed values [79]. The security of these schemes relies heavily on the unpredictability of the random input values. Hence when the inputs are not truly random the security of the system breaks down [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 44, 69].

Lack of entropy sources in IoT devices is a major problem, making diverse and scary attacks possible across the internet [44, 69]. The main reason is, again, the low-cost nature of the hardware used in these devices. Dataplant can help with this problem by using US-Dataplant and UC-Dataplant for generating high-entropy random numbers (or fingerprints) that can be used directly, as seeds to securely bootstrap other pseudo-random number generators, or other related applications.

A key feature of a good PUF is to be accessible during runtime to improve the flexibility of the authentication mechanisms built on top [68]. Most of state-of-the-art DRAM PUFs are very slow and interfere with the regular behaviour of the system, which make them unsuitable for runtime-access [63, 98, 67, 99, 100].

State-of-the-art DRAM PUFs DRAM-based PUFs are easy to fabricate but difficult to clone, because they rely on process variation to generate unique identifiers. Prior DRAM-based PUF proposals exploit variations in DRAM start-up values [63], DRAM write access latencies [64], DRAM cell retention failures [67, 101, 65] and reduced DRAM timing parameters [68]. There are three main problems with these approaches. First, all of these PUFs rely on the charge that is contained in DRAM cells, thus all the content of the memory region employed for the PUF is irreversibly overwritten. Using these PUFs require either 1) exclusive memory regions for PUFs or 2) copying and restoring the original contents for each PUF challenge request. Second, all but the DRAM latency PUF [68] have high evaluation time. Third, most of them have problems with repeatability when the temperature changes.

PUF-based Authentications Protocols. There are a number of authentication protocols based on PUFs [102, 103, 104, 105]. Independently of the specific protocol, the basis for a secure authentication protocol is the ability of the PUF to provide 1) repeatable responses to the same challenge and 2) unique responses to different challenges. The specific authentication protocols build on top of PUFs are out of the scope of this work.

The Dataplant PUFs We propose two new PUFs based on Dataplant that are low-cost, low latency, with low system interference and robust responses under temperature changes. These properties make our PUFs a very attractive primitive to implement authentication mechanisms for IoT devices.

The first Dataplant PUF is based on US-Dataplant. US-Dataplant PUF does not overwrite the memory content and it is particularly robust given that SAs are much less sensitive to environmental conditions, interference from other elements, and scaling issues [72]. The only downside of US-Dataplant PUF is its small PUF space which is limited to the number row buffers in DRAM (8MB in a 4GB DRAM). The second Dataplant PUF, based on UC-Dataplant does not have this limitation because it can generate PUF values out of every DRAM cell. UC-Dataplant PUF is more sensitive to scaling issues than US-Dataplant PUF, however, we experimentally demonstrate that it is more reliable under temperature variations than the state-of-the-art solutions in Section 7.1. We also show that US-Dataplant PUF and UC-Dataplant PUF can be evaluated 100x faster than the state-of-the-art mechanisms (Section 7.2.3).

Accessing the Dataplant PUFs There are two ways of enabling software access to our PUFs, either by adding a new instruction to the instruction set architecture for reading the PUF [106, 107], or by using a dedicated address range to map the PUF operations to regular load instructions.

On the DRAM side, we need to introduce a new command in the DDR specification. The new command has the same general requirements as a regular activation (US-Dataplant) or as a regular precharge (UC-Dataplant). We can integrate the new command in the JEDEC standard specification [59] without extra cost, as there is unused, reserved space as part of the standard for new commands.

Security Analysis There exist two main threats. First, an attacker could change the environmental conditions of a PUF to force the generation of PUF values different from the ones profiled before. We demonstrate in Section 7.2.2 that this problem is minimized when using Dataplant PUFs because of their resilience to temperature changes. Second, an attacker with physical access to the device could characterize the device and compromise its security. This vulnerability is common to all memory based PUFs (SRAM and DRAM) with finite CR pairs. This vulnerability is usually minimized by disabling the PUF after the secure setup phase [108].

6 Preventing Cold Boot Attacks

This section shows how our Dataplant enables an efficient and simple mechanism to prevent cold boot attacks. Cold boot attacks [36, 37, 38, 39] are possible because the data stored in DRAM is not immediately lost when the chip is powered off. This is due to the capacitive nature of DRAM cells that can hold their data up to some seconds [49]. This reminiscent effect can be even more significant if the DRAM module is cooled down. Taking advantage of this property, an attacker can either take the victim’s DRAM module off and place it in a system under her control with minimal information loss, or boot a small special purpose program from a warm or cold reset to recover the secret information.

State-of-the-Art Defenses. The existing state-of-the-art prevention mechanisms present major drawbacks for IoT and low-cost systems. There are three classes of mechanisms. First, mechanisms that rely on encrypting memory either explicitly [50, 51, 52, 53, 54, 55, 109, 36], or implicitly through some CPU extensions (e.g. Intel SGX [56]). These mechanisms are effective and secure, but are too complex and expensive to be implemented in many IoT and low-cost devices. Second, memory scramblers, which are simpler but have been shown to be insecure [110]. Third, the mechanism proposed by the Trusted Computing Group (TCG) [111] to reset the DRAM content upon power-off (or power-on if the last power-off was not clean). This mechanism is implemented on the host platform firmware and depend on the OS, which makes it vulnerable to attacks [112].

Threat Model. We tackle an attacker that gains physical access to a live uncompromised machine/device for an unlimited amount of time and whose goal is to obtain some information stored in the device’s DRAM. Note that, for any interesting information to be present in DRAM the attacker should get the memory while it is still powered on. We then assume that, as part of the attack, the DRAM chip is powered-off for an arbitrarily short amount of time. This power loss occurs when transplanting the DRAM module to an attacker-controlled machine and during attacks that reboot the victim machine to load a malicious OS. Note that some computers allow warm reboots, in which the power is not cut off. Our cold boot attack prevention mechanism is not compatible with systems that allow warm reboots.

We are not aware of alternative methods to power-on the DRAM, other than using the corresponding DRAM PINs. Therefore, until a technique that can attach a stable external power supply (while the chip is already powered on) to the DRAM chip is engineered, transplanting a chip from one machine to another would inevitably involve a power loss. However, even if this is possible, we speculate that it would require expensive specialized equipment, significantly increasing the cost of an otherwise cheap attack.

In summary, to the best of our knowledge, transplanting the DRAM and rebooting to a different OS are the only ways to perform a Cold Boot Attack today, and both involve a power loss. In particular, we are not aware of any other techniques that allow measuring the charge in the DRAM capacitors, including x-ray techniques.

6.1 Destroying Data at Power Up

We make the observation that it is possible to protect from cold boot attacks without encryption by deleting all memory contents during the DRAM initialization.

Based on this observation, we propose two new Cold Boot Attack prevention mechanisms. First, Self-Destruction, a low-cost mechanism based on Dataplant that destroys all the DRAM content without the intervention of the memory controller. Second, Command-based Destruction, a low-cost mechanism orchestrated by the memory controller that allows a more flexible implementation at the cost of providing weaker security guarantees.

6.1.1 Self-Destruction

The key idea of our Self-Destruction mechanism is to execute a mandatory self-refresh (SR) cycle at power-up that executes Dataplant primitives instead of regular activation commands. This way, the DRAM chip executes a destructive cycle, which can be performed autonomously without the intervention of the memory controller.

The basic principle of a DRAM refresh is to execute an activation and a precharge command into the row to be refreshed. As we show in Section 4, US-Dataplant is very similar to an activation command, and UC-Dataplant is very similar to a precharge command, which allows to easily incorporate them in the refresh operations, leveraging the circuitry that launches regular SR cycles. With Self-Destruction, the data is destroyed in a complete SR window, i.e., 64ms (32ms for LPDDR). During the destructive SR, the DRAM does not allow any memory commands, in order to ensure the atomicity of the process.

Self-Destruction in a Burst refresh. A burst refresh is a refresh mode that is available on Low-power DDR (LPDDR) devices. The main idea of the burst refresh is to complete all the required refreshes in a single burst, with the goal of meeting the deadlines of real-time applications. Our Self-Destruction mechanism is also compatible with this refresh mode that allows destroying data much quicker.

Security Analysis. Our mechanism is triggered in DRAM automatically and without external actions when power is detected. Therefore, the security of Self-Destruction depends on the reliability of the power-on detection circuit of the DRAM module. There are two ways in which an attacker can potentially bypass this circuit. We describe them here and explain why, in practice, they do not pose a security threat.

First, an attacker could operate DRAM at low voltage on the compromised system using, for instance, Dynamic Voltage and Frequency Scaling (DVFS), with the goal of not triggering the power-on detection circuit. The power-on circuit triggers when it detects a voltage ramp up from , but it does not need to reach (it triggers as long as a voltage ramp up starting from is detected). Therefore, operating the DRAM at very low voltage would not help the attacker.111The attacker might try to operate the device at a voltage close to such that the power-on circuit cannot detect the ramp up, however, at such low voltage the DRAM would not be operational.

Second, an attacker could fry the power-up detection mechanism. In practice, however, the FSM that initializes the chip is in the same internal controller that regulates all other functions (i.e., it regulates the timing parameters for activate, precharge, and other commands). Consequently, frying that component would most likely make the whole DRAM unusable.

Hardware Cost Analysis The hardware cost of implementing our Self-destruction mechanism in DRAM is very low. The implementation of Dataplant has very low overhead (Section 4.3), and the logic to trigger a refresh windows at power on is negligible. Triggering Dataplant instead of regular activations in the refresh process requires minimal modifications on the in-DRAM mechanism for generating the control signals.

6.1.2 Command-Based Destruction.

The key idea of the Command-Based Destruction is to force DRAM to follow a particular sequence of commands from the memory controller that leads to the deletion of the whole memory content during the initialization procedure. The mechanism can be implemented with regular write commands, with Rowclone [106], with Lisa-clone [107] or with our Dataplant primitives.

The Command-Based Destruction relaxes the security guarantees since it is conducted by the memory controller. An attacker could easily bypass this procedure by using a customized memory controller or a programmable one [113]. We fix this is by using a mechanism in DRAM that ensures the execution of the appropriate sequence of commands in the initialization procedure. A latch indicates when the DRAM is performing the initialization and, during this phase, filter out any other command. Implementing this method requires an FSM in DRAM, which adds hardware and energy overhead to the existing circuitry. Also, a DRAM module implementing Command-Based Destruction can only be used with compatible memory controllers.

Security Analysis Compared to Self-Destruction, Command-based Destruction has slightly weaker security guarantees. Command-based destruction is not self-contained in-DRAM, and it does not destroy the memory contents automatically at power up. So, the DRAM can be moved to the attacker’s system while still containing critical data. Nevertheless, it is very challenging to bypass the DRAM FSM that disables read commands until the memory is destroyed by the memory controller.

Compared to TCG, Command-based Destruction provides better security guarantees, as our mechanism does not provide any software interface to control the DRAM initialization mechanism.

Hardware Cost Analysis

Compared to Self-Destruction, Command-based Destruction is more complex to integrate into current systems. It requires the modification of the memory controller, and it requires more dedicated DRAM logic (an FSM that checks whether the sequence of commands sent by the memory controller is correct).

As for the PUF mechanism (Section 5), Command-based Destruction has to issue Rowclone/Lisa/Dataplant requests from the memory controller, so it requires a new DRAM command. We can integrate the new command in the DDR JEDEC standard specification [59] without extra cost, as there is unused, reserved space as part of the standard for new commands.

7 Evaluations

We evaluate the Dataplant primitive (Section 7.1), the Dataplant DRAM PUFs (Section 7.2) and our cold boot attack prevention mechanism (Section 7.3).

7.1 Dataplant: Latency, Energy, and Area

Methodology To show the benefits of Dataplant, we study the latency and energy overhead incurred when generating and overwriting values in one DRAM row. As there are no other works that generate data within DRAM, we compare our Dataplant primitives to the state-of-the-art mechanisms for copying data within DRAM, namely Lisa-clone [107] and Rowclone [106]. Rowclone and Lisa-clone propose in-DRAM methods to initialize data to zero by copying a reserved row filled with zeros, to the destination row. Both solutions need to modify the internal architecture of DRAM, and slightly reduce the DRAM’s capacity, since they need helper data to work. We compare all of these mechanisms with a baseline in which we overwrite memory with regular write commands.

We calculate the energy consumption of US-Dataplant and UC-Dataplant using the Activation and Precharge energy consumption described in the power model of the DRAMPower simulator [114].

We estimate the latency of US-Dataplant and UC-Dataplant assuming DDR3 timing constraints. We calculate the energy consumption of US-Dataplant and UC-Dataplant using the Activation and Precharge energy consumption described in the power model of the DRAMPower simulator 

[114].

Latency and Energy Results Table 1 shows the absolute values and the reduction of latency and energy of the evaluated techniques, when overwriting an 8 KB DRAM row. We make two major observations. First, the latency and energy consumption of our two Dataplant primitives are significantly reduced compared to the baseline, Lisa-clone and Rowclone. Second, UC-Dataplant is significantly faster than US-Dataplant, mainly because it avoids the activation of the SA.

Absolute Reduction
Primitive Lat. (ns) Ener. (nJ) Lat. Ener.
Baseline 546 2000 1.0x 1.0x
Lisa-clone 148.5 90 3.67x 22.2x
Rowclone 90 50 6.06x 41.5x
US-Dplant 35 17.3 = 7.3 + 10 15.6x 116x
UC-Dplant 13 17.2 42x 116x
Table 1: Latency and energy of different primitives for overwriting data, for a single operation of granularity 8KB.

Table 1 also shows the Dataplant energy breakdown (value generation + overwriting). The energy consumption is very similar on the two implementations because of two main reasons. First, the two implementations need to route the address within DRAM, which is one of the main sources of energy consumption (around 40%). Second, the energy consumption of the sense amplifier (used in US-Dataplant) and the precharge logic (used in UC-Dataplant) are similar (around 40%). Notice that overwriting in US-Dataplant is optional, hence they require only 7.3nJ and 8nJ respectively to generate an 8KB value, while in UC-Dataplant both processes are indivisible, requiring always 17.2nJ for generation+overwriting.

The latency and energy consumption of US-Dataplant is very similar to a regular activation, and the latency and energy of UC-Dataplant are very similar to that of a precharge.

Area Overhead Lisa-clone has an area overhead of 1% caused by the additional isolation transistors, additional control logic, and one additional zero-filled row per bank. The overhead of Rowclone (0.2%) is caused by the additional zero-filled row per subarray. US-Dataplant and UC-Dataplant incur minimal area overhead for implementing the logic that controls the signal timings.

7.2 Evaluating the Quality of Dataplant PUFs

To show the feasibility of US-Dataplant and UC-Dataplant for implementing PUFs, we simulate the operating conditions of US-Dataplant in SPICE, and we emulate the operating conditions of UC-Dataplant in real DRAM chips with an FPGA-based infrastructure.

7.2.1 Simulating US-Dataplant PUF

We evaluate US-Dataplant PUF with SPICE simulations. It is unfeasible to conduct experiments on real DRAM chips, as US-Dataplant requires internal changes in the DRAM timing parameters, which are hard-coded in the chips.

Methodology To show the effects of process variation on the values generated by US-Dataplant, we evaluate a detailed SA SPICE model [70] using Monte Carlo simulations. We model variations in all the affected components of the sense amplifiers (transistor length/width/threshold voltage). Our SA model always generates ‘1’ bits in absence of process variation. When we introduce process variation into the simulation, we observe that some of the SAs generate ‘0’ bits as well (we call them unpredictable values). We run 100,000 simulations for each variation.

Results Table 2 shows the percentage of SAs that generate unpredictable values for different levels of process variation and different temperatures.

Process variation effects Temperature effects
2% 3% 4% 5% 30C 60C 70C 85C
Unpred. 0% 0% 0.02% 0.19% 0.02% 0.19% 0.21% 0.15%
Table 2: Effect of process variation and temperature on the unpredictability of the generated values in US-Dataplant.

We make two main observations. First, small process variations (<4%) are not enough to generate unpredictable values. Second, large process variations increase the unpredictability of the generated values. As the technology scales, process variation becomes more significant, which increases the unpredictability of the values generated by US-Dataplant PUF (i.e., it would increase the PUF quality). Third, temperature changes do not cause significant variation in the unpredictability of the generated values.

7.2.2 Emulating UC-Dataplant PUF

We evaluate the feasibility of UC-Dataplant for implementing PUFs by recreating its operating conditions using 60 real DRAM chips (from 15 modules) and an FPGA-based infrastructure.

Methodology UC-Dataplant discharges a cell with the precharge logic, and it generates an unpredictable value by amplifying this empty cell. As we don’t have the resources to make a real implementation, we emulate this behavior in real DRAM chips by first disabling the DRAM refresh for 48 hours with the goal of completely discharging the DRAM cells, and then reading the content of the empty cells. This way, our experiment reproduces the output values that would produce a real UC-Dataplant PUF implementation. Recall that discharging the cells would take a few nanoseconds (not 48h) in a real implementation (Section 7.1). We perform our experiments with a customized memory controller, built with SoftMC [113] and a Xilinx ML605 FPGA, on 60 different DDR3 DRAM chips from three major vendors.

Our emulation of UC-Dataplant PUF is challenging, as DRAM cells can retain their content for a long time [71], i.e., not refreshing the DRAM does not guarantee that a cell will be completely discharged, even after a long period.

To deal with this issue, we tailor a custom test to determine if a cell is empty. As discussed in Sections 4.1 and 4.2, when a cell is empty, the value that generates UC-Dataplant should be always the same regardless of the initial value of the cell. Based on this observation, our test analyzes the final value of a DRAM cell after 48 hours without refresh, for two different scenarios: 1) all initial values are zero and 2) all initial values are one. The test has two possible outcomes. First, the test passes if the final value is the same regardless of the initial value. Thus, we can conclude that the cell is actually empty. In this case, the final value should be the one that a real UC-Dataplant implementation would generate, i.e., the activation of an empty cell. Second, the test fails if the final value is different. In that case, we cannot conclude that the cell is empty (i.e., we cannot infer the value generated by UC-Dataplant), so we do not consider that cell in our UC-Dataplant PUF emulation.

Results Our experiments achieve coverage between 34% to 99%, which is the percentage of cells that we are able to get empty with our methodology. The percentage of generated values that are unpredictable because of process variation is between 0.01% and 0.22%222. We evaluate the randomness of the values generated by UC-Dataplant in Section 7.2.4.

To measure the uniqueness and similarity of a PUF, we apply Jaccard indices [115] as suggested by prior works [116, 101, 117, 68]. We determine the Jaccard indices by taking two sets of unpredictable values (), i.e, two sets of PUF responses, from two memory segments, and calculating the ratio of their shared values over the full set of unique unpredictable values . A ratio close to 1 represents high similarity, and a ratio close to 0 represents uniqueness.

We use the term Intra-Jaccard for representing the similarity of two sets from the same memory segment, and Inter-Jaccard for representing the uniqueness of two sets from different

memory segments. An ideal PUF should have an Intra-Jaccard index close to 1 (a unique challenge has a unique response), and an Inter-Jaccard index close to 0 (different challenges have different and random responses).

We compute the distribution of Intra- and Inter-Jaccard indices obtained by running experiments on 60 different DRAM chips with segments of 8KB. We calculate the Intra-Jaccard indices for 10,000 random pairs of memory segments (each pair composed of two responses from the same memory segment), and the Inter-Jaccard indices for 10,000 random pairs of memory segments (each pair composed of two responses from different memory segments) from all memory modules.

We compare UC-Dataplant PUF with the DRAM latency PUF [68]. The DRAM Latency PUF accesses DRAM with reduced timing parameters, which causes some read failures that provide good PUF characteristics. We implemented the DRAM Latency PUF with a reduced

=2.5ns, as it is the timing that reports the best results with our setup. For improving the repeatability of the responses, the DRAM latency PUF implements a filtering mechanism that removes the cells with low failure probability from the PUF response. To this end, the mechanism reads the memory segment 100 times, and it composes a response that contains only the failures that repeat more than 90 times 

[68]. Our UC-Dataplant PUF does not implement any type of filtering mechanism as it achieves consistent results with only one PUF response. While a filter-free DRAM latency PUF can achieve an evaluation time comparable to Dataplant PUFs, the PUF quality would be significantly decreased (Section 7.2.3), compromising the functionality and security.

Figure 8 shows the Intra- and Inter-Jaccard indices of UC-Dataplant PUF and the DRAM Latency PUF [68] for 7 DDR3 modules and 8 DDR3L modules. We make four main observations. First, the DRAM latency PUF without filter has Intra-Jaccard indices that are distributed over all the spectrum (far from ideal), so it does not satisfy the repeatability requirements of a good quality PUF. This problem is solved with the filtering mechanism, which shifts the Intra-Jaccard values closer to one, but also increases the evaluation time (Section 7.2.3). Second, the DRAM Latency PUF has very good Inter-Jaccard indices, very close to zero. Third, the UC-Dataplant PUF has particularly good Intra-Jaccard indices (most of the responses to the same challenge are exactly the same), and still pretty good Inter-Jaccard indices (distributed next to zero). Fourth, the results from DDR3L modules are better than those from DDR3 modules, for both the DRAM Latency PUF and the UC-Dataplant PUF. We conclude that the UC-Dataplant PUF is particularly robust in the repeatability of the responses, while maintaining uniqueness between responses from different memory segments.

Figure 8: Jaccard indices obtained with the DRAM latency PUF (with and without filter) and with the UC-Dataplant PUF on both DDR3 and DDR3L modules.

Based on our results, a naive challenge-response authentication mechanism implemented with UC-Dataplant that correctly authenticates only when the response is exactly the expected, has an average false rejection rate of 0.64% and an average false acceptance rate of 0%.

Temperature and Aging Effects To demonstrate how temperature affects the repeatability, we evaluate the UC-Dataplant PUF and the DRAM latency PUF under different temperatures, ranging from 30C to 85C. The experimental setup is the same as the previous experiment and additionally includes a DRAM heater and a fine-grain temperature controller. For this experiment, we only need to wait for 4 hours (instead of 48 hours), since cells discharge faster at high temperatures. Figure 9 shows the Intra-Jaccard indices between the same segments under different temperatures. Our main observation is that the UC-Dataplant PUF is very robust to temperature changes, maintaining good repeatability even for extreme temperature changes (55C). The DRAM latency PUF repeatability is much more affected by temperature changes, confirming the results of the original work [68]. In conclusion, the UC-Dataplant PUF performs much better than the DRAM Latency PUF under changing temperature conditions.

Figure 9: Intra-jaccard vs Temperature.
Figure 10: Intra-jaccard vs Accelerated aging during 8 hours at 125C.

To demonstrate how aging affects the repeatability, we use accelerated aging techniques to artificially age our DRAM modules [118, 100, 119, 120, 121]. We artificially age our DRAM modules by operating them at 125C degrees running stress tests during 8 hours. Figure 10 shows the Intra-Jaccard indices between the same segments before and after the aging. We observe that UC-Dataplant is very robust to aging (most of the Jaccard indices are 1).

7.2.3 Evaluation Time

In a real system, the evaluation time of UC-Dataplant PUF, US-Dataplant PUF and DRAM latency PUF is dominated by factors such as the memory barrier that ensures that only one memory instruction is in flight (which takes in the order of s) and not by the PUF response generation itself (in the order of ns). This makes the practical evaluation time of the two PUFs roughly the same. Experiments with the DRAM latency PUF show that the evaluation time for a 32B cache block is around 3.4[68], and 0.87ms [68] for a 8KB memory segment.

As we demonstrate previously in Figure 8, the Dataplant PUFs responses to the same challenge are very similar to each other (i.e., the intra-jaccard index distribution is very close to one). Particularly, we observed that, in the worst DRAM chip we tested, Dataplant PUF responses to the same challenge are exactly the same on 99.72% of the times. To ensure reliable PUF behavior, we implement a filtering mechanism similar to the filter implemented in DRAM Latency PUF, but using only 10 PUF challenges. With the filtering mechanism all responses to the same challenge are exactly the same in our experiments. Table 3 summarizes the total evaluation time of DRAM latency PUF, UC-Dataplant PUF and US-Dataplant PUF (with and without filter).

Latency PUF Latency PUF (no-filter) Dataplant PUFs Dataplant PUFs (no-filter)
87ms 0.87 ms 8.7 ms 0.87 ms
Table 3: PUF evaluation time of the DRAM Latency PUF and the Dataplant PUF, using 8KB memory segments.

Our main observation is that the Dataplant PUFs are 10x faster than the DRAM Latency PUF, because it can achieve reliable results with less than 10 PUF responses, whereas the Latency PUF requires 100. Using a filtering mechanism with 10 responses in the DRAM Latency PUF causes a large degradation of the PUF quality (Section 7.2.2) and compromises functionality and security.

7.2.4 Randomness Analysis

A secure key or seed used for initializing a pseudo-random number generator should be random and have high-entropy. Although we already demonstrated the uniqueness of the responses between different memory segments (Section 7.2.2), this does not guarantee properties like high-entropy.

Methodology We analyze the randomness of the values generated by UC-Dataplant with real DRAM chips, with the experimental setup of Section 7.2.2. We generate a sequence of numbers composed by the relative position of the unpredictable values in a cache line. We use the NIST statistical test suite [57] to analyze the numbers generated by UC-Dataplant for 10 different DRAM modules.

Results We run the NIST test suite with the responses to different challenges from all the tested DRAM chips. We collect the PUF responses and we form up to 250KB (depending on the test) sequence numbers. Table 4 shows the average NIST p-values and NIST final results for the numbers generated by UC-Dataplant.

NIST Test. P-value Result
monobit 0.681 PASS
frequency_within_block 1.000 PASS
runs 0.298 PASS
longest_run_ones_in_a_block 0.287 PASS
binary_matrix_rank 0.536 PASS
dft 0.165 PASS
non_overlapping_template_matching 0.808 PASS
overlapping_template_matching 0.005 FAIL
maurers_universal 0.987 PASS
linear_complexity 0.0185 PASS
serial 0.988 PASS
approximate_entropy 0.194 PASS
cumulative_sums 0.940 PASS
random_excursion 0.951 PASS
random_excursion_variant 0.693 PASS
Table 4: Dataplant average results with the NIST randomness test suite.

We make two observations. First, the numbers generated by UC-Dataplant pass 14 out of 15 NIST tests, which demonstrate that our PUF is able to generate good quality random numbers. Second, the numbers generated by UC-Dataplant fail the overlapping_template_matching test. We attributte this fail to the fact that UC-Dataplant depends on process variation of both SAs and cells. As we do not know the set of rows that share the same SAs (the layout is unknown), we cannot intelligently select rows that do not share the same SAs for more randomness. We leave the reverse engineering of subarrays layout for future work.

7.3 Preventing Cold Boot Attacks

In this section, we evaluate our new Command-Based Destruction and the Self-Destruction mechanisms described in Section 6. We customize the memory controller to implement the Command-Based Destruction with Rowclone, Lisa-clone, and with our two Dataplant implementations. We implement Self-Destruction with our two Dataplant variants. We also implemented the TCG specification [111] for preventing cold boot attacks.

Methodology We customize Ramulator [122] to support the two proposed Dataplant implementations, Rowclone and Lisa-clone. Table 5 shows the summary of the DRAM and memory controller configurations used in our evaluation.

Our baseline is the TCG software cold boot attack prevention mechanism. We evaluate TCG by simulating a firmware approach that overwrites the memory with zeros by issuing regular write requests. To force the writing back the data to memory from cache, we use an instruction that invalidates the data on cache (e.g., the CLFLUSH instruction in x86). TCG does not require any hardware changes other than the BIOS customization.

We implement our Self-Destruction and Command-Based Destruction. Self-Destruction takes place entirely within DRAM, and it is implemented only with Dataplant. We implemented the two variants of Self-Destruction described in Section 6.1.1, namely Self-Destruction using self-refresh and Self-Destruction using burst refresh. Command-Based Destruction issues commands from the memory controller that destroy data in DRAM with the state-of-the-art primitives Rowclone, Lisa-clone, and also with our two Dataplant implementations.

To calculate latency of Dataplant we use the SA design described in [70], and to calculate the energy we use a customized version of DRAMPower [114]. For US-Dataplant, we use the same timing parameters as a regular activation, and for UC-Dataplant, we use the same timing as a regular precharge (see Section 4 for details).

Proc. in-order core, 32KB L1 D&I, 512KB L2
Mem. Ctr. 64/64-entry read/write queue, FR-FCFS [123, 124]
DRAM 1-2 channels, DDR3-1600 x8 11/11/11
Table 5: System configuration for evaluating our cold boot attack prevention mechanism.

We have already done a security analysis and hardware cost analysis of our mechanisms in Section 4.3, so in this evaluation we show the latency improvements and the energy savings.

Latency Results

Figure 11 shows the destruction time (in seconds, logarithmic scale) of the TCG software implementation, the Command-Based Destruction (Cmd-D) using all primitives, the Self-Destruction with Burst Refresh (Self-D-Burst) using Dataplant, and the Self-Destruction with Self-Refresh (Self-D-SR) using Dataplant. We assume that the specification of US-Dataplant and UC-Dataplant shares some timing parameters with a regular activation (e.g., tFAW and tRDD) to meet internal DRAM power restrictions. Although we show previously that UC-Dataplant can perform much faster than US-Dataplant (Table 1), the energy consumption is very similar, which limits the throughput of UC-Dataplant. In practice, the latency results of US-Dataplant and UC-Dataplant are identical for the cold boot attack prevention mechanism (Cmd-D Dataplant and Self-D-SR Dataplant in the figure). We test different sizes of DRAM, from 64MB, used in memories specifically designed for IoT [60], to 64GB, used in high-end servers [125]. Our simulator takes into account all timing parameters defined by the DDR standard [59]. The timing parameters for each size are taken from public datasheets released by vendors [126]. For the memories that we don’t have enough information about timing parameters (e.g., 64MB, 64GB), we extrapolate the parameters from similar memory modules.

Figure 11: Time (log scale) to destroy all DRAM data using a software implementation (TCG), our Command-based Destruction (Cmd-D), our Self-Destruction using Burst refresh (Self-D-Burst Dataplant) and our Self-Destruction using Self-Refresh (Self-D-SR Dataplant).

We make four major observations. First, the software-based destruction mechanism (TCG) has a high latency, specially for large DRAM sizes, which delays the boot time of the system significantly. Second, the Command-based destruction (Cmd-D) shows tolerable values for small and medium memory sizes. The Command-based Dataplant implementation performs 1.5x better than Rowclone, and 2.5x better than Lisa. Third, Self-Destruction based on burst refresh performs 19.5x better than Rowclone and 32.6x better than Lisa. Fourth, Self-Destruction based on Self-Refresh has the same latency than a regular refresh window. This implementation shows the best trade-off between performance and complexity (see Section 6.1.1).

Energy Results Table 6 shows the energy savings of different hardware mechanism compared to TCG.

Cmd-D Lisa Cmd-D Rowclone Self-D-Burst & Self-D-SR Dataplant
25x 45x 114x
Table 6: Energy savings in the destruction of all DRAM data compared to TCG.

The energy consumption of Self-Destruction is approximately the same as the Command-Based approach (excluding the energy of the bus). We can observe that our Dataplant implementations show large energy savings compared to TCG (114x), and very significant energy savings compared to Lisa-clone (4.5x) and Rowclone (2.54x).

Comparison with other State-of-the-Art Mechanisms

There exist other state-of-the-art cold boot attack protection mechanisms that are not directly comparable with our proposal, as they are radically different. This is the case with memory encryption, which provides strong security guarantees at the cost of additional energy consumption. Table 7 shows the performance, power, and area overhead of our mechanism compared to ChaCha-8 [36] and AES-128 [36], two low-cost ciphers that can be used to prevent cold boot attacks efficiently [36].

Self-Destruction ChaCha-8  [36] AES-128  [36]
Runtime Performance 0% 0% 0%1
Runtime Power2 0% 17% 12%
Area 0% 0.9% 1.25%
  • when less than 16 back-to-back row hits.

  • at peak bandwidth utilization.

Table 7: Overhead of our Self-Destruction mechanism based on Dataplant compared to two other mechanisms to prevent cold boot attacks on an Intel Atom N280 processor.

We make two main observations. First, our cold boot attack prevention mechanism based on Dataplant has zero performance and power overhead at runtime, and low hardware cost, which make it difficult to beat as a low-cost method for preventing cold boot attacks. Second, although ChaCha-8 and AES-8 can be implemented for hiding the encryption latency in the common case [36], the power and area overheads of ChaCha-8 and AES-128 are significant in low-cost processors such as the Intel Atom N280. We conclude that our zero-overhead proposal is a very efficient way to protect against cold boot attacks in systems where encryption 333While AES-128 and ChaCha-8 provide additional security features, we evaluate their ability to prevent cold boot attacks, as studied in recent literature [36]. is not an option.

8 Related Work

To our knowledge, this is the first paper to propose in-DRAM primitives that can enable security mechanisms in low-cost devices. We demonstrate two applications of our primitives: (1) PUF-based authentication and (2) cold boot attack prevention.

In-Memory Operations We already compare Dataplant to Lisa [107] and Rowclone [106]. As we discussed, Dataplant is a very low overhead set of in-memory primitives to generate data for security mechanisms. Prior works on in-memory operations target other basic functionalities in commodity DRAM chips, such as AND/OR bitwise operations [127, 128, 129]. A number of works perform processing near memory using 3D-stacked memories, which often contain a logic layer, but such logic requires a much greater logic cost [130, 131, 132, 133].

PUFs Many PUFs have been investigated in different components, such as SRAM [134, 135, 136, 137, 138, 139], ASIC logic [77, 140], and DRAM [63, 67, 101, 68]. There are two DRAM PUFs that can be accessed during runtime. First, the Runtime DRAM PUF [101] disables refresh in certain memory regions that are initialized with specific values. The PUF response is a function of the errors produced in the cells due to a lack of refresh after some time t. Second, the DRAM Latency PUF [68] accesses DRAM with reduced timing parameters to induce errors. The DRAM latency PUF overwrites the memory regions used by the PUF, which could have high system interference. Our Dataplant-based PUFs have lower evaluation times than the state-of-the-art DRAM PUFs. Also, our US-Dataplant PUFs is the first DRAM PUF, to our knowledge, that does not overwrite values in DRAM.

Cold Boot Attacks Several works propose encryption mechanisms to protect data against different attacks, including cold boot attacks [51, 52, 53, 55]. These mechanisms usually introduce performance and energy overheads, which various proposals attempt to reduce [50, 52, 141, 54]. Intel’s Software Guard Extensions (SGX) [56] can create protected areas of memory that ensure confidentiality and integrity of data by using strong encryption (AES) and message authentication codes (MAC). Other papers propose to use modern stream ciphers as a fast way to encrypt memory [36, 142].

A recent work on data lifetime management [143] proposes to disable access to the data in DRAM, as another solution for cold boot attacks. The authors provide a new flag in the DRAM decoder, controlled by a DRAM command, that controls the access to a DRAM row for untrusted programs. Unlike our Dataplant-based cold boot attack prevention mechanism, this prior work does not prevent an attacker with physical access from having free access to the rows.

Seol et al. [144] propose a mechanism to initialize DRAM with a reset operation based on connect/disconnect power lines. This reset operation has larger latencies than Dataplant after the reset operation, as they require a precharge and an activation command. In comparison, Dataplant requires only one/two commands to empty the cell. Memory scramblers are the main protection against cold boot attacks in modern unencrypted memory systems. However, these scrambling mechanisms are not sufficient yet to protect against cold boot attacks, as prior works have demonstrated [49, 36].

Our mechanism for protecting cold boot attacks improves the state-of-the-art by proposing a very simple mechanism with no performance or energy overhead at runtime.

9 Conclusion

We propose Dataplant, a set of low-cost, highly efficient, and reliable primitives that can cheaply enable important security mechanisms on any device that makes use of DRAM. Our primitives are especially useful for low-cost and Internet-of-Things (IoT) devices. The main idea of Dataplant is to slightly modify the internal DRAM timing signals to expose the inherent process variation found in all DRAM chips for generating unpredictable but reproducible values (e.g., keys, seeds, signatures) within DRAM. We design two low-cost security mechanisms using Dataplant: (1) lightweight authentication using physically unclonable functions (PUFs) and (2) cold boot attack prevention. Using a combination of simulation and evaluations on real DRAM chips, we show that these two mechanisms are significantly faster and more energy-efficient than their state-of-the-art counterparts, with the same security guarantees. We conclude that Dataplant can effectively enable low-cost and low-power security mechanisms for all types of devices that make use of DRAM, from embedded and IoT devices to high-end servers. This paper is a first step towards a more secure DRAM memory. We hope and expect that the availability of Dataplant in commodity DRAM chips will enable new security features and applications.

References

  • [1] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of things (iot): A vision, architectural elements, and future directions,” Future generation computer systems, vol. 29, no. 7, pp. 1645–1660, 2013.
  • [2] A. Palanca, E. Evenchick, F. Maggi, and S. Zanero, “A stealth, selective, link-layer denial-of-service attack against automotive networks,” in Detection of Intrusions and Malware, and Vulnerability Assessment (M. Polychronakis and M. Meier, eds.), (Cham), pp. 185–206, Springer International Publishing, 2017.
  • [3] “Hackers can stop or speed up xiaomi’s m365 electric scooter.” https://www.engadget.com/2019/02/14/xiaomi-m365-electric-scooter-hack-bluetooth.
  • [4] “Dyn cyber attack.” https://en.wikipedia.org/wiki/2016_Dyn_cyberattack.
  • [5] “Ukraine power grid cyber attack.” https://en.wikipedia.org/wiki/December_2015_Ukraine_power_grid_cyberattack.
  • [6] C.-C. Lu and S.-Y. Tseng, “Integrated Design of AES (Advanced Encryption Standard) Encrypter and Decrypter,” in ASAP, 2002.
  • [7] Y. Yuan, Y. Yang, L. Wu, and X. Zhang, “A High Performance Encryption System Based on AES Algorithm with Novel Hardware Implementation,” in EDSSC, 2018.
  • [8] E. Trichina, T. Korkishko, and K. H. Lee, “Small Size, Low Power, Side Channel-immune AES Coprocessor: Design and Synthesis Results,” in International Conference on Advanced Encryption Standard, 2004.
  • [9] F. Guürkaynak, A. Burg, N. Felber, W. Fichtner, D. Gasser, F. Hug, and H. Kaeslin, “A 2 Gb/s Balanced AES Crypto-chip Implementation,” in GLSVLSI, 2004.
  • [10] T. Amaki, M. Hashimoto, and T. Onoye, “An Oscillator-based True Random Number Generator with Process and Temperature Tolerance,” in DAC, 2015.
  • [11] K. Yang, D. Blaauw, and D. Sylvester, “An All-digital Edge Racing True Random Number Generator Robust Against PVT Variations,” JSSC, 2016.
  • [12] M. Bucci, L. Germani, R. Luzzi, A. Trifiletti, and M. Varanonuovo, “A High-speed Oscillator-based Truly Random Number Source for Cryptographic Applications on a Smart Card IC,” TC, 2003.
  • [13] M. Bhargava, K. Sheikh, and K. Mai, “Robust True Random Number Generator using Hot-carrier Injection Balanced Metastable Sense Amplifiers,” in HOST, 2015.
  • [14] C. S. Petrie and J. A. Connelly, “A Noise-based IC Random Number Generator for Applications in Cryptography,” Trans. Circuits Syst. I, 2000.
  • [15] S. K. Mathew, S. Srinivasan, M. A. Anders, H. Kaul, S. K. Hsu, F. Sheikh, A. Agarwal, S. Satpathy, and R. K. Krishnamurthy, “2.4 Gbps, 7 mW All-digital PVT-variation Tolerant True Random Number Generator for 45 nm CMOS High-performance Microprocessors,” JSSC, 2012.
  • [16] R. Brederlow, R. Prakash, C. Paulus, and R. Thewes, “A Low-power True Random Number Generator using Random Telegraph Noise of Single Oxide-traps,” in ISSCC, 2006.
  • [17] C. Tokunaga, D. Blaauw, and T. Mudge, “True Random Number Generator with a Metastability-based Quality Control,” JSSC, 2008.
  • [18] D. Kinniment and E. Chester, “Design of an On-chip Random Number Generator using Metastability,” in ESSCIRC, 2002.
  • [19] J. Holleman, S. Bridges, B. P. Otis, and C. Diorio, “A 3mu W CMOS True Random Number Generator with Adaptive Floating-Gate Offset Cancellation,” JSSC, 2008.
  • [20] D. E. Holcomb, W. P. Burleson, and K. Fu, “Power-Up SRAM State as an Identifying Fingerprint and Source of True Random Numbers,” in TC, 2009.
  • [21] F. Pareschi, G. Setti, and R. Rovatti, “A Fast Chaos-based True Random Number Generator for Cryptographic Applications,” in ESSCIRC, 2006.
  • [22] A. Stefanov, N. Gisin, O. Guinnard, L. Guinnard, and H. Zbinden, “Optical Quantum Random Number Generator,” J. Mod. Opt, 2000.
  • [23] R. Maes, A. Van Herrewege, and I. Verbauwhede, “Pufky: A fully functional puf-based cryptographic key generator,” in Cryptographic Hardware and Embedded Systems – CHES 2012 (E. Prouff and P. Schaumont, eds.), (Berlin, Heidelberg), pp. 302–319, Springer Berlin Heidelberg, 2012.
  • [24] G. E. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” in 2007 44th ACM/IEEE Design Automation Conference, pp. 9–14, June 2007.
  • [25]

    Z. Paral and S. Devadas, “Reliable and efficient puf-based key generation using pattern matching,” in

    2011 IEEE International Symposium on Hardware-Oriented Security and Trust, pp. 128–133, June 2011.
  • [26] M. D. Yu, R. Sowell, A. Singh, D. M’Raïhi, and S. Devadas, “Performance metrics and empirical results of a puf cryptographic key generation asic,” in 2012 IEEE International Symposium on Hardware-Oriented Security and Trust, pp. 108–115, June 2012.
  • [27] J. Delvaux, D. Gu, D. Schellekens, and I. Verbauwhede, “Helper data algorithms for puf-based key generation: Overview and analysis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, pp. 889–902, June 2015.
  • [28]

    M.-D. M. Yu, D. M’Raihi, R. Sowell, and S. Devadas, “Lightweight and secure puf key storage using limits of machine learning,” in

    Cryptographic Hardware and Embedded Systems – CHES 2011 (B. Preneel and T. Takagi, eds.), (Berlin, Heidelberg), pp. 358–373, Springer Berlin Heidelberg, 2011.
  • [29] J. W. Lee, D. Lim, B. Gassend, G. E. Suh, M. van Dijk, and S. Devadas, “A technique to build a secret key in integrated circuits for identification and authentication applications,” in 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525), pp. 176–179, June 2004.
  • [30] G. Selimis, M. Konijnenburg, M. Ashouei, J. Huisken, H. de Groot, V. van der Leest, G. J. Schrijen, M. van Hulst, and P. Tuyls, “Evaluation of 90nm 6t-sram as physical unclonable function for secure key generation in wireless sensor nodes,” in 2011 IEEE International Symposium of Circuits and Systems (ISCAS), pp. 567–570, May 2011.
  • [31] H. Kang, Y. Hori, T. Katashita, M. Hagiwara, and K. Iwamura, “Cryptographie key generation from puf data using efficient fuzzy extractors,” in 16th International Conference on Advanced Communication Technology, pp. 23–26, Feb 2014.
  • [32] M. Bhargava and K. Mai, “An efficient reliable puf-based cryptographic key generator in 65nm cmos,” in Proceedings of the Conference on Design, Automation & Test in Europe, DATE ’14, (3001 Leuven, Belgium, Belgium), pp. 70:1–70:6, European Design and Automation Association, 2014.
  • [33] R. Maes, V. van der Leest, E. van der Sluis, and F. Willems, “Secure key generation from biased pufs,” in Cryptographic Hardware and Embedded Systems – CHES 2015 (T. Güneysu and H. Handschuh, eds.), (Berlin, Heidelberg), pp. 517–534, Springer Berlin Heidelberg, 2015.
  • [34] P. Tuyls and B. Škorić, “Secret key generation from classical physics: Physical uncloneable functions,” in AmIware Hardware Technology Drivers of Ambient Intelligence, pp. 421–447, Springer, 2006.
  • [35] M. Yu, D. M’Raïhi, S. Devadas, and I. Verbauwhede, “Security and reliability properties of syndrome coding techniques used in puf key generation,” in GOMACTech Conference, pp. 1–4, 2013.
  • [36] S. F. Yitbarek, M. T. Aga, R. Das, and T. Austin, “Cold boot attacks are still hot: Security analysis of memory scramblers in modern processors,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 313–324, Feb 2017.
  • [37] J. A. Halderman, S. D. Schoen, N. Heninger, W. Clarkson, W. Paul, J. A. Calandrino, A. J. Feldman, J. Appelbaum, and E. W. Felten, “Lest we remember: Cold-boot attacks on encryption keys,” Commun. ACM, vol. 52, pp. 91–98, May 2009.
  • [38] P. Simmons, “Security through amnesia: A software-based solution to the cold boot attack on disk encryption,” in Proceedings of the 27th Annual Computer Security Applications Conference, ACSAC ’11, (New York, NY, USA), pp. 73–82, ACM, 2011.
  • [39] M. Gruhn and T. Müller, “On the practicability of cold boot attacks,” in 2013 International Conference on Availability, Reliability and Security, pp. 390–397, Sept 2013.
  • [40]

    “9 baby monitors wide open to hacks that expose users’ most private moments.”

    https://arstechnica.com/information-technology/2015/09/9-baby-monitors-wide-open-to-hacks-that-expose-users-most-private-moments/.
  • [41] “Hacker can send fatal dose to hospital drug pumps.” https://www.wired.com/2015/06/hackers-can-send-fatal-doses-hospital-drug-pumps/.
  • [42] “Hacked terminals capable of causing pacemaker deaths.” https://www.itnews.com.au/news/hacked-terminals-capable-of-causing-pacemaker-mass-murder-319508.
  • [43] “Canon IoT security cameras.” https://www.csoonline.com/article/3271086/security/im-hacked-message-left-on-dozens-of-defaced-canon-iot-security-cameras-in-japan.html.
  • [44] N. Heninger, Z. Durumeric, E. Wustrow, and J. A. Halderman, “Mining your ps and qs: Detection of widespread weak keys in network devices.,” in USENIX Security Symposium, vol. 8, p. 1, 2012.
  • [45] P. Tuyls and B. Škorić, “Strong authentication with physical unclonable functions,” in Security, Privacy, and Trust in Modern Data Management, pp. 133–148, Springer, 2007.
  • [46] J. Guajardo, S. S. Kumar, G.-J. Schrijen, and P. Tuyls, “Fpga intrinsic pufs and their use for ip protection,” in International workshop on cryptographic hardware and embedded systems, pp. 63–80, Springer, 2007.
  • [47] B. Canvel, A. Hiltgen, S. Vaudenay, and M. Vuagnoux, “Password interception in a ssl/tls channel,” in Advances in Cryptology - CRYPTO 2003 (D. Boneh, ed.), (Berlin, Heidelberg), pp. 583–599, Springer Berlin Heidelberg, 2003.
  • [48] C. Hilgers, H. Macht, T. Müller, and M. Spreitzenbarth, “Post-mortem memory analysis of cold-booted android devices,” in 2014 Eighth International Conference on IT Security Incident Management IT Forensics, pp. 62–75, May 2014.
  • [49] J. Bauer, M. Gruhn, and F. C. Freiling, “Lest we forget: Cold-boot attacks on scrambled ddr3 memory,” Digital Investigation, vol. 16, pp. S65 – S74, 2016. DFRWS 2016 Europe.
  • [50] G. E. Suh, D. Clarke, B. Gassend, M. v. Dijk, and S. Devadas, “Efficient memory integrity verification and encryption for secure processors,” in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, (Washington, DC, USA), pp. 339–, IEEE Computer Society, 2003.
  • [51] T. W. Arnold and L. P. Van Doorn, “The ibm pcixcc: A new cryptographic coprocessor for the ibm eserver,” IBM Journal of Research and Development, vol. 48, no. 3.4, pp. 475–487, 2004.
  • [52] J. Yang, L. Gao, and Y. Zhang, “Improving memory encryption performance in secure processors,” IEEE Trans. Comput., vol. 54, pp. 630–640, May 2005.
  • [53] G. Duc and R. Keryell, “Cryptopage: An efficient secure architecture with memory encryption, integrity and information leakage protection,” in 2006 22nd Annual Computer Security Applications Conference (ACSAC’06), pp. 483–492, Dec 2006.
  • [54] B. Rogers, S. Chhabra, M. Prvulovic, and Y. Solihin, “Using address independent seed encryption and bonsai merkle trees to make secure processors os- and performance-friendly,” in 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007), pp. 183–196, Dec 2007.
  • [55] M. Henson and S. Taylor, “Memory encryption: A survey of existing techniques,” ACM Comput. Surv., vol. 46, pp. 53:1–53:26, Mar. 2014.
  • [56] V. Costan and S. Devadas, “Intel sgx explained.,” IACR Cryptology ePrint Archive, vol. 2016, p. 86, 2016.
  • [57] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, and E. Barker, “A statistical test suite for random and pseudorandom number generators for cryptographic applications,” tech. rep., Booz-Allen and Hamilton Inc Mclean Va, 2001.
  • [58] J. Chow, B. Pfaff, T. Garfinkel, and M. Rosenblum, “Shredding your garbage: Reducing data lifetime through secure deallocation.,” in USENIX Security Symposium, pp. 22–22, 2005.
  • [59] J. S. S. T. Association et al., “JEDEC standard: DDR4 SDRAM,” JESD79-4, Sep, 2012.
  • [60] “HyperRAM.” http://www.cypress.com/file/183506/download.
  • [61] “LPDRAM.” http://www.apmemory.com/product_category2.php.
  • [62] “LPDRAM.” www.zentel.com.
  • [63] F. Tehranipoor, N. Karimian, K. Xiao, and J. Chandy, “DRAM based intrinsic physical unclonable functions for system level security,” in Proceedings of the 25th Edition on Great Lakes Symposium on VLSI, GLSVLSI ’15, (New York, NY, USA), pp. 15–20, ACM, 2015.
  • [64] M. S. Hashemian, B. Singh, F. Wolff, D. Weyer, S. Clay, and C. Papachristou, “A robust authentication methodology using physically unclonable functions in DRAM arrays,” in Proceedings of the 2015 Design, Automation &#38; Test in Europe Conference &#38; Exhibition, DATE ’15, (San Jose, CA, USA), pp. 647–652, EDA Consortium, 2015.
  • [65] C. Keller, F. Gürkaynak, H. Kaeslin, and N. Felber, “Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers,” in 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2740–2743, June 2014.
  • [66] W. Liu, Z. Zhang, M. Li, and Z. Liu, “A trustworthy key generation prototype based on ddr3 puf for wireless sensor networks,” in 2014 International Symposium on Computer, Consumer and Control, pp. 706–709, June 2014.
  • [67] S. Sutar, A. Raha, and V. Raghunathan, “D-puf: An intrinsically reconfigurable DRAM PUF for device authentication in embedded systems,” in 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES), pp. 1–10, Oct 2016.
  • [68] J. S. Kim, M. Patel, H. Hassan, and O. Mutlu, “The dram latency puf: Quickly evaluating physical unclonable functions by exploiting the latency-reliability tradeoff in modern commodity dram devices,” in 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 194–207, Feb 2018.
  • [69] A. Costin, J. Zaddach, A. Francillon, D. Balzarotti, and S. Antipolis, “A large-scale analysis of the security of embedded firmwares.,” in USENIX Security Symposium, pp. 95–110, 2014.
  • [70] B. Keeth, DRAM circuit design: fundamental and high-speed topics, vol. 13. John Wiley & Sons, 2008.
  • [71] J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu, “An experimental study of data retention behavior in modern DRAM devices: Implications for retention time profiling mechanisms,” in Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, (New York, NY, USA), pp. 60–71, ACM, 2013.
  • [72] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, “Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors,” in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA ’14, (Piscataway, NJ, USA), pp. 361–372, IEEE Press, 2014.
  • [73] V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice, G. Vigna, H. Bos, K. Razavi, and C. Giuffrida, “Drammer: Deterministic rowhammer attacks on mobile platforms,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, (New York, NY, USA), pp. 1675–1689, ACM, 2016.
  • [74] Z. B. Aweke, S. F. Yitbarek, R. Qiao, R. Das, M. Hicks, Y. Oren, and T. Austin, “Anvil: Software-based protection against next-generation rowhammer attacks,” in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’16, (New York, NY, USA), pp. 743–755, ACM, 2016.
  • [75] C. Miller and C. Valasek, “Remote exploitation of an unaltered passenger vehicle,” Black Hat USA, vol. 2015, 2015.
  • [76] B. Gassend, D. Clarke, M. van Dijk, and S. Devadas, “Silicon physical random functions,” in Proceedings of the 9th ACM Conference on Computer and Communications Security, CCS ’02, (New York, NY, USA), pp. 148–160, ACM, 2002.
  • [77] D. Lim, J. W. Lee, B. Gassend, G. E. Suh, M. van Dijk, and S. Devadas, “Extracting secret keys from integrated circuits,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, pp. 1200–1205, Oct 2005.
  • [78] R. Maes and I. Verbauwhede, Physically Unclonable Functions: A Study on the State of the Art and Future Research Directions, pp. 3–37. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010.
  • [79] H. Corrigan-Gibbs, W. Mu, D. Boneh, and B. Ford, “Ensuring high-quality randomness in cryptographic key generation,” in Proceedings of the 2013 ACM SIGSAC Conference on Computer &#38; Communications Security, CCS ’13, (New York, NY, USA), pp. 685–696, ACM, 2013.
  • [80] H. Corrigan-Gibbs and S. Jana, “Recommendations for randomness in the operating system,”
  • [81] “CVE-2001-0950: ValiCert Enterprise Validation Authority uses insufficiently random data,” January 2001.
  • [82] “CVE-2001-1141: PRNG in SSLeay and OpenSSL could be used by attackers to predict future pseudo-random numbers,” July 2011.
  • [83] “CVE-2001-1467: mkpasswd, as used by Red Hat Linux, seeds its random number generator with its process ID,” April 2001.
  • [84] “CVE-2003-1376: WinZip uses weak random number generation for password protected ZIP files,,” December 2003.
  • [85] “CVE-2005-3087: SecureW2 TLS implementation uses weak random number generators during generation of the pre-master secret,” September 2005.
  • [86] “CVE-2006-1378: PasswordSafe uses a weak random number generator,” March 2006.
  • [87] “CVE-2006-1833: Intel RNG Driver in NetBSD may always generate the same random number,” April 2006.
  • [88] “CVE-2007-2453: Random number feature in Linux kernel does not properly seed pools when there is no entropy,” June 2007.
  • [89] “CVE-2008-0141: WebPortal CMS generates predictable passwords containing only the time of day,” January 2008.
  • [90] “ CVE-2008-0166: OpenSSL on Debian-based operating systems uses a random number generator that generates predictable numbers,” January 2008.
  • [91] “ CVE-2008-2108: GENERATE SEED macro in php produces 24 bits of entropy and simplifies brute force attacks against the rand and mt rand functions,” May 2008.
  • [92] “CVE-2008-5162: The arc4random function in FreeBSD does not have a proper entropy source for a short time period immediately after boot,” November 2008.
  • [93] “CVE-2009-3238: Linux kernel produces insufficiently random numbers,” September 2009.
  • [94] “CVE-2009-3278: QNAP uses rand library function to generate a certain recovery key,” September 2009.
  • [95] “ CVE-2011-3599: Crypt::DSA for Perl, when /dev/random is absent, uses the Data::Random module, Oct. 2011.,” October 2011.
  • [96] A. Lenstra, J. P. Hughes, M. Augier, J. W. Bos, T. Kleinjung, and C. Wachter, “Ron was wrong, whit is right,” tech. rep., IACR, 2012.
  • [97] S. Yilek, E. Rescorla, H. Shacham, B. Enright, and S. Savage, “When private keys are public: Results from the 2008 debian openssl vulnerability,” in Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, IMC ’09, (New York, NY, USA), pp. 15–27, ACM, 2009.
  • [98] A. Rahmati, M. Hicks, D. E. Holcomb, and K. Fu, “Probable cause: The deanonymizing effects of approximate dram,” in Proceedings of the 42Nd Annual International Symposium on Computer Architecture, ISCA ’15, (New York, NY, USA), pp. 604–615, ACM, 2015.
  • [99] S. Sutar, A. Raha, and V. Raghunathan, “Memory-based combination pufs for device authentication in embedded systems,” IEEE Transactions on Multi-Scale Computing Systems, vol. 4, pp. 793–810, Oct 2018.
  • [100] F. Tehranipoor, N. Karimian, W. Yan, and J. A. Chandy, “Investigation of dram pufs reliability under device accelerated aging effects,” in 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4, May 2017.
  • [101] W. Xiong, A. Schaller, N. A. Anagnostopoulos, M. U. Saleem, S. Gabmeyer, S. Katzenbeisser, and J. Szefer, Run-Time Accessible DRAM PUFs in Commodity Devices, pp. 432–453. Berlin, Heidelberg: Springer Berlin Heidelberg, 2016.
  • [102] M. Majzoobi, M. Rostami, F. Koushanfar, D. S. Wallach, and S. Devadas, “Slender puf protocol: A lightweight, robust, and secure authentication by substring matching,” in 2012 IEEE Symposium on Security and Privacy Workshops, pp. 33–44, May 2012.
  • [103] G. Hammouri and B. Sunar, “Puf-hb: A tamper-resilient hb based authentication protocol,” in Applied Cryptography and Network Security (S. M. Bellovin, R. Gennaro, A. Keromytis, and M. Yung, eds.), (Berlin, Heidelberg), pp. 346–365, Springer Berlin Heidelberg, 2008.
  • [104] M. Rostami, M. Majzoobi, F. Koushanfar, D. S. Wallach, and S. Devadas, “Robust and reverse-engineering resilient puf authentication and key-exchange by substring matching,” IEEE Transactions on Emerging Topics in Computing, vol. 2, pp. 37–49, March 2014.
  • [105] W. Che, F. Saqib, and J. Plusquellic, “Puf-based authentication,” in 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 337–344, Nov 2015.
  • [106] V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, “Rowclone: Fast and energy-efficient in-DRAM bulk data copy and initialization,” in 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 185–197, Dec 2013.
  • [107] K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu, “Low-cost inter-linked subarrays (lisa): Enabling fast inter-subarray data movement in DRAM,” in 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 568–580, March 2016.
  • [108] U. Rührmair and D. E. Holcomb, “Pufs at a glance,” in Proceedings of the Conference on Design, Automation & Test in Europe, DATE ’14, (3001 Leuven, Belgium, Belgium), pp. 347:1–347:6, European Design and Automation Association, 2014.
  • [109] D. Kaplan, J. Powell, and T. Woller, “AMD memory encryption,” AMD White Paper, 2016.
  • [110] P. Mosalikanti, C. Mozak, and N. Kurd, “High performance ddr architecture in intel core processors using 32nm cmos high-k metal-gate process,” in Proceedings of 2011 International Symposium on VLSI Design, Automation and Test, pp. 1–4, April 2011.
  • [111] T. Computing, “Tcg platform reset attack mitigation specification, 2008,” Dokument dostupnỳ na< http://www. trustedcomputinggroup. org/resources/pc_client_work_group_platform_reset_attack_ mitigation_specification_version_10>(duben 2009).
  • [112] A. Pilkey, “The chilling reality of cold boot attacks.”
  • [113] H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, and O. Mutlu, “Softmc: A flexible and practical open-source infrastructure for enabling experimental DRAM studies,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 241–252, Feb 2017.
  • [114] K. Chandrasekar, C. Weis, Y. Li, B. Akesson, N. Wehn, and K. Goossens, “DRAMPower: Open-source DRAM power & energy estimation tool,” URL: http://www. drampower. info, vol. 22, 2012.
  • [115] P. Jaccard, “Étude comparative de la distribution florale dans une portion des alpes et des jura,” Bull Soc Vaudoise Sci Nat, vol. 37, pp. 547–579, 1901.
  • [116] A. Schaller, W. Xiong, N. A. Anagnostopoulos, M. U. Saleem, S. Gabmeyer, S. Katzenbeisser, and J. Szefer, “Intrinsic rowhammer pufs: Leveraging the rowhammer effect for improved security,” in 2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 1–7, May 2017.
  • [117] A. Aysu, Y. Wang, P. Schaumont, and M. Orshansky, “A new maskless debiasing method for lightweight physical unclonable functions,” in Hardware Oriented Security and Trust (HOST), 2017 IEEE International Symposium on, pp. 134–139, IEEE, 2017.
  • [118] A. G. Sabnis and J. T. Nelson, “A physical model for degradation of drams during accelerated stress aging,” in 21st International Reliability Physics Symposium, pp. 90–95, April 1983.
  • [119] F. H. Reynolds, “Thermally accelerated aging of semiconductor components,” Proceedings of the IEEE, vol. 62, pp. 212–222, Feb 1974.
  • [120] B. Saha, J. R. Celaya, P. F. Wysocki, and K. F. Goebel, “Towards prognostics for electronics components,” in 2009 IEEE Aerospace conference, pp. 1–7, March 2009.
  • [121] G. Sonnenfeld, K. Goebel, and J. R. Celaya, “An agile accelerated aging, characterization and scenario simulation system for gate controlled power transistors,” in 2008 IEEE AUTOTESTCON, pp. 208–215, Sept 2008.
  • [122] Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A fast and extensible DRAM simulator,” IEEE Computer Architecture Letters, vol. 15, pp. 45–49, Jan 2016.
  • [123] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory access scheduling,” in Proceedings of the 27th Annual International Symposium on Computer Architecture, ISCA ’00, (New York, NY, USA), pp. 128–138, ACM, 2000.
  • [124] W. K. Zuravleff and T. Robinson, “Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order,” May 13 1997. US Patent 5,630,096.
  • [125] “Micron ddr4 sdram lrdimm 64gb.” https://www.micron.com/~/media/documents/products/data-sheet/modules/lrdimm/ddr4/ass72c8gx72lz.pdf.
  • [126] “Micron.” https://www.micron.com.
  • [127] V. Seshadri, K. Hsieh, A. Boroum, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Fast bulk bitwise AND and OR in DRAM,” IEEE Computer Architecture Letters, vol. 14, pp. 127–131, July 2015.
  • [128] A. Akerib, A. Oren, E. Ehrman, and M. Meyassed, “Using storage cells to perform computation,” Aug. 7 2012. US Patent 8,238,173.
  • [129] A. Akerib and E. Ehrman, “In-memory computational device,” May 16 2017. US Patent 9,653,166.
  • [130] A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, “Lazypim: An efficient cache coherence mechanism for processing-in-memory,” IEEE Computer Architecture Letters, vol. 16, pp. 46–50, Jan 2017.
  • [131] L. Nai, R. Hadidi, J. Sim, H. Kim, P. Kumar, and H. Kim, “Graphpim: Enabling instruction-level pim offloading in graph computing frameworks,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 457–468, Feb 2017.
  • [132] K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O’Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler, “Transparent offloading and mapping (tom): Enabling programmer-transparent near-data processing in gpu systems,” in Proceedings of the 43rd International Symposium on Computer Architecture, ISCA ’16, (Piscataway, NJ, USA), pp. 204–216, IEEE Press, 2016.
  • [133] K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, “Accelerating pointer chasing in 3d-stacked memory: Challenges, mechanisms, evaluation,” in 2016 IEEE 34th International Conference on Computer Design (ICCD), pp. 25–32, Oct 2016.
  • [134] D. E. Holcomb, W. P. Burleson, K. Fu, et al., “Initial sram state as a fingerprint and source of true random numbers for rfid tags,” in Proceedings of the Conference on RFID Security, vol. 7, p. 2, 2007.
  • [135] D. E. Holcomb, W. P. Burleson, and K. Fu, “Power-up sram state as an identifying fingerprint and source of true random numbers,” IEEE Transactions on Computers, vol. 58, pp. 1198–1210, Sept 2009.
  • [136] M. Bhargava, C. Cakir, and K. Mai, “Reliability enhancement of bi-stable pufs in 65nm bulk cmos,” in 2012 IEEE International Symposium on Hardware-Oriented Security and Trust, pp. 25–30, June 2012.
  • [137] Y. Zheng, M. S. Hashemian, and S. Bhunia, “Resp: A robust physical unclonable function retrofitted into embedded sram array,” in 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–9, May 2013.
  • [138] K. Xiao, M. T. Rahman, D. Forte, Y. Huang, M. Su, and M. Tehranipoor, “Bit selection algorithm suitable for high-volume production of sram-puf,” in 2014 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp. 101–106, May 2014.
  • [139] A. Bacha and R. Teodorescu, “Authenticache: Harnessing cache ecc for system authentication,” in Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, (New York, NY, USA), pp. 128–140, ACM, 2015.
  • [140] V. van der Leest, G.-J. Schrijen, H. Handschuh, and P. Tuyls, “Hardware intrinsic security from d flip-flops,” in Proceedings of the Fifth ACM Workshop on Scalable Trusted Computing, STC ’10, (New York, NY, USA), pp. 53–62, ACM, 2010.
  • [141] C. Yan, D. Englender, M. Prvulovic, B. Rogers, and Y. Solihin, “Improving cost, performance, and security of memory encryption and authentication,” in Proceedings of the 33rd Annual International Symposium on Computer Architecture, ISCA ’06, (Washington, DC, USA), pp. 179–190, IEEE Computer Society, 2006.
  • [142] D. J. Bernstein, “Chacha, a variant of salsa20,” in Workshop Record of SASC, vol. 8, pp. 3–5, 2008.
  • [143] Y. Lee, Y. Kim, J. Jeong, and J. W. Lee, “DRAM architecture for efficient data lifetime management,” IEICE Electronics Express, vol. 14, no. 10, pp. 20170309–20170309, 2017.
  • [144] H. Seol, W. Shin, J. Jang, J. Choi, J. Suh, and L. S. Kim, “In-dram data initialization,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, pp. 3251–3254, Nov 2017.
  • [145] M. Anikeev and F. Freiling, “Preventing malicious data harvesting from deallocated memory areas,” in Proceedings of the 6th International Conference on Security of Information and Networks, SIN ’13, (New York, NY, USA), pp. 448–449, ACM, 2013.
  • [146] L. Sha, F. Xiao, W. Chen, and J. Sun, “Iiot-sidefender: Detecting and defense against the sensitive information leakage in industry iot,” World Wide Web, vol. 21, pp. 59–88, Jan 2018.
  • [147] K. Harrison and S. Xu, “Protecting cryptographic keys from memory disclosure attacks,” in Dependable Systems and Networks, 2007. DSN’07. 37th Annual IEEE/IFIP International Conference on, pp. 137–143, IEEE, 2007.
  • [148] R. Geambasu, T. Kohno, A. A. Levy, and H. M. Levy, “Vanish: Increasing data privacy with self-destructing data.,” in USENIX Security Symposium, vol. 9, 2009.
  • [149] T. Garfinkel, B. Pfaff, J. Chow, and M. Rosenblum, “Data lifetime is a systems problem,” in Proceedings of the 11th Workshop on ACM SIGOPS European Workshop, EW 11, (New York, NY, USA), ACM, 2004.
  • [150] R. Geambasu, T. Kohno, A. A. Levy, and H. M. Levy, “Vanish: Increasing data privacy with self-destructing data,” in Proceedings of the 18th Conference on USENIX Security Symposium, SSYM’09, (Berkeley, CA, USA), pp. 299–316, USENIX Association, 2009.
  • [151] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, “Pin: Building customized program analysis tools with dynamic instrumentation,” SIGPLAN Not., vol. 40, pp. 190–200, June 2005.
  • [152] Bochs: The Open Source IA-32 Emulation Project, http://bochs.sourceforge.net/.
  • [153] Transaction Performance Processing Council, "TPC Benchmarks", http://www.tpc.org.
  • [154] J.D. McCalpin, "STREAM Benchmark".
  • [155] Standard Performance Evaluation Corp., "SPEC CPU2006 Benchmarks", http://www.spec.org/cpu2006.
  • [156] J. Poovey et al., "Dynograph", https://github.com/sirpoovey/DynoGraph.
  • [157] HPC Challenge, "RandomAccess", http://icl.cs.utk.edu/hpcc.
  • [158] MySQL: an open source database, https://www.mysql.com.
  • [159] Memcached: a distributed memory object caching system, https://memcached.org.
  • [160] Stress-ng: a tool to load and stress a computer system, http://kernel.ubuntu.com/ cking/stress-ng.

Appendix A D-Dataplant primitive

D-Dataplant generates deterministic values within the SAs by adding a transistor to each SA in the row buffer. The generated values can be stored in DRAM, or can be read by the processor from the SAs without overwriting data in DRAM.

D-Dataplant deterministically drives the bitline voltage level to zero (0V) or one () and optionally writes the generated value into the cell. The key idea is to add an additional path connecting a fixed voltage level to the bitline. To this end, we add an additional transistor controlled by a Dplant signal. Figure 12 (left) shows how this transistor is connected into the SA to generate a "zero" or a "one" value. Figure 12 (right) illustrates how the Dataplant transistor drives the cell towards a deterministic value (zero in this case).

Figure 12: Dataplant transistor placement (left), and behaviour (right).

Figure 13 shows how the value is generated (including the optional overwriting of the cell). First, the Dplant ()

1
and the SA

2
() signals are triggered for driving the bitline to the deterministic voltage level

3
. Then, if the wordline is triggered

4
(), the generated value is moved to the DRAM cell

5
(), overwriting the previous content of the cell.

Figure 13: SPICE simulation of the internal DRAM signals involved in D-Dataplant. =1V and the original content of the DRAM cell is “one" ().

a.1 Evaluation

Table 8 shows the absolute values and the reduction of latency and energy of the evaluated techniques, when overwriting an 8 KB DRAM row. The table also shows the Dataplant energy breakdown (value generation + overwriting) of D-DataplantD-Dplant). The energy consumption of D-Dataplants very similar to US-Dataplant and UC-Dataplant because the additional transistor of D-Dataplant has a very low effect on the energy consumption. Notice that overwriting D-Dataplant is optional, hence they require only 8nJ to generate a 8KB value.

Absolute Reduction
Primitive Lat. (ns) Ener. (nJ) Lat. Ener.
Baseline 546 2000 1.0x 1.0x
Lisa-clone 148.5 90 3.67x 22.2x
Rowclone 90 50 6.06x 41.5x
US-Dplant 35 17.3 = 7.3 + 10 15.6x 116x
UC-Dplant 13 17.2 42x 116x
D-Dplant 35 18 = 8 + 10 15.6x 111x
Table 8: Latency and energy of different primitives for overwriting data, for a single operation of granularity 8KB.

Area Overhead The overhead of D-Dataplant is an additional transistor per SA. Our SPICE simulations show that this transistor can be very small, but we consider a full-size transistor to avoid possible layout and fabrication issues. Considering an SA composed of 20 transistors [70], the worst case overall area overhead is between 0.4% and 2% depending on the DRAM design. 444This depends on the number of cells per bitline in the subarray, which determines the total number of SAs in the module..

Appendix B Secure Deallocation

Dataplant enables the implementation of a variety of applications in the security domain. We have already described two important applications enabled by Dataplant, authentication mechanisms based on PUFs (Section 5) and preventing cold boot attacks (Section 6). In this section, we briefly discuss secure deallocation, an additional application that can benefit from Dataplant.

Today’s applications, especially web servers, web browsers, and word processors, do not immediately remove data from memory when it is no longer needed. Instead, the data is physically erased only when the memory is required for other uses. As a consequence, sensitive data could remain in memory for an indefinite amount of time, which augments the risk of exposure.

Secure deallocation [58, 145, 146, 147, 148, 149] is a technique that set the data to zero at the moment of deallocation, or as soon as the data is not needed anymore. This technique reduces the time that critical data is exposed to attacks. Vanish [150] proposes a similar idea in which the old copies of data are self-destroyed after a specific amount of time. Dataplant enables the implementation of the previous techniques with very low latency, energy, and area overhead.

b.1 Evaluation

Methodology. We simulate US-Dataplant and UC-Dataplant (described in Section 4), D-Dataplant (described in Section A), Rowclone, Lisa-clone, and a software secure deallocation mechanism  [58]. Notice that, if the OS guarantees that new allocated pages are filled with all zeros, we need to use D-Dataplant instead of US-Dataplant and UC-Dataplant.

We customized Ramulator [122] to support all the mechanisms on in-order cores. To generate the traces that drive our simulator, we use PIN [151] for user-level traces, and a full-system emulator, Bochs [152] to generate the memory traces that include Linux kernel page allocations and deallocations.

To calculate the area, energy, and latency of our Dataplants, we use the SA design described in [70]. To estimate the energy consumed by the DRAM module, we use a customized version of DRAMPower [114]. Table 9 shows the system configuration used in our evaluation. We use 1 memory channel.

Processor 1-4 cores, in-order,
Cache L1:64KB, L2:512KB per core, 64B lines
Mem. Ctr. 64/64-entry read/write queue, FR-FCFS [123, 124]
DRAM 1 channel, DDR3-1600 x8 11/11/11
Table 9: System configuration.

We use benchmarks that cause intensive page allocations (Table 10). For the 4 core evaluation, we choose 50 mixes of workloads, in which each mix is composed by two benchmarks that are intensive in memory allocations/deallocations, and the other two that are not (Table 11 shows 5 representative mixes). Between the non-intensive page allocation benchmarks, we include TPC-C and H [153], STREAM [154], SPEC2006 [155], DynoGraph (pagerank, bfs, stream) [156] and HPCC RandomAccess [157].

Bench. Description
mysql MySQL [158] loading the sample employeedb.
mcached Memcached [159], a memory object caching system
compiler Compilation phase from the GNU C compiler
bootup Linux kernel booting up phase
shell Script running ’find’ in a directory tree with ’ls’
malloc stress-ng [160] stressing the malloc primitive
Table 10: Benchmarks for evaluating secure deallocation.
MIX1: malloc, bootup, tpcc64, libquantum MIX4: malloc, shell, xalancbmk, bzip2
MIX2: shell, bootup, lbm, xalancbmk MIX5: malloc, malloc, astar, condmat
MIX3: bootup, shell, pagerank, pagerank
Table 11: Five representative mixes (out of 50) used in the multicore evaluation for secure deallocation.

Results Figure 14 shows the performance (higher is better) and energy savings (larger is better) of Dataplant and other state-of-the-art mechanisms (based on Rowclone and Lisa-clone) normalized to a software secure deallocation implementation on a single core. Figure 15 shows the same results in a 4-core processor. Three observations are in order. First, the hardware implementations improve the performance up to 21% and the energy savings up to 34%, compared to a software implementation. Second, Dataplant performs better than Lisa-clone and Rowclone in all cases, both in performance and energy consumption. Third, the performance improvements and energy savings of Dataplant compared to Rowclone and Lisa-clone are not very large for some benchmarks. Note, however, that our approaches are much easier to integrate on commodity DRAM chips than Lisa-clone or Rowclone (Section 4).

Figure 14: Single core speedup (larger is better) and energy savings (larger is better) of the secure deallocation hardware approaches compared to a software approach.
Figure 15: 4 core speedup (larger is better) and energy savings (larger is better) of the secure deallocation hardware approaches compared to a software approach.