LDPUF: Exploiting DRAM Latency Variations to Generate Robust Device Signatures

08/07/2018 ∙ by B. M. S. Bahar Talukder, et al. ∙ The University of Alabama in Huntsville University of Florida 0

Physically Unclonable Functions (PUFs) are potential security blocks to generate unique and more secure keys in low-cost cryptographic applications. Memories have been popular candidates for PUFs because of their prevalence in the modern electronic systems. However, the existing techniques of generating device signatures from DRAM is very slow, destructive (destroy the current data), and disruptive to system operation. In this paper, we propose latency-based (precharge) PUF which exploits DRAM precharge latency to generate signatures. Our proposed methodology for key generation is fast, robust, least disruptive, and non-destructive. The silicon results from DDR3 chips show that the proposed key generation technique is at least 4,300X faster than the existing approaches, while reliably reproducing the key in extreme operating conditions.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Physical unclonable functions (PUFs) play important roles in security by offering a high level of protection in cryptographic applications with the capability of strong volatile key or unique ID generation. A PUF is a circuit that generates unique fingerprints by exploiting the inherent and unavoidable manufacturing process variations during fabrication [1, 2]. Identification, authentication, secure communication, IC obfuscation to prevent IC piracy in semiconductor supply chain, detection of counterfeit ICs, etc. are a few common applications of PUFs because of their unique and unpredictable characteristics [3, 4, 5, 1, 6, 7, 8, 9]. In recent years, PUFs have also been used in IoT applications because they enable low-cost solutions with a high level of security [10, 11, 12].

In addition to low-cost, the memory-based PUF provides an opportunity to implement PUF-based schemes to the existing system [5, 2, 8, 9]. The start-up behavior of the memory chips, disturbance characteristics, the random decay properties, etc. are the most common techniques to generate responses from memory chips [1]. Previous works on DRAM PUFs (DPUFs) have focused on: (i) retention-based: writing all cells to ‘1’ and disabling the refresh then waiting for half the cells to discharge and reading cell values [2, 13, 14, 15], (ii) start-up based: using the start-up values of the cells to generate the secret key as in [16, 17], and (iii) disturbance-based: disturbance caused by rowhammer [18, 19]. The variations in activation latency time have also been used to generate device signatures [20]. In this method, the signature is obtained from the errors generated at the reduced activation time during read operation [20].

In PUF-based applications, the responses (i.e., the PUF outputs) have to be robust, fast, random, and unique [5, 21, 22, 23, 24]. Like other silicon PUFs, the DRAM-based PUF responses are also impacted by external influences such as operating and environmental variations, aging, etc. [25, 26, 27, 28, 29, 30, 31, 32]. In addition, the existing signature generation schemes from DRAM do not offer impressive throughput; retention-based DPUF requires an order of minutes, and start-up based DPUF needs a power cycle. The destructiveness of the memory contents, disruption of the system, etc. are few other major limitations of existing DRAM-based PUFs (discussed in Section II-E).

While some applications can tolerate a certain amount of errors, others, such as the generation of cryptographic keys, cannot. To make the PUF output more stable (i.e., to obtain the same response for the applied challenge to a PUF), error correcting code (ECC) and different enrollment schemes are often used but at the expense of additional cost [33, 34, 35].

In this paper, we propose PreLatPUF that exploits the precharge timing latency variations in DRAM to generate device signatures. The main contributions of this paper (i.e., to generate robust device signatures from DRAM) are summarized below.

  • We propose precharge latency based DRAM PUF (PreLatPUF) that generates device signatures at a much faster rate. We experimentally demonstrate that the faulty read operation at the reduced precharge latency can be used to generate unique and random device signatures.

  • We characterize the errors at the reduced precharge latency to discover cells that are most suitable for robust and reliable PUFs.

  • We propose a cell selection algorithm and a registration technique to ensure that the signatures generated at the reduced precharge latency are robust, unique, and random.

  • We present a quantitative and qualitative comparison between PreLatPUF and some of the previously proposed DRAM-based PUFs. The results show that the proposed PreLatPUF outperforms existing DPUFs in several aspects.

  • We evaluate the proposed PreLatPUF using commercially available DDR3 DRAM modules.

The rest of the paper is organized as follows. In Section II, we present the background of DRAM architecture, read/write operation, existing DRAM-based PUFs and major challenges. We propose the latency-based DRAM PUF in Section III. The experimental results and discussions are presented in Section IV. We conclude the paper in Section V.

Ii Background and Motivation

In this section, we provide a brief background of the modern memory subsystem and its operation. We also present existing DRAM-based PUFs and their limitations.

Ii-a DRAM Organization

Fig. 1 illustrates the organization of a modern DRAM system, which maintains a hierarchy of channel, rank, bank, DRAM chips, DRAM cells, and memory controller. Depending on the system requirement, different electronic systems can have DRAM modules of different sizes. A DRAM module is divided into one or multiple ranks. The rank is accessed in each reading/writing attempt. Rank, again, consists of several DRAM chips and provides a wide databus together. The same databus is shared among the ranks. A chip select pin is used to choose a particular rank. The width of the databus is usually bits and distributed equally among the chips inside a rank. Each DRAM chip consists of multiple banks to support the parallelism. In a memory bank, the DRAM cells are arranged in a two-dimensional array. The rows and columns of a DRAM are known as wordline and bitline, respectively. The row of a DRAM is also known as the page. The bitlines are connected to the row-buffer (a row of sense-amplifiers). When a DRAM is read, the sense-amplifier senses the stored charge of each memory cell and latches it to a corresponding value (‘1’ or ‘0’). A DRAM cell, the smallest unit, is used to store a single bit (‘1’ or ‘0’). The DRAM cell consists of two components: a capacitor to hold the charge and an access transistor to access the capacitor. The charging state of the capacitor determines the state of the value (‘1’ or ‘0’). A fully charged capacitor is represented by logic ‘1’. On the other hand, logic ‘0’ is the representation of a capacitor with no charge.

i DRAM system.
ii Close view on a Memory Bank.
Fig. 1: Organization of a modern memory subsystem [36], [37].

Ii-B DRAM Operation

Ii-B1 READ Operation:

Fig. 1i presents a simplified DRAM read operation, which consists of several states. In the precharge state, the memory controller generates a precharge command (PRE) to precharge all bitlines to (green line). This command also deactivates previously activated wordline. In the next state (i.e., the activation state), the ACTIVATE command (ACT) from the memory controller activates the target wordline by raising the value of wordline to (violet line). Once the pass-transistor (connected to the wordline) is ON, the charge flows from the capacitor (red line) to the attached bitline if the stored value is ‘1’, and moves from bitline to the capacitor if the stored value is ‘0’. In the final stage, the differential sense-amplifier senses the voltage perturbation on the bitline and amplifies the bitline voltage to a strong logic ‘1’ (or ‘0’). Then, the sense-amplifier latches the logic value from the bitline. In the DRAM system, the read operation is destructive; therefore, rewriting after reading is mandatory.

Ii-B2 WRITE Operation:

In the write operation, initially all bitlines are precharged to with the PRE command. The ACT command is applied to write data into a specific wordline. The sense-amplifier with desired logic value enables the corresponding bitline to charge or discharge the connected cell capacitor. After each successful READ/WRITE operation, the bitlines must be precharged back to to access a new set of memory cells from a different wordline.

Ii-C DRAM Timing

Timing is critical for reliable DRAM operation. All major timing parameters of a DRAM module are presented in Fig. 1ii. Initially, all bitlines are precharged to . To access the data from a specific wordline, ACTIVATE (ACT) command is applied to the corresponding wordline. Once that is completed, a READ/WRITE command is sent from the memory controller to sense the voltage perturbation on bitlines or to write a data to the memory cells. The minimum required time interval between ACT command and READ/WRITE command is defined as the activation time, . The Column Access Strobe (CAS) latency is the minimum waiting time to get the first data bit on data bus after sending a READ command. After a successful READ/WRITE operation, precharge command (PRE) is applied to deactivate the previously activated wordline (if any) and precharge the bitlines to its initial precharge state (i.e., to ). If the WRITE command is applied, the PRE command should be further delayed by period (write recovery time) at the end the write data burst. The PRE command is applied for at least (precharge time) duration before sending the next ACT command. The duration between the activation state to the beginning of the precharge state is called row active time or restoration latency (). The + is the total time required to access a single row of a bank and is known as row cycle time (). Usually, the is in the order of 50ns for most modern DDR3 DRAMs.

i Signal waveform at the reading cycle.[38]
ii DRAM timing at the reading cycle.[37].
Fig. 2: DRAM operation and timing.

Ii-D Existing DRAM-based PUFs

Ii-D1 Retention-based DRAM PUFs (DPUFs):

Signatures are generated by disabling the refresh interval for a certain and sufficient amount of time [13, 20]. The DRAM cells are leaky, and therefore, the DRAM contents need to be refreshed periodically, usually 64ms or 32ms according to the JEDEC specification [39], to ensure the data integrity [13].Failing to refresh periodically within this time interval introduces errors due to the leaky property of DRAM cells. The error pattern generated from the retention failure is unique from chip to chip and is used to generate device signatures [13, 20].

The retention-based device signature is promising but suffers from several drawbacks that hinder its use in real applications. First, the periodic refresh operation in most DRAM modules is handled internally by a memory controller. There is no efficient way to control this refresh time for an arbitrarily small region of DRAM module since the granularity for such refresh operation is predefined by the vendors. Some common control signals control memory cells under the same granularity, and therefore changing a timing parameter on one cell affects all other cells as well. On the other hand, two rows from two different granular regions can be accessed independently (but not simultaneously as they may share the same channel). For a retention-based DRAM PUF, an authentication key of sufficient length can be generated by retention failure from a small portion of a DRAM module, but the whole operation may cause unwanted data corruption of other memory cells under the same granularity [39]. Second, a key of sufficient length requires an adequate number of errors; therefore, it might need a long waiting time (order of minutes) to generate a key with desired length and quality [20]. Third, the retention time is heavily temperature dependent, which makes the key sensitive to temperature variations [13, 14, 15, 40, 41, 42]. Previous studies show that the bit error rate (BER) increases exponentially with the temperature; the key generation scheme requires a longer time interval between two refresh operations at a lower temperature [43]. The required time to generate the key is also a function of the size of the memory segment. A smaller segment requires longer evaluation time than a larger one [20]. Therefore, the designer must decide on area vs. time overhead. Several techniques can be used to address the above challenges but with a limited gain [43, 44, 25, 45, 13, 46, 47, 48].

Ii-D2 Latency-based DPUFs:

The reduction in introduces erroneous read/write operation (see Section II-C), which can be used to generate device signatures [20]. This latency-based PUF generates signature at a much faster rate [20]. The reported result shows that the mean evaluation time is 88.2ms (outperforms all previously proposed retention-based DPUFs [13, 14, 15]). However, it still requires multiple row cycles to evaluate the PUF response. This latency-based DPUF also needs a filtering mechanism in each access that adds both hardware and computational overheads.

Ii-D3 Start-up based DPUFs:

In start-up based DPUF[17], the device signature is generated from the start-up states of DRAM cells. Initially, the bitlines are charged to . But the process variations on the storage capacitor slightly deviate the bitline voltage to or , where represents a small voltage. The sense-amplifier senses the voltage difference to ‘1’ or ‘0’, accordingly. Upon power-up, the DRAM cells generate ‘1’s and ‘0’s randomly. The significant challenges of a start-up based DPUF are: (i) requirement of a power-cycle and (ii) a time gap between the turn-OFF and turn-ON is required to avoid a strong correlation between the data before turn-OFF and the signature.

Ii-D4 Rowhammer DPUFs:

The errors caused by the rowhammer disturbance are used to generate device signatures [18, 19]. This technique does not require any additional power cycle. However, the average evaluation time of a rowhammer PUF is in order of minutes and therefore might not be suitable in many applications. Besides, all DRAMs are not vulnerable to rowhammer [18].

Ii-E Motivations

Below, we summarize the major motivations of our proposed work.

  • Waste of DRAM Power Cycle: Start-up based key generation requires a DRAM power cycle to obtain device signatures [17]. Hence, the whole system needs a power cycle (i.e., a turn-off and a turn-on) to obtain the PUF response. Therefore, this type of PUF cannot be evaluated while the system is in operation.

  • Large Evaluation Time: Rowhammer-based and retention-based key generation techniques require an order of minutes to generate enough bit failures and therefore not suitable for many applications [13, 14, 15, 49, 18, 19]. On the other hand, the existing latency-based DPUF still needs multiple row cycles (reading one data burst at each cycle) to evaluate the PUF key [20] since the reduction in activation time only affects the first few bits in the cache line (see Section II-C).

  • Destructive: Retention-based key generation is destructive. The DRAM granularity causes random failed bit throughout the smallest granular region (usually a rank). Note that the DRAM refresh can be disabled only at the granularity of channels [39]. A dedicated memory might need to be used to overcome this problem but at the expense of additional hardware. The start-up based and rowhammer-based DPUFs are also destructive.

  • Disruptive: DRAM granularity keeps the entire DRAM rank busy during each access. Hence, such kind of PUF evaluation blocks the access on the target DRAM region by other applications for a long time. Though the existing latency-based DRAM PUF [20] solves the problem of long evaluation time and unwanted data failure (due to the granularity), it still needs a filtering mechanism to evaluate PUF in each access, which introduces additional computational and timing overheads.

Iii PreLatPUF: Prechareg Latency-based PUF

In this section, we present the proposed PreLatPUF, cell characterization, and cell selection algorithm.

Iii-a Precharge Latency and Source of Variations

The latency is defined as the time required to move charge during read/write operation. In modern DRAM architecture, multiple DRAM cells are connected to the same bitline through access transistors. The DRAM vendor provides the minimum required timing latency to perform a reliable read/write operation. Erroneous read/write operation is observed if the minimum timing latency is not maintained [37]. In our experimental results, the following observations have been discovered that are also consistent with [37] and [50].

  • Observation 1: A reduced only affects the first accessed column/cache line.

  • Observation 2: A reduced might affect almost all cells of a row.

  • Observation 3: Almost no bit error is introduced at the reduced .

From the above observations, we can conclude that the reduction in or can be used to generate device signatures from a DRAM. The -based PUF has been proposed in [20] that needs an additional filtering mechanism and several row cycles (discussed in Section II-E). In this article, we use the variations to generate device signatures.

The DRAM cell characteristics at the reduced mostly rely on the internal structure of a DRAM module, process variations, layout variations, data dependency, etc. [51, 52, 27, 49, 46, 37, 20, 53]. Fig. 3 presents a simplified structure of the DRAM precharge circuit [54]. In a DRAM module, each DRAM cell is connected to a bitline through an access transistor and each bitline has a corresponding that provides the complementary data (see Fig. 3). Each bitline and pair contain a sense-amplifier and an equalization circuit. At the precharge state, the transistor 1 and 2 of the equalization circuit create a conducting path with a voltage source . On the other hand, the transistor 3 of the equalization circuit creates a conducting path between bitilne and . With the proper precharge time, the transistor 1 and 2 get enough time to precharge the bitline pair to , and the transistor 3 further ensures the equalization of voltage on the bitline pair. After turning ON the access transistor, the bitline voltage is perturbed by the stored charge in the capacitor. Then the perturbed voltage is sensed and amplified with the sense-amplifier. However, at the reduced precharge time, the transistor 1 and transistor 2 might not get enough time to precharge the bitline pair equally to . Therefore, the bitline and the might deviate from VDD/2. The variations on RC path delay and the capacitance of the bitline

follow the Gaussian distribution

[55, 56, 57], and two different DRAM cells of same physical length may have different s.

In addition to this, the process variation also might introduce slight variation on the charge storage capacity of the the DRAM cells. Hence, during the READ operation, the intensity of the voltage perturbation on a bitline might vary from one memory cell to another memory cell [58]. As a result, these DRAM cells may behave differently at the reduced precharge time [38]. In addition, different vendors may follow different kind of configurations (e.g., open bitline array structure, folded bitline array structure etc. [54]), which may lead to different faulty outputs at the reduced .

Fig. 3: Simplified structure of precharge circuit.

Note that the minimum value of is required to deactivate the previously activated row to avoid correlation among the outputs and the contents of the previously activated row. The minimum value of is determined empirically, and it may vary from module to module (discussed in Section IV-B).

Iii-B Characterization

We characterize the DRAM cells to understand the data dependency, spatial correlation, etc. in order to obtain robust PUF signatures. The characterization phase is conducted by observing the outputs with different types of input patterns (e.g., all 1’s, all 0’s or checkerboard pattern). The term ‘input value’ or ‘input pattern’ is used for the pattern that is written in the DRAM memory module with standard timing parameters. On the other hand, the ‘output pattern’ refers to the output that is read back at the reduced . A particular input pattern is applied several times (more on Section IV) to study the temporal variation (i.e., measurement variation). Based on the data correctness (or incorrect/faulty behavior), we divide the DRAM cells into two major categories:

  • Non-faulty Cells: These memory cells do not show any errors at the reduced and retain correct data regardless of the input data pattern.

  • Incorrect/Corrupted/Faulty Cells: These memory cells fail to output the original data (i.e., the input pattern and output pattern are different). The errors might be independent or dependent on the input data.

Based on the temporal variations, again, we categorize the incorrect/faulty cells into the following types:

  • Noisy Cells: Error pattern varies from measurement to measurement because of internal/external noise for these types of cells. Some of these cells can be useful to generate random number [53]. Some of these cells can be used to create PUF but might require a large ECC [59].

  • Robust/Measurement-invariant Cells: These cells do not show any temporal variation, i.e., cell outputs are independent of measurements. These cells are tolerant to internal and external noise and ideal for PUF.

In addition, the outputs at the reduced might depend on the memory cell contents (i.e., written patterns) due to the coupling effect of neighborhood cells [60, 52]. Based on the data dependency, we categorize the DRAM cells into following types:

  • Pattern Independent Cells: These types of cells exhibit the same output (at the reduced ) regardless the input patterns. The experimental results show that (details in Section IV), most of the DRAM cells from the major vendors are pattern independent. In this paper, we have only focused on the ‘pattern independent’ cells for PUF implementation.

  • Pattern Dependent Cells: The output patterns for these cells are different for different input patterns. Therefore, these cells can be the ideal candidates for the PUF that possesses enhanced challenge-response pair [61, 62, 63].

Iii-C Cell Selection Algorithm

In this paper, we only focus on the pattern independent cells. The experimental results show that some of the pattern independent cells are strong ‘1’ and some of them are strong ‘0’. Besides the reproducibility, it is important that the generated key is random and unique as well. Entropy is used to measure the randomness (i.e., the unpredictability) of a bitstream [59, 64]

. A binary string of randomly distributed 0’s and 1’s with equal probability possess high entropy

[59, 64, 65]. Not all cells can be used to generate PUF because some DRAM cells create deterministic outputs. We scan each row to find the most suitable cells for generating robust and random keys. We observe that the generated outputs using all pattern independent bits of every word (a word is 64 bits wide) suffer from poor entropy. As a part of the entropy test, we count the ratio between the occurrence of 1’s and the occurrence of 0’s. Our objective is to generate a key that has an equal number of 1’s and 0’s. The raw outputs show that there is a considerable imbalance between the number of 0’s and 1’s if we count each failed bit from all words. Therefore, all bits of every word are not suitable for key generation. It is observed that some specific bits of every word of a row produce a predictable outcome. For example, for a particular memory bank, the first bit of every word of a specific row is always read as ‘0’ at a reduced . The binary string () formed with the first bits of the words cannot be used to generate keys since the Hamming weight111The Hamming weight is defined as the total number 1’s (or 0’s) in a bitstream. of the is 0%. The explanation of this phenomenon as follows: a 64-bit DRAM module is analogous to a combination of 64 2-D memory arrays (distributed into multiple DRAM chips), and each memory array contributes to every word by providing one bit. For example, the memory array is responsible for the bit of the word. The impact of reduced may vary among memory arrays. In our proposed bit selection algorithm, we use an important metric: Hamming weight. A 50% of Hamming weight, which is ideal for a key, means that the binary string has an equal number of 1’s and 0’s. Similar to , we create a binary string with the second bit of each word in a row. Similarly, the binary string generated from the bit of each word is . The bit of the word is considered as the eligible bit if it produces a random binary string with a Hamming weight.

To get the most suitable cells for robust PUF, we propose an algorithm (Algorithm 1) for selecting the qualified memory cells and their locations. In practice, not all binary strings in experiences a 50% of Hamming weight. Therefore, we choose only those binary strings that fall into a range of allowable Hamming weight ( to ). All eligible bits (of words) from a row can be defined as Eq. (1). Table I shows a simplified explanation of selecting eligible bits, where we have presented all memory cells from an imaginary row that has 4-bit ( to ) wide 16 words ( to ). We have produced the first string by only taking the first bit from each word, by only taking the second bit from each word and so on. The rightmost column of Table I presents the Hamming weight () of each string. For better randomness, the Hamming weight of each string should be 50% (8 in this case). However, the silicon results show that it is not always achievable. Therefore, we have to choose a lower limit () and an upper limit () of Hamming weight. Let’s assume, the chosen values of and are 5 and 11, respectively. As a result, only cells under the and can be used for PUF operation (as Hamming weight of and are between 5 and 11, see Table I). So, according to the Eq. (1), the set of eligible bits is .

If the row consists of n words, then we can create a binary string from each word by only considering the qualified bits (i.e., the cells that satisfy Eq. (1)). For example, if we consider the word from row , then, is a binary string by taking bits which are the elements of . So, all allowable data bits from the can be presented as the Eq. (2). Here, is a single dimensional binary string containing all eligible data bits from . According to the Table I, and Eq. (2), .

(1)
(2)
1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 15
1 0 1 1 0 1 0 0 0 1 0 0 1 0 1 0 7
0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 2
1 0 0 0 0 1 1 0 1 1 0 1 1 1 0 1 9
TABLE I: Selecting appropriate cells with cell selection algorithm.

Input:
    mem_data: A matrix, containing pattern independent data. An element of mem_data can be empty (if the corresponding memory cell is not pattern independent) or ‘0’ or ‘1’.
    & : Minimum and maximum allowable Hamming weight as described in sec III-C.
    Output:
    : 1D array, contains the list of qualified rows which holds qualified bits for PUF generation
    : 2D array, row is associated with the row of . Each row of contains all qualified bits from each word of the corresponding row.

1:   //
2:   //
3:  
4:  
5:  
6:  for  to  do
7:     for  to  do
8:        
9:        
10:        for  to  do
11:           
12:           if  then
13:              
14:              
15:           end if
16:        end for
17:        
18:        if  &&  then
19:           
20:           
21:           
22:        end if
23:     end for
24:     if  then
25:        
26:        
27:        
28:        
29:     end if
30:  end for
Algorithm 1 Selecting qualified memory cells.

However, the length of the key can be larger than the number of qualified memory cells in a binary string . In this case, we will have to use more than one binary string from the multiple rows. Algorithm 1 is designed to select the qualified bits (i.e., the cells that satisfy Eq. (1)) from each row. From now on to the rest of our discussion, the bit of the 64-bit data word, accessed from the location (r,c), will be noted as (r,c,b) where, r is the row number (or page number), and c is the column number ( word of the row r). In Algorithm 1, , , and are the total number of rows, total number of columns, and the word width respectively (constant for a specific memory module). In our experiment, we have used 1GB memory modules, where, = 16384, = 1024, and = 64).

In the proposed Algorithm 1, a one-dimensional array and a two-dimensional array together hold the memory locations of the qualified DRAM cells. The holds all eligible row (or page) addresses and holds corresponding qualified bit number of the row. For example, represents that , , , and rows (or pages) are marked as the qualified rows (see Fig. 4). (on right side) of the fig. 4 represents corresponding locations of the eligible bits. For example, for , the . i.e. , and bit of all words from row 1 can be used to generate key.

Fig. 4: Qualified row position and corresponding bit position in words.

Iii-D Registration

In the registration phase, we generate a golden data set (i.e. challenge-response data set) using Algorithm 2, which can be used to generate robust signatures. We assume that the golden data set is created and stored in a trusted environment. In the Algorithm 2, we use qualified memory cells that are obtained using Algorithm 1. In Algorithm 2, the goldenDataLoc holds the logical locations of eligible memory cells and the goldenData saves the outputs that are accessed from the corresponding locations at the reduced . The goldenDataLoc, goldenData, and the reduced value of will be used as the golden data set for future authentication.

Input:
    mem_data: A matrix, containing pattern independent data. An element of mem_data can be empty (if the corresponding memory cell is not pattern independent) or ‘0’ or ‘1’ .
    & : generated from algorithm 1.
    Output:
    goldenDataLoc: A boolean matrix of size . goldenDataLoc(r,c,b) is true if corresponding memory cell qualified for the PUF application
    goldenData: Matrix of size , contains pattern independent output of those memory cells that are marked as true in goldenDataLoc matrix.

1:  
2:  
3:  for  to  do
4:     for  to  do
5:        for  to  do
6:           
7:           if  then
8:              
9:              ;
10:           end if
11:        end for
12:     end for
13:  end for
Algorithm 2 Generating golden data.

Iv Result and Analysis

Our results are based on experiments conducted with six memory banks from two commercial DDR3 memory modules of two major memory vendors222vendor A: Micron, vendor B: Samsung (namely A and B). We used SoftMC (Soft Memory Controller [51]) along with the Xilinx ML605 Evaluation Kit which is embedded with Virtex-6 FPGA. SoftMC uses Riffa [66] framework to establish communication between a host PC and the evaluation board through x8 PCIe bus. To check the design reliability against voltage variation, we used a USB interface adapter evaluation module [67] for controlling the voltage of the memory module very precisely.

The experiment was performed in two steps. First, an 8-bit pattern was written at the regular timing parameter and then read it back at the reduced timing parameter. The reading operation was done in a single row cycle, i.e., we activated one wordline at a time and then read all bitlines with consecutive burst. Here, each data burst was able to capture the data from successive 8 bitlines. This whole process was done at the nominal operating voltage and room temperature (i.e., 25C and 1.5V for all modules). To obtain and analyze the error pattern, we first checked the Hamming Distance between the written pattern (input pattern) and the pattern that was read out (output pattern) with the reduced timing parameter. Then, failed bits were analyzed for additional information (e.g., spatial distribution, pattern dependency, etc.). Four sets of 8-bit input patterns (0xFF, 0xAA, 0x55, 0x00) were used to characterize the DRAM cells. For each set of the input pattern, we repeated our experiment five times (hence, produced 20 sets of data) to study the temporal variation. Independent analysis is done by choosing random memory banks (four from vendor A and two from B; each consists 128MB memory cells).

We conducted our experiment on DRAM memory module by changing the activation time (), restoration time (), and precharge time ().

Iv-a Reduced Latency: Activation Time vs. Precharge Time

We read a whole row in a single row-cycle to evaluate the error patterns generated at the reduced . Two 32-byte (double-data rate) memory chunks were read with each burst (with 8-bit burst length, i.e., eight words can be accessed at a time while each word corresponds to 64-bit data). From now on to rest of our discussion, we will use the notation to present the reduced timing parameter , where is the reduced value of the timing parameter in nanosecond. At a reduced activation time (e.g., at ), failed bits were only observed at the first accessed cache line (i.e., in the first 64-byte data) and therefore it needs several read cycles. All memory banks from our selected manufacturers exhibit similar characteristics. Such behavior is observed because the target wordline is fully activated before accessing the second content of the cache line (see appendix A). Note that [37] and [20] also presented similar observation.

On the other hand, the experimental results show that enough reduction in creates errors uniformly across the whole word. In addition, it requires only a single row-cycle. Fig. 5 shows that the percentage of failed bits in two random banks from two vendors for different input patterns at the reduced . We observed the first error(s) at . We reduced the to , , and to observe the behavior of failed bits. The results show that the total number of failed bits are at for vendor A. The rate of the failed bits increases at a much faster rate as we decrease the further. For vendor B, the total number of failed bits are at but increase significantly at .

We also discover that DRAMs from different manufacturers react differently for a given input pattern. Fig. 5 (left) shows that most of the cells produce faulty outputs with the input pattern that has all 1’s, but most of the bits are faultless when the input pattern is all 0’s. On the other hand, we observe in Fig. 5 (right) that most of the bits are failed when the input pattern is all 0’s but most of the bits are seemed to be correct when the input pattern is all 1’s. In the left figure, the number of failed bits for input pattern 0xFF is higher because the pattern independent ‘0’ (output always ‘0’ regardless of the input pattern) cells are dominant for this module. In the right figure, the number of failed bits for the input pattern 0x00 is higher as the pattern independent ‘1’ (output always ‘1’ regardless of the input pattern) cell is dominant for this module.

We can conclude from the results that (i) reducing precharge time is superior to the reducing activation time for generating quality signatures in a single row-cycle, and (ii) the erroneous behavior depends on the input pattern, the DRAM architecture, process variations, the amount of reduction in , etc.

Fig. 5: vs. % of failed bits- (i) from vendor A and (ii) from vendor B (the horizontal axis shown in logarithmic scale).

Iv-B Cell Characterization

The silicon results show that a different number of faulty outputs are generated at different reduced s. In addition, we must ensure that the reduced is also capable of closing the previously activated row (as discussed in Sec. III-A). The chosen value of is empirical (denoted as for clarification), which used to characterized DRAM cells, and to evaluate the PUF responses. Note that the might vary from module to module. Also, the cell characterization is done at nominal voltage and room temperature (i.e., 25C and 1.5V for all modules).

i
ii
Fig. 6: Spatial location of pattern independent cells (at 2.5ns), (i) bit ‘0’, (ii) bit ‘1’.
Fig. 7: Pattern dependent cells (at 2.5ns), (i) failed to ‘0’, and (ii) failed to ‘1’.
  1. Pattern Independent: Memory cells from this category always flip to a fixed value (either to ‘0’ or to ‘1’) regardless of the input pattern (i.e., originally written value to the DRAM cells). Fig. 6, a 3D histogram plot to describe the spatial locality of the pattern independent cells, shows the spatial locality (along 16384 rows and 1024 columns) of output ‘0’ (left) and ‘1’ (right) across a random DRAM bank from vendor A

    . The results show that pattern independent 1’s and 0’s are uniformly distributed. All memory banks from vendor

    B also show similar spatial distribution (not shown in the figure). Therefore, the reduction of is a better candidate to generate device signatures.

  2. Pattern Dependent: The outputs of these type of cells depend on the input patterns written into the DRAM cells. The outputs are affected by the cumulative voltage of partially precharged bitline, stored values, the coupling effect of neighbor cells, etc. We consider a memory cell as a pattern dependent if it provides different outputs for different inputs. These cells are also measurement invariant for at least one input pattern. Fig. 7 shows the DRAM cells that are dependent on input patterns 0xAA. Pattern dependent cells can be used for PUF with an enhanced challenge-response pair (CRP) space. Besides, spatial locality along both row and column are visible in Fig. 7. The darker line in the Fig. 7 (both horizontal and vertical) represents rows and columns with the pattern dependent cells. A darker line signifies that it has more pattern dependent cells. The spatial locality might reveal the physical to logical address mapping [46]. Fig. 7 is captured from a random bank of vendor B, a similar type of spatial locality was found in all memory banks from all vendors. The third column (from right) in Table II shows the percentage of pattern dependent cells from each bank.

  3. Noisy Cells: With partially precharged bitlines, outputs of these cells vary from measurement to measurement. Hence, these noisy cells are not suitable to be used as PUF. The second column (from right) in Table II represents the percentage of noisy cells from each bank. In Fig. 8, we demonstrate the distribution of noisy memory cells for a random bank from the vendor B. The results show that the noisy cells are not entirely random (in this case, most of the cells are biased to ‘1’). Similar characteristics were found in other memory banks from both vendors (i.e. most of the noisy cells are biased to either ‘0’ or ‘1’). Large ECC might be required if these cells are used as a PUF [33, 34] because of their poor reproducibility. We also found that the spatial locality of noisy cells from one bank to another is random. Therefore, a proper subset of such cells can also be used to generate a random number [53].

Fig. 8: Noisy cell characteristics. Most of the cells are biased to ‘1’.
Fig. 9: Cell distribution among bitlines.

The complete distribution of these three types of DRAM cells along the bitline is presented in Fig. 9 for a given bank of vendor A. In this figure, we presented only 128 bitlines of two consecutive 64-bit words, where each bitline consists of 16384 memory cells (since the total number of rows is 16384). The figure shows that all memory cells from and (where, ) bits of a word generate ‘0’ regardless of the input patterns. One of the possible reasons is that, for these bitlines, a large voltage difference with corresponding causes the sense-amplifier to deviate towards a specific logic level (either ‘0’ or ‘1’) (see Sec. III-A). Therefore, the generation of key from such memory cells reduces the overall entropy of the key. The proposed Algorithm 1 eliminates such memory cells and improves the entropy.

Table II summarizes the distribution of the cells of two different vendors (vendor A, and vendor B) at . The results show that more than 90% cells from each bank of vendor A (except the bank d) are pattern independent while it is for the vendor B. However, we found an exception for the bank d of vendor A because the previously activated row fails to close at . To avoid this issue, we characterized memory cells of this bank with . For this particular memory bank, we found that the independent pattern cells are fewer in numbers than the other memory banks. We also found that the number of noisy cells increases by a significant margin than the other memory banks.

Vendor
Memory
Bank ID
(ns)
Pattern
Independent
Pattern
Dependent
(%)
Noisy
(%)
Vaid
bits
(%)
0 (%) 1 (%)
A a 2.5 85.825 12.631 0.006 1.537 0.000
b 2.5 72.663 18.790 0.135 8.413 0.000
c 2.5 72.793 17.202 0.133 9.872 0.000
d 5 7.820 10.560 0.310 81.030 0.290
B a 2.5 8.226 63.674 0.519 27.580 0.001
b 2.5 6.339 53.530 0.113 40.017 0.001
TABLE II: Distribution of memory cells at the partial precharge state.

Iv-C PreLatPUF Evaluation:

We use diffuseness, uniqueness, and reliability, three major PUF performance metrics [23, 24, 68], to quantify and compare (with other DPUFs) the quality of the proposed PreLatPUF. The proposed Algorithm 1, presented in Section III, is used to obtain the logical locations of the qualified memory cells. In this algorithm, we used = 0.25 and = 0.75 as the input parameters. Ideally, the Hamming distance should be 0.5. A Hamming distance of 0 represents that the PUF is not unique. We completed the registration (i.e., creating the golden data set) using the proposed Algorithm 2. We generated at least one 1024-bit key from each qualified row (or page). However, it is possible to generate multiple keys from each row since the number of qualified memory cells was more than 1024. To keep it simple, we obtained only one key from each row to test the PUF performance. The key generated from the golden data set is used as the reference key. We refer the corresponding address for generating a reference key as the key address. To evaluate the performance of our proposed PreLatPUF, we created four sets of test data in four different operating conditions (will be discussed in IV-C3). We measured the output for the four different input patterns (0xFF, 0xAA, 0x55, and 0x00) and took the average. The outputs from different operating conditions were compared with the reference key to ensure the robustness of our proposed key generation methodology. We present the major performance metrics below.

Iv-C1 Diffuseness:

PUF device should be able to generate distinguishable responses with different challenges. For PreLatPUF, we consider the address as the challenge and corresponding cell content at reduced as the response. To check the diffuseness, we measured inter Hamming Distance (inter HD) among the reference keys from each bank (i.e., intra-bank but inter-reference key). A 50% of inter HD signifies that a unique key can be generated from each row (i.e., address). The average Hamming weight of 50% also represents that the keys are random. Table III shows the average Hamming weight of each key and average Hamming distance among the different keys generated from each bank. Though the average HD and Hamming weight for a few banks deviate from 50%, the silicon results from all rows of each bank show that the keys generated from a distant row of the same memory bank are not repetitive.

Vendor
Memory
Bank ID
#Qualified
row (%)
Average
Hamming
Distance (%)
Average
Hamming
weight (%)
A a 100.00 48.87 54.23
b 92.31 49.35 53.29
c 92.30 49.24 49.24
d 67.82 28.97 53.98
B a 74.84 42.28 68.19
b 63.99 38.06 70.31
TABLE III: Average Hamming weight and average Hamming distance among the keys generated from each bank.

Iv-C2 Uniqueness:

Responses from different devices should be unique. This metric tells that the PUF 1 is different from the PUF 2. To quantify the uniqueness, we measured the inter Hamming Distance (inter HD) of the key from different memory banks, i.e. the HD between the two keys of two banks generated from each key address. We checked the inter HD for the following combinations by taking account following scenarios:

  • A different pair of banks that are from the same module.

  • A different pair of banks that are from different modules but from the same vendor.

  • A different pair of banks that are from two different vendors.

Fig. 10 shows the inter HD at the worst case (i.e., the largest deviation from 50% inter HD) scenario for both vendors. In the worst scenario, for the vendor A, the average, minimum, and maximum inter HD are 45.78%, 37.05%, and 52.5% respectively. For the vendor B, the mean, minimum, and maximum inter HD are 51.91%, 40.92%, and 72.23%, respectively. Therefore, we can conclude that the key generated from the proposed PreLatPUF is unique.

i
ii
Fig. 10: Inter Hamming distance (at 2.5ns) for the worst case from (i) vendor A and (ii) vendor B.

Iv-C3 Reliability:

Same response (i.e., PUF output) should be generated to its entire lifetime at any operating condition. The reproducibility at different operating conditions is presented in Fig. 11. This figure presents only the worst results from each vendor (i.e., memory bank with the most significant deviation from 0% intra HD). To examine the robustness of the proposed PreLatPUF at extreme operating conditions, we collected results at four different operating conditions: (i) nominal voltage and room temperature (NVRT), (ii) low-voltage and room temperature (LVRT), (iii) high-voltage and room temperature (HVRT), and (iv) nominal voltage and high temperature (NVHT). The results show that the memory module from vendor A is less robust than the vendor B at the reduced operating voltage. For vendor A, we can only change the operating voltage by 20mv without causing an excessive error on PUF response. On the other hand, the DRAM module from vendor B can tolerate 55mv change in operating voltage. Table IV presents the intra HD at different operating conditions. The column 4 of Table IV represents change in operating voltage from the nominal (1.5V) and the column 5 represents change in temperature from room temperature (25C). The results show that all memory banks from both vendors are robust against temperature variations.

The rightmost column of Table IV shows that the robustness of the PUF output improves as we increase the operating voltage for those banks that possess dominant pattern independent ‘0’ cells at the reduced . An increase in the voltage makes these cells more immune to noise. On the other hand, the banks with dominant pattern independent ‘1’ cells show the opposite behavior (i.e., the robustness of the PUF output increases as we reduce the operating voltage). A decrease in the voltage makes these cells more immune to noise. However, the bank d from the vendor A produces a slight robust output with the change in the voltage (increased or decreased) because this bank produces noisier cells than other banks.

a
b
Fig. 11: Intra HD for the worst case from- (a) vendor A (at 2.5ns), (b) vendor B (at 2.5ns) with (i) NVRT, (ii) LVRT, (iii) HVRT, and (iv) NVHT.
Vendor
Memory
Bank ID
Operating
Codition
(mV)
( C)
Intra HD
Key with
Intra HD
1% 30%
A a NVRT 0 0 0.48 0.07 0.00 0.00
LVRT 20 0 0.05 0.08 0.00 0.00
HVRT 55 0 0.07 0.09 0.00 0.00
NVHT 0 20 0.06 0.09 0.00 0.00
b NVRT 0 0 0.47 3.17 1.57 0.00
LVRT 20 0 2.94 10.55 7.81 2.91
HVRT 55 0 0.09 0.10 0.00 0.00
NVHT 0 20 0.67 3.84 2.34 0.01
c NVRT 0 0 0.49 3.34 1.54 0.03
LVRT 20 0 7.77 12.38 27.95 0.46
HVRT 55 0 0.09 0.12 0.01 0.00
NVHT 0 20 0.52 3.38 1.54 0.02
d NVRT 0 0 1.54 9.02 4.37 2.74
LVRT 20 0 1.69 8.87 8.87 2.66
HVRT 55 0 1.47 8.73 4.29 2.64
NVHT 0 20 4.72 8.36 9.35 2.62
B a NVRT 0 0 1.97 10.25 3.37 3.25
LVRT 55 0 2.11 10.19 3.36 3.17
HVRT 55 0 1.92 10.02 3.53 3.17
NVHT 0 20 2.13 10.23 3.76 3.26
b NVRT 0 0 1.93 10.55 3.24 2.62
LVRT 55 0 2.22 10.30 5.68 2.52
HVRT 55 0 1.95 10.35 3.18 2.53
NVHT 0 20 1.99 10.55 3.39 2.74
TABLE IV: Intra HD at different operating conditions.

Iv-D Performance Comparison

Iv-D1 Evaluation Time:

We use two approaches to quantify and compare (with other DPUFs) the evaluation time. Eq. (3) (the first approach) and Eq. (4) (the second approach) are used to compare the time overhead required for the Key generation. The first approach measures the time required to receive the response after sending the challenge from the host. The second approach, on the other hand, is intended to measure the required time to produce the key in the evaluation board ()333The current implementation does not support a separate measurement of and ..

(3)
(4)

where,
evaluation with the first approach,
evaluation with the second approach,
time required to send the command to the evaluation board from the host computer,
time required to execute the command in the evaluation board,
time required to send back the read data to the host computer from the evaluation board, and
time required to store the read data to a storage device.

With the first approach, the worst average time is 1.59ms (worst among all banks, see Table V), which is 74us with the second approach. However, the evaluation time can be measured more accurately by inserting a local counter inside the FPGA. Note that we did not include the characterization phase in the evaluation time (see Eq. (3) and Eq. (4)) since the cell characterization is performed once during the registration. We, also, did not include the time to write a specific data pattern because we did not consider pattern dependent cells in this paper. However, the writing time needs to be added if pattern dependent cells are considered to generate a large CRP space. The average system-level evaluation time of reduced -based DPUF is 88.2ms [20], which is still slower (considering the worst evaluation time with the second approach) than our proposed method. On the other hand, the retention-based DPUF takes order of minutes to generate a device signature with enough retention failures [13].

Vendor
Memory
Bank ID
#Required
Burst (mean)
Mean
Evaluation time (ms)
A a 9.00 0.51
b 6.43 0.41
c 7.19 0.47
d 16.10 0.93
B a 28.15 1.59
b 24.18 1.34
TABLE V: Average PreLatPUF evaluation time.

Iv-D2 System Level Disruption:

For most of the DRAM modules, the granularity of refreshing the DRAM contents is rank. Therefore, we need to increase the refresh interval for entire memory rank for evaluating retention-based DPUF. As a result, it causes random data corruption over the whole rank. Also, due to the long evaluation time of the retention-based DPUF, the particular DRAM rank becomes unavailable for other applications for a long time. In the proposed PreLatPUF, the reduced

only affects the cells that are being accessed. We also checked the interference to the neighborhood rows of the target row that is being accessed for key generation. To do so, we arbitrarily selected consecutive 1000 rows from each memory bank. Then, we read the data from all odd-numbered rows at the

and investigated the impact on the memory cells of the even-numbered row with nominal . Our results show that there is no data corruption in the adjacent rows.

However, though the latency-based DPUF of [20] at the reduced is fast, this type of DPUF evaluation needs a filtering mechanism upon each access, which causes both computational and hardware overheads. In our proposed mechanism, we register the eligible PUF cells once in its entire lifetime (see Section III-C). Once the suitable cells for PUF operation are determined, the evaluation of our proposed PUF is straight-forward (i.e. request the response by sending an address and then compare only the eligible cells’ content with the golden data). The proposed PUF evaluation has the least evaluation time. Therefore, the proposed PreLatPUF can be used in run-time, which is impossible in many existing DPUFs [13, 14, 2].

Iv-D3 Robustness:

The robustness (i.e., the effect of different operating conditions and environmental variations) of the proposed PreLatPUF is shown in Table IV. The impact of operating voltage and temperature variations in DPUFs have been explored before [20, 13]. In this paper, we compared the robustness between the proposed PreLatPUF and retention-based DPUF at different operating conditions. To accumulate the retention-based failures, we chose a random memory segment with 1000 rows from each bank. At first, we stored logic ‘1’ to all memory cells under the segment, and then the refresh interval was prolonged until we obtained at least 2% failures at the NVRT. For a specific bank, same refresh interval was maintained for all other operating conditions. For the proposed PreLatPUF, we measured the data errors with four input patterns (0xFF, 0xAA, 0x55, and 0x00) at for the same 1000 rows. We used the Jaccard Index to compare the robustness of our proposed PreLatPUF with the retention-based PUF. For the retention-based DPUF, the PUF characteristics are evaluated from the location of the failed bits. For example, in our case, retention-based failed bits are always failed from logic ‘1’ to logic ‘0’. But the location of the failed bits differs from one device to another. For the two sets of the measurements (, ), the Jaccard Index is measured as , where is the total matched failed bits and is the total failed bits from two measurements and [20, 18]. For better reproducibility, the intra Jaccard Index should be 1. Table VI shows the comparison between PreLatPUF and retention-based PUF. The results show that the proposed PreLatPUF is more robust than the retention-based DPUF. The retention-based PUF is more susceptible to the temperature variation compared to the PreLatPUF. This is because the retention-based failed bit is mostly emphasized by the charge leakage rate of DRAM cells, which has a strong exponential dependence on the temperature [13, 14, 15, 40, 41, 42]. On the other hand, the change in is very negligible as temperature changes. The changes only (3%) as temperature changes from 27C to 85C [69].

For the -based DPUF, the results shown in [20] suggest that it can tolerate only a small change in temperature (e.g., 5C). On the other hand, for the PreLatPUF, we increased the temperature by 20 and found a negligible change in the signature. The results presented in [69] also suggest that the temperature dependency of is stronger than the temperature dependency of .

Vendor
Memory
Bank ID
, Jaccard Index
Proposed
PreLatPUF
Retention
Based DPUF
A a NVRT, LVRT 0.997 0.926
NVRT, HVRT 0.997 0.968
NVRT, NVHT 0.997 0.349
b NVRT, LVRT 0.980 0.902
NVRT, HVRT 0.997 0.970
NVRT, NVHT 0.986 0.356
c NVRT, LVRT 0.929 0.930
NVRT, HVRT 0.997 0.960
NVRT, NVHT 0.985 0.355
d NVRT, LVRT 0.994 0.941
NVRT, HVRT 0.983 0.968
NVRT, NVHT 0.996 0.279
B a NVRT, LVRT 0.968 0.962
NVRT, HVRT 0.961 0.847
NVRT, NVHT 0.968 0.421
b NVRT, LVRT 0.965 0.952
NVRT, HVRT 0.968 0.950
NVRT, NVHT 0.971 0.457
TABLE VI: Jaccard Index at different operating conditions for the PreLatPUF and the retention-based DPUF.

V Conclusion

In this paper, we proposed a DRAM-based PUF that exploits the precharge-latency variations in DRAM cells. We characterized DRAM cells’ errors at the reduce precharge-latency to find the most suitable DRAM cells in order to produce random, unique, and reliable device signatures. The silicon results from commercially available DRAM modules show that the proposed device signature scheme and algorithm can generate robust PUF outputs at a much faster rate.

Appendix A Impact of Reduced Activation time

In figure 12, red spots represent the failed bits at the reduced activation time () for a DRAM bank. The results show that the failed bits are only observed at the first accessed cache line (i.e., just in the first column). A similar observation was concluded in [20] and [37].

Fig. 12: Failed bits at with input pattern- (i) 0x00, (ii) 0x55, (iii) 0xAA, and (iv) 0xFF.

Acknowledgment

We would like to thank Hasan Hassan (ETH Zürich) & CMU for the SoftMC software.

References

  • [1] C. Herder, M. D. Yu, F. Koushanfar and S. Devadas, “Physical Unclonable Functions and Applications: A Tutorial,” in Proceedings of the IEEE, vol. 102, no. 8, pp. 11261141, Aug. 2014.
  • [2] C. Keller, F. Grkaynak, H. Kaeslin and N. Felber, “Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers,” 2014 IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne VIC, 2014, pp. 27402743.
  • [3] U. Guin et al., “Counterfeit Integrated Circuits: A Rising Threat in the Global Semiconductor Supply Chain,” Proceedings of the IEEE, vol. 102, no. 8, pp. 12071228, 2014.
  • [4] R. Chakraborty and S. Bhunia, “HARPOON: An Obfuscation-Based SoC Design Methodology for Hardware Protection,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 10, pp. 14931502, 2009.
  • [5] A. Hosey et al., “Advanced Analysis of Cell Stability for Reliable SRAM PUFs,” 2014 IEEE 23rd Asian Test Symposium, 2014.
  • [6] M. T. Rahman et al., “CSST: Preventing distribution of unlicensed and rejected ICs by untrusted foundry and assembly,” In 2014 IEEE International symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFT), pp. 46-51. IEEE, 2014.
  • [7] Contreras, Gustavo K., Md Tauhidur Rahman, and Mohammad Tehranipoor,“Secure split-test for preventing IC piracy by untrusted foundry and assembly,” In 2013 IEEE International symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS), pp. 196-203. IEEE, 2013.
  • [8] A. Basak, F. Zhang, and S. Bhunia, “PiRA: IC authentication utilizing intrinsic variations in pin resistance,” 2015 IEEE International Test Conference (ITC), 2015.
  • [9] A. Hennessy, Y. Zheng, and S. Bhunia, “JTAG-based robust PCB authentication for protection against counterfeiting attacks,” 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 2016.
  • [10] U. Chatterjee, R. S. Chakraborty, and D. Mukhopadhyay, ”A PUF-based secure communication protocol for IoT,” ACM Transactions on Embedded Computing Systems (TECS) 16, no. 3 (2017): 67.
  • [11] B. Halak, M. Zwolinski, and M. S. Mispan, “Overview of PUF-based hardware security solutions for the Internet of Things,” In 2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1-4. IEEE, 2016.
  • [12] U. Chatterjee et al., ”Building PUF based Authentication and Key Exchange Protocol for IoT without Explicit CRPs in Verifier Database,” IEEE Transactions on Dependable and Secure Computing (2018).
  • [13] S. Sutar et al., “D-PUF: An Intrinsically Reconfigurable DRAM PUF for Device Authentication and Random Number Generation,” ACM Trans. Embed. Comput. Syst. 17, 1, Article 17 (December 2017).
  • [14] W. Xiong et al., “Run-Time Accessible DRAM PUFs in Commodity Devices,” Lecture Notes in Computer Science Cryptographic Hardware and Embedded Systems CHES 2016, pp. 432453, 2016.
  • [15] C. Keller, F. Gurkaynak, H. Kaeslin, and N. Felber, “Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers,” 2014 IEEE International Symposium on Circuits and Systems (ISCAS), 2014.
  • [16] F. Tehranipoor, N. Karimian, K. Xiao, and J. Chandy, “DRAM based Intrinsic Physical Unclonable Functions for System Level Security,” In Proceedings of the 25th edition on Great Lakes Symposium on VLSI (GLSVLSI ’15). ACM, New York, NY, USA, 15-20.
  • [17] F. Tehranipoor, N. Karimian, W. Yan and J. A. Chandy, “DRAM-Based Intrinsic Physically Unclonable Functions for System-Level Security and Authentication,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 3, pp. 10851097, March 2017.
  • [18] A. Schaller et al., “Intrinsic Rowhammer PUFs: Leveraging the Rowhammer effect for improved security,” 2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), 2017.
  • [19] N. Anagnostopoulos, T. Arul, Y. Fan, C. Hatzfeld, A. Schaller, W. Xiong, M. Jain, M. Saleem, J. Lotichius, S. Gabmeyer, J. Szefer, and S. Katzenbeisser, “Intrinsic Run-Time Row Hammer PUFs: Leveraging the Row Hammer Effect for Run-Time Cryptography and Improved Security,” Cryptography, vol. 2, no. 3, p. 13, 2018.
  • [20] J. S. Kim, M. Patel, H. Hassan, and O. Mutlu, “The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern DRAM Devices,” 24th International Symposium on High-Performance Computer Architecture (HPCA), Vienna, Austria, Feb. 2018.
  • [21] A. Mazady et al., “Memristor pufa security primitive: Theory and experiment,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 5, no. 2 (2015): 222229.
  • [22] K. Xiao et al., “Bit selection algorithm suitable for high-volume production of SRAM-PUF,” In 2014 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), pp. 101-106. IEEE, 2014.
  • [23] M. T. Rahman, D. Forte, J. Fahrny, and M. Tehranipoor, “ARO-PUF: an aging-resistant ring oscillator PUF design,” In Proceedings of the conference on Design, Automation & Test in Europe (DATE’14). European Design and Automation Association, 3001 Leuven, Belgium, Article 69 , 6 pages.
  • [24] M. T. Rahman, F. Rahman, D. Forte, and M. Tehranipoor, “An aging-resistant RO-PUF for reliable key generation,” IEEE Transactions on Emerging Topics in Computing 4, no. 3 (2015): 335-348.
  • [25] K. K. Chang et al., “Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms,” Proc. ACM Meas. Anal. Comput. Syst. 1, 1, Article 10 (June 2017).
  • [26] K. Chandrasekar et al., “Exploiting expendable process-margins in DRAMs for run-time performance optimization,” Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, 2014
  • [27] D. Lee et al., “Design-Induced Latency Variation in Modern DRAM Chips,” Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems - SIGMETRICS 17 Abstracts, 2017.
  • [28] F. Tehranipoor, N. Karimian, W. Yan, and J. A. Chandy, “Investigation of DRAM PUFs reliability under device accelerated aging effects,” 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017.
  • [29] K. Kim et al., “Study on off-state hot carrier degradation and recovery of NMOSFET in SWD circuits of DRAM,” 2016 IEEE International Integrated Reliability Workshop (IIRW), 2016.
  • [30] D. Ganta and L. Nazhandali, “Study of IC aging on ring oscillator physical unclonable functions,” Fifteenth International Symposium on Quality Electronic Design, 2014.
  • [31] M. Hasanuzzaman, S. K. Islam, and L. M. Tolbert, “Effects of temperature variation ( K) in MOSFET modeling in 6Hsilicon carbide,” Solid-State Electronics, vol. 48, no. 1, pp. 125132, 2004.
  • [32] Z. Guo, et al., “SCARe: An SRAM-Based Countermeasure Against IC Recycling,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 4, pp. 744755, 2018.
  • [33] M. Hiller, D. Merli, F. Stumpf, and G. Sigl, “Complementary IBS: Application specific error correction for PUFs,” 2012 IEEE International Symposium on Hardware-Oriented Security and Trust, 2012.
  • [34] S. Devadas and M. Yu, “Secure and Robust Error Correction for Physical Unclonable Functions,” IEEE Design & Test, pp. 11, 2015.
  • [35] J. Kim, M. Sullivan, S.-L. Gong, and M. Erez, “Frugal ECC: Efficient and Versatile Memory Error Protection through Fine-Grained Compression,” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC 15, 2015.
  • [36] Q. Deng et al., “Active Low-Power Modes for Main Memory with MemScale,” in IEEE Micro, vol. 32, no. 3, pp. 6069, May-June 2012. doi: 10.1109/MM.2012.21
  • [37] K. K. Chang et al., “Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization,” SIGMETRICS Perform. Eval. Rev. 44, 1 (June 2016), 323336.
  • [38] D. Lee et al., “Adaptive-latency DRAM: Optimizing DRAM timing for the common-case,” 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), 2015.
  • [39] JEDEEC, “DDR3 SDRAM Standard”, 2012.
  • [40] J. Liu et al., “An experimental study of data retention behavior in modern DRAM devices,” ACM SIGARCH Computer Architecture News, vol. 41, no. 3, p. 60, 2013.
  • [41] H. Hassan et al., “ChargeCache: Reducing DRAM latency by exploiting row access locality,” 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016.
  • [42] Y. Katayama, E. J. Stuckey, S. Morioka and Z. Wu, “Fault-tolerant refresh power reduction of DRAMs for quasi-nonvolatile data retention,” Defect and Fault Tolerance in VLSI Systems, 1999. DFT ’99. International Symposium on, Albuquerque, NM, 1999, pp. 311318.
  • [43] S. Govindavajhala and A. Appel, “Using memory errors to attack a virtual machine,” Proceedings 19th International Conference on Data Engineering (May 2003).
  • [44] X. Zhang, Y. Zhang, B. R. Childers, and J. Yang, “Restore truncation for performance improvement in future DRAM systems,” 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2016.
  • [45] J. Liu et al., “An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms,” In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA ’13). ACM, New York, NY, USA
  • [46] S. Khan, D. Lee and O. Mutlu, “PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM,” 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, 2016, pp. 239250.
  • [47] Y. Wang et al., “Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration,” 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018.
  • [48] M. Patel, J. S. Kim, and O. Mutlu, “The Reach Profiler (REAPER),” ACM SIGARCH Computer Architecture News, vol. 45, no. 2, pp. 255268, 2017.
  • [49] M. K. Qureshi et al., “AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems,” 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015.
  • [50] J. Kim, M. Patel, H. Hassan, and O. Mutlu, “Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines,” 2018 IEEE 36th International Conference on Computer Design (ICCD), 2018.
  • [51] H. Hassan et al., “SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies,” 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, 2017, pp. 241252.
  • [52] S. Khan et al., “Detecting and mitigating data-dependent DRAM failures by exploiting current memory content,” Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-50 17, 2017.
  • [53] B. M. S. B. Talukder et al., “Exploiting DRAM Latency Variations for Generating True Random Numbers,” In 2019 IEEE International Conference on Consumer Electronics (ICCE), pp. 16. IEEE, 2019.
  • [54] B. Jacob, S. W. Ng, and D. T. Wang, “Memory systems: cache, DRAM, disk,” Morgan Kaufmann, 2010.
  • [55] Y. Chen, A. B. Kahng, B. Liu, and W. Wang, “Crosstalk-aware Signal Probability-based Dynamic Statistical Timing Analysis,” Sixteenth International Symposium on Quality Electronic Design, 2015.
  • [56] S. R. Sarangi et al., “VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects,” IEEE Transactions on Semiconductor Manufacturing 21, no. 1 (2008): 313.
  • [57] W. Zhang et al., “An Efficient Method for Chip-Level Statistical Capacitance Extraction Considering Process Variations with Spatial Correlation,” 2008 Design, Automation and Test in Europe, 2008.
  • [58] W. Shin, J. Yang, J. Choi, and L.-S. Kim, “NUAT: A non-uniform access time memory controller,” 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.
  • [59] M. T. Rahman et al., “Systematic Correlation and Cell Neighborhood Analysis of SRAM PUF for Robust and Unique Key Generation,” Journal of Hardware and Systems Security 1.2 (2017): 137155.
  • [60] Y. Li et al., “DRAM Yield Analysis and Optimization by a Statistical Design Approach,” in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, no. 12, pp. 29062918, Dec. 2011.
  • [61] Q. Tang et al., “A DRAM based physical unclonable function capable of generating > Challenge Response Pairs per 1Kbit array for secure chip authentication,” 2017 IEEE Custom Integrated Circuits Conference (CICC), 2017.
  • [62] D. Ganta and L. Nazhandali, “Easy-to-build Arbiter Physical Unclonable Function with enhanced challenge/response set,” International Symposium on Quality Electronic Design (ISQED), 2013.
  • [63] A. Maiti, I. Kim, and P. Schaumont, “A Robust Physical Unclonable Function With Enhanced Challenge-Response Set,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 1, pp. 333345, 2012.
  • [64] M. T. Rahman et al., “TI-TRNG: Technology independent true random number generator,” Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC 14, 2014.
  • [65] C. E. Shannon, “Prediction and Entropy of Printed English,” The Bell System Technical Journal, Jan. 1951.
  • [66] M. Jacobsen, D. Richmond, M. Hogains, and R. Kastner, “RIFFA 2.1: A Reusable Integration Framework for FPGA Accelerators,” ACM Trans. Reconfigurable Technol. Syst. 8, 4, Article 22 (September 2015).
  • [67] “USB Interface Adapter EVM USB-TO-GPIO (ACTIVE),” [Online]. Available: http://www.ti.com/tool/usb-to-gpio. [Accessed: 03-May-2019].
  • [68] A. Maiti, V. Gunreddy, and P. Schaumont, “A Systematic Method to Evaluate and Compare the Performance of Physical Unclonable Functions,” Embedded Systems Design with FPGAs, pp. 245267, 2012.
  • [69] K. Chandrasekar et al.,“Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization,” Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014.