Optimal Checkpointing for Secure Intermittently-Powered IoT Devices

11/04/2017 ∙ by Zahra Ghodsi, et al. ∙ NYU college 0

Energy harvesting is a promising solution to power Internet of Things (IoT) devices. Due to the intermittent nature of these energy sources, one cannot guarantee forward progress of program execution. Prior work has advocated for checkpointing the intermediate state to off-chip non-volatile memory (NVM). Encrypting checkpoints addresses the security concern, but significantly increases the checkpointing overheads. In this paper, we propose a new online checkpointing policy that judiciously determines when to checkpoint so as to minimize application time to completion while guaranteeing security. Compared to state-of-the-art checkpointing schemes that do not account for the overheads of encrypted checkpoints we improve execution time up to 1.4x.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The Internet of Things (IoT) is envisioned as a network of devices that operate anytime from anywhere. IoT devices are expected to be small and guarantee perpetual and autonomous operation even in hard to reach places. Energy harvesting has been proposed as the source of energy for such IoT devices [1, 2]. Ambient energy can be harvested from many sources, including solar, wind, thermal, WiFi, radio frequency, biological and chemical [3, 4, 5]. The amount of energy harvested is highly time varying and the energy supply is intermittent. For this reason, processors with off-chip non-volatile main memory (and potentially even on-chip caches and register files) have been proposed [1, 6] to guarantee forward progress of program execution. In these approaches, the processor saves its intermediate results in non-volatile memory when the energy supply is too low for the processor to operate.

Security is another critical concern with IoT devices. An attacker with physical access to an IoT device might be able to read out potentially sensitive data from the non-volatile memory [7]. Prior work secures data in IoT devices using non-volatile main memory, i.e. by encrypting and decrypting every write to and read from the non-volatile memory [8, 9]. However, simply encrypting main memory is not sufficient. An attacker might capture an IoT device in the middle of program execution and recover a checkpoint of architectural state. In fact, prior work has shown that unencrypted intermediate program state is an even greater security threat [10]. Thus, we propose to encrypt the intermediate checkpoints of program state to guarantee confidentiality at all times during program execution.

(a)
(b)
Fig. 1: Runtime execution progress of the FFT program running on harvested energy with (a) no encryption and (b) PRINCE [11] as the block cipher. Vertical lines represent checkpoints and arrows show roll-backs due to energy failure. Encryption increases the average energy and latency of each checkpoint. Due to this overhead, the total number of roll-backs, time spent for each checkpoint and therefore the program execution time increases.
Fig. 2: Optimal checkpointing of secure IoT processors: System model with a finite-sized battery with capacity and random energy arrivals. At a checkpoint, the volatile IoT system state (Program Counter (PC), Register File (RF) and dirty cache lines) is encrypted and saved into NVM.

Encryption increases the energy and latency overhead of checkpoints, by and respectively for PRINCE [11] block cipher. Finding the optimum checkpoint placement policy when the energy source is intermittent is crucial, and becomes even more significant when checkpoints have to be encrypted for security. Naively replacing unencrypted checkpoints (Fig (a)a) with encrypted checkpoints (Fig (b)b) can significantly increase the number of roll-backs and total program execution time. This motivates the need for a smart checkpointing policy that places checkpoints judiciously. The checkpointing policy must account for multiple factors including the energy level of the battery, the stochastic behaviour of the harvested energy (which differ from one source to another), when the previous checkpoint was taken, and the expected time to program completion. Prior checkpointing policies account for some of these effects, for instance, periodic checkpointing [12] or checkpointing when the available energy in the battery falls below a threshold [13, 14].

We formalize the online checkpointing problem as a Markov decision process, and compute the optimal checkpointing policy offline using Q-learning

[15]. Online decisions are made using a table including the optimum action which is obtained from the Q-table (stored in memory). Our solution: (i) is the first approach that simultaneously (and explicitly) accounts for: the current battery level, previous time a checkpoint was taken, and the program’s forward progress in informing checkpointing decisions; (ii) accounts for the stochastic nature of the energy source, checkpointing overheads, and processor energy consumption; (iii) uses a model-free approach to solving the Markov decision process that can be trained using empirically obtained and synthetically generated traces of harvested energy; and (iv) offers computational and energy efficient hardware solutions to make online decisions.

The rest of the paper is organized as follows: Section I-A reviews the recent work in this area, while Section II explains system model and the proposed Q-learning based online checkpointing policy. Section III describes our experimental setup, results and compares the proposed approach to prior art. We conclude in Section IV to future work.

I-a Related Work

Checkpointing has been used for fault tolerance in computing systems [16, 17], allowing the processor to roll back to the last valid checkpoint and recover the processor state in case of failure. Okamura et al. [18] propose a dynamic checkpointing scheme based on Q-learning for fault-tolerance, but assume random failures unrelated to energy harvesting or power failures.

Checkpointing has also been used to guarantee forward progress in intermittently powered devices [13, 14, 12, 6, 19, 20]. Mementos [20] inserts trigger points in the software at compile time and, at run time, checkpoints when the stored energy level falls below a threshold at a trigger point. QuickRecall [14] and Hibernus [13] propose a policy based on two thresholds, one that triggers checkpoints (when the energy level falls below a low threshold) and one that determines when to start re-execution (when the energy level rises above a high threshold). A similar approach is used in [6], except that they assume access to a processor with on-chip non-volatile state. Hardware support for checkpointing that uses two counters for number of instructions and number of stores is proposed in [12]. A checkpoint is performed if either of the two counters exceeds its threshold.

From a security stand-point, several proposals exist on main memory encryption for security, [8, 9] but they do not consider battery-operated intermittently powered processors. Finally, Q-learning has been used in other online decision making contexts such as communications and power management [21, 22, 23].

Ii Online Checkpointing

We describe an online checkpointing framework for secure IoT processors starting with the system model, a mathematical formulation of the optimal checkpointing problem, and a learning based solution.

Ii-a System Model

Our target IoT system shown in Fig 2, includes an in-order processor with conventional volatile caches and registers running on harvested energy and an off-chip non-volatile main memory. Although the proposed techniques are agnostic to the NVM technology, in Section III we empirically evaluate an RRAM-based NVM. For securing the checkpoints, our system encrypts main memory similar to prior work [8, 9]. Data blocks written to and read from main memory are encrypted and decrypted, respectively, using a light-weight block cipher PRINCE [11].

The system is powered by harvested energy which is highly intermittent. To smooth out large temporal variations in harvested energy, the IoT device has on-board energy storage111On-board energy storage can range from a simple super-capacitor to a battery management IC with an in-built battery tailored for ultra-low power energy harvesting devices such as the TI BQ25504 [24].. To guarantee forward progress of program execution, the processor state is checkpointed in NVM, and to guarantee security, all checkpoints are encrypted before being written to the NVM. Shown as shaded in Fig 2, a checkpointed IoT system state consists of: (i) the program counter (PC), (ii) the contents of the register file (RF); and (iii) all dirty cache lines in the L1 data cache222We assume a single-level cache hierarchy.. The NVM has a shadow memory to store a checkpoint.

When rolling back to the last checkpoint, the state of the system consisting of volatile and non-volatile states should be maintained consistently. The volatile states are stored and recovered from the shadow memory, therefore they remain consistent. Ransford et. al. [25] showed that checkpointing and recovering the program state can lead to inconsistency in the NVM. All writes to NVM that happen after a checkpoint change the non-volatile state of the program. In case of an energy failure, the processor rolls back to the checkpointed state and the volatile states are recovered correctly, however, the non-volatile state has changed since the checkpoint and the NVM will be inconsistent.

To address this issue, previous work has proposed enforcing a checkpoint between every non-idempotent memory access pairs [26], or versioning the inconsistent data [27] so that every non-idempotent memory access pair is eliminated. Liu et. al. [12] propose a hardware solution in which all stores are regarded as speculative and are delayed. These stores are written back to non-volatile memory only when the program reaches the next checkpoint. To keep the overhead of processing low and avoid placing an excessive number of checkpoints, we simply expand the shadow memory to keep track of the memory locations which are written to between two checkpoints. For each write to the non-volatile memory, the previous value in the memory is copied to the shadow memory. If the next checkpoint is performed successfully, these values are discarded and otherwise they are restored along with the saved volatile state.

Finally, the IoT processor is provisioned with the scheduler module (Fig 2), that determines when checkpoints are performed. The design and evaluation of a security-aware checkpoint scheduler is the primary contribution of this work.

Fig. 3: State transition diagram for actions (checkpoint) and (no checkpoint) with and without an energy failure event. Each state is defined as where represents the progress made so far, is where the last checkpoint was performed and is the current battery level. For each action, indicates the immediate cost incurred from that action.

Ii-B Problem Formulation

The checkpoint scheduler decides whether to checkpoint or not every instructions. Every block of executed instructions is referred to as an interval. To track forward progress, the scheduler maintains two run-time counters: (i) a Progress Counter (PrC) which counts the number of intervals of actual forward progress that the program has made; and (ii) a Checkpoint Counter (CC) which records when the last checkpoint was taken. Both counters are initialized to zero when the program begins. PrC is incremented every time instructions execute successfully. If a checkpoint is made, CC is updated to the current PrC. Finally, on energy failure, the processor rolls back to the last checkpointed state and this resets PrC to the current value of CC.

We formulate the execution of a program with online checkpointing as an instance of a Markov decision process (MDP) consisting of a set of states , a set of actions , a stochastic transition function

which gives the probability of moving from one state to another for each action, and an immediate cost function

.

In our formulation, the state of the Markov decision process is given by the current values of the progress counter (PrC), the checkpoint counter (CC) and the battery level. Specifically, the state at the end of each control interval is given by where is the PrC value (representing the progress made so far), is CC (when the last checkpoint was performed) and is the current battery level. The set of actions is , where the action represents taking a checkpoint while the action represents proceeding without a checkpoint.

If the scheduler decides to checkpoint, a cost corresponding to the latency of checkpointing is incurred. Assuming no energy failure during the next interval, the next state is . On the other hand, if there is an energy failure in the next interval, the next state is . In both cases, and are the new battery levels accounting for the energy cost of checkpointing, the net energy received/consumed in interval .

On the other hand, if the scheduler decides not to checkpoint, there is no immediate cost. Assuming no energy failure during the next interval, the next state is . However, if there is an energy failure in the next interval, the next state is , incurring a cost of representing the lost progress. The state transition diagram is shown in Fig 3.

The optimal policy balances the cost of checkpointing with the cost of recovering from an energy failure by rolling back to the last checkpoint. The total cost can be defined as the time overhead of checkpointing plus the re-execution time due to roll-backs. Given the Markov decision process specification, our goal is to find an optimal policy which minimizes the total expected cost as defined above, i.e., the expected overhead of the checkpointing policy relative to the baseline uninterrupted execution time. The optimal policy can be obtained by solving for a fixed-point of the Markov decision process given the stochastic model for the harvested energy and system statistics i.e., the probability of energy failure in future control intervals and a distribution of next state battery levels. However, this model would be hard to obtain and can be highly inaccurate. Hence, the proposed system learns an optimal policy using experimentally obtained traces of harvested energy and system statistics.

Ii-C Offline Learning of Checkpointing Policy

The system uses the Q-learning algorithm to find the optimal policy. This algorithm assigns a Q-value to each state-action pair that, once the algorithm converges, represents the expected total cost of executing action in state and choosing greedy actions afterwards. The algorithm starts by initializing all Q-values arbitrarily, and iteratively updates by simulating the system. In iteration , the algorithm chooses the action that results in the smallest Q-value for the current state, simulates the system, and observes the next state and corresponding cost. It then updates the Q-values as follows:

(1)

Here, is the checkpointing cost for checkpointing and roll-back cost in the event of an energy failure for no checkpointing. The parameter discounts future costs, and determines the learning rate, i.e., how strongly the Q-values are overwritten by new ones after each iteration. With certain constraints on the learning rate [28], the Q-values have been proven to converge to those corresponding to the optimal policy . Empirically, we found that using a variable learning rate worked best in practice.

(2)

where is the number of times a state-action pair is visited. Further, we use the algorithm which chooses an action at random with probability and follows the greedy strategy with probability where .

Fig. 4: Illustration of the checkpointing policy. A mandatory checkpoint is taken after every super interval ( intervals). Within an interval, the checkpointing decisions are made based on the learned Q-table.

Ii-D Online Decision Making

After the Q table is learned, it is used online to derive the optimal policy for any state by picking the action with smallest Q-value. Therefore, it is sufficient to store one action bit corresponding to the optimum action for each state. For a small state-space, this approach entails small memory overhead and has the benefit of low energy consumption since it only requires one read from memory at the end of each interval.

The state-space of the Markov decision process grows quadratically with the number of intervals and hence with the dynamic execution length of the program. This can result in large offline training times for the Q-learning to converge, as well as large storage requirements for action bit values. We use a hybrid policy as shown in Fig 4 that checkpoints every intervals (the super-interval). The optimal checkpointing locations within a super-interval are determined by the proposed Q-learning approach. The hybrid approach limits the size of the Markov decision process state-space to states, where is the number of possible battery levels. As a result, we are able to limit not only the training time, but the maximum amount of space required to store the Q-table in non-volatile main memory.

Iii Experimental Results

Iii-a System Setup

For experimental evaluation, we used the TigerMIPS processor, a 32-bit 5-stage implementation of the MIPS ISA which includes 8KB L1 instruction and data caches [29]. The light-weight PRINCE block cipher is used to encrypt and decrypt data written to and read from the non-volatile main memory. In order to reduce the overhead of memory writes, a TigerMIPS’ write-through cache is replaced with a write-back cache. An RRAM based non-volatile main memory is assumed, and NVSim [30] is used to derive power and performance parameters for an 8 MB non-volatile memory.

The TigerMIPS RTL was modified to incorporate checkpointing and roll-back operations. On a checkpoint, the fetch stage is stalled until all instructions in the pipeline retire. The contents of PC, RF and dirty cache lines are encrypted and written back to main memory. The effect of energy failure is simulated by flushing the processor’s pipeline, discarding the data in the instruction and data caches, and resetting the RF and PC. When energy is restored, the checkpointed state is read from main memory, decrypted and restored in the processor. The modified TigerMIPS processor and PRINCE block cipher are synthesized using Cadence RTL Compiler [31] with a 45 technology library and a target 100 KHz clock. The was set at .

(a)
(b)
(c)
Fig. 5: Generating synthetic power traces based on measured harvested power from radio frequency. (a) measured radio frequency power trace from [6], (b) quantized measured power to six power levels, and (c) the synthetic power trace with transition probabilities extracted from the quantized measured trace.

For Q-learning we set the discount factor , and (i.e., the rate at which random actions are picked instead of greedy ones) was reduced from to so as to perform more action exploration at the beginning. The size of an interval is set to 500 instructions, and a super-interval contains 100 intervals (i.e., a super-interval has 50K instructions). Thus, the Progress Counter and Checkpoint Counter are small (9 bits each).

For a super interval containing 100 intervals and 20 battery levels, the state-space has states. For each state, one action bit needs to be stored in memory, indicating whether a checkpoint has to be placed or not. Therefore, a memory size of would be sufficient to store the Q-table information. At run time, the action bit corresponding to the current state is read and the appropriate action is performed by the scheduler.

Energy Penalty Analysis: The energy penalty of storing the action bits in NVM and reading is one read from the non-volatile memory every interval which adds up to per super-interval. This is only of the energy consumed by the processor in a super-interval.

We model a radio frequency based energy harvesting source as described by [6]

. Unfortunately, the sample power traces provided are small, while we need large traces both for training and validation. Modeling the harvested power as a Markov chain is common in literature 

[32]. Thus, we developed a first-order Markov chain based model for the harvested power that transitions between six discrete power levels. To obtain the transition probabilities, we quantized the measured trace to six power levels, (power levels in ). We construct a transition matrix , where indicates the number of power level transitions from to within the trace, where . Each row of the transition matrix keeps track of the number of transitions from power level

to all other power levels. Therefore, we can estimate the transition probabilities between power levels from the transition matrix. Fig 

(a)a shows the measured power from a radio frequency source. The quantized power over the six levels is shown in Fig (b)b. Once the transition matrix and the transition probability matrix is obtained, we can use the Markov chain model to obtain new synthetic power traces. An example of a synthetic power trace generated using this model is shown in Fig (c)c.

Fig. 6: In-house toolflow developed for experimental validation of the proposed approach.

We assume a battery with a 2 capacity which is smaller than the battery assumed in QuickRecall [14] but larger than the capacity in [6].

Table I shows the estimated energy consumption for different components of our system, the average incoming energy, and battery parameters.

Parameter Value Parameter Value
Processor nJ/inst Action Bit Memory 25KB
PRINCE nJ/8B block Harvested (Avg) 6 nJ/clock
NV Read pJ/4B line Battery Capacity 2 J
NV Write pJ/4B line Battery Levels
TABLE I: Experimental parameters. The processor and PRINCE cipher energy consumptions are obtained from RTL synthesis. The data for non-volatile memory is from NVSim [30].

Iii-B Tool Flow

Our validation tool flow is shown in Fig 6 and is based on two components: an offline component that learns the optimal policy and obtains the action bit table based on the learned Q-table in Matlab, and an on-line component that uses detailed RTL simulations to measure execution time using our proposed and other state-of-the-art checkpointing policies.

The offline Q-learning phase takes as input the energy harvesting traces and average of processor power consumption for each benchmark. Once the optimal policy is learned offline, the corresponding action bit table is fed to the RTL simulator along with dynamic traces of processor power consumption to estimate the program execution time with the learned policy. The power traces are generated by feeding the value change dump (VCD) files from each benchmark’s execution to Cadence RTL Compiler. The power traces used for offline training and online validation are different, obtained from the same Markov chain model. Our results are presented on several benchmarks (CHStone [33] and FFT [34]).

(a)
(b)
Fig. 7: Converged Q-values for and as a function of battery level for (a) and , i.e., at the interval with the most recent checkpoint at the interval, and (b) and .

Iii-C Comparison with Prior Art

The Q-learning based approach is compared with two techniques from prior art. The conservative policy is based on the work in [13, 6], where the checkpointing decisions are made only on the current battery level. The conservative policy checkpoints any time the energy level in the battery falls below a threshold, that is equal to the energy required to checkpoint the PC, RF contents and the data cache. The processor is then turned off and restored only when the battery level exceeds another threshold, which is at least the amount of energy required to read and decrypt data from the nonvolatile memory plus the energy to perform a checkpoint. The policy is conservative in that it accumulates the energy required to perform a full checkpoint in battery before starting the program execution, guaranteeing that it never incurs any roll-backs.

The second technique is a periodic checkpointing policy [12]. A running instruction counter is maintained, and a checkpoint is performed any time the counter exceeds a threshold. For fairness, we pick the best threshold value averaged over all benchmarks (1000 instructions).

The learned Q-values as a function of battery level are shown in Fig 7 for checkpoint () and proceed with no checkpoint () actions. In Fig (a)a and Fig (b)b, the previous checkpoint was taken one and six intervals before the current interval, respectively.

The Q-values represent expected cost of the action, thus the optimal policy chooses the action with lower Q-value. We make two observations from this figure: (i) in both cases, as the battery level reduces, the checkpointing action becomes preferred to the no checkpointing action; (ii) however, the threshold below which the checkpointing is preferred is lower when the previous checkpoint was taken recently. In other words, policies based on a static battery level threshold [13, 6] are sub-optimal.

Fig. 8: Comparison of normalized execution times: Q-Learning based checkpointing policy vs periodic [12] and conservative policies [13, 6]. For each testbench, runtime is normalized with respect to the baseline execution time (without energy failure) for that testbench.
Benchmark Periodic Q-learning
#CPs Total RB #CPs Total RB
Cost (s) Cost (s)
DFADD 12 0.302 7 0.290
MIPS 69 0.767 38 0.632
ADPCM 237 6.13 139 5.36
GSM 26 0.629 20 0.462
MOTION 58 0.674 42 0.769
AES 69 2.04 48 2.01
FFT 17 0.391 11 0.250
TABLE II: Comparison of Q-learning based checkpointing with periodic checkpointing in terms of the number of checkpoints (CP) and roll-back (RB) cost corresponding to re-execution for various benchmarks.

The normalized execution time of the proposed Q-learning based dynamic checkpointing policy to the periodic and conservative policies are compared in Fig 8. Normalization is done with respect to the baseline execution time with no energy failure. For all the benchmarks, Q-learning results in the lowest execution time followed by periodic and finally conservative. Specifically, Q-learning is on average faster than periodic checkpointing. The improvements are even greater when compared against the conservative policy.

From Fig 8 we can see that the conservative policy performs poorly compared to the other methods. Although in this method no overhead is incurred from roll-backs (and consequently lost computation), significant amount of time is lost waiting for the battery to recharge. On the other hand, the periodic policy does not adaptively determine when to checkpoint. Table II shows the number of checkpoints placed by the periodic and Q-learning policies during runtime. We can see that the Q-learning based policy places fewer checkpoints compared to the periodic policy, thus incurring smaller overhead due to checkpointing. The other observation is that the re-execution cost due to roll-backs is smaller for the Q-learning policy, proving that the checkpoints are placed close to energy failure points, thus reducing the re-execution cost.

The execution progress of the FFT benchmark is shown in Fig 9. This figure plots the number of (useful) instructions executed, battery level and checkpoint locations for the proposed Q-learning based checkpointing policy. Observe that checkpoints can be triggered by low battery level (CP1, CP2, CP3, CP5 and CP6), or when a long time has passed since the previous checkpoint even if the battery level is high (CP4).

Fig. 9: Progress plot for the FFT benchmark using Q-learning based checkpointing. Vertical lines represent checkpoint placements. Checkpoints are triggered either by a low battery level or when a long time has passed since the previous checkpoint.

Iii-D Sensitivity and Security Analysis

We explored the effects of varying the number of possible battery levels and super interval size . A simulator in Matlab was developed, implementing the program execution with checkpointing and roll-back. The simulator estimates the running time of the program by keeping track of the available energy in the battery based on the incoming harvested energy (synthetic traces) and consumed energy by the processor.

In the first experiment, we fixed the size of the super interval, and trained different models by varying . The running time of the program when the Q-learning policy is used for checkpointing was estimated. Fig (a)a shows the relative speedup with respect to the performance of the model with . We can see that training on higher granularity for battery level results in better performance of the Q-learning policy. Training a model with resulted in more than speedup of program execution time. However, the size of state space grows linearly with the number of battery levels, which is directly proportional to the size of memory required to store the Q-values and also results in higher training times. Note that the performance gain from increasing the battery levels begins to converge after which higher granularity would not gain much in terms of speedup.

In the second experiment, the effect of changing was examined. We trained various models by varying and fixing . Fig (b)b shows the relative speedup of models based on Q-learning policy with different values of compared to the model with . We can see that by increasing the super interval size to , we get up to speedup in program execution time. Note that the size of state space and hence the memory required to store Q-values grows quadratically with the size of the super interval.

(a)
(b)
Fig. 10: Sensitivity analysis of the Q-learning checkpointing policy to (a) battery level B and (b) size of the super interval S. The relative speedups compared to a model with and are illustrated. With higher granularity of the battery level and a larger super interval size, the Q-learning policy performs better and program execution time decreases.

As was mentioned before, the non-volatile memory has to be encrypted at all times to prevent an attacker from reading out potentially sensitive data. However, once an attacker physically captures an IoT device, they can read out an encrypted image of main memory and checkpointed state. Then, by executing their own code on the device and observing the corresponding encrypted data, the attacker might be able to carry out a chosen plain-text attack to recover the on-chip encryption key. Although security analysis of PRINCE suggests that such an attack is impractical (requiring at least plaintexts [35]), a cautious defender might still wish to to use stronger block ciphers like AES. However, AES encryption further increases the energy overhead of checkpointing. Our experiments for the FFT benchmark showed and increase in runtime when PRINCE and AES were used, respectively, for encrypting the checkpoints. In addition to chosen plaintext attacks, the attacker might be able to launch a denial of service attack if they can tamper with the energy source. While both the original and secure schemes are susceptible to such attacks, the energy overhead of secure checkpointing might increase the vulnerability to denial of service.

Iv Conclusion

In this paper, we have proposed and evaluated a novel Q-learning based online checkpointing policy for secure, intermittently powered IoT devices. Compared to the current state-of-the-art, the proposed policy is the first to take into account multiple factors, including forward progress, distance to previous checkpoint and current battery level to judiciously determine when to checkpoint. A detailed evaluation of the scheme compared to the state-of-the-art demonstrates up to reduction in execution time. We examine the effects of varying model parameters such as battery level and interval size on performance and the resulting trade-offs between model complexity and performance and memory storage requirements. Our future work involves incorporating other run-time information in the framework, and experimenting with an energy harvester and a lightweight processor.

References

  • [1] Yongpan Liu, Zewei Li, Hehe Li, Yiqun Wang, Xueqing Li, Kaisheng Ma, Shuangchen Li, Meng-Fan Chang, Sampson John, Yuan Xie, et al. Ambient energy harvesting nonvolatile processors: from circuit to system. Design Automation Conference, pages 1–6, 2015.
  • [2] R. V. Prasad, S. Devasenapathy, V. S. Rao, and J. Vazifehdan. Reincarnation in the ambiance: Devices and networks with energy harvesting. Communications Surveys Tutorials, pages 195–213, 2014.
  • [3] Canan Dagdeviren, Byung Duk Yang, Yewang Su, Phat L. Tran, Pauline Joe, Eric Anderson, Jing Xia, Vijay Doraiswamy, Behrooz Dehdashti, Xue Feng, Bingwei Lu, Robert Poston, Zain Khalpey, Roozbeh Ghaffari, Yonggang Huang, Marvin J. Slepian, and John A. Rogers. Conformal piezoelectric energy harvesting and storage from motions of the heart, lung, and diaphragm. Proceedings of the National Academy of Sciences, pages 1927–1932, 2014.
  • [4] J. A. Paradiso and T. Starner. Energy scavenging for mobile and wireless electronics. Pervasive Computing, pages 18–27, 2005.
  • [5] M. Piñuela, P. D. Mitcheson, and S. Lucyszyn. Ambient rf energy harvesting in urban and semi-urban environments. Transactions on Microwave Theory and Techniques, pages 2715–2726, 2013.
  • [6] Kaisheng Ma, Yang Zheng, Shuangchen Li, Karthik Swaminathan, Xueqing Li, Yongpan Liu, Jack Sampson, Yuan Xie, and Vijaykrishnan Narayanan. Architecture exploration for ambient energy harvesting nonvolatile processors. International Symposium on High Performance Computer Architecture, pages 526–537, 2015.
  • [7] D. Samyde, S. Skorobogatov, R. Anderson, and J. J. Quisquater. On a new way to read data from memory. Security in Storage Workshop, pages 65–69, 2002.
  • [8] Siddhartha Chhabra and Yan Solihin. i-nvmm: a secure non-volatile main memory system with incremental encryption. International Symposium on Computer Architecture, pages 177–188, 2011.
  • [9] William Enck, Kevin Butler, Thomas Richardson, and Patrick McDaniel. Securing non-volatile main memory. Technical report, Pennsylvania State University, 2008.
  • [10] J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. Lest we remember: Cold-boot attacks on encryption keys. Communications, pages 91–98, 2009.
  • [11] Julia Borghoff, Anne Canteaut, Tim Güneysu, Elif Bilge Kavun, Miroslav Knezevic, Lars R Knudsen, Gregor Leander, Ventzislav Nikov, Christof Paar, Christian Rechberger, et al. Prince: a low-latency block cipher for pervasive computing applications. International Conference on the Theory and Application of Cryptology and Information Security, pages 208–225, 2012.
  • [12] Qingrui Liu and Changhee Jung. Lightweight hardware support for transparent consistency-aware checkpointing in intermittent energy-harvesting systems. Non-Volatile Memory Systems and Applications Symposium, pages 1–6, 2016.
  • [13] Domenico Balsamo, Alex S Weddell, Geoff V Merrett, Bashir M Al-Hashimi, Davide Brunelli, and Luca Benini. Hibernus: Sustaining computation during intermittent supply for energy-harvesting systems. Embedded Systems Letters, pages 15–18, 2015.
  • [14] Hrishikesh Jayakumar, Arnab Raha, and Vijay Raghunathan. Quickrecall: A low overhead hw/sw approach for enabling computations across power cycles in transiently powered computers. International Conference on VLSI Design and International Conference on Embedded Systems, pages 330–335, 2014.
  • [15] Christopher J.C.H. Watkins and Peter Dayan. Technical note: Q-learning. Machine Learning, pages 279–292, 1992.
  • [16] Philip A Bernstein, Vassos Hadzilacos, and Nathan Goodman. Concurrency Control and Recovery in Database Systems. Addison- Wesley, 1987.
  • [17] James S. Plank. An overview of checkpointing in uniprocessor and distributedsystems, focusing on implementation and performance. 1997.
  • [18] Hiroyuki Okamura, Yuki Nishimura, and Tadashi Dohi.

    A dynamic checkpointing scheme based on reinforcement learning.

    Pacific Rim International Symposium on Dependable Computing, pages 151–158, 2004.
  • [19] Benjamin Ransford. Transiently Powered Computers. PhD thesis, University of Massachusetts Amherst, Jan. 2013.
  • [20] Benjamin Ransford, Jacob Sorber, and Kevin Fu. Mementos: system support for long-running computation on rfid-scale devices. Sigplan Notices, pages 159–170, 2012.
  • [21] J. Ho, D. W. Engels, and S. E. Sarma. Hiq: a hierarchical q-learning algorithm to solve the reader collision problem. International Symposium on Applications and the Internet Workshops, pages 4 pp.–, 2006.
  • [22] Junhong Nie and S. Haykin. A q-learning-based dynamic channel assignment technique for mobile communication systems. Transactions on Vehicular Technology, pages 1676–1687, 1999.
  • [23] Ying Tan, Wei Liu, and Qinru Qiu. Adaptive power management using reinforcement learning. International Conference on Computer-Aided Design, pages 461–467, 2009.
  • [24] Gartner Says the Internet of Things Will Transform the Data Center. http://www.ti.com/product/BQ25504/datasheet.
  • [25] Benjamin Ransford and Brandon Lucia. Nonvolatile memory is a broken time machine. Proceedings of the workshop on Memory Systems Performance and Correctness, page 5, 2014.
  • [26] Mimi Xie, Mengying Zhao, Chen Pan, Jingtong Hu, Yongpan Liu, and Chun Jason Xue. Fixing the broken time machine: Consistency-aware checkpointing for energy harvesting powered non-volatile processor. In Proceedings of the 52nd Annual Design Automation Conference, page 184, 2015.
  • [27] Brandon Lucia and Benjamin Ransford. A simpler, safer programming and execution model for intermittent systems. SIGPLAN Notices, pages 575–585, 2015.
  • [28] John N. Tsitsiklis. Asynchronous stochastic approximation and q-learning. Machine Learning, pages 185–202, 1994.
  • [29] S. Moore and G. Chadwick. https://www.cl.cam.ac.uk/teaching/0910/ECAD+Arch/mips.html.
  • [30] Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. Computer-Aided Design of Integrated Circuits and Systems, pages 994–1007, 2012.
  • [31] Cadence RTL Compiler. https://www.cadence.com/content/cadence-www/global/en_US/home/tools.html.
  • [32] Chin Keong Ho, Pham Dang Khoa, and Pang Chin Ming. Markovian models for harvested energy in wireless communications. Communication Systems (ICCS), 2010 IEEE International Conference on, pages 311–315, 2010.
  • [33] Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, and Hiroaki Takada. Proposal and quantitative analysis of the chstone benchmark program suite for practical c-based high-level synthesis. Journal of Information Processing, pages 242–254, 2009.
  • [34] LegUp High-Level Synthesis. http://legup.eecg.utoronto.ca/.
  • [35] Shahram Rasoolzadeh and Håvard Raddum. Cryptanalysis of 6-round prince using 2 known plaintexts. IACR Cryptology ePrint Archive, page 132, 2016.