Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

09/20/2021
by   Muhammad Shafique, et al.
NYU college
TU Wien
7

The security and privacy concerns along with the amount of data that is required to be processed on regular basis has pushed processing to the edge of the computing systems. Deploying advanced Neural Networks (NN), such as deep neural networks (DNNs) and spiking neural networks (SNNs), that offer state-of-the-art results on resource-constrained edge devices is challenging due to the stringent memory and power/energy constraints. Moreover, these systems are required to maintain correct functionality under diverse security and reliability threats. This paper first discusses existing approaches to address energy efficiency, reliability, and security issues at different system layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how to further improve the performance (latency) and the energy efficiency of Edge AI systems through HW/SW-level optimizations, such as pruning, quantization, and approximation. To address reliability threats (like permanent and transient faults), we highlight cost-effective mitigation techniques, like fault-aware training and mapping. Moreover, we briefly discuss effective detection and protection techniques to address security threats (like model and data corruption). Towards the end, we discuss how these techniques can be combined in an integrated cross-layer framework for realizing robust and energy-efficient Edge AI systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 6

05/16/2020

NeuroAttack: Undermining Spiking Neural Networks Security through Externally Triggered Bit-Flips

Due to their proven efficiency, machine-learning systems are deployed in...
01/04/2021

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead

Machine Learning (ML) techniques have been rapidly adopted by smart Cybe...
08/23/2021

ReSpawn: Energy-Efficient Fault-Tolerance for Spiking Neural Networks considering Unreliable Memories

Spiking neural networks (SNNs) have shown a potential for having low ene...
07/20/2020

Energy Efficient Computing Systems: Architectures, Abstractions and Modeling to Techniques and Standards

Computing systems have undergone several inflexion points - while Moore'...
07/17/2020

FSpiNN: An Optimization Framework for Memory- and Energy-Efficient Spiking Neural Networks

Spiking Neural Networks (SNNs) are gaining interest due to their event-d...
07/31/2019

Tuning Algorithms and Generators for Efficient Edge Inference

A surge in artificial intelligence and autonomous technologies have incr...
12/19/2019

Spiking Networks for Improved Cognitive Abilities of Edge Computing Devices

This concept paper highlights a recently opened opportunity for large sc...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The NN-based AI systems have achieved state-of-the-art accuracy for various applications such as image classification, object recognition, healthcare, automotive, and robotics [46]. However, current trends show that the accuracy is improved at the cost of increasing complexity of NN models (e.g., larger model size and complex operations) [7][79]. This increased complexity hinders the deployment of advanced NNs (DNNs and SNNs) on resource-constrained edge devices [58]. Therefore, optimizations at different system layers (i.e., HW and SW) are necessary to enable the use of advanced NNs at the edge [7]. Besides performance and energy efficiency, reliability and security aspects are also important to ensure the correct functionality under diverse operating conditions (e.g., in the presence of HW faults and security threats), especially for safety-critical applications like autonomous driving and healthcare [42][89]. Therefore, the important design metrics for enabling Edge AI include performance (i.e., latency), energy efficiency, reliability, and security.

I-a Key Challenges for Energy-Efficient and Secure Edge AI

We introduce the key challenges for developing Edge AI systems in the following text (see Fig. 1 for an overview of the challenges).

Fig. 1: Overview of challenges for energy-efficient and secure Edge AI.
  • [leftmargin=*]

  • Performance: Edge AI systems are expected to have high performance to provide real-time response. However, due to memory- and compute-intensive nature of NNs, achieving high performance is not trivial. Moreover, edge devices have limited compute and memory resources, which makes it challenging to map the full NN computations simultaneously on an accelerator fabric [77].

  • Energy Efficiency: Edge AI systems should also have high energy efficiency to ensure complete processing within a restricted energy budget, especially in the case of battery-powered devices. Therefore, the energy consumption in both the off-chip and on-chip parts should be minimized. The off-chip part includes the DRAM-based off-chip memory accesses [77], while the on-chip part includes (1) the on-chip memory accesses, and (2) the neural operations like multiply-and-accumulation (MAC) [61].

  • Reliability: Edge AI systems should produce correct outputs even in the presence of different types of reliability threats [19]. The main reliability threats are as follows.

    • Process variations are the result of imprecisions in the fabrication process, as manufacturing billions of nano-scale transistors with identical electrical properties is difficult to impossible. This causes variations in the leakage power and frequency in the same chip, across different chips in the same wafer, and even across different wafers [82].

    • Soft errors are caused by high-energy particle strikes, manifest as bit-flips at the HW layer, and can propagate all the way to the application layer and may cause incorrect outputs [4].

    • Aging is gradual degradation of the processing circuits over time [87]. It occurs due to physical phenomena like Hot Career Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), and Negative/Positive Bias Temperature Instability (NBTI/PBTI).

  • Security: Edge AI systems should offer high resilience against security vulnerabilities such as side channels and HW intrusions [19]. Moreover, NN algorithms (e.g., DNNs) have other security vulnerabilities as well that can be exploited through data poisoning to cause confidence reduction or misclassification [90].

The above discussion highlights different possible challenges for developing energy-efficient and secure Edge AI systems. To address each challenge individually, various techniques have been proposed at different layers of the computing stack. However, systematic integration of the most effective techniques from both the hardware and software levels is important to achieve ultra-efficient and secure Edge AI.

I-B Our Contributions

In the light of the above discussion, the contributions of this paper are the following.

  • [leftmargin=*]

  • We present an overview of different challenges and state-of-the-art techniques for improving performance and energy efficiency of Edge AI systems (Section II).

  • We present an overview of different challenges and state-of-the-art techniques for reliability and security of Edge AI (Section III).

  • We present a cross-layer framework that systematically integrates the most effective techniques for improving the energy efficiency and robustness of Edge AI (Section IV).

  • We discuss the challenges and recent advances in neuromorphic computing considering SNNs (Section V).

Ii Performance and Energy-Efficient Edge AI

In the quest of achieving higher accuracy, the evolution of DNNs has seen a dramatic increase in the complexity with respect to model size and operations, i.e., from simple Multi-Layer Perceptron (MLP) to deep and complex networks like Convolutional Neural Networks (CNNs), Transformers, and Capsule Networks (CapsNet)

[85]. Hence, the advanced DNNs require specialized hardware accelerators and optimization frameworks to enable efficient and real-time data processing at the edge. To address this, a significant amount of work has been carried out in the literature. In this section, we discuss different state-of-the-art techniques for improving performance and energy efficiency of Edge AI (see overview in Fig. 2).

Ii-a Optimizations for DNN Models

The edge platforms typically have limited memory and power/energy budgets, hence small-sized DNN models with limited number of operations are desired for Edge AI applications. Model compression techniques such as pruning (i.e., structured [3][24] or unstructured [48, 59, 17]) and quantization [17, 16, 32, 57] are considered to be highly effective for reducing the memory footprint of the models as well as for reducing the number of computations required per inference. Structured pruning [3] can achieve about 4x weight memory compression, while class-blind unstructured pruning (i.e., PruNet [59]) achieves up to 190x memory compression. Quantization when combined with pruning can further improve the compression rate. For instance, quantization in the Deep Compression [17] improves the compression rate by about 3x for the AlexNet and the VGG-16 models. The Q-CapsNets framework [57] shows that quantization is highly effective for complex DNNs such as CapsNets as well. It reduces the memory requirement of the CapsNet [85] by 6.2x with a negligible accuracy degradation of 0.15%. Since model compression techniques may result in a sub-optimal accuracy-efficiency trade-off (due to lack of information of the underlying hardware architecture used for DNN execution), HW-aware model generation and compression techniques have emerged as a potential solution. Many Neural Architecture Search (NAS) techniques [109, 52, 74, 51, 97, 1, 60] have been proposed to generate high accuracy and efficient models. The state-of-the-art NAS like the APNAS framework [1]

employs an analytical model and a reinforcement learning engine to quickly find DNNs with good accuracy-efficiency trade-offs for the targeted systolic array-based HW accelerators 

[21][33]. It reduces the compute cycles by 53% on average with negligible accuracy degradation (avg. 3%) compared to the state-of-the-art techniques. Therefore, it is suitable for generating DNNs for resource-constrained applications. Meanwhile, the NASCaps framework [60] employs an NSGA-II algorithm to find Pareto-optimal DNN models by leveraging the trade-off between different hardware characteristics (i.e., memory, latency, and energy) of a given HW accelerator. Compared to manually designed state-of-the-art CapsNets (i.e., DeepCaps), the NASCaps achieves 79% latency reduction, 88% energy reduction, 63% memory reduction, with only 1% accuracy reduction.

Fig. 2: Overview of the techniques at different system layers for improving the performance and energy efficiency of Edge AI.

Ii-B Optimizations for DNN Accelerators

To efficiently run the generated DNN models on accelerator fabric, optimizations should be applied across the HW architecture, i.e., in the off-chip memory, on-chip memory, and on-chip compute engine.

The Off-chip Memory (DRAM): The main challenges arise from the fact that a full DNN model usually cannot be mapped and processed at once on the accelerator fabric due to limited on-chip memory. Therefore, redundant accesses for the same data to DRAM are required, which restricts the systems from achieving high performance and energy efficiency gains, as DRAM access latency and energy are significantly higher than other operations [96]. Toward this, previous works have proposed (1) model compression through pruning [17, 48, 59, 3, 24] and quantization [17][16, 32, 57], and (2) data partitioning and scheduling schemes [104, 105, 49, 98]. However, they do not study the impact of DRAM accesses which dominate the total system latency and energy, and do not minimize redundant accesses for overlapping data in convolutional operations. To address these limitations, several SW-level techniques have been proposed (the ROMANet [77] and DRMap [75] methodologies). Our ROMANet [77] minimizes the DRAM energy consumption through a design space exploration (DSE) that finds the most effective data partitioning and scheduling while considering redundant access optimization. It minimizes the average DRAM energy-per-access by avoiding row buffer conflicts and misses through an effective DRAM mapping, as shown in Fig. 3. Our DRMap [75] further improves the DRAM latency and energy for DNN processing considering different DRAM architectures such as the low-latency DRAM with subarray-level parallelism (i.e., SALP [41]). It employs a DSE with a generic DRAM data mapping policy that maximizes DRAM row buffer hits, bank- and subarray-level parallelism to obtain minimum energy-delay-product (EDP) of DRAM accesses for the given DRAM architectures and DNN data partitioning and scheduling (see Fig. 4).

Fig. 3: Experimental results of (a) the number of DRAM accesses, and (b) the DRAM access energy for the AlexNet. The ROMANet [77] decreases the DRAM accesses and the DRAM energy compared to the state-of-the-art (i.e., the Caffeine [105], the SmartShuttle [49], and the BWA [98]).
Fig. 4: The EDP of DRAM accesses for the AlexNet across different DRAM architectures (i.e., DDR3, SALP-1, SALP-2, and SALP-MASA) and different DRAM mapping policies, which have different orders of DRAM mapping loops. The results show that the DRMap mapping (i.e., Map 3) consistently obtains the lowest EDP [75].

The On-chip Memory (Buffer): To efficiently shuttle data between the DRAM and the on-chip fabric, specialized on-chip buffer design and access management are important. Here, the scratchpad memory (SPM) design is commonly used due to its low latency and power characteristics [77][34]. For optimizing buffer access latency and energy, several SW-level techniques have been proposed (such as ROMANet [77] and DESCNet [61]). Our ROMANet framework [77] exploits the bank-level parallelism in the buffer to minimize latency and energy of the given buffer access requests, as shown in Fig. 5. Meanwhile, our DESCNet framework [61] searches for different on-chip memory architectures to reduce the energy consumption, and performs run-time memory management to power-gate the unnecessary memory blocks for non-memory-intensive operations. These optimizations provide up to 79% energy savings for CapsNet inference.

Fig. 5: Experimental results of the buffer access latency and energy across different optimization techniques and different networks. The ROMANet [77] effectively reduces the buffer latency and energy over the state-of-the-art (i.e., the Caffeine [105], SmartShuttle [49], and BWA [98] techniques).

The Compute Engine (Computational Units): The state-of-the-art HW-level optimization techniques (e.g., approximate computing) can provide significant area, performance and energy efficiency improvements, but at the cost of output quality degradation, which cannot be tolerated in safety-critical applications. Toward this, we proposed the concept of curable approximations in [20], which ensures minimal accuracy degradation by employing approximations in a way that approximation errors from one stage are compensated in the subsequent stage/s of the pipeline. When used for improving the efficiency of compute engine with cascaded processing elements (PEs), like the systolic array in the TPU [33], it reduces the Power-Delay Poduct (PDP) of the array by about 46% and 38% compared to the conventional and approximate systolic array design, respectively. To efficiently employ approximations in applications that can tolerate minor quality degradation, a systematic error analysis is necessary to identify the approximation knobs and the degree to which each type of approximation can be employed. Toward this, several methodologies have been proposed to analyze the error resilience of CNNs [18] and CapsNets (i.e., ReD-CaNe [62]). By modeling the effects of approximations, it is possible to identify the optimal approximate components (e.g., adders and multipliers) that offer the best accuracy-efficiency trade-off while meeting the user-defined constraints. Compared to having accurate hardware, an efficient design that employs a layer-wise selection of approximate multipliers can achieve 28% energy reduction [62]. Furthermore, to find the configurations that offer good accuracy-energy trade-offs, the ALWANN framework [68] performs a DSE with a multi-objective NSGA-II algorithm.

Run-time Optimizations: Several run-time power management techniques can be employed to further boost the efficiency, e.g., the run-time clock gating, power gating, and dynamic voltage and frequency scaling (DVFS) techniques. For instance, the DESCNet technique [61] partitions the SPM into multiple sectors, and performs sector-level power-gating based on the characteristics of CapsNet workload to get high energy savings at run time during inference. Compared to the standard memory designs, the application-driven memory organizations equipped with memory power management unit in the DESCNet can save up to 79% energy and 47% area.

Iii Improving Reliability and Security for Edge AI

Fig. 6: Overview of challenges for reliability and security aspects, and the respective solutions on different system layers.

The Edge AI systems need to continuously produce correct outputs under diverse operating conditions. This requirement is especially important for safety-critical applications such as medical data analysis and autonomous driving. There are mainly two categories of vulnerability issues that threaten the Edge AI: (1) reliability and (2) security. In this section, we discuss the state-of-the-art techniques for improving the reliability and security of Edge AI (see an overview in Fig. 6).

Iii-a Reliability Threats and Mitigation Techniques

Reliability threats may come from various sources like process variation, soft errors, and aging. They can manifest as permanent faults (faults that remain in the system permanently and do not disappear), transient faults (faults that occur once and can disappear), or performance degradation (e.g., in the form of delay/timing errors). To address these threats, conventional fault-mitigation techniques for VLSI can be employed, e.g., Dual Modular Redundancy (DMR) [99], Triple Modular Redundancy (TMR) [54], and Error Correction Code (ECC) [95]. However, these techniques incur huge overheads due to redundant hardware or execution. Hence, cost-effective techniques are required to mitigate the reliability threats in the Edge AI.

Permanent Faults: To mitigate permanent faults in DNN accelerators, recent works have proposed techniques like fault-aware pruning (FAP) [106] and fault-aware training (FAT) [106][40]. They aim at making DNNs resilient to the faults by incorporating the information of faults in the optimization/training process. These techniques usually require minor modifications at the hardware level (i.e., additional circuitry) to bypass/disconnect the faulty components, which results in minor run-time overheads. The key limitation of FAT is that it incurs a huge retraining cost, specifically for the cases in which retraining has to be performed for a large number of faulty chips. Moreover, FAT cannot be employed if the training dataset is not available to the user. To address these limitations, we proposed SalvageDNN [22] that enables us to mitigate permanent faults in DNN accelerators without retraining. It achieves this through a significance-driven fault-aware mapping (FAM) strategy, and shuffling of parameters at the software level to avoid additional memory operations. Techniques like FT-ClipAct [27] and Ranger [8]

employ range restriction functions to block large (abnormal) activation values using pre-computed thresholds. Range restriction is realized using clipped activation functions that map out of the range values to pre-specified values within the range that have the least impact on the output. FT-ClipAct 

[27] shows that such techniques can improve the accuracy of the VGG-16 by 68.92% (on average) at 10 fault rate compared to the no fault mitigation case.

Transient Faults (Soft Errors): Soft error rates have been increasing in HW systems [47]. To mitigate their negative impact, several techniques have been proposed [86, 70, 28, 56, 108, 8]. Some of these techniques only cover limited faults [28] and/or incur significant overheads [86][56]. For instance, techniques in [86] employ a separate network to detect the anomaly in the output. Other state-of-the-art techniques employ online SW-level range restriction functions, like Ranger [8] that rectifies the faulty outputs of DNN operations without re-computation by restricting the value ranges.

Aging: Aging may result in timing errors, and techniques like ThUnderVolt [107] and GreenTPU [72] can be employed for mitigating the effects of timing errors that occur in the computational units of DNN accelerators. Meanwhile, aging in the on-chip memory (6T-SRAM), one of the key component in DNN accelerators, has been addressed by techniques like the fixed aging balancing [43], adaptive aging balancing [88], and additional circuitry [91][103]. However, these techniques are designed for specific data distribution and/or applications, or require additional circuitry in each SRAM cell. To address this challenge, we proposed DNN-Life framework [23] that employs novel memory-write (and read) transducers to achieve an optimal duty-cycle at run time in each cell of the on-chip weight memory to mitigate NBTI aging.

Besides the HW-induced reliability threats (i.e., permanent faults, soft errors, and aging), other works have analyzed the resilience of DNNs against other threats (e.g., input noise). For instance, the FANNet methodology [69] analyzes the DNN noise tolerance using model checking techniques for formal analysis of DNNs under different ranges of input noise. The key idea is to investigate the impact of training bias on accuracy, and study the input node sensitivity under noise.

Iii-B Secure ML: Attacks and Defenses

Security threats may come from different types of attacks, such as side-channel attacks, data poisoning, and hardware intrusion. These attacks can cause confidence reduction in classification accuracy, random or targeted misclassification, and IP stealing. To systematically identify the possible security attacks and defense mechanisms for Edge AI, a threat model (which defines the capabilities and goals of the attacker under realistic assumptions) is required [19]. The attacks can be categorized based on the Edge AI design cycle, i.e., during training, HW design or implementation, and inference [19] (the overview is shown in Fig. 7).

  • [leftmargin=*]

  • Training: The attacker can manipulate the DNN model, training dataset or tools, to attack the system [73].

  • HW Implementation: The attacker can steal the DNN IP through side-channel attacks, or hardware intrusion [73].

  • Inference: The attacker can perform side-channel attacks for IP stealing, or manipulate the input data to achieve random or targeted misclassification [73].

Therefore, effective defense mechanisms are required to secure Edge AI from possible attacks. Toward this, both attacks and defenses need to be explored. In this section, we discuss different security attacks and some possible defenses (countermeasures) against these attacks.

Fig. 7: An overview of security threats (attacks) and in the training, HW design/implementation, and inference phases.

Data Poisoning/Manipulation: Data poisoning aims at producing incorrect output (i.e., misclassification), and it can be performed by adding crafted noise to the DNN inputs (i.e., training or test data). Toward this, SW-level methodologies (e.g., TrISec [38], FaDec [36], and CapsAttacks [63]) have been proposed to explore the impacts of different data poisoning attacks. For instance, TrISec [38]

generates imperceptible attack images as the test inputs by leveraging the backpropagation algorithm on the trained DNNs without knowledge of the training dataset. The generated attacks have close correlation and structural similarity index with the clean input, thereby making them difficult to notice in both subjective and objective tests. FaDec 

[36]

generates imperceptible decision-based attack images as the test inputs by employing fast estimation of the classification boundary and adversarial noise optimization. It results in a fast and imperceptible attack, i.e., 16x faster than the state-of-the-art decision-based attacks. Meanwhile, CapsAttacks 

[63] performs analysis to study the vulnerabilities of the CapsNet by adding perturbations to the test inputs. The results show that, compared to traditional DNNs with similar width and depth, the CapsNets are more robust to affine transformations and adversarial attacks. All these works demonstrated that DNNs are vulnerable to data poisoning attacks (which can be imperceptible), thereby the effective countermeasures are required. Previous works have proposed several SW-level defense mechanisms. One idea is to employ encryption for protecting the training data [26, 14, 13, 25]. Another idea is to employ noise filters, as the FadeML methodology [39] demonstrates that the existing adversarial attacks can be nullified using noise filters, like the Local Average with Neighborhood Pixels (LAP) and Local Average with Radius (LAR) techniques. Meanwhile, the QuSecNets methodology [37] employs quantization to eliminate the attacks in the input images. It has two quantization mechanisms, i.e., constant quantization, which quantizes the intensities of input pixels based on fixed quantization levels; and trainable quantization

, which learns the quantization levels during the training phase to provide a stronger protection. This technique increases the accuracy of CNNs by 50%-96% and by 10%-50% for the perturbed images from the MNIST and the CIFAR-10, respectively.

Side Channel Attacks: These attacks aim at extracting confidential information (e.g., for data sniffing and IP stealing) without interfering with the functionality or the operation of the devices by monitoring and manipulating the side channel parameters (e.g., timing, power, temperature, etc.). The potential countermeasures are the obfuscation techniques, which target at concealing or obscuring the functional behavior or specific information [29]. For instance, the processing HW can be designed so that the power signals of the operation are independent to the processed data values, thereby concealing the secret information [5]. Meanwhile, to protect the devices from timing attacks, designers can (1) randomize the execution delay of different operations, or (2) enforce the same execution delay for all operations, thereby obscuring the underlying operation [5].

Hardware Intrusion: HW intrusion means that the attacker inserts malware or trojan (typically in the form of circuitry modification) in the processing HW for performing attacks such as confidence reduction and misclassification. The potential countermeasures are the typical HW security techniques, like the built-in self-test (BIST) to verify the functionality of the processing HW, the side channel analysis-based monitoring [30, 31, 102] to detect and identify anomalous side channel signals, the formal method analysis to quickly and comprehensively analyze the behavior of the processing HW (e.g., using property checker [30], mathematical model [15], SAT solver [44], and SMT solver [35]).

Iv A Cross-Layer Framework for
Energy-Efficient and Robust Edge AI

To develop energy-efficient and robust Edge AI systems, different aspects related to performance and energy efficiency, reliability, and security should be collectively addressed. Toward this, we propose a cross-layer framework that combines different techniques from different layers of the computing stack for achieving energy-efficient and secure Edge AI systems (see the overview in Fig. 8). Our integrated framework employs the following steps.

Fig. 8: Overview of our cross-layer framework for energy-efficient and secure Edge AI systems.

DNN Model Creation with Secure Training: DNNs for Edge AI have to meet the design constraints (e.g., accuracy, memory, power, and energy). This can be achieved through two different ways, i.e., by employing (1) model compression through pruning [59] and quantization [57] of the pre-trained DNN model, and (2) multi-objective neural architecture search (NAS) similar to the APNAS [1] and NASCaps [60] frameworks. APNAS [1] searches for a model that has good accuracy and performance considering a systolic array-based DNN accelerator [21] through reinforcement learning. Meanwhile, NASCaps [60] optimizes the accuracy and the hardware efficiency of a given accelerator for CapsNet inference. To ensure that the generated model can be trusted, the training process should be protected from attacks. To do this, several countermeasures can be employed, e.g., by comparing the redundant trained models [50], by performing local training [53] to identify if the trained model has been attacked, or by encrypting the training dataset [13, 25, 26, 14] to remove data poisoning attacks (see

1
in Fig. 8).

Efficient Edge AI Design: Once a trusted model is generated, further performance and energy optimizations are performed (see

2
in Fig. 8). At design time, DRAM latency and energy can be improved using techniques like ROMANet [77] and DRMap [75]. Meanwhile, the buffer latency and energy can be optimized using ROMANet [77] and DESCNet [61], and the compute latency and energy can be optimized using approximation methodologies like CANN [20], ALWANN [68], and ReD-CaNe [62]. Moreover, efficiency gains of the systems can be improved at run time using run-time power management techniques like clock gating [42], power gating [61], and DVFS[42]. Furthermore, this step should ensure that the employed techniques do not violate the design specifications, thereby providing efficient Edge AI.

Resilient Edge AI Design: To improve the resiliency of Edge AI against the reliability threats, effective mitigation techniques are required (see

3
in Fig. 8). Toward this, the characteristics of DNN resiliency under the targeted reliability threats are evaluated. Recent works have studied the DNN resiliency in the presence of approximation errors [18][62] and permanent faults [22]. Based on this information, appropriate fault mitigation techniques can be identified and deployed. At design time, several techniques can be employed, such as fault-aware training (e.g., FAP [106] and FAT [40]), range restriction (e.g., FT-ClipAct [27]), and aging-aware timing error mitigation (e.g., ThUnderVolt [107] and GreenTPU [72]). Meanwhile, the fault-aware mapping (e.g., SalvageDNN [22]), the range restriction (e.g., Ranger [8]), online error monitoring and adaptive DVFS can be performed to improve the system’s resiliency at run time. Furthermore, this step needs to ensure that the employed techniques do not lead to any violation of the design constraints, thereby resulting in a resilient and energy-efficient Edge AI system.

Secure HW Design/Implementation: Since the HW side also has vulnerability issues, the HW design/implementation process should be protected. Toward this, the existing HW security techniques can be employed (see

4
in Fig. 8). For instance, the side-channel analysis-based monitoring [30, 31, 102] can monitor the side-channel signals that attackers can exploit. Then, we can leverage the information to devise defense mechanisms that block the exploitation. Another idea is to obscure the HW information from the attacker using obfuscation techniques [83][94]. The other techniques leverage the formal method-based analysis [30][15, 44, 35] to quickly identify all possible security vulnerabilities and the corresponding defense mechanisms. To evaluate the efficacy of the applied defense mechanisms, HW testing is performed. Furthermore, this step also needs to ensure that the employed defense techniques still meet the design constraints, thereby resulting in a secure HW design.

Secure Inference: Since the security attacks can also target the inference phase, a secure inference is required (see

5
in Fig. 8). Most of the attacks come in the form of data manipulation. Hence, we can perform data encryption to block the insertion of perturbations into the input data. Another idea is to mitigate the input data-based attacks by employing quantization-based defenses such as, QuSecNets [37] and by noise filters like in the FadeML methodology [39].

Note that all the proposed steps can jointly provide an end-to-end cross-layer framework that performs HW- and SW-level optimizations at the design-time and run-time. Our proposed framework ensures that the Edge AI systems have high performance and energy efficiency, while providing correct output under diverse reliability and security threats.

V Neuromorphic Research considering SNNs

SNNs are considered as the third generation of NN models, which employ spike-encoded information and computation [55]. Due to their bio-inspired operations, SNNs have a high potential to provide energy-efficient computation. Recent works have been actively exploring two research directions, i.e., SNNs with a localized learning rule like the spike-timing-dependent plasticity (STDP) [79], and SNNs obtained from DNN conversions [67].

V-a Improving the Energy Efficiency of SNNs

To improve the energy efficiency of SNNs, several HW- and SW-level optimizations have been proposed. For HW-level techniques, SNN accelerators have been designed, such as TrueNorth [2], SpiNNaker [71], PEASE [84], Loihi [9], and ODIN [12]. Recent work (i.e., the SparkXD framework [78]) optimizes the DRAM access latency and energy for SNN inference by employing the reduced-voltage DRAM operations and effective DRAM mapping, leading to DRAM energy saving by up to 40% (see Fig. 9). For SW-level techniques, the FSpiNN framework [79] improves the energy efficiency of SNN processing in the training (avg. 3.5x) and the inference (avg. 1.8x) through the optimization of neural operations and quantization, without accuracy loss (see Fig. 10). The Q-SpiNN [80]

explores different precision levels, rounding schemes, and quantization schemes (i.e., post- and in-training quantization) to maximize memory savings for both weights and neuron parameters (which occupy considerable amount of memory in the accelerator fabric). The other techniques target at mapping and running the SNN applications (e.g., DVS Gesture Recognition

[67] and Autonomous Cars [101]) on neuromorphic hardware (i.e., Loihi) to improve the energy efficiency of their processing compared to running them on conventional platforms (e.g., CPUs, GPUs). As shown in Fig. 11, the CarSNN [101] improves by 2% the N-CARS accuracy, compared to the related works, while consuming only 315 mW on the Loihi Neuromorphic Chip, thus making a step forward towards ultra-low power event-based vision for autonomous cars.

Fig. 9: (a) The DRAM energy in an SNN inference on MNIST incurred by an SNN with accurate DRAM (the Baseline) and a SparkXD-based SNN with approximate DRAM. (b) The speed-up achieved by the SparkXD.
Fig. 10: The FSpiNN improves the energy efficiency compared to the standard unsupervised SNN (Baseline) [10] and the SL-STDP [93] across different network sizes for both training and inference phases on the MNIST workload.
Fig. 11: The CarSNN [101], although being more energy-efficient, achieves higher accuracy for the N-CARS dataset than the related works like the HATS [92], Gabor-SNN [6], and HOTS [45] techniques.

V-B Improving the Reliability of SNNs

In recent years, the SNN reliability aspect starts gaining attention because it is crucial to ensure the functionality of SNN systems. Moreover, the reliability issues may come from various sources (e.g., manufacturing defects, optimization techniques, etc.). For instance, employing the reduced-voltage DRAM in SNN accelerators can offer energy savings, but at the cost of increased DRAM errors which may alter the weight values and reduce the accuracy. Toward this, the SparkXD framework [78] improves the SNN reliability (preserving the high accuracy) by incorporating the information of faults (i.e., fault map and fault rate) in the retraining process, i.e., so-called the fault-aware training (FAT). Furthermore, the ReSpawn framework [76] mitigates the negative impact of permanent and approximation-induced faults in the off-chip and on-chip memories of SNN HW accelerators through a cost-effective fault-aware mapping (FAM). It places the weight bits with higher significance on the non-faulty memory cells, which enhances the reliability of SNNs without retraining, and achieves up to 70% accuracy improvement from the baseline, as shown in Fig. 12. In this manner, the ReSpawn can also improve the yield and reduce the per-unit-cost of SNN chips. Besides the HW-induced faults, the SNN systems may encounter dynamically changing environments, which cause the offline-learned knowledge to obsolete at run-time. Toward this, the SpikeDyn framework [81] employs an unsupervised continual learning mechanism by leveraging the internal characteristics of neural dynamics and weight decay function to enable an online learning scenario.

Fig. 12: The ReSpawn maintains higher accuracy than the fault-aware training (FAT), across different network sizes and different fault rates in memories.

V-C Improving the Security of SNNs

Previous works have studied that SNNs are vulnerable to security attacks, like data poisoning attacks on traditional image classification datasets like the MNIST [64] and on event-based datasets [66], showing different behavior under attack, compared to the non-spiking DNNs. Furthermore, SNNs are also vulnerable to externally triggered bit-flip attacks. The experiments conducted in [100] show that only 4 bit-flips at the most sensitive weight memory cells are sufficient for fooling SNNs on the CIFAR10 dataset. Once these memory locations are found, the attacker can trigger the malicious hardware that generates bit-flips by inserting a specific pattern in the input images. To address the security problem, several defense techniques have been proposed. One technique is exploiting the structural network parameters, e.g., threshold voltage and time window, to improve the SNN robustness [11]. By fine-tuning such parameters, the SNNs can be up to 85% more robust than non-spiking DNNs. Meanwhile, the R-SNN methodology [65] employs noise filtering to remove the adversarial attacks in the DVS inputs. The experiments demonstrate that such noise filtering slightly affects the SNN outputs for clean event sequences, while a wide range of filter parameters can increase the robustness of the SNN under attack by up to 90%.

Vi Conclusion

The use of Edge AI and tinyML systems is expected to grow fast in the coming years. Therefore, ensuring their high energy efficiency and robustness is important. This paper provides an overview of challenges and potential solutions for improving performance, energy efficiency, and robustness (i.e., reliability and security) of Edge AI. It shows that HW/SW co-design and co-optimization techniques at the design- and run-time can be combined through a cross-layer framework to efficiently address these challenges.

Acknowledgments

This work was partly supported by Intel Corporation through Gift funding for the project ”Cost-Effective Dependability for Deep Neural Networks and Spiking Neural Networks”.

References

  • [1] P. Achararit, M. A. Hanif, R. V. W. Putra, M. Shafique, and Y. Hara-Azumi (2020) APNAS: accuracy-and-performance-aware neural architecture search for neural hardware accelerators. IEEE Access 8 (), pp. 165319–165334. External Links: Document Cited by: §II-A, §IV.
  • [2] F. Akopyan et al. (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE TCAD 34 (10), pp. 1537–1557. Cited by: §V-A.
  • [3] S. Anwar, K. Hwang, and W. Sung (2017-02) Structured pruning of deep convolutional neural networks. ACM JETC 13 (3). External Links: Document Cited by: §II-A, §II-B.
  • [4] R. C. Baumann (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE TDMR 5 (3). Cited by: 2nd item.
  • [5] S. Bhunia and M. Tehranipoor (2019) Chapter 8 - side-channel attacks. In Hardware Security, S. Bhunia and M. Tehranipoor (Eds.), pp. 193–218. External Links: ISBN 978-0-12-812477-2, Document Cited by: §III-B.
  • [6] A.C. Bovik, M. Clark, and W.S. Geisler (1990) Multichannel texture analysis using localized spatial filters. IEEE TPAMI 12 (1). Cited by: Fig. 11.
  • [7] M. Capra, B. Bussolino, A. Marchisio, G. Masera, M. Martina, and M. Shafique (2020) Hardware and software optimizations for accelerating deep neural networks: survey of current trends, challenges, and the road ahead. IEEE Access 8 (), pp. 225134–225180. External Links: Document Cited by: §I.
  • [8] Z. Chen, G. Li, and K. Pattabiraman (2021) A low-cost fault corrector for deep neural networks through range restriction. In Proc. of DSN, Cited by: §III-A, §III-A, §IV.
  • [9] M. Davies et al. (2018) Loihi: a neuromorphic manycore processor with on-chip learning. Ieee Micro 38 (1), pp. 82–99. Cited by: §V-A.
  • [10] P. U. Diehl and M. Cook (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. FNCOM 9, pp. 99. Cited by: Fig. 10.
  • [11] R. El-Allami et al. (2021) Securing deep spiking neural networks against adversarial attacks through inherent structural parameters. In Proc. of DATE, Vol. , pp. 774–779. External Links: Document Cited by: §V-C.
  • [12] C. Frenkel et al. (2018) A 0.086-mm

    12.7-pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28-nm cmos

    .
    IEEE TBCAS 13 (1), pp. 145–158. Cited by: §V-A.
  • [13] J. Gao, W. Wang, M. Zhang, G. Chen, H. Jagadish, G. Li, T. K. Ng, B. C. Ooi, S. Wang, and J. Zhou (2018) PANDA: facilitating usable ai development. arXiv preprint arXiv:1804.09997. Cited by: §III-B, §IV.
  • [14] F. González-Serrano, A. Amor-Martín, and J. Casamayón-Antón (2018) Supervised machine learning using encrypted training data. Int. J. of Information Security 17 (4), pp. 365–377. Cited by: §III-B, §IV.
  • [15] D. Gopinath et al. (2018) Deepsafe: a data-driven approach for assessing robustness of neural networks. In Proc. of ATVA, pp. 3–19. Cited by: §III-B, §IV.
  • [16] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan (2015) Deep learning with limited numerical precision. In Proc. of ICML, Cited by: §II-A, §II-B.
  • [17] S. Han, H. Mao, and W. J. Dally (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Cited by: §II-A, §II-B.
  • [18] M. A. Hanif, R. Hafiz, and M. Shafique (2018) Error resilience analysis for systematically employing approximate computing in convolutional neural networks. In Proc. of DATE, Vol. , pp. 913–916. External Links: Document Cited by: §II-B, §IV.
  • [19] M. A. Hanif, F. Khalid, R. V. W. Putra, S. Rehman, and M. Shafique (2018) Robust machine learning systems: reliability and security for deep neural networks. In Proc. of IOLTS, Vol. , pp. 257–260. External Links: Document Cited by: 3rd item, 4th item, §III-B.
  • [20] M. A. Hanif, F. Khalid, and M. Shafique (2019) CANN: curable approximations for high-performance deep neural network accelerators. In Proc. of DAC, Vol. , pp. 1–6. External Links: Document Cited by: §II-B, §IV.
  • [21] M. A. Hanif, R. V. W. Putra, M. Tanvir, R. Hafiz, S. Rehman, and M. Shafique (2018) MPNA: a massively-parallel neural array accelerator with dataflow optimization for convolutional neural networks. arXiv preprint arXiv:1810.12910. Cited by: §II-A, §IV.
  • [22] M. A. Hanif and M. Shafique (2020) Salvagednn: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping. RSTA 378 (2164). Cited by: §III-A, §IV.
  • [23] M. A. Hanif and M. Shafique (2021) DNN-life: an energy-efficient aging mitigation framework for improving the lifetime of on-chip weight memories in deep neural network hardware architectures. In Proc. of DATE, Vol. , pp. 729–734. External Links: Document Cited by: §III-A.
  • [24] Y. He, J. Lin, Z. Liu, H. Wang, L. Li, and S. Han (2018) Amc: automl for model compression and acceleration on mobile devices. In Proc. of ECCV, pp. 784–800. Cited by: §II-A, §II-B.
  • [25] E. Hesamifard, H. Takabi, M. Ghasemi, and R. N. Wright (2018) Privacy-preserving machine learning as a service.. Proc. Priv. Enhancing Technol. 2018 (3). Cited by: §III-B, §IV.
  • [26] E. Hesamifard, H. Takabi, and M. Ghasemi (2017) Cryptodl: deep neural networks over encrypted data. arXiv preprint arXiv:1711.05189. Cited by: §III-B, §IV.
  • [27] L. Hoang, M. A. Hanif, and M. Shafique (2020) FT-clipact: resilience analysis of deep neural networks and improving their fault tolerance using clipped activation. In Proc. of DATE, Vol. , pp. 1241–1246. External Links: Document Cited by: §III-A, §IV.
  • [28] S. Hong et al. (2019) Terminal brain damage: exposing the graceless degradation in deep neural networks under hardware fault attacks. In Proc. of USENIX, Cited by: §III-A.
  • [29] T. Hoque, R. S. Chakraborty, and S. Bhunia (2020) Hardware obfuscation and logic locking: a tutorial introduction. IEEE Design and Test 37 (3), pp. 59–77. External Links: Document Cited by: §III-B.
  • [30] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu (2017) Safety verification of deep neural networks. In Proc. of CAV, pp. 3–29. Cited by: §III-B, §IV.
  • [31] T. Hunt, C. Song, R. Shokri, V. Shmatikov, and E. Witchel (2018) Chiron: privacy-preserving machine learning as a service. arXiv preprint arXiv:1803.05961. Cited by: §III-B, §IV.
  • [32] B. Jacob et al. (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. of CVPR, Vol. . External Links: Document Cited by: §II-A, §II-B.
  • [33] N. P. Jouppi et al. (2017)

    In-datacenter performance analysis of a tensor processing unit

    .
    In Proc. of ISCA, pp. 1–12. Cited by: §II-A, §II-B.
  • [34] D. Kang, D. Kang, and S. Ha (2021) Multi-bank on-chip memory management techniques for cnn accelerators. IEEE TC. Cited by: §II-B.
  • [35] G. Katz and pthers (2017) Reluplex: an efficient smt solver for verifying deep neural networks. In Proc. of CAV, pp. 97–117. Cited by: §III-B, §IV.
  • [36] F. Khalid, H. Ali, M. Abdullah Hanif, S. Rehman, R. Ahmed, and M. Shafique (2020)

    FaDec: a fast decision-based attack for adversarial machine learning

    .
    In Proc. of IJCNN, Vol. , pp. 1–8. External Links: Document Cited by: §III-B.
  • [37] F. Khalid, H. Ali, H. Tariq, M. A. Hanif, S. Rehman, R. Ahmed, and M. Shafique (2019) QuSecNets: quantization-based defense mechanism for securing deep neural network against adversarial attacks. In Proc. of IOLTS, Vol. , pp. 182–187. External Links: Document Cited by: §III-B, §IV.
  • [38] F. Khalid, M. A. Hanif, S. Rehman, R. Ahmed, and M. Shafique (2019) TrISec: training data-unaware imperceptible security attacks on deep neural networks. In Proc. of IOLTS, Vol. , pp. 188–193. External Links: Document Cited by: §III-B.
  • [39] F. Khalid, M. A. Hanif, S. Rehman, J. Qadir, and M. Shafique (2019) FAdeML: understanding the impact of pre-processing noise filtering on adversarial machine learning. In Proc. of DATE, Vol. , pp. 902–907. External Links: Document Cited by: §III-B, §IV.
  • [40] S. Kim et al. (2018) Energy-efficient neural network acceleration in the presence of bit-level memory errors. IEEE TCASI 65 (12), pp. 4285–4298. Cited by: §III-A, §IV.
  • [41] Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu (2012) A case for exploiting subarray-level parallelism (salp) in dram. In Proc. of ISCA, Cited by: §II-B.
  • [42] F. Kriebel, S. Rehman, M. A. Hanif, F. Khalid, and M. Shafique (2018) Robustness for smart cyber physical systems and internet-of-things: from adaptive robustness methods to reliability and security for machine learning. In Proc. of ISVLSI, Vol. , pp. 581–586. External Links: Document Cited by: §I, §IV.
  • [43] S. V. Kumar, K. Kim, and S. S. Sapatnekar (2006) Impact of nbti on sram read stability and design for reliability. In Proc. of ISQED, Cited by: §III-A.
  • [44] L. Kuper et al. (2018) Toward scalable verification for safety-critical deep networks. arXiv preprint arXiv:1801.05950. Cited by: §III-B, §IV.
  • [45] X. Lagorce et al. (2017)

    HOTS: a hierarchy of event-based time-surfaces for pattern recognition

    .
    IEEE TPAMI 39 (7), pp. 1346–1359. Cited by: Fig. 11.
  • [46] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §I.
  • [47] G. Li et al. (2017) Understanding error propagation in deep learning neural network (dnn) accelerators and applications. In Proc. of SC, Cited by: §III-A.
  • [48] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710. Cited by: §II-A, §II-B.
  • [49] J. Li et al. (2018) SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In Proc. of DATE, pp. 343–348. Cited by: Fig. 3, Fig. 5, §II-B.
  • [50] T. Li et al. (2018) Outsourced privacy-preserving classification service over encrypted data. J. of Network and Computer Appl. 106. Cited by: §IV.
  • [51] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy (2018) Progressive neural architecture search. In Proc. of ECCV, pp. 19–34. Cited by: §II-A.
  • [52] H. Liu, K. Simonyan, and Y. Yang (2018) Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055. Cited by: §II-A.
  • [53] Y. Liu, J. Chen, and H. Chen (2018) Less is more: culling the training set to improve robustness of deep neural networks. In Proc. of GameSec, pp. 102–114. Cited by: §IV.
  • [54] R. E. Lyons and W. Vanderkulk (1962) The use of triple-modular redundancy to improve computer reliability. IBM J. Res. Dev. 6 (2). External Links: ISSN 0018-8646, Document Cited by: §III-A.
  • [55] W. Maass (1997) Networks of spiking neurons: the third generation of neural network models. Neural networks 10 (9), pp. 1659–1671. Cited by: §V.
  • [56] A. Mahmoud et al. (2020) HarDNN: feature map vulnerability evaluation in cnns. arXiv preprint arXiv:2002.09786. Cited by: §III-A.
  • [57] A. Marchisio, B. Bussolino, A. Colucci, M. Martina, G. Masera, and M. Shafique (2020) Q-capsnets: a specialized framework for quantizing capsule networks. In Proc. of DAC, Vol. , pp. 1–6. External Links: Document Cited by: §II-A, §II-B, §IV.
  • [58] A. Marchisio, M. A. Hanif, F. Khalid, G. Plastiras, C. Kyrkou, T. Theocharides, and M. Shafique (2019) Deep learning for edge computing: current trends, cross-layer optimizations, and open research challenges. In Proc. of ISVLSI, Vol. , pp. 553–559. External Links: Document Cited by: §I.
  • [59] A. Marchisio, M. A. Hanif, M. Martina, and M. Shafique (2018) PruNet: class-blind pruning method for deep neural networks. In Proc. of IJCNN, Vol. , pp. 1–8. External Links: Document Cited by: §II-A, §II-B, §IV.
  • [60] A. Marchisio, A. Massa, V. Mrazek, B. Bussolino, M. Martina, and M. Shafique (2020) NASCaps: a framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks. In Proc. of ICCAD, Vol. , pp. 1–9. External Links: Document Cited by: §II-A, §IV.
  • [61] A. Marchisio, V. Mrazek, M. A. Hanif, and M. Shafique (2020) DESCNet: developing efficient scratchpad memories for capsule network hardware. IEEE TCAD (), pp. 1–1. External Links: Document Cited by: 2nd item, §II-B, §II-B, §IV.
  • [62] A. Marchisio, V. Mrazek, M. A. Hanif, and M. Shafique (2020) ReD-cane: a systematic methodology for resilience analysis and design of capsule networks under approximations. In Proc. of DATE, Vol. . External Links: Document Cited by: §II-B, §IV, §IV.
  • [63] A. Marchisio, G. Nanfa, F. Khalid, M. A. Hanif, M. Martina, and M. Shafique (2019) CapsAttacks: robust and imperceptible adversarial attacks on capsule networks. CoRR abs/1901.09878. External Links: 1901.09878 Cited by: §III-B.
  • [64] A. Marchisio, G. Nanfa, F. Khalid, M. A. Hanif, M. Martina, and M. Shafique (2020) Is spiking secure? a comparative study on the security vulnerabilities of spiking and deep neural networks. In Proc. of IJCNN, Vol. , pp. 1–8. External Links: Document Cited by: §V-C.
  • [65] A. Marchisio et al. (2021) R-snn: an analysis and design methodology for robustifying spiking neural networks against adversarial attacks through noise filters for dynamic vision sensors. In Proc. of IROS, Cited by: §V-C.
  • [66] A. Marchisio, G. Pira, M. Martina, G. Masera, and M. Shafique (2021) DVS-attacks: adversarial attacks on dynamic vision sensors for spiking neural networks. In Proc. of IJCNN, Cited by: §V-C.
  • [67] R. Massa, A. Marchisio, M. Martina, and M. Shafique (2020) An efficient spiking neural network for recognizing gestures with a dvs camera on the loihi neuromorphic processor. In Proc. of IJCNN, Vol. , pp. 1–9. External Links: Document Cited by: §V-A, §V.
  • [68] V. Mrazek, Z. Vasicek, L. Sekanina, M. A. Hanif, and M. Shafique (2019) ALWANN: automatic layer-wise approximation of deep neural network accelerators without retraining. In Proc. ICCAD, Vol. , pp. 1–8. External Links: Document Cited by: §II-B, §IV.
  • [69] M. Naseer, M. F. Minhas, F. Khalid, M. A. Hanif, O. Hasan, and M. Shafique (2020) FANNet: formal analysis of noise tolerance, training bias and input sensitivity in neural networks. In Proc. of DATE, Vol. . External Links: Document Cited by: §III-A.
  • [70] E. Ozen and A. Orailoglu (2019) Sanity-check: boosting the reliability of safety-critical deep neural network applications. In Proc. of ATS, Vol. . External Links: Document Cited by: §III-A.
  • [71] E. Painkras et al. (2013) SpiNNaker: a 1-w 18-core system-on-chip for massively-parallel neural network simulation. IEEE JSSC 48 (8), pp. 1943–1953. Cited by: §V-A.
  • [72] P. Pandey et al. (2019) GreenTPU: improving timing error resilience of a near-threshold tensor processing unit. In Proc. of DAC, pp. 1–6. Cited by: §III-A, §IV.
  • [73] N. Papernot et al. (2016) Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814. Cited by: 1st item, 2nd item, 3rd item.
  • [74] H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean (2018) Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning, pp. 4095–4104. Cited by: §II-A.
  • [75] R. V. W. Putra, M. A. Hanif, and M. Shafique (2020) DRMap: a generic dram data mapping policy for energy-efficient processing of convolutional neural networks. In Proc. of DAC, Vol. , pp. 1–6. External Links: Document Cited by: Fig. 4, §II-B, §IV.
  • [76] R. V. W. Putra, M. A. Hanif, and M. Shafique (2021) ReSpawn: energy-efficient fault-tolerance for spiking neural networks considering unreliable memories. In Proc. of ICCAD, Vol. , pp. 1–8. Cited by: §V-B.
  • [77] R. V. W. Putra, M. A. Hanif, and M. Shafique (2021) ROMANet: fine-grained reuse-driven off-chip memory access management and data organization for deep neural network accelerators. IEEE TVLSI 29 (4), pp. 702–715. External Links: Document Cited by: 1st item, 2nd item, Fig. 3, Fig. 5, §II-B, §II-B, §IV.
  • [78] R. V. W. Putra, M. A. Hanif, and M. Shafique (2021) SparkXD: A framework for resilient and energy-efficient spiking neural network inference using approximate dram. In Proc. of DAC, Vol. , pp. 1–6. Cited by: §V-A, §V-B.
  • [79] R. V. W. Putra and M. Shafique (2020) FSpiNN: an optimization framework for memory-efficient and energy-efficient spiking neural networks. IEEE TCAD 39 (11), pp. 3601–3613. External Links: Document Cited by: §I, §V-A, §V.
  • [80] R. V. W. Putra and M. Shafique (2021) Q-spinn: a framework for quantizing spiking neural networks. In Proc. of IJCNN, Vol. , pp. 1–8. Cited by: §V-A.
  • [81] R. V. W. Putra and M. Shafique (2021) SpikeDyn: A framework for energy-efficient spiking neural networks with continual and unsupervised learning capabilities in dynamic environments. In Proc. of DAC, Vol. . Cited by: §V-B.
  • [82] B. Raghunathan, Y. Turakhia, S. Garg, and D. Marculescu (2013) Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors. In Proc. of DATE, pp. 39–44. Cited by: 1st item.
  • [83] I. Rosenberg, G. Sicard, and E. O. David (2017) DeepAPT: nation-state apt attribution using end-to-end deep neural networks. In Proc. of ICANN, pp. 91–99. Cited by: §IV.
  • [84] A. Roy, S. Venkataramani, N. Gala, S. Sen, K. Veezhinathan, and A. Raghunathan (2017) A programmable event-driven architecture for evaluating spiking neural networks. In Proc. of ISLPED, Cited by: §V-A.
  • [85] S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic routing between capsules. In Proc. of NIPS, pp. 3859–3869. Cited by: §II-A, §II.
  • [86] C. Schorn, A. Guntoro, and G. Ascheid (2018) Efficient on-line error detection and mitigation for deep neural network accelerators. In Proc. of SAFECOMP, pp. 205–219. Cited by: §III-A.
  • [87] M. Shafique, S. Garg, J. Henkel, and D. Marculescu (2014) The eda challenges in the dark silicon era: temperature, reliability, and variability perspectives. In Proc. of DAC, pp. 1–6. Cited by: 3rd item.
  • [88] M. Shafique, M. U. K. Khan, O. Tüfek, and J. Henkel (2015) EnAAM: energy-efficient anti-aging for on-chip video memories. In Proc. of DAC, Cited by: §III-A.
  • [89] M. Shafique, M. Naseer, T. Theocharides, C. Kyrkou, O. Mutlu, L. Orosa, and J. Choi (2020) Robust machine learning systems: challenges,current trends, perspectives, and the road ahead. IEEE Design and Test 37 (2), pp. 30–57. External Links: Document Cited by: §I.
  • [90] M. Shafique, T. Theocharides, C. Bouganis, M. A. Hanif, F. Khalid, R. Hafız, and S. Rehman (2018) An overview of next-generation architectures for machine learning: roadmap, opportunities and challenges in the iot era. In Proc. of DATE, pp. 827–832. Cited by: 4th item.
  • [91] T. Siddiqua and S. Gurumurthi (2011) Enhancing nbti recovery in sram arrays through recovery boosting. IEEE TVLSI 20 (4). Cited by: §III-A.
  • [92] A. Sironi et al. (2018) HATS: histograms of averaged time surfaces for robust event-based object classification. In Proc. of CVPR, Vol. . Cited by: Fig. 11.
  • [93] G. Srinivasan et al. (2017) Spike timing dependent plasticity based enhanced self-learning for efficient pattern recognition in spiking neural networks. In Proc. of IJCNN, Cited by: Fig. 10.
  • [94] Q. Sun et al. (2018) Natural and effective obfuscation by head inpainting. In Proc. of CVPR, pp. 5050–5059. Cited by: §IV.
  • [95] H. Y. Sze (2000-July 18) Circuit and method for rapid checking of error correction codes using cyclic redundancy check. Google Patents. Note: US Patent 6,092,231 Cited by: §III-A.
  • [96] V. Sze, Y. Chen, T. Yang, and J. S. Emer (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc. of the IEEE 105 (12), pp. 2295–2329. Cited by: §II-B.
  • [97] M. Tan et al. (2019) Mnasnet: platform-aware neural architecture search for mobile. In Proc. of CVPR, pp. 2820–2828. Cited by: §II-A.
  • [98] S. Tewari, A. Kumar, and K. Paul (2020) Bus width aware off-chip memory access minimization for cnn accelerators. In Proc. of ISVLSI, Cited by: Fig. 3, Fig. 5, §II-B.
  • [99] R. Vadlamani, J. Zhao, W. Burleson, and R. Tessier (2010) Multicore soft error rate stabilization using adaptive dual modular redundancy. In Proc. of DATE, Vol. , pp. 27–32. External Links: Document Cited by: §III-A.
  • [100] V. Venceslai et al. (2020) NeuroAttack: undermining spiking neural networks security through externally triggered bit-flips. In Proc. of IJCNN, Vol. . External Links: Document Cited by: §V-C.
  • [101] A. Viale, A. Marchisio, M. Martina, G. Masera, and M. Shafique (2021) CarSNN: an efficient spiking neural network for event-based autonomous cars on the loihi neuromorphic research processor. In Proc. of IJCNN, Cited by: Fig. 11, §V-A.
  • [102] L. Wei et al. (2018) I know what you see: power side-channel attack on convolutional neural network accelerators. In Proc. of ACSAC, Cited by: §III-B, §IV.
  • [103] B. Zatt et al. (2011) A low-power memory architecture with application-aware power management for motion & disparity estimation in multiview video coding. In Proc. of ICCAD, pp. 40–47. Cited by: §III-A.
  • [104] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proc. of FPGAs, Cited by: §II-B.
  • [105] C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE TCAD 38. Cited by: Fig. 3, Fig. 5, §II-B.
  • [106] J. J. Zhang, T. Gu, K. Basu, and S. Garg (2018) Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In Proc. of VTS, pp. 1–6. Cited by: §III-A, §IV.
  • [107] J. Zhang et al. (2018) Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proc. of DAC, pp. 1–6. Cited by: §III-A, §IV.
  • [108] K. Zhao et al. (2021) FT-cnn: algorithm-based fault tolerance for convolutional neural networks. IEEE TPDS 32 (7). External Links: Document Cited by: §III-A.
  • [109] B. Zoph and Q. V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §II-A.