The NN-based AI systems have achieved state-of-the-art accuracy for various applications such as image classification, object recognition, healthcare, automotive, and robotics . However, current trends show that the accuracy is improved at the cost of increasing complexity of NN models (e.g., larger model size and complex operations) . This increased complexity hinders the deployment of advanced NNs (DNNs and SNNs) on resource-constrained edge devices . Therefore, optimizations at different system layers (i.e., HW and SW) are necessary to enable the use of advanced NNs at the edge . Besides performance and energy efficiency, reliability and security aspects are also important to ensure the correct functionality under diverse operating conditions (e.g., in the presence of HW faults and security threats), especially for safety-critical applications like autonomous driving and healthcare . Therefore, the important design metrics for enabling Edge AI include performance (i.e., latency), energy efficiency, reliability, and security.
I-a Key Challenges for Energy-Efficient and Secure Edge AI
We introduce the key challenges for developing Edge AI systems in the following text (see Fig. 1 for an overview of the challenges).
Performance: Edge AI systems are expected to have high performance to provide real-time response. However, due to memory- and compute-intensive nature of NNs, achieving high performance is not trivial. Moreover, edge devices have limited compute and memory resources, which makes it challenging to map the full NN computations simultaneously on an accelerator fabric .
Energy Efficiency: Edge AI systems should also have high energy efficiency to ensure complete processing within a restricted energy budget, especially in the case of battery-powered devices. Therefore, the energy consumption in both the off-chip and on-chip parts should be minimized. The off-chip part includes the DRAM-based off-chip memory accesses , while the on-chip part includes (1) the on-chip memory accesses, and (2) the neural operations like multiply-and-accumulation (MAC) .
Reliability: Edge AI systems should produce correct outputs even in the presence of different types of reliability threats . The main reliability threats are as follows.
Process variations are the result of imprecisions in the fabrication process, as manufacturing billions of nano-scale transistors with identical electrical properties is difficult to impossible. This causes variations in the leakage power and frequency in the same chip, across different chips in the same wafer, and even across different wafers .
Soft errors are caused by high-energy particle strikes, manifest as bit-flips at the HW layer, and can propagate all the way to the application layer and may cause incorrect outputs .
Aging is gradual degradation of the processing circuits over time . It occurs due to physical phenomena like Hot Career Injection (HCI), Time-Dependent Dielectric Breakdown (TDDB), and Negative/Positive Bias Temperature Instability (NBTI/PBTI).
Security: Edge AI systems should offer high resilience against security vulnerabilities such as side channels and HW intrusions . Moreover, NN algorithms (e.g., DNNs) have other security vulnerabilities as well that can be exploited through data poisoning to cause confidence reduction or misclassification .
The above discussion highlights different possible challenges for developing energy-efficient and secure Edge AI systems. To address each challenge individually, various techniques have been proposed at different layers of the computing stack. However, systematic integration of the most effective techniques from both the hardware and software levels is important to achieve ultra-efficient and secure Edge AI.
I-B Our Contributions
In the light of the above discussion, the contributions of this paper are the following.
We present an overview of different challenges and state-of-the-art techniques for improving performance and energy efficiency of Edge AI systems (Section II).
We present an overview of different challenges and state-of-the-art techniques for reliability and security of Edge AI (Section III).
We present a cross-layer framework that systematically integrates the most effective techniques for improving the energy efficiency and robustness of Edge AI (Section IV).
We discuss the challenges and recent advances in neuromorphic computing considering SNNs (Section V).
Ii Performance and Energy-Efficient Edge AI
In the quest of achieving higher accuracy, the evolution of DNNs has seen a dramatic increase in the complexity with respect to model size and operations, i.e., from simple Multi-Layer Perceptron (MLP) to deep and complex networks like Convolutional Neural Networks (CNNs), Transformers, and Capsule Networks (CapsNet). Hence, the advanced DNNs require specialized hardware accelerators and optimization frameworks to enable efficient and real-time data processing at the edge. To address this, a significant amount of work has been carried out in the literature. In this section, we discuss different state-of-the-art techniques for improving performance and energy efficiency of Edge AI (see overview in Fig. 2).
Ii-a Optimizations for DNN Models
The edge platforms typically have limited memory and power/energy budgets, hence small-sized DNN models with limited number of operations are desired for Edge AI applications. Model compression techniques such as pruning (i.e., structured  or unstructured [48, 59, 17]) and quantization [17, 16, 32, 57] are considered to be highly effective for reducing the memory footprint of the models as well as for reducing the number of computations required per inference. Structured pruning  can achieve about 4x weight memory compression, while class-blind unstructured pruning (i.e., PruNet ) achieves up to 190x memory compression. Quantization when combined with pruning can further improve the compression rate. For instance, quantization in the Deep Compression  improves the compression rate by about 3x for the AlexNet and the VGG-16 models. The Q-CapsNets framework  shows that quantization is highly effective for complex DNNs such as CapsNets as well. It reduces the memory requirement of the CapsNet  by 6.2x with a negligible accuracy degradation of 0.15%. Since model compression techniques may result in a sub-optimal accuracy-efficiency trade-off (due to lack of information of the underlying hardware architecture used for DNN execution), HW-aware model generation and compression techniques have emerged as a potential solution. Many Neural Architecture Search (NAS) techniques [109, 52, 74, 51, 97, 1, 60] have been proposed to generate high accuracy and efficient models. The state-of-the-art NAS like the APNAS framework 
employs an analytical model and a reinforcement learning engine to quickly find DNNs with good accuracy-efficiency trade-offs for the targeted systolic array-based HW accelerators. It reduces the compute cycles by 53% on average with negligible accuracy degradation (avg. 3%) compared to the state-of-the-art techniques. Therefore, it is suitable for generating DNNs for resource-constrained applications. Meanwhile, the NASCaps framework  employs an NSGA-II algorithm to find Pareto-optimal DNN models by leveraging the trade-off between different hardware characteristics (i.e., memory, latency, and energy) of a given HW accelerator. Compared to manually designed state-of-the-art CapsNets (i.e., DeepCaps), the NASCaps achieves 79% latency reduction, 88% energy reduction, 63% memory reduction, with only 1% accuracy reduction.
Ii-B Optimizations for DNN Accelerators
To efficiently run the generated DNN models on accelerator fabric, optimizations should be applied across the HW architecture, i.e., in the off-chip memory, on-chip memory, and on-chip compute engine.
The Off-chip Memory (DRAM): The main challenges arise from the fact that a full DNN model usually cannot be mapped and processed at once on the accelerator fabric due to limited on-chip memory. Therefore, redundant accesses for the same data to DRAM are required, which restricts the systems from achieving high performance and energy efficiency gains, as DRAM access latency and energy are significantly higher than other operations . Toward this, previous works have proposed (1) model compression through pruning [17, 48, 59, 3, 24] and quantization [16, 32, 57], and (2) data partitioning and scheduling schemes [104, 105, 49, 98]. However, they do not study the impact of DRAM accesses which dominate the total system latency and energy, and do not minimize redundant accesses for overlapping data in convolutional operations. To address these limitations, several SW-level techniques have been proposed (the ROMANet  and DRMap  methodologies). Our ROMANet  minimizes the DRAM energy consumption through a design space exploration (DSE) that finds the most effective data partitioning and scheduling while considering redundant access optimization. It minimizes the average DRAM energy-per-access by avoiding row buffer conflicts and misses through an effective DRAM mapping, as shown in Fig. 3. Our DRMap  further improves the DRAM latency and energy for DNN processing considering different DRAM architectures such as the low-latency DRAM with subarray-level parallelism (i.e., SALP ). It employs a DSE with a generic DRAM data mapping policy that maximizes DRAM row buffer hits, bank- and subarray-level parallelism to obtain minimum energy-delay-product (EDP) of DRAM accesses for the given DRAM architectures and DNN data partitioning and scheduling (see Fig. 4).
The On-chip Memory (Buffer): To efficiently shuttle data between the DRAM and the on-chip fabric, specialized on-chip buffer design and access management are important. Here, the scratchpad memory (SPM) design is commonly used due to its low latency and power characteristics . For optimizing buffer access latency and energy, several SW-level techniques have been proposed (such as ROMANet  and DESCNet ). Our ROMANet framework  exploits the bank-level parallelism in the buffer to minimize latency and energy of the given buffer access requests, as shown in Fig. 5. Meanwhile, our DESCNet framework  searches for different on-chip memory architectures to reduce the energy consumption, and performs run-time memory management to power-gate the unnecessary memory blocks for non-memory-intensive operations. These optimizations provide up to 79% energy savings for CapsNet inference.
The Compute Engine (Computational Units): The state-of-the-art HW-level optimization techniques (e.g., approximate computing) can provide significant area, performance and energy efficiency improvements, but at the cost of output quality degradation, which cannot be tolerated in safety-critical applications. Toward this, we proposed the concept of curable approximations in , which ensures minimal accuracy degradation by employing approximations in a way that approximation errors from one stage are compensated in the subsequent stage/s of the pipeline. When used for improving the efficiency of compute engine with cascaded processing elements (PEs), like the systolic array in the TPU , it reduces the Power-Delay Poduct (PDP) of the array by about 46% and 38% compared to the conventional and approximate systolic array design, respectively. To efficiently employ approximations in applications that can tolerate minor quality degradation, a systematic error analysis is necessary to identify the approximation knobs and the degree to which each type of approximation can be employed. Toward this, several methodologies have been proposed to analyze the error resilience of CNNs  and CapsNets (i.e., ReD-CaNe ). By modeling the effects of approximations, it is possible to identify the optimal approximate components (e.g., adders and multipliers) that offer the best accuracy-efficiency trade-off while meeting the user-defined constraints. Compared to having accurate hardware, an efficient design that employs a layer-wise selection of approximate multipliers can achieve 28% energy reduction . Furthermore, to find the configurations that offer good accuracy-energy trade-offs, the ALWANN framework  performs a DSE with a multi-objective NSGA-II algorithm.
Run-time Optimizations: Several run-time power management techniques can be employed to further boost the efficiency, e.g., the run-time clock gating, power gating, and dynamic voltage and frequency scaling (DVFS) techniques. For instance, the DESCNet technique  partitions the SPM into multiple sectors, and performs sector-level power-gating based on the characteristics of CapsNet workload to get high energy savings at run time during inference. Compared to the standard memory designs, the application-driven memory organizations equipped with memory power management unit in the DESCNet can save up to 79% energy and 47% area.
Iii Improving Reliability and Security for Edge AI
The Edge AI systems need to continuously produce correct outputs under diverse operating conditions. This requirement is especially important for safety-critical applications such as medical data analysis and autonomous driving. There are mainly two categories of vulnerability issues that threaten the Edge AI: (1) reliability and (2) security. In this section, we discuss the state-of-the-art techniques for improving the reliability and security of Edge AI (see an overview in Fig. 6).
Iii-a Reliability Threats and Mitigation Techniques
Reliability threats may come from various sources like process variation, soft errors, and aging. They can manifest as permanent faults (faults that remain in the system permanently and do not disappear), transient faults (faults that occur once and can disappear), or performance degradation (e.g., in the form of delay/timing errors). To address these threats, conventional fault-mitigation techniques for VLSI can be employed, e.g., Dual Modular Redundancy (DMR) , Triple Modular Redundancy (TMR) , and Error Correction Code (ECC) . However, these techniques incur huge overheads due to redundant hardware or execution. Hence, cost-effective techniques are required to mitigate the reliability threats in the Edge AI.
Permanent Faults: To mitigate permanent faults in DNN accelerators, recent works have proposed techniques like fault-aware pruning (FAP)  and fault-aware training (FAT) . They aim at making DNNs resilient to the faults by incorporating the information of faults in the optimization/training process. These techniques usually require minor modifications at the hardware level (i.e., additional circuitry) to bypass/disconnect the faulty components, which results in minor run-time overheads. The key limitation of FAT is that it incurs a huge retraining cost, specifically for the cases in which retraining has to be performed for a large number of faulty chips. Moreover, FAT cannot be employed if the training dataset is not available to the user. To address these limitations, we proposed SalvageDNN  that enables us to mitigate permanent faults in DNN accelerators without retraining. It achieves this through a significance-driven fault-aware mapping (FAM) strategy, and shuffling of parameters at the software level to avoid additional memory operations. Techniques like FT-ClipAct  and Ranger 
employ range restriction functions to block large (abnormal) activation values using pre-computed thresholds. Range restriction is realized using clipped activation functions that map out of the range values to pre-specified values within the range that have the least impact on the output. FT-ClipAct shows that such techniques can improve the accuracy of the VGG-16 by 68.92% (on average) at 10 fault rate compared to the no fault mitigation case.
Transient Faults (Soft Errors): Soft error rates have been increasing in HW systems . To mitigate their negative impact, several techniques have been proposed [86, 70, 28, 56, 108, 8]. Some of these techniques only cover limited faults  and/or incur significant overheads . For instance, techniques in  employ a separate network to detect the anomaly in the output. Other state-of-the-art techniques employ online SW-level range restriction functions, like Ranger  that rectifies the faulty outputs of DNN operations without re-computation by restricting the value ranges.
Aging: Aging may result in timing errors, and techniques like ThUnderVolt  and GreenTPU  can be employed for mitigating the effects of timing errors that occur in the computational units of DNN accelerators. Meanwhile, aging in the on-chip memory (6T-SRAM), one of the key component in DNN accelerators, has been addressed by techniques like the fixed aging balancing , adaptive aging balancing , and additional circuitry . However, these techniques are designed for specific data distribution and/or applications, or require additional circuitry in each SRAM cell. To address this challenge, we proposed DNN-Life framework  that employs novel memory-write (and read) transducers to achieve an optimal duty-cycle at run time in each cell of the on-chip weight memory to mitigate NBTI aging.
Besides the HW-induced reliability threats (i.e., permanent faults, soft errors, and aging), other works have analyzed the resilience of DNNs against other threats (e.g., input noise). For instance, the FANNet methodology  analyzes the DNN noise tolerance using model checking techniques for formal analysis of DNNs under different ranges of input noise. The key idea is to investigate the impact of training bias on accuracy, and study the input node sensitivity under noise.
Iii-B Secure ML: Attacks and Defenses
Security threats may come from different types of attacks, such as side-channel attacks, data poisoning, and hardware intrusion. These attacks can cause confidence reduction in classification accuracy, random or targeted misclassification, and IP stealing. To systematically identify the possible security attacks and defense mechanisms for Edge AI, a threat model (which defines the capabilities and goals of the attacker under realistic assumptions) is required . The attacks can be categorized based on the Edge AI design cycle, i.e., during training, HW design or implementation, and inference  (the overview is shown in Fig. 7).
Training: The attacker can manipulate the DNN model, training dataset or tools, to attack the system .
HW Implementation: The attacker can steal the DNN IP through side-channel attacks, or hardware intrusion .
Inference: The attacker can perform side-channel attacks for IP stealing, or manipulate the input data to achieve random or targeted misclassification .
Therefore, effective defense mechanisms are required to secure Edge AI from possible attacks. Toward this, both attacks and defenses need to be explored. In this section, we discuss different security attacks and some possible defenses (countermeasures) against these attacks.
Data Poisoning/Manipulation: Data poisoning aims at producing incorrect output (i.e., misclassification), and it can be performed by adding crafted noise to the DNN inputs (i.e., training or test data). Toward this, SW-level methodologies (e.g., TrISec , FaDec , and CapsAttacks ) have been proposed to explore the impacts of different data poisoning attacks. For instance, TrISec 
generates imperceptible attack images as the test inputs by leveraging the backpropagation algorithm on the trained DNNs without knowledge of the training dataset. The generated attacks have close correlation and structural similarity index with the clean input, thereby making them difficult to notice in both subjective and objective tests. FaDec
generates imperceptible decision-based attack images as the test inputs by employing fast estimation of the classification boundary and adversarial noise optimization. It results in a fast and imperceptible attack, i.e., 16x faster than the state-of-the-art decision-based attacks. Meanwhile, CapsAttacks performs analysis to study the vulnerabilities of the CapsNet by adding perturbations to the test inputs. The results show that, compared to traditional DNNs with similar width and depth, the CapsNets are more robust to affine transformations and adversarial attacks. All these works demonstrated that DNNs are vulnerable to data poisoning attacks (which can be imperceptible), thereby the effective countermeasures are required. Previous works have proposed several SW-level defense mechanisms. One idea is to employ encryption for protecting the training data [26, 14, 13, 25]. Another idea is to employ noise filters, as the FadeML methodology  demonstrates that the existing adversarial attacks can be nullified using noise filters, like the Local Average with Neighborhood Pixels (LAP) and Local Average with Radius (LAR) techniques. Meanwhile, the QuSecNets methodology  employs quantization to eliminate the attacks in the input images. It has two quantization mechanisms, i.e., constant quantization, which quantizes the intensities of input pixels based on fixed quantization levels; and trainable quantization
, which learns the quantization levels during the training phase to provide a stronger protection. This technique increases the accuracy of CNNs by 50%-96% and by 10%-50% for the perturbed images from the MNIST and the CIFAR-10, respectively.
Side Channel Attacks: These attacks aim at extracting confidential information (e.g., for data sniffing and IP stealing) without interfering with the functionality or the operation of the devices by monitoring and manipulating the side channel parameters (e.g., timing, power, temperature, etc.). The potential countermeasures are the obfuscation techniques, which target at concealing or obscuring the functional behavior or specific information . For instance, the processing HW can be designed so that the power signals of the operation are independent to the processed data values, thereby concealing the secret information . Meanwhile, to protect the devices from timing attacks, designers can (1) randomize the execution delay of different operations, or (2) enforce the same execution delay for all operations, thereby obscuring the underlying operation .
Hardware Intrusion: HW intrusion means that the attacker inserts malware or trojan (typically in the form of circuitry modification) in the processing HW for performing attacks such as confidence reduction and misclassification. The potential countermeasures are the typical HW security techniques, like the built-in self-test (BIST) to verify the functionality of the processing HW, the side channel analysis-based monitoring [30, 31, 102] to detect and identify anomalous side channel signals, the formal method analysis to quickly and comprehensively analyze the behavior of the processing HW (e.g., using property checker , mathematical model , SAT solver , and SMT solver ).
Iv A Cross-Layer Framework for
Energy-Efficient and Robust Edge AI
To develop energy-efficient and robust Edge AI systems, different aspects related to performance and energy efficiency, reliability, and security should be collectively addressed. Toward this, we propose a cross-layer framework that combines different techniques from different layers of the computing stack for achieving energy-efficient and secure Edge AI systems (see the overview in Fig. 8). Our integrated framework employs the following steps.
DNN Model Creation with Secure Training: DNNs for Edge AI have to meet the design constraints (e.g., accuracy, memory, power, and energy). This can be achieved through two different ways, i.e., by employing (1) model compression through pruning  and quantization  of the pre-trained DNN model, and (2) multi-objective neural architecture search (NAS) similar to the APNAS  and NASCaps  frameworks. APNAS  searches for a model that has good accuracy and performance considering a systolic array-based DNN accelerator  through reinforcement learning. Meanwhile, NASCaps  optimizes the accuracy and the hardware efficiency of a given accelerator for CapsNet inference. To ensure that the generated model can be trusted, the training process should be protected from attacks. To do this, several countermeasures can be employed, e.g., by comparing the redundant trained models , by performing local training  to identify if the trained model has been attacked, or by encrypting the training dataset [13, 25, 26, 14] to remove data poisoning attacks (see in Fig. 8).
Efficient Edge AI Design: Once a trusted model is generated, further performance and energy optimizations are performed (see in Fig. 8). At design time, DRAM latency and energy can be improved using techniques like ROMANet  and DRMap . Meanwhile, the buffer latency and energy can be optimized using ROMANet  and DESCNet , and the compute latency and energy can be optimized using approximation methodologies like CANN , ALWANN , and ReD-CaNe . Moreover, efficiency gains of the systems can be improved at run time using run-time power management techniques like clock gating , power gating , and DVFS. Furthermore, this step should ensure that the employed techniques do not violate the design specifications, thereby providing efficient Edge AI.
Resilient Edge AI Design: To improve the resiliency of Edge AI against the reliability threats, effective mitigation techniques are required (see in Fig. 8). Toward this, the characteristics of DNN resiliency under the targeted reliability threats are evaluated. Recent works have studied the DNN resiliency in the presence of approximation errors  and permanent faults . Based on this information, appropriate fault mitigation techniques can be identified and deployed. At design time, several techniques can be employed, such as fault-aware training (e.g., FAP  and FAT ), range restriction (e.g., FT-ClipAct ), and aging-aware timing error mitigation (e.g., ThUnderVolt  and GreenTPU ). Meanwhile, the fault-aware mapping (e.g., SalvageDNN ), the range restriction (e.g., Ranger ), online error monitoring and adaptive DVFS can be performed to improve the system’s resiliency at run time. Furthermore, this step needs to ensure that the employed techniques do not lead to any violation of the design constraints, thereby resulting in a resilient and energy-efficient Edge AI system.
Secure HW Design/Implementation: Since the HW side also has vulnerability issues, the HW design/implementation process should be protected. Toward this, the existing HW security techniques can be employed (see in Fig. 8). For instance, the side-channel analysis-based monitoring [30, 31, 102] can monitor the side-channel signals that attackers can exploit. Then, we can leverage the information to devise defense mechanisms that block the exploitation. Another idea is to obscure the HW information from the attacker using obfuscation techniques . The other techniques leverage the formal method-based analysis [15, 44, 35] to quickly identify all possible security vulnerabilities and the corresponding defense mechanisms. To evaluate the efficacy of the applied defense mechanisms, HW testing is performed. Furthermore, this step also needs to ensure that the employed defense techniques still meet the design constraints, thereby resulting in a secure HW design.
Secure Inference: Since the security attacks can also target the inference phase, a secure inference is required (see in Fig. 8). Most of the attacks come in the form of data manipulation. Hence, we can perform data encryption to block the insertion of perturbations into the input data. Another idea is to mitigate the input data-based attacks by employing quantization-based defenses such as, QuSecNets  and by noise filters like in the FadeML methodology .
Note that all the proposed steps can jointly provide an end-to-end cross-layer framework that performs HW- and SW-level optimizations at the design-time and run-time. Our proposed framework ensures that the Edge AI systems have high performance and energy efficiency, while providing correct output under diverse reliability and security threats.
V Neuromorphic Research considering SNNs
SNNs are considered as the third generation of NN models, which employ spike-encoded information and computation . Due to their bio-inspired operations, SNNs have a high potential to provide energy-efficient computation. Recent works have been actively exploring two research directions, i.e., SNNs with a localized learning rule like the spike-timing-dependent plasticity (STDP) , and SNNs obtained from DNN conversions .
V-a Improving the Energy Efficiency of SNNs
To improve the energy efficiency of SNNs, several HW- and SW-level optimizations have been proposed. For HW-level techniques, SNN accelerators have been designed, such as TrueNorth , SpiNNaker , PEASE , Loihi , and ODIN . Recent work (i.e., the SparkXD framework ) optimizes the DRAM access latency and energy for SNN inference by employing the reduced-voltage DRAM operations and effective DRAM mapping, leading to DRAM energy saving by up to 40% (see Fig. 9). For SW-level techniques, the FSpiNN framework  improves the energy efficiency of SNN processing in the training (avg. 3.5x) and the inference (avg. 1.8x) through the optimization of neural operations and quantization, without accuracy loss (see Fig. 10). The Q-SpiNN 
explores different precision levels, rounding schemes, and quantization schemes (i.e., post- and in-training quantization) to maximize memory savings for both weights and neuron parameters (which occupy considerable amount of memory in the accelerator fabric). The other techniques target at mapping and running the SNN applications (e.g., DVS Gesture Recognition and Autonomous Cars ) on neuromorphic hardware (i.e., Loihi) to improve the energy efficiency of their processing compared to running them on conventional platforms (e.g., CPUs, GPUs). As shown in Fig. 11, the CarSNN  improves by 2% the N-CARS accuracy, compared to the related works, while consuming only 315 mW on the Loihi Neuromorphic Chip, thus making a step forward towards ultra-low power event-based vision for autonomous cars.
V-B Improving the Reliability of SNNs
In recent years, the SNN reliability aspect starts gaining attention because it is crucial to ensure the functionality of SNN systems. Moreover, the reliability issues may come from various sources (e.g., manufacturing defects, optimization techniques, etc.). For instance, employing the reduced-voltage DRAM in SNN accelerators can offer energy savings, but at the cost of increased DRAM errors which may alter the weight values and reduce the accuracy. Toward this, the SparkXD framework  improves the SNN reliability (preserving the high accuracy) by incorporating the information of faults (i.e., fault map and fault rate) in the retraining process, i.e., so-called the fault-aware training (FAT). Furthermore, the ReSpawn framework  mitigates the negative impact of permanent and approximation-induced faults in the off-chip and on-chip memories of SNN HW accelerators through a cost-effective fault-aware mapping (FAM). It places the weight bits with higher significance on the non-faulty memory cells, which enhances the reliability of SNNs without retraining, and achieves up to 70% accuracy improvement from the baseline, as shown in Fig. 12. In this manner, the ReSpawn can also improve the yield and reduce the per-unit-cost of SNN chips. Besides the HW-induced faults, the SNN systems may encounter dynamically changing environments, which cause the offline-learned knowledge to obsolete at run-time. Toward this, the SpikeDyn framework  employs an unsupervised continual learning mechanism by leveraging the internal characteristics of neural dynamics and weight decay function to enable an online learning scenario.
V-C Improving the Security of SNNs
Previous works have studied that SNNs are vulnerable to security attacks, like data poisoning attacks on traditional image classification datasets like the MNIST  and on event-based datasets , showing different behavior under attack, compared to the non-spiking DNNs. Furthermore, SNNs are also vulnerable to externally triggered bit-flip attacks. The experiments conducted in  show that only 4 bit-flips at the most sensitive weight memory cells are sufficient for fooling SNNs on the CIFAR10 dataset. Once these memory locations are found, the attacker can trigger the malicious hardware that generates bit-flips by inserting a specific pattern in the input images. To address the security problem, several defense techniques have been proposed. One technique is exploiting the structural network parameters, e.g., threshold voltage and time window, to improve the SNN robustness . By fine-tuning such parameters, the SNNs can be up to 85% more robust than non-spiking DNNs. Meanwhile, the R-SNN methodology  employs noise filtering to remove the adversarial attacks in the DVS inputs. The experiments demonstrate that such noise filtering slightly affects the SNN outputs for clean event sequences, while a wide range of filter parameters can increase the robustness of the SNN under attack by up to 90%.
The use of Edge AI and tinyML systems is expected to grow fast in the coming years. Therefore, ensuring their high energy efficiency and robustness is important. This paper provides an overview of challenges and potential solutions for improving performance, energy efficiency, and robustness (i.e., reliability and security) of Edge AI. It shows that HW/SW co-design and co-optimization techniques at the design- and run-time can be combined through a cross-layer framework to efficiently address these challenges.
This work was partly supported by Intel Corporation through Gift funding for the project ”Cost-Effective Dependability for Deep Neural Networks and Spiking Neural Networks”.
-  (2020) APNAS: accuracy-and-performance-aware neural architecture search for neural hardware accelerators. IEEE Access 8 (), pp. 165319–165334. External Links: Cited by: §II-A, §IV.
-  (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE TCAD 34 (10), pp. 1537–1557. Cited by: §V-A.
-  (2017-02) Structured pruning of deep convolutional neural networks. ACM JETC 13 (3). External Links: Cited by: §II-A, §II-B.
-  (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE TDMR 5 (3). Cited by: 2nd item.
-  (2019) Chapter 8 - side-channel attacks. In Hardware Security, S. Bhunia and M. Tehranipoor (Eds.), pp. 193–218. External Links: Cited by: §III-B.
-  (1990) Multichannel texture analysis using localized spatial filters. IEEE TPAMI 12 (1). Cited by: Fig. 11.
-  (2020) Hardware and software optimizations for accelerating deep neural networks: survey of current trends, challenges, and the road ahead. IEEE Access 8 (), pp. 225134–225180. External Links: Cited by: §I.
-  (2021) A low-cost fault corrector for deep neural networks through range restriction. In Proc. of DSN, Cited by: §III-A, §III-A, §IV.
-  (2018) Loihi: a neuromorphic manycore processor with on-chip learning. Ieee Micro 38 (1), pp. 82–99. Cited by: §V-A.
-  (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. FNCOM 9, pp. 99. Cited by: Fig. 10.
-  (2021) Securing deep spiking neural networks against adversarial attacks through inherent structural parameters. In Proc. of DATE, Vol. , pp. 774–779. External Links: Cited by: §V-C.
12.7-pj/sop 64k-synapse 256-neuron online-learning digital spiking neuromorphic processor in 28-nm cmos. IEEE TBCAS 13 (1), pp. 145–158. Cited by: §V-A.
-  (2018) PANDA: facilitating usable ai development. arXiv preprint arXiv:1804.09997. Cited by: §III-B, §IV.
-  (2018) Supervised machine learning using encrypted training data. Int. J. of Information Security 17 (4), pp. 365–377. Cited by: §III-B, §IV.
-  (2018) Deepsafe: a data-driven approach for assessing robustness of neural networks. In Proc. of ATVA, pp. 3–19. Cited by: §III-B, §IV.
-  (2015) Deep learning with limited numerical precision. In Proc. of ICML, Cited by: §II-A, §II-B.
-  (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Cited by: §II-A, §II-B.
-  (2018) Error resilience analysis for systematically employing approximate computing in convolutional neural networks. In Proc. of DATE, Vol. , pp. 913–916. External Links: Cited by: §II-B, §IV.
-  (2018) Robust machine learning systems: reliability and security for deep neural networks. In Proc. of IOLTS, Vol. , pp. 257–260. External Links: Cited by: 3rd item, 4th item, §III-B.
-  (2019) CANN: curable approximations for high-performance deep neural network accelerators. In Proc. of DAC, Vol. , pp. 1–6. External Links: Cited by: §II-B, §IV.
-  (2018) MPNA: a massively-parallel neural array accelerator with dataflow optimization for convolutional neural networks. arXiv preprint arXiv:1810.12910. Cited by: §II-A, §IV.
-  (2020) Salvagednn: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping. RSTA 378 (2164). Cited by: §III-A, §IV.
-  (2021) DNN-life: an energy-efficient aging mitigation framework for improving the lifetime of on-chip weight memories in deep neural network hardware architectures. In Proc. of DATE, Vol. , pp. 729–734. External Links: Cited by: §III-A.
-  (2018) Amc: automl for model compression and acceleration on mobile devices. In Proc. of ECCV, pp. 784–800. Cited by: §II-A, §II-B.
-  (2018) Privacy-preserving machine learning as a service.. Proc. Priv. Enhancing Technol. 2018 (3). Cited by: §III-B, §IV.
-  (2017) Cryptodl: deep neural networks over encrypted data. arXiv preprint arXiv:1711.05189. Cited by: §III-B, §IV.
-  (2020) FT-clipact: resilience analysis of deep neural networks and improving their fault tolerance using clipped activation. In Proc. of DATE, Vol. , pp. 1241–1246. External Links: Cited by: §III-A, §IV.
-  (2019) Terminal brain damage: exposing the graceless degradation in deep neural networks under hardware fault attacks. In Proc. of USENIX, Cited by: §III-A.
-  (2020) Hardware obfuscation and logic locking: a tutorial introduction. IEEE Design and Test 37 (3), pp. 59–77. External Links: Cited by: §III-B.
-  (2017) Safety verification of deep neural networks. In Proc. of CAV, pp. 3–29. Cited by: §III-B, §IV.
-  (2018) Chiron: privacy-preserving machine learning as a service. arXiv preprint arXiv:1803.05961. Cited by: §III-B, §IV.
-  (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. of CVPR, Vol. . External Links: Cited by: §II-A, §II-B.
In-datacenter performance analysis of a tensor processing unit. In Proc. of ISCA, pp. 1–12. Cited by: §II-A, §II-B.
-  (2021) Multi-bank on-chip memory management techniques for cnn accelerators. IEEE TC. Cited by: §II-B.
-  (2017) Reluplex: an efficient smt solver for verifying deep neural networks. In Proc. of CAV, pp. 97–117. Cited by: §III-B, §IV.
FaDec: a fast decision-based attack for adversarial machine learning. In Proc. of IJCNN, Vol. , pp. 1–8. External Links: Cited by: §III-B.
-  (2019) QuSecNets: quantization-based defense mechanism for securing deep neural network against adversarial attacks. In Proc. of IOLTS, Vol. , pp. 182–187. External Links: Cited by: §III-B, §IV.
-  (2019) TrISec: training data-unaware imperceptible security attacks on deep neural networks. In Proc. of IOLTS, Vol. , pp. 188–193. External Links: Cited by: §III-B.
-  (2019) FAdeML: understanding the impact of pre-processing noise filtering on adversarial machine learning. In Proc. of DATE, Vol. , pp. 902–907. External Links: Cited by: §III-B, §IV.
-  (2018) Energy-efficient neural network acceleration in the presence of bit-level memory errors. IEEE TCASI 65 (12), pp. 4285–4298. Cited by: §III-A, §IV.
-  (2012) A case for exploiting subarray-level parallelism (salp) in dram. In Proc. of ISCA, Cited by: §II-B.
-  (2018) Robustness for smart cyber physical systems and internet-of-things: from adaptive robustness methods to reliability and security for machine learning. In Proc. of ISVLSI, Vol. , pp. 581–586. External Links: Cited by: §I, §IV.
-  (2006) Impact of nbti on sram read stability and design for reliability. In Proc. of ISQED, Cited by: §III-A.
-  (2018) Toward scalable verification for safety-critical deep networks. arXiv preprint arXiv:1801.05950. Cited by: §III-B, §IV.
HOTS: a hierarchy of event-based time-surfaces for pattern recognition. IEEE TPAMI 39 (7), pp. 1346–1359. Cited by: Fig. 11.
-  (2015) Deep learning. nature 521 (7553), pp. 436–444. Cited by: §I.
-  (2017) Understanding error propagation in deep learning neural network (dnn) accelerators and applications. In Proc. of SC, Cited by: §III-A.
-  (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710. Cited by: §II-A, §II-B.
-  (2018) SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In Proc. of DATE, pp. 343–348. Cited by: Fig. 3, Fig. 5, §II-B.
-  (2018) Outsourced privacy-preserving classification service over encrypted data. J. of Network and Computer Appl. 106. Cited by: §IV.
-  (2018) Progressive neural architecture search. In Proc. of ECCV, pp. 19–34. Cited by: §II-A.
-  (2018) Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055. Cited by: §II-A.
-  (2018) Less is more: culling the training set to improve robustness of deep neural networks. In Proc. of GameSec, pp. 102–114. Cited by: §IV.
-  (1962) The use of triple-modular redundancy to improve computer reliability. IBM J. Res. Dev. 6 (2). External Links: Cited by: §III-A.
-  (1997) Networks of spiking neurons: the third generation of neural network models. Neural networks 10 (9), pp. 1659–1671. Cited by: §V.
-  (2020) HarDNN: feature map vulnerability evaluation in cnns. arXiv preprint arXiv:2002.09786. Cited by: §III-A.
-  (2020) Q-capsnets: a specialized framework for quantizing capsule networks. In Proc. of DAC, Vol. , pp. 1–6. External Links: Cited by: §II-A, §II-B, §IV.
-  (2019) Deep learning for edge computing: current trends, cross-layer optimizations, and open research challenges. In Proc. of ISVLSI, Vol. , pp. 553–559. External Links: Cited by: §I.
-  (2018) PruNet: class-blind pruning method for deep neural networks. In Proc. of IJCNN, Vol. , pp. 1–8. External Links: Cited by: §II-A, §II-B, §IV.
-  (2020) NASCaps: a framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks. In Proc. of ICCAD, Vol. , pp. 1–9. External Links: Cited by: §II-A, §IV.
-  (2020) DESCNet: developing efficient scratchpad memories for capsule network hardware. IEEE TCAD (), pp. 1–1. External Links: Cited by: 2nd item, §II-B, §II-B, §IV.
-  (2020) ReD-cane: a systematic methodology for resilience analysis and design of capsule networks under approximations. In Proc. of DATE, Vol. . External Links: Cited by: §II-B, §IV, §IV.
-  (2019) CapsAttacks: robust and imperceptible adversarial attacks on capsule networks. CoRR abs/1901.09878. External Links: Cited by: §III-B.
-  (2020) Is spiking secure? a comparative study on the security vulnerabilities of spiking and deep neural networks. In Proc. of IJCNN, Vol. , pp. 1–8. External Links: Cited by: §V-C.
-  (2021) R-snn: an analysis and design methodology for robustifying spiking neural networks against adversarial attacks through noise filters for dynamic vision sensors. In Proc. of IROS, Cited by: §V-C.
-  (2021) DVS-attacks: adversarial attacks on dynamic vision sensors for spiking neural networks. In Proc. of IJCNN, Cited by: §V-C.
-  (2020) An efficient spiking neural network for recognizing gestures with a dvs camera on the loihi neuromorphic processor. In Proc. of IJCNN, Vol. , pp. 1–9. External Links: Cited by: §V-A, §V.
-  (2019) ALWANN: automatic layer-wise approximation of deep neural network accelerators without retraining. In Proc. ICCAD, Vol. , pp. 1–8. External Links: Cited by: §II-B, §IV.
-  (2020) FANNet: formal analysis of noise tolerance, training bias and input sensitivity in neural networks. In Proc. of DATE, Vol. . External Links: Cited by: §III-A.
-  (2019) Sanity-check: boosting the reliability of safety-critical deep neural network applications. In Proc. of ATS, Vol. . External Links: Cited by: §III-A.
-  (2013) SpiNNaker: a 1-w 18-core system-on-chip for massively-parallel neural network simulation. IEEE JSSC 48 (8), pp. 1943–1953. Cited by: §V-A.
-  (2019) GreenTPU: improving timing error resilience of a near-threshold tensor processing unit. In Proc. of DAC, pp. 1–6. Cited by: §III-A, §IV.
-  (2016) Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814. Cited by: 1st item, 2nd item, 3rd item.
-  (2018) Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning, pp. 4095–4104. Cited by: §II-A.
-  (2020) DRMap: a generic dram data mapping policy for energy-efficient processing of convolutional neural networks. In Proc. of DAC, Vol. , pp. 1–6. External Links: Cited by: Fig. 4, §II-B, §IV.
-  (2021) ReSpawn: energy-efficient fault-tolerance for spiking neural networks considering unreliable memories. In Proc. of ICCAD, Vol. , pp. 1–8. Cited by: §V-B.
-  (2021) ROMANet: fine-grained reuse-driven off-chip memory access management and data organization for deep neural network accelerators. IEEE TVLSI 29 (4), pp. 702–715. External Links: Cited by: 1st item, 2nd item, Fig. 3, Fig. 5, §II-B, §II-B, §IV.
-  (2021) SparkXD: A framework for resilient and energy-efficient spiking neural network inference using approximate dram. In Proc. of DAC, Vol. , pp. 1–6. Cited by: §V-A, §V-B.
-  (2020) FSpiNN: an optimization framework for memory-efficient and energy-efficient spiking neural networks. IEEE TCAD 39 (11), pp. 3601–3613. External Links: Cited by: §I, §V-A, §V.
-  (2021) Q-spinn: a framework for quantizing spiking neural networks. In Proc. of IJCNN, Vol. , pp. 1–8. Cited by: §V-A.
-  (2021) SpikeDyn: A framework for energy-efficient spiking neural networks with continual and unsupervised learning capabilities in dynamic environments. In Proc. of DAC, Vol. . Cited by: §V-B.
-  (2013) Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors. In Proc. of DATE, pp. 39–44. Cited by: 1st item.
-  (2017) DeepAPT: nation-state apt attribution using end-to-end deep neural networks. In Proc. of ICANN, pp. 91–99. Cited by: §IV.
-  (2017) A programmable event-driven architecture for evaluating spiking neural networks. In Proc. of ISLPED, Cited by: §V-A.
-  (2017) Dynamic routing between capsules. In Proc. of NIPS, pp. 3859–3869. Cited by: §II-A, §II.
-  (2018) Efficient on-line error detection and mitigation for deep neural network accelerators. In Proc. of SAFECOMP, pp. 205–219. Cited by: §III-A.
-  (2014) The eda challenges in the dark silicon era: temperature, reliability, and variability perspectives. In Proc. of DAC, pp. 1–6. Cited by: 3rd item.
-  (2015) EnAAM: energy-efficient anti-aging for on-chip video memories. In Proc. of DAC, Cited by: §III-A.
-  (2020) Robust machine learning systems: challenges,current trends, perspectives, and the road ahead. IEEE Design and Test 37 (2), pp. 30–57. External Links: Cited by: §I.
-  (2018) An overview of next-generation architectures for machine learning: roadmap, opportunities and challenges in the iot era. In Proc. of DATE, pp. 827–832. Cited by: 4th item.
-  (2011) Enhancing nbti recovery in sram arrays through recovery boosting. IEEE TVLSI 20 (4). Cited by: §III-A.
-  (2018) HATS: histograms of averaged time surfaces for robust event-based object classification. In Proc. of CVPR, Vol. . Cited by: Fig. 11.
-  (2017) Spike timing dependent plasticity based enhanced self-learning for efficient pattern recognition in spiking neural networks. In Proc. of IJCNN, Cited by: Fig. 10.
-  (2018) Natural and effective obfuscation by head inpainting. In Proc. of CVPR, pp. 5050–5059. Cited by: §IV.
-  (2000-July 18) Circuit and method for rapid checking of error correction codes using cyclic redundancy check. Google Patents. Note: US Patent 6,092,231 Cited by: §III-A.
-  (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc. of the IEEE 105 (12), pp. 2295–2329. Cited by: §II-B.
-  (2019) Mnasnet: platform-aware neural architecture search for mobile. In Proc. of CVPR, pp. 2820–2828. Cited by: §II-A.
-  (2020) Bus width aware off-chip memory access minimization for cnn accelerators. In Proc. of ISVLSI, Cited by: Fig. 3, Fig. 5, §II-B.
-  (2010) Multicore soft error rate stabilization using adaptive dual modular redundancy. In Proc. of DATE, Vol. , pp. 27–32. External Links: Cited by: §III-A.
-  (2020) NeuroAttack: undermining spiking neural networks security through externally triggered bit-flips. In Proc. of IJCNN, Vol. . External Links: Cited by: §V-C.
-  (2021) CarSNN: an efficient spiking neural network for event-based autonomous cars on the loihi neuromorphic research processor. In Proc. of IJCNN, Cited by: Fig. 11, §V-A.
-  (2018) I know what you see: power side-channel attack on convolutional neural network accelerators. In Proc. of ACSAC, Cited by: §III-B, §IV.
-  (2011) A low-power memory architecture with application-aware power management for motion & disparity estimation in multiview video coding. In Proc. of ICCAD, pp. 40–47. Cited by: §III-A.
-  (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proc. of FPGAs, Cited by: §II-B.
-  (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE TCAD 38. Cited by: Fig. 3, Fig. 5, §II-B.
-  (2018) Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In Proc. of VTS, pp. 1–6. Cited by: §III-A, §IV.
-  (2018) Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proc. of DAC, pp. 1–6. Cited by: §III-A, §IV.
-  (2021) FT-cnn: algorithm-based fault tolerance for convolutional neural networks. IEEE TPDS 32 (7). External Links: Cited by: §III-A.
-  (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §II-A.