1 Introduction
The explosive growth of processing requirements of datadriven applications has resulted in intensive research efforts for alternative computing architectures that are more energyefficient than traditional von Neumann processors. Unlike the dominant Deep Neural Networks (DNNs) which rely on realvalued information encoding, Spiking Neural Networks (SNNs) communicate through discrete and sparse tokens in time called spikes, mimicking the operation of the brain and are hence projected to be ideal candidates for realizing energyefficient hardware platforms for artificial intelligence applications. Moreover, SNNs are also ideal for realtime applications as they take advantage of the temporal dimension for data encoding and processing. However, SNNs lag behind DNNs in terms of computational capability demonstrations due to the current lack of efficient learning algorithms
Nandakumar and others (2018). Backpropagation techniques are ubiquitously adopted to train DNNs, but the discontinuous nature of spikes makes it nontrivial to derive such gradientbased rules for SNNs
Roy et al. (2019). Here we explore a probabilistic framework for SNNs, which defines the output of spiking neurons as jointly distributed binary random processes. Such definitions help in applying maximum likelihood criteria and then derive flexible learning rules without requiring backpropagation mechanisms, conversions or other approximations from pretrained ANNs
Jang et al. (2019).Conventional neural networks have millions of trainable parameters and are trained using von Neumann machines, where the memory and computation units are physically separated. The performance of these implementations is typically limited by the “von Neumann bottleneck” caused by the constant transfer of data between the processor and the memory. SNN implementations on these platforms becomes inherently slow due to the need to access data over time in order to carry out temporal processing. Hence, hardware accelerators are necessary to exploit the full potential of SNNs and also to develop efficient learning algorithms. Here, we discuss a hardware accelerator designed for implementing probabilistic SNNs, that uses binary STTRAM devices for realizing synapses and digital CMOS for neuronal computations. We also evaluate the performance of probabilistic SNNs against equivalent ANNs on standard benchmark datasets.
2 Related Work
In an effort to build largescale neuromorphic computing systems that can emulate the energy efficiency of the human brain, several computing platforms have implemented SNNs. While Tijanic chip Pei and others (2019), Intel’s Loihi Davies and others (2018) and IBM’s TrueNorth Merolla et al. (2014) realize spiking neurons and make use of static random access memory (SRAM) to store the state of synapses and neurons, recent research efforts have proposed the use of tiled crossbar arrays of two terminal nanoscale devices for implementing largescale DNN systems Kuzum et al. (2013); Gokmen and Vlasov (2016); Ambrogio and others (2018); Ankit et al. (2019)
. Though analog memristor based neural network accelerators are estimated to provide higher throughput than GPUs, there are several challenges associated with these inmemory computing solutions
Gokmen and Vlasov (2016); Ankit et al. (2019); Shafiee et al. (2016); Chi et al. (2016). For instance, programming variability and stochasticity of device conductances as well as the need for accurate and area/powerhungry digital to analog converters (DAC) at the input and analog to digital converters (ADC) at the output, along with the additive noise contributed by these peripheral logic circuits pose significant challenges Ambrogio and others (2018); Babu et al. (2018); Kulkarni et al. (2019).Among the emerging nanoscale memory devices, Spin Transfer Torque RAM (STTRAM) has been explored for implementing synaptic weights for ANNs as well as SNNs owing to their fast read/write characteristics, high endurance, and scalability Locatelli and others (2015); Wang and others (2015) in several previous studies Kulkarni et al. (2019); Sengupta et al. (2017); Kulkarni et al. (2020); Fukami,Shunsuke and Ohno,Hideo (2018). The stochastic nature of the spintronic devices has been leveraged to model neural transfer functions in a crossbar architecture Sengupta et al. (2016). These designs for SNNs have shown over energy improvements over conventional CMOS designs. Notably, an allspin neuromorphic processor design for SNNs comprising of spintronic synapses and neurons with inmemory computing architecture has been projected to give higher energy efficiency and speed up compared to equivalent CMOS implementations Sengupta et al. (2017). In contrast to these prior implementations using spintronic devices, we consider STTRAM device as a binary storage unit, minimising the area/power and precision requirements of peripheral logic circuits consisting of ADCs/DACs. Thus, bit synaptic weight is represented by using binary STTRAM devices. While the idea of using STTRAM device as a binary storage unit has been discussed earlier in Kulkarni et al. (2019, 2020), a deterministic spiking neuron model was used for learning and inference based on a crossbar architecture. Instead, this work uses STTRAM memory as a regular storage unit, similar to that of SRAM storage in conventional designs. To the best of our knowledge, this work presents the first design of a hardware accelerator based on STTRAM devices to implement probabilistic SNNs based on a generalized linear model (GLM) for neurons.
2.1 Main Contributions
The main contributions of this paper are listed as follows.

We propose a hardware accelerator for inference, Spintronic Accelerator for Probabilistic SNNs (SpinAPS) that integrates binary spintronic devices to store the synaptic states and digital CMOS neurons for computations. The potential of hardware implementations of Generalized Linear Model (GLM)based probabilistic SNNs trained using the energy efficient firsttospike rule are evaluated for the first time in this work.

We evaluate the performance of probabilistic SNNs on two benchmarks  handwritten digit recognition and human activity recognition datasets, and show that SNN performance is comparable to that of equivalent ANNs.

SpinAPS achieves performance improvement in terms of GSOPS/W/mm when compared to an equivalent design that uses SRAM to store the synaptic states.
This paper is organized as follows: In Section 3, we review the architecture of GLMbased probabilistic SNNs and also explain the firsttospike rule for classification. The algorithm optimization strategies for an efficient hardware implementation for the firsttospike rule is considered in Section 4. The architecture details of SpinAPS core and the mapping of the GLM kernels into the memory is discussed in Section 5. Section 6 details the relevant digital CMOS neuron logic blocks and memory design. We then evaluate the performance of our hardware accelerator for different bit precision choices in Section 7. Finally, we conclude the paper in Section 8.
3 Review of GLM neuron model for SNNs
The majority of the neuron models used in SNN computations are deterministic in nature, by which the neuron emits a spike when the membrane potential crosses the threshold value. The nondifferentiable nature of the spikes in SNNs makes it nontrivial to use the standard approach of stochastic gradient descent (SGD) widely used for training ANNs. Therefore, several approaches have been used to realize deterministic SNNs such as converting pretrained ANNs, smoothing out the membrane potential to define the derivatives, and so on
Rueckauer and Liu (2018); Wu et al. (2018); Lee et al. (2016); O’Connor and Welling (2016). In contrast to deterministic models, probabilistic models for neurons are based on a linearnonlinear Poisson model that are widely studied in the field of computational neuroscience Weber and Pillow (2017).Generalized Linear Models (GLM) yield a probabilistic framework for SNNs that is flexible and computationally tractable Jang et al. (2019); Weber and Pillow (2017). The basic architecture of GLMbased probabilistic SNNs used for SNN training is shown in Fig. 1. We focus on a 2layer SNN, which has presynaptic neurons encoding the input and output neurons corresponding to the output classes. Each of the input neurons receives a spike train having samples through rate encoding. The input is normalized and spikes are issued through a Bernoulli random process. For cases where the sign of the input is vital in achieving learning, the negative sign is absorbed in the sign bit of the corresponding weights. The membrane potential of output neuron at any instant can be expressed as,
(1) 
where denotes the stimulus kernel; represents the input spike window having spike samples, denotes the feedback kernel; represents the output spike window with samples; and is the bias parameter. Here refers to the index of the presynaptic neuron and refers to the index of the postsynaptic neuron. The stimulus and feedback kernels in GLM can be defined as the weighted sum of fixed basis functions with the learnable weights, and they are expressed as shown below.
(2) 
The matrices A, B
are the basis vectors, defined as
A = [] and B = []. The prior work in GLM Bagheri and others (2018) uses realvalued raised cosine basis vectors for the stimulus and the feedback kernels. The vectors = and are the learnable weights in the network with ,denoting the number of basis functions. The spiking probability of the output neuron is then decided based on the sigmoid nonlinearity applied to the membrane potential. GLM neurons have reproduced a wide range of spiking neuronal behaviors observed in human brain by appropriately tuning the stimulus and feedback kernels
Weber and Pillow (2017). Learning rules for a GLM SNN based on rate and firsttospike decoding rules have been derived in a number of works reviewed in Jang et al. (2019); Jang and Simeone (2019); Gardner and Grüning (2016).Two main decoding strategies have been considered for the given SNN architecture  rate decoding and firsttospike decoding Bagheri and others (2018). When using the rate decoding scheme for inference, the network decision is based on the neuron with the maximum spike count. For the firsttospike scheme, a decision is made when one of the output neuron spikes. It has been shown in Bagheri and others (2018) that the firsttospike rule exhibits a low inference complexity compared to rate decoding due to its ability to make decisions early. Hence, we choose the firsttospike scheme for our hardware optimization studies.
3.1 FirsttoSpike Decoding
The fundamental idea of firsttospike scheme is illustrated in Figure 2. The kernels for the GLM based SNN are trained using the maximum likelihood criterion, which maximizes the probability of obtaining the first spike at the labeled neuron and no spikes for all other output neurons up to that time instant. This probability can be mathematically expressed as,
(3) 
where corresponds to the labeled neuron, denotes the membrane potential, and
denotes the sigmoid activation function applied to the membrane potential
. Also . The weight update rules are derived by maximizing the log probability in equation (3) and are discussed in detail in Bagheri and others (2018). Note that the feedback kernels are not necessary for the firsttospike rule as the network dynamics need not be calculated after the first spike is observed.3.2 Datasets Used in This Study
Throughout this work, we evaluate the performance of Probabilistic SNNs on handwritten digit recognition and human activity recognition (HAR) datasets Anguita et al. (2013). With training and test images, each of the input image in the handwritten digit database has () pixels corresponding to the (
) digits. The HAR dataset has a collection of physical activities feature extracted from the time series inputs of the embedded sensors in a smart phone. The database has roughly
training samples and test samples corresponding to six types of physical activities.4 HardwareSoftware Cooptimization
In this section, we discuss the design choices that facilitate our hardware implementation. We start by observing the floatingpoint baseline accuracy of FtS in Fig. 3, where the kernels are trained using the techniques demonstrated in Bagheri and others (2018). For the purpose of designing the inference engine, we assume that the kernels are fixed and binary basis vectors are used in the training instead of the cosine basis vectors. The usage of binary basis vectors helps in simplifying the kernel computations without requiring any multipliers in hardware. We first determine the floating baseline test accuracy as a function of the presentation times by training the SNN over epochs.
It can be seen that the network performance improves with higher presentation times and spike integration windows , reaching a maximum test accuracy for and . Note that for the handwritten digit benchmark, the network accuracy of the GLM SNN trained using the FtS rule with (, ) is %, which is at par with the % accuracy of a 2layer artificial neural network (ANN) with the same architecture. In the case of HAR dataset, GLM achieves a maximum test accuracy of with , which is comparable with ANN test accuracy of . Hence, we chose , for handwritten digit recognition and , for HAR benchmarks as the baseline. We now discuss software optimization strategies for implementing the firsttospike scheme in an energy efficient manner in hardware.
4.1 Sigmoid Activation Function
As shown in Fig. 1
, the basic GLM neuron architecture uses a sigmoid activation function to determine the spike probability of the output neurons. The implementation of sigmoid functions in hardware is relatively complex as it involves division and exponentiation, incurring significant area and power
Tommiska (2003). Here we follow the piecewise linear approximation (PWL) demonstrated in Alippi and StortiGajani (1991) for implementing the sigmoid activation function. In the PWL approximation, the sigmoid is broken up into integer set of break points such that the resulting function can be expressed as powers of . Considering the negative axis alone, the PWL approximation for input can be expressed as(4) 
where is the integer part of and is the fractional part of . Using the symmetry of sigmoid function, the values in the positive xaxis can be obtained as . The output of the PWL approximation is then compared with a pseudorandom number generated from the linear feedback shift register (LFSR) and a spike is generated if the PWL value is greater than the LFSR output. A 16bit LFSR is used and is assumed to be shared among all the output neurons.
4.2 Quantization
We now aim at finding the minimum bit precision required for the learnt weights and biases in order to maintain close to baseline floatingpoint accuracy during inference. Here we follow a post training quantization study to determine the minimum bit precision required for the learnable parameters. Starting with the baseline networks for the two datasets obtained using floating point parameters, we quantize them into discrete levels and study the inference accuracy with the PWL approximation for the sigmoid. With bbit quantization, one bit is reserved for the sign to represent both positive and negative parameters. We use a uniform quantizer with quantization step given by the relation
(5) 
respectively for weights and biases. We also quantize the membrane potential and the output of the PWL activation to b bits. We summarize the inference performance of the GLM SNN as a function of bit precision in Fig. 4 (a). Network performance degrades with lower choices of b, but we note that bit resolution is sufficient for maintaining close to baseline floatingpoint inference accuracy for the two benchmarks.
Even though the input spike pattern lasts for algorithmic time steps, the firsttospike learning rule allows a decision to be made even before all the spikes are presented to the network. It can be observed for the handwritten digit benchmark, around 75% of the samples are classified in the first algorithmic time steps. Thus GLM SNN can leverage the ability of firsttospike rule in making decisions with reduced latency and memory accesses.
5 Overview of SpinAPS architecture
As illustrated in Fig. 5, the core architecture of SpinAPS consists of binary spintronic devices to store the synaptic states and digital CMOS neurons to perform the neuronal functionality. The SpinAPS core accepts input spike at every processor time step , reads the synapses corresponding to the spike integration window from the memory to compute the membrane potential and applies the nonlinear PWL activation function to determine the spike probability of the output neurons. A pseudorandom number generated from the LFSR is then used to actually determine whether a spike is issued or not as explained in Section 4. The word width, and hence the memory capacity of the banked STTRAM memory used in the SpinAPS core can be designed based on the bit precision b required for the synapses.
SpinAPS colocates the dense STTRAM memory array and digital CMOS computations in the same core, thereby avoiding the traditional von Neumann bottleneck. Following the principles demonstrated in IBM’s TrueNorth chip Merolla et al. (2014), a tiled architecture with SpinAPS cores, with each core having neurons, can be used to realize a system with 1 million neurons as illustrated in Fig. 6. We next discuss the basic characteristics of the spintronic synapses and how we can map the parameters of probabilistic SNNs into the memory.
5.1 STTRAM as Synapse
Synapses generally scale as , where is the number of neurons in each layer. Recently, spintronic devices have been explored for its use as nanoscale synapses Locatelli and others (2015); Sengupta et al. (2016); Wang and others (2015) mainly due to its high read and write bandwidths, low power consumption and excellent reliability Dorrance and others (2012); Yu and others (2013); Nebashi and others (2009). These spintronic devices are compatible with CMOS technology and fast read/write has been demonstrated in sub ns regime Dong and others (2019); Noguchi and others (2015). Here we propose binary STTRAM devices as the nanoscale electrical synapses for the SpinAPS core illustrated in Fig. 5. Structurally, STTRAM uses a Magnetic Tunnel Junction (MTJ) with a pair of ferromagnets separated by a thin insulating layer. These devices have the ability to store either ‘0’ or ‘1’ depending on the relative magnetic orientation of the two ferromagnetic layers as shown in Fig. 7. The memory array is typically configured based on the crossbar architecture with an access transistor connected in series with the memory device to selectively read and program it.
5.2 Synaptic Memory Architecture
Here we discuss the mapping of the learnable parameters, stimulus kernel () and the bias parameters () of GLMbased SNN into the spintronic memory to achieve accelerated hardware performance.
5.2.1 Kernel Mapping
Here we choose (, ) to describe the mapping of bit synapses, hence the word width (vertical lines in Fig. 8) required for the memory is bits. Each of the input neuron generates a bit pattern of length , hence unique kernel weights are provided for every neuron, requiring word lines in the array. Thus, a network with 256 input and 256 output neurons can be mapped to a memory array with memory devices. For example if the bit pattern generated by the first input neuron is “1010010”, then the synaptic weights corresponding to 1, 3, and 6 word line will be read sequentially. The address corresponding to these word lines are stored in registers, indicated as address storage registers in Figure 5. These synaptic weights will be added at the output neuron to compute the membrane potential. One word line will be sufficient for mapping as () bits is equivalent to the word width of the core. As determines the baseline firing rate of the output neurons, the word line voltage corresponding to the line is kept high indicating an “always read” condition.
6 Highlevel Design
6.1 Neuron Implementation
We now describe the digital CMOS implementation of the relevant blocks. The baseline design assumes 8bit precision and is synthesized using TSMC nm logic process running at a clock frequency of MHz.
6.1.1 Generation of spike window
As discussed in Section 3, normalized input spikes are transformed to generate spikes of length using a Bernoulli random process. At each time instant, , incoming spikes are latched at the input neuron (marked as ‘In. Reg’, a serialin, parallel out shift register in Fig. 5) which is then translated into an activation pattern of length , and applied as read enable signals to the word lines associated with each kernel weights. The spike window generator circuit uses a multiplexer for selecting the bit pattern of length from the input register and whose select lines are configured by the controller for every . For example, when , the select line will be “000” and as no input spikes have arrived before, the output of the multiplexer would be “0000000”. When , the select line becomes “010”, and the multiplexer output will be “000000”, where represents the presence/absence of a spike at and so on. As discussed in Section 5, the synapses associated with the input neurons are read sequentially and hence the logic circuit for the spike window generator can be shared among all the input neurons. The synaptic weights are stored in the registers and the sign bit of the weights can be optionally flipped depending on the sign of the input before computing the membrane potential .
6.1.2 Piecewise Linear Approximation (PWL) for sigmoid calculation
Our GLM neuron model uses a nonlinear sigmoid activation function for generating the spike probabilities for the output neuron based on the value of . Here we adopt the traditional piecewise linear approximation (shown in Equation 4) for the sigmoid using adders and shift registers as described in Alippi and StortiGajani (1991). The 18bit membrane potential is clipped to an 8bit fixed point number in the range [8,8] before applying to the Piecewise Linear Approximation (PWL) sigmoid generator which is shared among output neurons. The clipped fixed point representation assumes bit for the sign, bits for the integer and bits for the fractional part. The output of the sigmoid generator is an unsigned bit number whose values lie in the range and is compared with the 8bit pattern from the bit LFSR (Linear Feedback Shift Register) to generate output spikes.
6.2 Design of the synaptic memory
We describe the memory organization for the baseline design with 8bit precision and (, ) as discussed in Section 4. The required memory capacity for any bbit precision would become b bits. We design and analyze the STTRAM based synaptic memory using DESTINY, a comprehensive tool for modeling emerging memory technologies Poremba and others (2015). STTRAM cell architecture with 1T/1MTJ configuration is simulated using the parameters mentioned in table 1 with the feature size nm Lin and others (2009). In table 1, V is the read voltage, t is the read pulse width, I is the programming current and t denotes the write pulse width. Representative values for read and write pulse width is assumed based on experimentally reported chipscale demonstrations of STTRAM Dong and others (2019); Lin and others (2009).
Parameter  Cell Area  R  R  V  t  I  t 
F  mV  ns  A  ns  
Value  24  2500  5000  80  5  150  10 
The optimized solution of the memory mapping for 8bit precision is shown in Figure 9. The read energy per word line for b is around pJ with a latency of ns. The simulated STTRAM array running at MHz has an area efficiency of % with a total synaptic area of mm and power of mW.
7 Performance evaluation
We use a commonly used performance metric to benchmark our accelerator design  the number of billions of synaptic operations that can be performed per second per watt (GSOPS/W) and per unit area (GSOPS/W/mm). A synaptic operation refers to reading one active word line ( bbit synapses), computing the spike probability and then issuing the output spike. We compare these performance metrics of SpinAPS with an equivalent SRAMbased design for different bit precisions (shown in table 2), with neuron logic implementation kept the same. The SRAM memory is simulated using DESTINY Poremba and others (2015) and clocked at MHz. Independent of bit precision, SpinAPS can perform GSOPS while the SRAMbased design achieves GSOPS due to its higher clock rate. For bit precision, SpinAPS core is approximately better in synaptic power and in synaptic area when compared to SRAM. For the values reported in table 2, a overhead is considered for the spike routing infrastructure between the cores and overhead is added to consider the area and power of the core’s controller Merolla et al. (2014); Moradi and Manohar (2018); Moradi and others (2018).
Precision  GSOPS/W  GSOPS/W/mm  

SRAM  STTRAM  SRAM  STTRAM  
5  353  474  177  559 
6  283  412  119  415 
7  230  366  83  322 
8  193  311  61  239 
It can be noticed that the SpinAPS core can achieve a maximum performance improvement of and a minimum of in terms of GSOPS/W/mm when compared to an equivalent SRAMbased design for b and b precision choices respectively. The average energy per algorithmic time step of the SpinAPS core and the total area for the design as a function of the bit precision is shown in Fig. 10.
We now compare the performance of SpinAPS with an inference accelerator design employing STTRAM devices for SNNs, memristorbased inference engines for DNNs, and GPUs. The performance improvement projected for SpinAPS is comparable to what has been recently reported in Kulkarni et al. (2019), which uses STTRAM devices in integrated crossbar arrays. Furthermore, the SpinAPS design presented here, when extrapolated to nm Bohr (2014) technology node achieves GSOPS/W, which is also comparable with other recent memristorbased DNN inference engines achieving GSOPS/W in Shafiee et al. (2016) and GSOPS/W in Ankit et al. (2019). Note that when compared to stateoftheart GPUs like Tesla V100, SpinAPS can achieve approximately , and improvement in terms of GSOPS/W, and GSOPS/mm 29. While implementations employing analog phase change memory (PCM) devices in crossbar arrays have been projected to provide two orders of magnitude improvement in energy efficiency when compared to GPUs through extrapolated and aggressive assumptions Ambrogio and others (2018), SpinAPS projections are based on more realistic design choices that are representative of experimental demonstrations.
8 Conclusions
In this paper, we proposed a hardware accelerator SpinAPS using binary STTRAM devices and digital CMOS neurons to perform inference for probabilistic SNNs. The probabilistic SNNs based on the GLM neuron model are trained using the novel firsttospike rule and its performance is benchmarked on two standard datasets, exhibiting comparable performance with an equivalent ANN. We discussed the design of the basic elements in the SpinAPS core, considering different bit precision choices and the tradeoff associated with the performance and hardware constraints. SpinAPS leverages the ability of firsttospike rule in making decisions with low latency achieving approximately performance improvement in terms of GSOPS/W/mm compared to SRAMbased design.
Acknowledgment
This research was supported in part by the National Science Foundation grant #1710009, and the CAMPUSENSE project grant from CISCO Systems Inc. Resources of the HighPerformance Computing facility at NJIT was used in this work.
References
 Simple approximation of sigmoidal functions: realistic design of digital neural networks capable of learning. In IEEE International Sympoisum on Circuits and Systems, Vol. . External Links: Document, ISSN Cited by: §4.1, §6.1.2.
 Equivalentaccuracy accelerated neuralnetwork training using analogue memory. Nature 558. External Links: ISSN 14764687, Document Cited by: §2, §7.
 Energy Efficient SmartphoneBased Activity Recognition using FixedPoint Arithmetic.. J. UCS 19 (9), pp. 1295–1314. Cited by: §3.2.

PUMA: A Programmable UltraEfficient MemristorBased Accelerator for Machine Learning Inference
. In Proceedings of the TwentyFourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, New York, NY, USA, pp. 715–731. External Links: ISBN 9781450362405, Document Cited by: §2, §7.  Stochastic Learning in Deep Neural Networks based on Nanoscale PCMO device characteristics. Neurocomputing 321, pp. 227 – 236. External Links: ISSN 09252312, Document Cited by: §2.
 Training Probabilistic Spiking Neural Networks with First ToSpike Decoding. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. . External Links: ISSN 2379190X Cited by: Figure 1, §3.1, §3, §3, §4.
 14 nm Process Technology: Opening New Horizons. Note: Intel Technology Development, Available online. https://www.intel.com/content/www/us/en/architectureandtechnology/bohr14nmidf2014brief.html Cited by: §7.
 PRIME: A Novel ProcessinginMemory Architecture for Neural Network Computation in ReRAMBased Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Vol. , pp. 27–39. Cited by: §2.
 Loihi: a Neuromorphic Manycore Processor with OnChip Learning. IEEE Micro 38 (1), pp. 82–99. External Links: Document, ISSN 02721732 Cited by: §2.
 A 1Mb 28nm 1T1MTJ STTMRAM with SingleCap OffsetCancelled Sense Amplifier and In Situ SelfWriteTermination. IEEE Journal of SolidState Circuits 54 (1), pp. 231–239. External Links: Document, ISSN 00189200 Cited by: §5.1, §6.2.
 Scalability and DesignSpace Analysis of a 1T1MTJ Memory Cell for STTRAMs. IEEE Transactions on Electron Devices 59 (4), pp. 878–887. External Links: Document, ISSN 00189383 Cited by: §5.1.
 Perspective: spintronic synapse for artificial neural network. Journal of Applied Physics 124 (15), pp. 151904. External Links: Document Cited by: §2.
 Supervised learning in spiking neural networks for precise temporal encoding. PLOS ONE 11 (8), pp. 1–28. External Links: Document Cited by: §3.
 Acceleration of Deep Neural Network Training with Resistive CrossPoint Devices: Design Considerations. Frontiers in Neuroscience 10, pp. 333. External Links: Document, ISSN 1662453X Cited by: §2.
 An Introduction to Probabilistic Spiking Neural Networks: Probabilistic Models, Learning Rules, and Applications. IEEE Signal Processing Magazine 36 (6), pp. 64–77. Cited by: §1, §3.
 Training Dynamic Exponential Family Models with Causal and Lateral Dependencies for Generalized Neuromorphic Computing. In ICASSP 2019  2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. . External Links: Document, ISSN 15206149 Cited by: §3.
 Neuromorphic hardware accelerator for snn inference based on sttram crossbar arrays. In 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Vol. , pp. 438–441. Cited by: §2, §2, §7.
 An OnChip Learning Accelerator for Spiking Neural Networks using STTRAM Crossbar Arrays. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE), Vol. , pp. 1019–1024. Cited by: §2.
 Synaptic electronics: materials, devices and applications. Nanotechnology 24 (38), pp. . External Links: Document Cited by: §2.
 Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience 10, pp. 508. External Links: Document, ISSN 1662453X Cited by: §3.
 45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverseconnection 1T/1MTJ cell. In 2009 IEEE International Electron Devices Meeting (IEDM), Vol. , pp. 1–4. External Links: ISSN 01631918 Cited by: §6.2.
 Spintronic devices as key elements for energyefficient neuroinspired architectures. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE), Vol. , pp. 994–999. External Links: Document, ISSN 15301591 Cited by: §2, §5.1.
 A million spikingneuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. External Links: Document, ISSN 00368075 Cited by: §2, §5, §7.
 A Scalable Multicore Architecture with Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs). IEEE Transactions on Biomedical Circuits and Systems (), pp. . External Links: Document, ISSN 19324545 Cited by: §7.
 The Impact of Onchip Communication on Memory Technologies for Neuromorphic Systems. Journal of Physics D: Applied Physics 52 (1), pp. 014003. External Links: Document Cited by: §7.
 Building braininspired computing systems: examining the role of nanoscale devices. IEEE Nanotechnology Magazine 12 (3), pp. 19–35. External Links: Document, ISSN 19324510 Cited by: §1.
 A 90nm 12ns 32Mb 2T1MTJ MRAM. In 2009 IEEE International SolidState Circuits Conference  Digest of Technical Papers, Vol. , pp. 462–463,463a. External Links: Document, ISSN 01936530 Cited by: §5.1.
 A 3.3nsaccesstime 71.2W/MHz 1Mb embedded STTMRAM using physically eliminated readdisturb scheme and normallyoff memory architecture. In IEEE International SolidState Circuits Conference  (ISSCC) Digest of Technical Papers, Vol. , pp. . External Links: Document, ISSN 01936530 Cited by: §5.1.
 [29] NVIDIA Volta GV100 12nm FinFET GPU DetailedTesla V100 Specifications Include 21 Billion Transistors, 5120CUDA Cores, 16GB HBM2 with 900 GB/s Bandwidth. Note: Available online. https://wccftech.com/nvidiavoltagv100gputeslav100architecturespecificationsdeepdive/ Cited by: §7.
 Deep Spiking Networks. CoRR abs/1602.08323. External Links: 1602.08323 Cited by: §3.
 Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572 (7767), pp. 106–111. External Links: ISSN 14764687, Document Cited by: §2.
 DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE), Vol. . External Links: Document, ISSN 15301591 Cited by: §6.2, §7.
 Towards spikebased machine intelligence with neuromorphic computing. Nature 575 (7784), pp. 607–617. External Links: ISSN 14764687, Document Cited by: §1.
 Conversion of analog to spiking neural networks using sparse temporal coding. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Vol. , pp. 1–5. Cited by: §3.
 Performance analysis and benchmarking of allspin spiking neural networks (special session paper). In 2017 International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 4557–4563. External Links: Document, ISSN 21614407 Cited by: §2.
 Probabilistic Deep Spiking Neural Systems Enabled by Magnetic Tunnel Junction. IEEE Transactions on Electron Devices 63 (7), pp. 2963–2970. External Links: Document, ISSN 00189383 Cited by: §2, §5.1.

ISAAC: A Convolutional Neural Network Accelerator with InSitu Analog Arithmetic in Crossbars
. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Vol. , pp. 14–26. Cited by: §2, §7.  Efficient Digital Implementation of the Sigmoid function for Reprogrammable Logic. IEE Proceedings  Computers and Digital Techniques 150 (6), pp. 403–411. External Links: Document, ISSN 13502387 Cited by: §4.1.
 A case of precisiontunable STTRAM memory design for approximate neural network. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Vol. . External Links: Document, ISSN 02714302 Cited by: §2, §5.1.
 Capturing the Dynamical Repertoire of Single Neurons with Generalized Linear Models. Neural Computation. External Links: Document Cited by: §3, §3.
 SpatioTemporal Backpropagation for Training HighPerformance Spiking Neural Networks. Frontiers in Neuroscience 12, pp. 331. External Links: Document, ISSN 1662453X Cited by: §3.
 Cycling endurance optimization scheme for 1Mb STTMRAM in 40nm technology. In 2013 IEEE International SolidState Circuits Conference Digest of Technical Papers, Vol. , pp. 224–225. External Links: Document, ISSN 01936530 Cited by: §5.1.
Comments
There are no comments yet.