SpinAPS: A High-Performance Spintronic Accelerator for Probabilistic Spiking Neural Networks

08/05/2020 ∙ by Anakha V Babu, et al. ∙ King's College London 0

We discuss a high-performance and high-throughput hardware accelerator for probabilistic Spiking Neural Networks (SNNs) based on Generalized Linear Model (GLM) neurons, that uses binary STT-RAM devices as synapses and digital CMOS logic for neurons. The inference accelerator, termed "SpinAPS" for Spintronic Accelerator for Probabilistic SNNs, implements a principled direct learning rule for first-to-spike decoding without the need for conversion from pre-trained ANNs. The proposed solution is shown to achieve comparable performance with an equivalent ANN on handwritten digit and human activity recognition benchmarks. The inference engine, SpinAPS, is shown through software emulation tools to achieve 4x performance improvement in terms of GSOPS/W/mm2 when compared to an equivalent SRAM-based design. The architecture leverages probabilistic spiking neural networks that employ first-to-spike decoding rule to make inference decisions at low latencies, achieving 75 the test performance in as few as 4 algorithmic time steps on the handwritten digit benchmark. The accelerator also exhibits competitive performance with other memristor-based DNN/SNN accelerators and state-of-the-art GPUs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The explosive growth of processing requirements of data-driven applications has resulted in intensive research efforts for alternative computing architectures that are more energy-efficient than traditional von Neumann processors. Unlike the dominant Deep Neural Networks (DNNs) which rely on real-valued information encoding, Spiking Neural Networks (SNNs) communicate through discrete and sparse tokens in time called spikes, mimicking the operation of the brain and are hence projected to be ideal candidates for realizing energy-efficient hardware platforms for artificial intelligence applications. Moreover, SNNs are also ideal for real-time applications as they take advantage of the temporal dimension for data encoding and processing. However, SNNs lag behind DNNs in terms of computational capability demonstrations due to the current lack of efficient learning algorithms

Nandakumar and others (2018)

. Backpropagation techniques are ubiquitously adopted to train DNNs, but the discontinuous nature of spikes makes it non-trivial to derive such gradient-based rules for SNNs

Roy et al. (2019)

. Here we explore a probabilistic framework for SNNs, which defines the output of spiking neurons as jointly distributed binary random processes. Such definitions help in applying maximum likelihood criteria and then derive flexible learning rules without requiring backpropagation mechanisms, conversions or other approximations from pre-trained ANNs

Jang et al. (2019).

Conventional neural networks have millions of trainable parameters and are trained using von Neumann machines, where the memory and computation units are physically separated. The performance of these implementations is typically limited by the “von Neumann bottleneck” caused by the constant transfer of data between the processor and the memory. SNN implementations on these platforms becomes inherently slow due to the need to access data over time in order to carry out temporal processing. Hence, hardware accelerators are necessary to exploit the full potential of SNNs and also to develop efficient learning algorithms. Here, we discuss a hardware accelerator designed for implementing probabilistic SNNs, that uses binary STT-RAM devices for realizing synapses and digital CMOS for neuronal computations. We also evaluate the performance of probabilistic SNNs against equivalent ANNs on standard benchmark datasets.

2 Related Work

In an effort to build large-scale neuromorphic computing systems that can emulate the energy efficiency of the human brain, several computing platforms have implemented SNNs. While Tijanic chip Pei and others (2019), Intel’s Loihi Davies and others (2018) and IBM’s TrueNorth Merolla et al. (2014) realize spiking neurons and make use of static random access memory (SRAM) to store the state of synapses and neurons, recent research efforts have proposed the use of tiled crossbar arrays of two terminal nanoscale devices for implementing large-scale DNN systems Kuzum et al. (2013); Gokmen and Vlasov (2016); Ambrogio and others (2018); Ankit et al. (2019)

. Though analog memristor based neural network accelerators are estimated to provide higher throughput than GPUs, there are several challenges associated with these in-memory computing solutions

Gokmen and Vlasov (2016); Ankit et al. (2019); Shafiee et al. (2016); Chi et al. (2016). For instance, programming variability and stochasticity of device conductances as well as the need for accurate and area/power-hungry digital to analog converters (DAC) at the input and analog to digital converters (ADC) at the output, along with the additive noise contributed by these peripheral logic circuits pose significant challenges Ambrogio and others (2018); Babu et al. (2018); Kulkarni et al. (2019).

Among the emerging nanoscale memory devices, Spin Transfer Torque RAM (STT-RAM) has been explored for implementing synaptic weights for ANNs as well as SNNs owing to their fast read/write characteristics, high endurance, and scalability Locatelli and others (2015); Wang and others (2015) in several previous studies Kulkarni et al. (2019); Sengupta et al. (2017); Kulkarni et al. (2020); Fukami,Shunsuke and Ohno,Hideo (2018). The stochastic nature of the spintronic devices has been leveraged to model neural transfer functions in a crossbar architecture Sengupta et al. (2016). These designs for SNNs have shown over energy improvements over conventional CMOS designs. Notably, an all-spin neuromorphic processor design for SNNs comprising of spintronic synapses and neurons with in-memory computing architecture has been projected to give higher energy efficiency and speed up compared to equivalent CMOS implementations Sengupta et al. (2017). In contrast to these prior implementations using spintronic devices, we consider STT-RAM device as a binary storage unit, minimising the area/power and precision requirements of peripheral logic circuits consisting of ADCs/DACs. Thus, -bit synaptic weight is represented by using binary STT-RAM devices. While the idea of using STT-RAM device as a binary storage unit has been discussed earlier in Kulkarni et al. (2019, 2020), a deterministic spiking neuron model was used for learning and inference based on a crossbar architecture. Instead, this work uses STT-RAM memory as a regular storage unit, similar to that of SRAM storage in conventional designs. To the best of our knowledge, this work presents the first design of a hardware accelerator based on STT-RAM devices to implement probabilistic SNNs based on a generalized linear model (GLM) for neurons.

2.1 Main Contributions

The main contributions of this paper are listed as follows.

  • We propose a hardware accelerator for inference, Spintronic Accelerator for Probabilistic SNNs (SpinAPS) that integrates binary spintronic devices to store the synaptic states and digital CMOS neurons for computations. The potential of hardware implementations of Generalized Linear Model (GLM)-based probabilistic SNNs trained using the energy efficient first-to-spike rule are evaluated for the first time in this work.

  • We evaluate the performance of probabilistic SNNs on two benchmarks - handwritten digit recognition and human activity recognition datasets, and show that SNN performance is comparable to that of equivalent ANNs.

  • SpinAPS achieves performance improvement in terms of GSOPS/W/mm when compared to an equivalent design that uses SRAM to store the synaptic states.

This paper is organized as follows: In Section 3, we review the architecture of GLM-based probabilistic SNNs and also explain the first-to-spike rule for classification. The algorithm optimization strategies for an efficient hardware implementation for the first-to-spike rule is considered in Section 4. The architecture details of SpinAPS core and the mapping of the GLM kernels into the memory is discussed in Section 5. Section 6 details the relevant digital CMOS neuron logic blocks and memory design. We then evaluate the performance of our hardware accelerator for different bit precision choices in Section 7. Finally, we conclude the paper in Section 8.

3 Review of GLM neuron model for SNNs

The majority of the neuron models used in SNN computations are deterministic in nature, by which the neuron emits a spike when the membrane potential crosses the threshold value. The non-differentiable nature of the spikes in SNNs makes it non-trivial to use the standard approach of stochastic gradient descent (SGD) widely used for training ANNs. Therefore, several approaches have been used to realize deterministic SNNs such as converting pretrained ANNs, smoothing out the membrane potential to define the derivatives, and so on

Rueckauer and Liu (2018); Wu et al. (2018); Lee et al. (2016); O’Connor and Welling (2016). In contrast to deterministic models, probabilistic models for neurons are based on a linear-nonlinear Poisson model that are widely studied in the field of computational neuroscience Weber and Pillow (2017).

Figure 1: The architecture of Generalized Linear Model (GLM) model used for SNN learning in this work Bagheri and others (2018). input neurons and one ouput neuron is shown for simplicity.

Generalized Linear Models (GLM) yield a probabilistic framework for SNNs that is flexible and computationally tractable Jang et al. (2019); Weber and Pillow (2017). The basic architecture of GLM-based probabilistic SNNs used for SNN training is shown in Fig. 1. We focus on a 2-layer SNN, which has presynaptic neurons encoding the input and output neurons corresponding to the output classes. Each of the input neurons receives a spike train having samples through rate encoding. The input is normalized and spikes are issued through a Bernoulli random process. For cases where the sign of the input is vital in achieving learning, the negative sign is absorbed in the sign bit of the corresponding weights. The membrane potential of output neuron at any instant can be expressed as,

(1)

where denotes the stimulus kernel; represents the input spike window having spike samples, denotes the feedback kernel; represents the output spike window with samples; and is the bias parameter. Here refers to the index of the pre-synaptic neuron and refers to the index of the post-synaptic neuron. The stimulus and feedback kernels in GLM can be defined as the weighted sum of fixed basis functions with the learnable weights, and they are expressed as shown below.

(2)

The matrices A, B

are the basis vectors, defined as

A = [] and B = []. The prior work in GLM Bagheri and others (2018) uses real-valued raised cosine basis vectors for the stimulus and the feedback kernels. The vectors = and are the learnable weights in the network with ,

denoting the number of basis functions. The spiking probability of the output neuron is then decided based on the sigmoid non-linearity applied to the membrane potential. GLM neurons have reproduced a wide range of spiking neuronal behaviors observed in human brain by appropriately tuning the stimulus and feedback kernels

Weber and Pillow (2017). Learning rules for a GLM SNN based on rate and first-to-spike decoding rules have been derived in a number of works reviewed in Jang et al. (2019); Jang and Simeone (2019); Gardner and Grüning (2016).

Two main decoding strategies have been considered for the given SNN architecture - rate decoding and first-to-spike decoding Bagheri and others (2018). When using the rate decoding scheme for inference, the network decision is based on the neuron with the maximum spike count. For the first-to-spike scheme, a decision is made when one of the output neuron spikes. It has been shown in Bagheri and others (2018) that the first-to-spike rule exhibits a low inference complexity compared to rate decoding due to its ability to make decisions early. Hence, we choose the first-to-spike scheme for our hardware optimization studies.

3.1 First-to-Spike Decoding

The fundamental idea of first-to-spike scheme is illustrated in Figure 2. The kernels for the GLM based SNN are trained using the maximum likelihood criterion, which maximizes the probability of obtaining the first spike at the labeled neuron and no spikes for all other output neurons up to that time instant. This probability can be mathematically expressed as,

(3)

where corresponds to the labeled neuron, denotes the membrane potential, and

denotes the sigmoid activation function applied to the membrane potential

. Also . The weight update rules are derived by maximizing the log probability in equation (3) and are discussed in detail in Bagheri and others (2018). Note that the feedback kernels are not necessary for the first-to-spike rule as the network dynamics need not be calculated after the first spike is observed.

Figure 2: Illustration of first-to-spike (FtS) rule decoding scheme for the GLM-based SNN on the handwritten digit benchmark. For example, when the trained network is presented with an image ‘7’, output neuron corresponding to class 7 will ideally generate the first spike.

3.2 Datasets Used in This Study

Throughout this work, we evaluate the performance of Probabilistic SNNs on handwritten digit recognition and human activity recognition (HAR) datasets Anguita et al. (2013). With training and test images, each of the input image in the handwritten digit database has () pixels corresponding to the (

) digits. The HAR dataset has a collection of physical activities feature extracted from the time series inputs of the embedded sensors in a smart phone. The database has roughly

training samples and test samples corresponding to six types of physical activities.

4 Hardware-Software Co-optimization

In this section, we discuss the design choices that facilitate our hardware implementation. We start by observing the floating-point baseline accuracy of FtS in Fig. 3, where the kernels are trained using the techniques demonstrated in Bagheri and others (2018). For the purpose of designing the inference engine, we assume that the kernels are fixed and binary basis vectors are used in the training instead of the cosine basis vectors. The usage of binary basis vectors helps in simplifying the kernel computations without requiring any multipliers in hardware. We first determine the floating baseline test accuracy as a function of the presentation times by training the SNN over epochs.

Figure 3: The test accuracy of GLM on human activity recognition (HAR) and handwritten digit recognition. A comparable performance is achieved with respect to an ANN having the same architecture. Here presentation time is kept the same as the spike integration window .

It can be seen that the network performance improves with higher presentation times and spike integration windows , reaching a maximum test accuracy for and . Note that for the handwritten digit benchmark, the network accuracy of the GLM SNN trained using the FtS rule with (, ) is %, which is at par with the % accuracy of a 2-layer artificial neural network (ANN) with the same architecture. In the case of HAR dataset, GLM achieves a maximum test accuracy of with , which is comparable with ANN test accuracy of . Hence, we chose , for handwritten digit recognition and , for HAR benchmarks as the baseline. We now discuss software optimization strategies for implementing the first-to-spike scheme in an energy efficient manner in hardware.

4.1 Sigmoid Activation Function

As shown in Fig. 1

, the basic GLM neuron architecture uses a sigmoid activation function to determine the spike probability of the output neurons. The implementation of sigmoid functions in hardware is relatively complex as it involves division and exponentiation, incurring significant area and power

Tommiska (2003). Here we follow the piecewise linear approximation (PWL) demonstrated in Alippi and Storti-Gajani (1991) for implementing the sigmoid activation function. In the PWL approximation, the sigmoid is broken up into integer set of break points such that the resulting function can be expressed as powers of . Considering the negative axis alone, the PWL approximation for input can be expressed as

(4)

where is the integer part of and is the fractional part of . Using the symmetry of sigmoid function, the values in the positive x-axis can be obtained as . The output of the PWL approximation is then compared with a pseudo-random number generated from the linear feedback shift register (LFSR) and a spike is generated if the PWL value is greater than the LFSR output. A 16-bit LFSR is used and is assumed to be shared among all the output neurons.

4.2 Quantization

We now aim at finding the minimum bit precision required for the learnt weights and biases in order to maintain close to baseline floating-point accuracy during inference. Here we follow a post training quantization study to determine the minimum bit precision required for the learnable parameters. Starting with the baseline networks for the two datasets obtained using floating point parameters, we quantize them into discrete levels and study the inference accuracy with the PWL approximation for the sigmoid. With b-bit quantization, one bit is reserved for the sign to represent both positive and negative parameters. We use a uniform quantizer with quantization step given by the relation

(5)

respectively for weights and biases. We also quantize the membrane potential and the output of the PWL activation to b bits. We summarize the inference performance of the GLM SNN as a function of bit precision in Fig. 4 (a). Network performance degrades with lower choices of b, but we note that bit resolution is sufficient for maintaining close to baseline floating-point inference accuracy for the two benchmarks.

Figure 4:

(a) Test performance of the GLM SNN after quantization of weights with PWL approximation and by using a 16-bit LFSR. (b) Cumulative distribution of the number of input samples classified as a function of decision time

. An early decision can be made in first time steps for classifying 75% of the images in the handwritten digit benchmark.

Even though the input spike pattern lasts for algorithmic time steps, the first-to-spike learning rule allows a decision to be made even before all the spikes are presented to the network. It can be observed for the handwritten digit benchmark, around 75% of the samples are classified in the first algorithmic time steps. Thus GLM SNN can leverage the ability of first-to-spike rule in making decisions with reduced latency and memory accesses.

5 Overview of SpinAPS architecture

As illustrated in Fig. 5, the core architecture of SpinAPS consists of binary spintronic devices to store the synaptic states and digital CMOS neurons to perform the neuronal functionality. The SpinAPS core accepts input spike at every processor time step , reads the synapses corresponding to the spike integration window from the memory to compute the membrane potential and applies the non-linear PWL activation function to determine the spike probability of the output neurons. A pseudo-random number generated from the LFSR is then used to actually determine whether a spike is issued or not as explained in Section 4. The word width, and hence the memory capacity of the banked STT-RAM memory used in the SpinAPS core can be designed based on the bit precision b required for the synapses.

Figure 5: The core architecture of SpinAPS having binary STT-RAM devices to store the synaptic states and digital CMOS neurons. Assuming a spike integration window of , the core can map input and output neurons.

SpinAPS co-locates the dense STT-RAM memory array and digital CMOS computations in the same core, thereby avoiding the traditional von Neumann bottleneck. Following the principles demonstrated in IBM’s TrueNorth chip Merolla et al. (2014), a tiled architecture with SpinAPS cores, with each core having neurons, can be used to realize a system with 1 million neurons as illustrated in Fig. 6. We next discuss the basic characteristics of the spintronic synapses and how we can map the parameters of probabilistic SNNs into the memory.

Figure 6: The tiled architecture of SpinAPS obtained by arranging the neuro-synaptic cores that communicate to each other using a packet routing digital mesh network.

5.1 STT-RAM as Synapse

Figure 7: Illustration of basic cell structure of 1T/1MTJ STT-RAM with low (“0”) and high resistance (“1”) states.

Synapses generally scale as , where is the number of neurons in each layer. Recently, spintronic devices have been explored for its use as nanoscale synapses Locatelli and others (2015); Sengupta et al. (2016); Wang and others (2015) mainly due to its high read and write bandwidths, low power consumption and excellent reliability Dorrance and others (2012); Yu and others (2013); Nebashi and others (2009). These spintronic devices are compatible with CMOS technology and fast read/write has been demonstrated in sub ns regime Dong and others (2019); Noguchi and others (2015). Here we propose binary STT-RAM devices as the nanoscale electrical synapses for the SpinAPS core illustrated in Fig. 5. Structurally, STT-RAM uses a Magnetic Tunnel Junction (MTJ) with a pair of ferromagnets separated by a thin insulating layer. These devices have the ability to store either ‘0’ or ‘1’ depending on the relative magnetic orientation of the two ferromagnetic layers as shown in Fig. 7. The memory array is typically configured based on the crossbar architecture with an access transistor connected in series with the memory device to selectively read and program it.

5.2 Synaptic Memory Architecture

Here we discuss the mapping of the learnable parameters, stimulus kernel () and the bias parameters () of GLM-based SNN into the spintronic memory to achieve accelerated hardware performance.

5.2.1 Kernel Mapping

Figure 8: Kernel mapping of GLM SNN to the spintronic memory of SpinAPS core, showing that devices on word-lines are used to represent the kernel parameters of every input neuron.

Here we choose (, ) to describe the mapping of -bit synapses, hence the word width (vertical lines in Fig. 8) required for the memory is bits. Each of the input neuron generates a bit pattern of length , hence unique kernel weights are provided for every neuron, requiring word lines in the array. Thus, a network with 256 input and 256 output neurons can be mapped to a memory array with memory devices. For example if the bit pattern generated by the first input neuron is “1010010”, then the synaptic weights corresponding to 1, 3, and 6 word line will be read sequentially. The address corresponding to these word lines are stored in registers, indicated as address storage registers in Figure 5. These synaptic weights will be added at the output neuron to compute the membrane potential. One word line will be sufficient for mapping as () bits is equivalent to the word width of the core. As determines the baseline firing rate of the output neurons, the word line voltage corresponding to the line is kept high indicating an “always read” condition.

6 High-level Design

6.1 Neuron Implementation

We now describe the digital CMOS implementation of the relevant blocks. The baseline design assumes 8-bit precision and is synthesized using TSMC nm logic process running at a clock frequency of MHz.

6.1.1 Generation of spike window

As discussed in Section 3, normalized input spikes are transformed to generate spikes of length using a Bernoulli random process. At each time instant, , incoming spikes are latched at the input neuron (marked as ‘In. Reg’, a serial-in, parallel out shift register in Fig. 5) which is then translated into an activation pattern of length , and applied as read enable signals to the word lines associated with each kernel weights. The spike window generator circuit uses a multiplexer for selecting the bit pattern of length from the input register and whose select lines are configured by the controller for every . For example, when , the select line will be “000” and as no input spikes have arrived before, the output of the multiplexer would be “0000000”. When , the select line becomes “010”, and the multiplexer output will be “000000”, where represents the presence/absence of a spike at and so on. As discussed in Section 5, the synapses associated with the input neurons are read sequentially and hence the logic circuit for the spike window generator can be shared among all the input neurons. The synaptic weights are stored in the registers and the sign bit of the weights can be optionally flipped depending on the sign of the input before computing the membrane potential .

6.1.2 Piecewise Linear Approximation (PWL) for sigmoid calculation

Our GLM neuron model uses a non-linear sigmoid activation function for generating the spike probabilities for the output neuron based on the value of . Here we adopt the traditional piecewise linear approximation (shown in Equation 4) for the sigmoid using adders and shift registers as described in Alippi and Storti-Gajani (1991). The 18-bit membrane potential is clipped to an 8-bit fixed point number in the range [-8,8] before applying to the Piecewise Linear Approximation (PWL) sigmoid generator which is shared among output neurons. The clipped fixed point representation assumes bit for the sign, bits for the integer and bits for the fractional part. The output of the sigmoid generator is an unsigned -bit number whose values lie in the range and is compared with the 8-bit pattern from the -bit LFSR (Linear Feedback Shift Register) to generate output spikes.

6.2 Design of the synaptic memory

Figure 9: Banked STT-RAM organization for mapping the synapses of GLM SNN having 256 input and 256 output neurons with 8-b precision weights.

We describe the memory organization for the baseline design with 8-bit precision and (, ) as discussed in Section 4. The required memory capacity for any b-bit precision would become b bits. We design and analyze the STT-RAM based synaptic memory using DESTINY, a comprehensive tool for modeling emerging memory technologies Poremba and others (2015). STT-RAM cell architecture with 1T/1MTJ configuration is simulated using the parameters mentioned in table 1 with the feature size nm Lin and others (2009). In table 1, V is the read voltage, t is the read pulse width, I is the programming current and t denotes the write pulse width. Representative values for read and write pulse width is assumed based on experimentally reported chip-scale demonstrations of STT-RAM Dong and others (2019); Lin and others (2009).

Parameter Cell Area R R V t I t
F mV ns A ns
Value 24 2500 5000 80 5 150 10
Table 1: Simulation parameters for STT-RAM array using DESTINY with the feature size nm

The optimized solution of the memory mapping for 8-bit precision is shown in Figure 9. The read energy per word line for b is around pJ with a latency of ns. The simulated STT-RAM array running at MHz has an area efficiency of % with a total synaptic area of mm and power of mW.

7 Performance evaluation

We use a commonly used performance metric to benchmark our accelerator design - the number of billions of synaptic operations that can be performed per second per watt (GSOPS/W) and per unit area (GSOPS/W/mm). A synaptic operation refers to reading one active word line ( b-bit synapses), computing the spike probability and then issuing the output spike. We compare these performance metrics of SpinAPS with an equivalent SRAM-based design for different bit precisions (shown in table 2), with neuron logic implementation kept the same. The SRAM memory is simulated using DESTINY Poremba and others (2015) and clocked at MHz. Independent of bit precision, SpinAPS can perform GSOPS while the SRAM-based design achieves GSOPS due to its higher clock rate. For -bit precision, SpinAPS core is approximately better in synaptic power and in synaptic area when compared to SRAM. For the values reported in table 2, a overhead is considered for the spike routing infrastructure between the cores and overhead is added to consider the area and power of the core’s controller Merolla et al. (2014); Moradi and Manohar (2018); Moradi and others (2018).

Precision GSOPS/W GSOPS/W/mm
SRAM STT-RAM SRAM STT-RAM
5 353 474 177 559
6 283 412 119 415
7 230 366 83 322
8 193 311 61 239
Table 2: Performance comparison of STT-RAM and SRAM-based designs.

It can be noticed that the SpinAPS core can achieve a maximum performance improvement of and a minimum of in terms of GSOPS/W/mm when compared to an equivalent SRAM-based design for b and b precision choices respectively. The average energy per algorithmic time step of the SpinAPS core and the total area for the design as a function of the bit precision is shown in Fig. 10.

Figure 10: (a) Average energy per processor time step () of SpinAPS for each of the bit choices for the handwritten digit benchmark. One processor time step corresponds to reading all the active word lines (on an average ), compute the membrane potential and generating an output spike. (b) Area of the SpinAPS core as a function of bit precision.

We now compare the performance of SpinAPS with an inference accelerator design employing STT-RAM devices for SNNs, memristor-based inference engines for DNNs, and GPUs. The performance improvement projected for SpinAPS is comparable to what has been recently reported in Kulkarni et al. (2019), which uses STT-RAM devices in integrated crossbar arrays. Furthermore, the SpinAPS design presented here, when extrapolated to nm Bohr (2014) technology node achieves GSOPS/W, which is also comparable with other recent memristor-based DNN inference engines achieving GSOPS/W in Shafiee et al. (2016) and GSOPS/W in Ankit et al. (2019). Note that when compared to state-of-the-art GPUs like Tesla V100, SpinAPS can achieve approximately , and improvement in terms of GSOPS/W, and GSOPS/mm 29. While implementations employing analog phase change memory (PCM) devices in crossbar arrays have been projected to provide two orders of magnitude improvement in energy efficiency when compared to GPUs through extrapolated and aggressive assumptions Ambrogio and others (2018), SpinAPS projections are based on more realistic design choices that are representative of experimental demonstrations.

8 Conclusions

In this paper, we proposed a hardware accelerator SpinAPS using binary STT-RAM devices and digital CMOS neurons to perform inference for probabilistic SNNs. The probabilistic SNNs based on the GLM neuron model are trained using the novel first-to-spike rule and its performance is benchmarked on two standard datasets, exhibiting comparable performance with an equivalent ANN. We discussed the design of the basic elements in the SpinAPS core, considering different bit precision choices and the trade-off associated with the performance and hardware constraints. SpinAPS leverages the ability of first-to-spike rule in making decisions with low latency achieving approximately performance improvement in terms of GSOPS/W/mm compared to SRAM-based design.

Acknowledgment

This research was supported in part by the National Science Foundation grant #1710009, and the CAMPUSENSE project grant from CISCO Systems Inc. Resources of the High-Performance Computing facility at NJIT was used in this work.

References

  • C. Alippi and G. Storti-Gajani (1991) Simple approximation of sigmoidal functions: realistic design of digital neural networks capable of learning. In IEEE International Sympoisum on Circuits and Systems, Vol. . External Links: Document, ISSN Cited by: §4.1, §6.1.2.
  • S. Ambrogio et al. (2018) Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558. External Links: ISSN 1476-4687, Document Cited by: §2, §7.
  • D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz (2013) Energy Efficient Smartphone-Based Activity Recognition using Fixed-Point Arithmetic.. J. UCS 19 (9), pp. 1295–1314. Cited by: §3.2.
  • A. Ankit, I. E. Hajj, et al. (2019)

    PUMA: A Programmable Ultra-Efficient Memristor-Based Accelerator for Machine Learning Inference

    .
    In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, New York, NY, USA, pp. 715–731. External Links: ISBN 9781450362405, Document Cited by: §2, §7.
  • A. V. Babu, S. Lashkare, U. Ganguly, and B. Rajendran (2018) Stochastic Learning in Deep Neural Networks based on Nanoscale PCMO device characteristics. Neurocomputing 321, pp. 227 – 236. External Links: ISSN 0925-2312, Document Cited by: §2.
  • A. Bagheri et al. (2018) Training Probabilistic Spiking Neural Networks with First- To-Spike Decoding. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. . External Links: ISSN 2379-190X Cited by: Figure 1, §3.1, §3, §3, §4.
  • M. Bohr (2014) 14 nm Process Technology: Opening New Horizons. Note: Intel Technology Development, Available online. https://www.intel.com/content/www/us/en/architecture-and-technology/bohr-14nm-idf-2014-brief.html Cited by: §7.
  • P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie (2016) PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Vol. , pp. 27–39. Cited by: §2.
  • M. Davies et al. (2018) Loihi: a Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro 38 (1), pp. 82–99. External Links: Document, ISSN 0272-1732 Cited by: §2.
  • Q. Dong et al. (2019) A 1-Mb 28-nm 1T1MTJ STT-MRAM with Single-Cap Offset-Cancelled Sense Amplifier and In Situ Self-Write-Termination. IEEE Journal of Solid-State Circuits 54 (1), pp. 231–239. External Links: Document, ISSN 0018-9200 Cited by: §5.1, §6.2.
  • R. Dorrance et al. (2012) Scalability and Design-Space Analysis of a 1T-1MTJ Memory Cell for STT-RAMs. IEEE Transactions on Electron Devices 59 (4), pp. 878–887. External Links: Document, ISSN 0018-9383 Cited by: §5.1.
  • Fukami,Shunsuke and Ohno,Hideo (2018) Perspective: spintronic synapse for artificial neural network. Journal of Applied Physics 124 (15), pp. 151904. External Links: Document Cited by: §2.
  • B. Gardner and A. Grüning (2016) Supervised learning in spiking neural networks for precise temporal encoding. PLOS ONE 11 (8), pp. 1–28. External Links: Document Cited by: §3.
  • T. Gokmen and Y. Vlasov (2016) Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations. Frontiers in Neuroscience 10, pp. 333. External Links: Document, ISSN 1662-453X Cited by: §2.
  • H. Jang, O. Simeone, B. Gardner, and A. Gruning (2019) An Introduction to Probabilistic Spiking Neural Networks: Probabilistic Models, Learning Rules, and Applications. IEEE Signal Processing Magazine 36 (6), pp. 64–77. Cited by: §1, §3.
  • H. Jang and O. Simeone (2019) Training Dynamic Exponential Family Models with Causal and Lateral Dependencies for Generalized Neuromorphic Computing. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. . External Links: Document, ISSN 1520-6149 Cited by: §3.
  • S. R. Kulkarni, D. V. Kadetotad, S. Yin, J. Seo, and B. Rajendran (2019) Neuromorphic hardware accelerator for snn inference based on stt-ram crossbar arrays. In 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Vol. , pp. 438–441. Cited by: §2, §2, §7.
  • S. R. Kulkarni, S. Yin, J. Seo, and B. Rajendran (2020) An On-Chip Learning Accelerator for Spiking Neural Networks using STT-RAM Crossbar Arrays. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE), Vol. , pp. 1019–1024. Cited by: §2.
  • D. Kuzum, S. Yu, and H. P. Wong (2013) Synaptic electronics: materials, devices and applications. Nanotechnology 24 (38), pp. . External Links: Document Cited by: §2.
  • J. H. Lee, T. Delbruck, and M. Pfeiffer (2016) Training deep spiking neural networks using backpropagation. Frontiers in Neuroscience 10, pp. 508. External Links: Document, ISSN 1662-453X Cited by: §3.
  • C. J. Lin et al. (2009) 45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell. In 2009 IEEE International Electron Devices Meeting (IEDM), Vol. , pp. 1–4. External Links: ISSN 0163-1918 Cited by: §6.2.
  • N. Locatelli et al. (2015) Spintronic devices as key elements for energy-efficient neuroinspired architectures. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE), Vol. , pp. 994–999. External Links: Document, ISSN 1530-1591 Cited by: §2, §5.1.
  • P. A. Merolla, J. V. Arthur, et al. (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. External Links: Document, ISSN 0036-8075 Cited by: §2, §5, §7.
  • S. Moradi et al. (2018) A Scalable Multicore Architecture with Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs). IEEE Transactions on Biomedical Circuits and Systems (), pp. . External Links: Document, ISSN 1932-4545 Cited by: §7.
  • S. Moradi and R. Manohar (2018) The Impact of On-chip Communication on Memory Technologies for Neuromorphic Systems. Journal of Physics D: Applied Physics 52 (1), pp. 014003. External Links: Document Cited by: §7.
  • S. R. Nandakumar et al. (2018) Building brain-inspired computing systems: examining the role of nanoscale devices. IEEE Nanotechnology Magazine 12 (3), pp. 19–35. External Links: Document, ISSN 1932-4510 Cited by: §1.
  • R. Nebashi et al. (2009) A 90nm 12ns 32Mb 2T1MTJ MRAM. In 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, Vol. , pp. 462–463,463a. External Links: Document, ISSN 0193-6530 Cited by: §5.1.
  • H. Noguchi et al. (2015) A 3.3ns-access-time 71.2W/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture. In IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, Vol. , pp. . External Links: Document, ISSN 0193-6530 Cited by: §5.1.
  • [29] NVIDIA Volta GV100 12nm FinFET GPU Detailed-Tesla V100 Specifications Include 21 Billion Transistors, 5120CUDA Cores, 16GB HBM2 with 900 GB/s Bandwidth. Note: Available online. https://wccftech.com/nvidia-volta-gv100-gpu-tesla-v100-architecture-specifications-deep-dive/ Cited by: §7.
  • P. O’Connor and M. Welling (2016) Deep Spiking Networks. CoRR abs/1602.08323. External Links: 1602.08323 Cited by: §3.
  • J. Pei et al. (2019) Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572 (7767), pp. 106–111. External Links: ISSN 1476-4687, Document Cited by: §2.
  • M. Poremba et al. (2015) DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE), Vol. . External Links: Document, ISSN 1530-1591 Cited by: §6.2, §7.
  • K. Roy, A. Jaiswal, and P. Panda (2019) Towards spike-based machine intelligence with neuromorphic computing. Nature 575 (7784), pp. 607–617. External Links: ISSN 1476-4687, Document Cited by: §1.
  • B. Rueckauer and S. Liu (2018) Conversion of analog to spiking neural networks using sparse temporal coding. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Vol. , pp. 1–5. Cited by: §3.
  • A. Sengupta, A. Ankit, and K. Roy (2017) Performance analysis and benchmarking of all-spin spiking neural networks (special session paper). In 2017 International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 4557–4563. External Links: Document, ISSN 2161-4407 Cited by: §2.
  • A. Sengupta, M. Parsa, B. Han, and K. Roy (2016) Probabilistic Deep Spiking Neural Systems Enabled by Magnetic Tunnel Junction. IEEE Transactions on Electron Devices 63 (7), pp. 2963–2970. External Links: Document, ISSN 0018-9383 Cited by: §2, §5.1.
  • A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar (2016)

    ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

    .
    In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Vol. , pp. 14–26. Cited by: §2, §7.
  • M. T. Tommiska (2003) Efficient Digital Implementation of the Sigmoid function for Reprogrammable Logic. IEE Proceedings - Computers and Digital Techniques 150 (6), pp. 403–411. External Links: Document, ISSN 1350-2387 Cited by: §4.1.
  • Y. Wang et al. (2015) A case of precision-tunable STT-RAM memory design for approximate neural network. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS), Vol. . External Links: Document, ISSN 0271-4302 Cited by: §2, §5.1.
  • A. I. Weber and J. W. Pillow (2017) Capturing the Dynamical Repertoire of Single Neurons with Generalized Linear Models. Neural Computation. External Links: Document Cited by: §3, §3.
  • Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi (2018) Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks. Frontiers in Neuroscience 12, pp. 331. External Links: Document, ISSN 1662-453X Cited by: §3.
  • H. Yu et al. (2013) Cycling endurance optimization scheme for 1Mb STT-MRAM in 40nm technology. In 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, Vol. , pp. 224–225. External Links: Document, ISSN 0193-6530 Cited by: §5.1.