Effects of VLSI Circuit Constraints on Temporal-Coding Multilayer Spiking Neural Networks

06/18/2021 ∙ by Yusuke Sakemi, et al. ∙ nec global 0

The spiking neural network (SNN) has been attracting considerable attention not only as a mathematical model for the brain, but also as an energy-efficient information processing model for real-world applications. In particular, SNNs based on temporal coding are expected to be much more efficient than those based on rate coding, because the former requires substantially fewer spikes to carry out tasks. As SNNs are continuous-state and continuous-time models, it is favorable to implement them with analog VLSI circuits. However, the construction of the entire system with continuous-time analog circuits would be infeasible when the system size is very large. Therefore, mixed-signal circuits must be employed, and the time discretization and quantization of the synaptic weights are necessary. Moreover, the analog VLSI implementation of SNNs exhibits non-idealities, such as the effects of noise and device mismatches, as well as other constraints arising from the analog circuit operation. In this study, we investigated the effects of the time discretization and/or weight quantization on the performance of SNNs. Furthermore, we elucidated the effects the lower bound of the membrane potentials and the temporal fluctuation of the firing threshold. Finally, we propose an optimal approach for the mapping of mathematical SNN models to analog circuits with discretized time.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Spiking neural networks (SNNs) are expected to be energy-efficient models because they can process information in the form of spikes in an event-driven manner. Moreover, the performance of SNNs has been demonstrated to be comparable to that of artificial neural networks (ANNs) with deep learning in relatively small-scale image recognition tasks 

[21, 18]

. The information representations in SNNs can be approximately classified into two domains: rate and temporal coding. The information is contained in the rate of the spikes in rate coding, whereas it is contained in the precise timing of the spikes in temporal coding. Although rate coding is closely linked to ANNs

[5][11]

in which the neuron states are represented by analog values, temporal coding can efficiently exploit the information of the temporal dynamics of the membrane potentials and synaptic currents, which are not considered in ANNs 

[7]. In particular, time-to-first-spike (TTFS) coding, which is a type of temporal coding, is expected to be the most efficient because the information of a neuron output is represented as the timing of a single spike [2][16][3][19].

As SNNs are expressed as continuous-state and continuous-time systems, analog VLSI implementation is favorable. However, the construction of the entire system, including the input/output parts, would be infeasible with continuous-time analog circuits when the system size is very large because of inflexibility in system design and difficulty in analog circuit design. The discretization of the neuron activation in the time domain and quantization of the synaptic weights are required. Therefore, many studies on SNNs have adopted fully digital VLSI implementation [14][4] or mixed-signal VLSI implementation [15].

In this study, we temporally discretized SNNs trained with TTFS coding and quantized the synaptic weights to facilitate VLSI implementation. Moreover, we investigated the effects of the lower bound of the membrane potentials and the temporal fluctuation of the firing thresholds, which are problematic in analog VLSI implementation. Section 2 introduces related works, Section 3 presents the investigation into the above-mentioned effects, and Section 4 demonstrates the optimal mapping of SNNs to analog VLSI circuits.

2 Related works

Several supervised learning algorithms for multilayer SNNs based on temporal coding have been proposed in discrete time systems [8][10]. However, these algorithms require approximations to obtain the derivative of the spike timing with respect to the weights to derive

the backpropagation algorithms. In our approach, the SNNs are trained in continuous time and then discretized, which does not require any approximation to derive the backpropagation algorithm apart from the spike vanishing problem

[2], and therefore, the training of the SNNs is easier. Moreover, as we demonstrate later, the discretized time step for the spike timing is important to map the SNNs to VLSI circuits optimally. In our method, once the SNNs are trained in a continuous-time system, an arbitrary time step can be selected for the spike timing without retraining, whereas in the other approaches, the optimization of the time step is difficult because the time step must be fixed when training SNNs [8][10]. Furthermore, we note that it is common to quantize the activation of neurons after training in ANNs [20].

Certain research groups have designed VLSI circuits for SNNs based on TTFS coding [6][17]. Göltz et al. designed a mixed-signal circuit and demonstrated SNNs based on TTFS coding [6]. Oh et al. designed analog SNN circuits with analog memory based on TTFS coding [17]. Although Kheradpishesh et al. [9] and Sakemi et al. [19] investigated the effects of the timing jitter of spikes on the performance, the effects of the time discretization of the spikes have not been discussed. Sakemi et al. [19] and Oh et al. [17] investigated the effects of fixed scattering of the firing thresholds owing to device mismatches. However, they did not discuss the effects of the temporal fluctuation of the firing thresholds. Oh et al. [17] investigated the effects of the variations in the synaptic weights, and Kheradpishesh et al. [9]binarized weights of SNNs on TTFS coding. However, they did not discuss the effects of weight quantization. To the best of our knowledge, the effects of the lower bound of the membrane potentials for SNNs based on TTFS coding have not been investigated in previous works [2][16][3][19][6][17].

3 Effects of non-idealities arising from VLSI circuit constraints

3.1 Models

Figure 1: (a) Schematic diagram of the th neuron in the th layer receiving spikes from neurons in the th layer at , , , and firing at . (b) Schematic diagram of time evolution of the membrane potential of the th neuron in the th layer with (red dashed line) and without (black solid line) a lower bound of the membrane potential. The upper and lower horizontal black dotted lines represent the values of and , respectively.

We adopted a simple integrate-and-fire neuron model in a multilayer network that was suitable for VLSI implementation [13][19]:

(1)
(2)

where is the membrane potential of the th neuron in the th layer, is the weight from the th neuron in the th layer to the th neuron in the th layer, and is the spike timing of the th neuron in the th layer. When the membrane potential exceeds the firing threshold , the neuron generates a spike. Then, the membrane potential is set to and the potential does not change again. Fig. 1 shows schematic diagram of the neuron model.

For image processing in SNNs, the normalized intensity of the th pixel of an image is converted into an input spike, with the timing given by

(3)

where is the time span of the input spikes and it is set to 5 ms. When , the th input spike is not generated. To improve the generalization ability, Gaussian noise was introduced into the timing of the input spikes in the training phase. The SNNs were trained with the backpropagation algorithm described in [19].

We investigated the following effects in the VLSI implementation of SNNs: (i) the discretization of the spike timing of the neurons, (ii) the quantization of the weights, (iii) the lower bound of the membrane potentials, and (iv) the temporal fluctuation of the firing threshold. In the remainder of this section, we explain the experimental setup used to investigate these effects. We note that training is performed offline, and only the inference is carried out on a VLSI circuit. Therefore, we did not consider effects (i) to (iv) when training the SNNs.

After training the SNNs, we discretized the timing of the spikes as follows:

(4)
(5)

where is a positive real number representing the time step of the spike timing and is an integer. The time-discretized spikes are sent to neurons in the subsequent layer. The synaptic weights are replaced with the quantized weights , where is a positive real number and is an integer given by .

The membrane potential provided by (1) can take any value less than . However, the range of the membrane potential is limited in analog VLSI circuits. To investigate the effects of such a limited range, we modified the neuron equation (1) so that the membrane potential could not take a value less than :

(6)

Fig. 1 (b) presents a schematic of the evolution of the membrane potential according to (6). The solid line represents the membrane potential when the lower bound of the membrane potential is not considered. The dashed line represents the membrane potential when the lower bound of the membrane potential is . In the latter case, the membrane potential is from time to time , where the sum of the inputs is less than 0. For , where the sum of the inputs is positive, the membrane potential begins to increase. As a result, the spike timing when the lower bound is considered is different from the spike timing when the lower bound is not considered. We note that when is sufficiently low, the membrane potentials obtained by (6) converge to the membrane potentials obtained by (1).

Finally, to investigate the effects of the temporal fluctuation of the firing threshold, we modeled the firing threshold as a time-dependent function . The value of

is drawn from the Gaussian distribution as

(7)

where is a Gaussian distribution with a mean of

and standard deviation of

. The standard deviation of the firing threshold is given by .

3.2 Results

In this section, we demonstrate the effects of VLSI circuit constrains on SNNs (784-800-10) for the MNIST

[12] and Fashion-MNIST [24] datasets.

Figure 2: Typical results of raster plots (left panels) and membrane potentials (right panels) for MNIST dataset. = 2 ms, = -0.5, and = 0.04. Weights are not quantized. The horizontal dashed lines in the right panels represent the values of the temporally fluctuating firing threshold .

Fig. 2 presents the timing of the spikes (raster plots) and time evolution of the membrane potentials for the MNIST dataset when = 2 ms, = -0.5, and = 0.04. In the raster plot, the timing of the spikes lay only on integer numbers multiplied by . The slope of the membrane potential changed only at time discretized by . Moreover, the membrane potential never took values lower than .

Figure 3: Classification accuracy of SNNs on MNIST and Fashion-MNIST datasets with time-discretization (left panels) and weight quantization (right panels).The horizontal dashed lines in all panels represent the performance when no circuit constraints are considered.

The left panels of Fig. 3 depicts the performance of the SNNs with time-discretized spikes for various values of . For both the MNIST and Fashion-MNIST datasets, the effects of the time discretization of the spikes at the output layer were more significant than those at the other layers. This can be attributed to the fact that the prediction results obtained by the network are represented by the earliest spike at the output layer. If the real timing difference between the earliest and second earliest spikes is less than , the network cannot predict the correct label when the timing of the spikes is discretized. The effects of the time discretization of the spike timing were substantially lower for the input layer. This may be because the coding of the input spikes includes “no spike” states, which are inherently robust against timing discretization.

The right panels of Fig. 3 presents the performance of the SNNs when the weights were quantized. In this figure, the horizontal axis represents the number of levels of the quantized weights, given by . We found that performance degradation could be avoided if the number of levels was greater than approximately 20.

Figure 4: Classification accuracy for various values of lower bound of membrane potential for (a) MNIST dataset and (b) Fashion-MNIST dataset. The distributions of the lowest membrane potentials in the output layer (c) and those of the lowest membrane potential before the earliest timing of spikes in the output layer (d) when = 0.6 ms for the above datasets. The horizontal dashed lines in (a) and (b) represent the performance when no circuit constraints are considered.

Figs. 4 (a) and (b) present the classification accuracy for various values of the lower bound of the membrane potential. Fig. 4 (c) shows the distribution of the minimum value of the membrane potential of the neurons at the output layer, given by

(8)

when was not considered. Because the membrane potentials were more likely to take lower values for the MNIST dataset than for the Fashion-MNIST dataset, the SNNs trained on the MNIST dataset were more vulnerable to the value of as indicated in Figs. 4 (a) and (b). We note that the performance was not affected when only the lower bound the membrane potential of neurons in the hidden layer were considered.

One may wonder why the SNNs trained on the MNIST dataset were almost not affected by the lower bound when = -1, even though the average of the minimum voltage given by (8) was much lower than -1 as shown in Fig. 4 (c). This is because the minimum values (8) were typically obtained after another neuron was fired, as shown in Fig. 2. To elucidate this fact, Fig. 4 (d) plots the distribution of the minimum membrane potential before the timing of the earliest spike:

(9)

We found that a large portion of the minimum membrane potentials given by (9) had distributions that were higher than -0.5.

Figure 5: Classification accuracy for various values of for (a) MNIST dataset and (b) Fashion-MNIST dataset. The horizontal dashed lines in (a) and (b) represent the performance when any circuit constraints are not considered.

Fig. 5 depicts the classification accuracy for various values of . We found that the performance of the SNNs was not significantly impaired if was less than 0.05 for the MNIST and Fashion-MNIST datasets.

4 Mapping of SNNs to Vlsi circuits

We demonstrate a procedure for mapping SNNs to VLSI circuits. First, we introduce a VLSI circuit model derived from Kirchhoff’s law. Second, we show how to map an SNN to the VLSI circuit optimally, assuming certain circuit parameters and constraints.

4.1 Circuit model

Figure 6: Schematic of

integrate-and-fire neuron circuit. Currents from many synapses are converted into voltage at the capacitor

. A spike is generated when the voltage exceeds a threshold.

We consider an integrate-and-fire model as illustrated in Fig. 6

, which is commonly employed in matrix–vector computation

[23][22][1][25] and SNNs [19, 17]. The time evolution of the membrane potential represented by the voltage at a capacitor is obtained by Kirchhoff’s current law, as follows:

(10)

where represents the controllable minimum current, is an integer, and is the capacitance. The timing of the firing is obtained as follows:

(11)
(12)

where is a real number representing the time step of the time-discrete spikes in the circuits.

By setting , the condition whereby (1) and (10) are equivalent can be obtained as follows:

(13)

4.2 Optimal mapping of SNNs to circuits

We used the circuit parameters displayed in Fig. 6 to study the mapping of SNNs to circuits. These values of parameters were determined typically from VLSI design based on the approach of time-domain analog computing with transient states (TACT), in which synapse currents were given by field-effect transistors operating in a subthreshold region [22, 25].

Figure 7: Classification accuracy on (a) MNIST dataset and (b) Fashion-MNIST dataset with circuit model for various values of . Classification accuracy is depicted when the circuit model is considered, (i) when only the effects of time discretization and weight quantization are considered, (ii) when only the effects of time discretization is considered, and (iii) when only the effects of weight quantization is considered. Note that the effects of the lower bound of the membrane potential and the temporal fluctuation of firing threshold are not included in (i)-(iii). The horizontal dashed lines in (a) and (b) represent the performance when no circuit constraints are considered.

Fig. 7 presents the classification accuracy for various values of . Note that the value of was determined using the values of and (13). For the MNIST dataset, the classification accuracy decreased almost monotonically as increased. For the Fashion-MNIST dataset, the classification accuracy exhibited a peak around = 0.4 ms. The performance of the SNN was mainly affected by the time discretization and weight quantization. To elucidate that, Fig. 7 also depicts the classification accuracy when only the time discretization and the weight quantization were considered and that when only the time discretization or the weight quantization was considered.

Figure 8: Average time of earliest spike at output layer for various values of .

As the network could cease its operation after the earliest spike was generated in the output layer, the energy efficiency increased as the average timing of the earliest spike became shorter. Fig. 8 indicates that the average timing of the earliest spike at the output layer decreased as a function of , because the time steps required to obtain the output spikes was small when was large (see Fig. 2) with the same .

The optimal choice of was determined by the trade-off between the classification accuracy (Fig. 7) and energy efficiency (Fig. 8). For example, if an accuracy of more than 98% was required with the minimum energy efficiency for the MNIST dataset, the optimal would be approximately 0.8 ms.

5 Conclusions

In this study, we evaluated the performance of SNNs with TTFS coding on the MNIST and Fashion-MNIST datasets when considering circuit constraints. We found that the SNNs with TTFS coding were robust against these circuit constraints. Although information is contained in the precise spike timing in SNNs, the performance of the SNNs was only slowly degraded as the time step of the time discretization increased. Only 20 or 30 quantized levels were sufficient for the weights to prevent performance impairment, which corresponded to 4- or 5-bit precision. The lower bound of the membrane potential was not critical if we used only the earliest spike as the prediction result. The temporal fluctuation of the firing threshold was also not critical.

We demonstrated the optimal mapping of SNN models with TTFS coding to VLSI circuits, assuming a reasonable set of circuit parameters. We confirmed that the optimal choice of the discretization parameters for the spike timing and weights is obtained by a trade-off between the classification accuracy and energy efficiency. In this study, we fixed the circuit parameters and searched for the optimal discretization parameters for the SNNs. It will be straightforward to expand this strategy so that one can also search for optimal circuit parameters such as the clock period and the capacitance of neurons . Further, we focused on a specific neuron model (1), however, the proposed techniques can be applied to other neuron models such as ones adopted in [16] and [3].

Acknowledgment

The authors would like to thank Kazumasa Yanagisawa and Masatoshi Yamaguchi from Floadia corporation for the fruitful discussion.

References

  • [1] M. Bavandpour, M. R. Mahmoodi, and D. B. Strukov (2019) Energy-efficient time-domain vector-by-matrix multiplier for neurocomputing and beyond. IEEE Transactions on Circuits and Systems II: Express Briefs. Cited by: §4.1.
  • [2] S. M. Bohte, J. N. Kok, and H. L. Poutré (2002-10) Error-backpropagation in temporally encoded networks of spiking neurons. Neurocomputing 48 (1), pp. 17–37. External Links: ISSN 0925-2312 Cited by: §1, §2, §2.
  • [3] I. M. Comsa et al. (2020) Temporal coding in spiking neural networks with alpha synaptic function. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 8529–8533. Cited by: §1, §2, §5.
  • [4] M. Davies et al. (2018) Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38 (1), pp. 82–99. Cited by: §1.
  • [5] P. U. Diehl et al. (2015-07) Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 1–8. External Links: ISSN 2161-4407 Cited by: §1.
  • [6] J. Göltz et al. (2020) Fast and deep neuromorphic learning with first-spike coding. In Proceedings of the Neuro-inspired Computational Elements Workshop, pp. 1–3. Cited by: §2.
  • [7] D. Huh and T. J. Sejnowski (2018) Gradient descent for spiking neural networks. In Advances in Neural Information Processing Systems, Vol. 31, pp. 1433–1443. Cited by: §1.
  • [8] S. R. Kheradpisheh and T. Masquelier (2020) Temporal backpropagation for spiking neural networks with one spike per neuron. International Journal of Neural Systems 30 (6), pp. 2050027. Cited by: §2.
  • [9] S. R. Kheradpisheh, M. Mirsadeghi, and T. Masquelier (2020) BS4NN: binarized spiking neural networks with temporal coding and learning. arXiv:2007.04039. Cited by: §2.
  • [10] J. Kim, K. Kim, and J. Kim (2020) Unifying activation-and timing-based learning rules for spiking neural networks. arXiv:2006.02642. Cited by: §2.
  • [11] S. Kim, S. Park, B. Na, and S. Yoon (2020) Spiking-YOLO: spiking neural network for energy-efficient object detection. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 34, pp. 11270–11277. Cited by: §1.
  • [12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner (1998-11) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. External Links: ISSN 0018-9219 Cited by: §3.2.
  • [13] W. Maass (1999) “Computing with spiking neurons,” in pulsed neural networks. pp. 55–85. Cited by: §3.1.
  • [14] P. A. Merolla et al. (2014) A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345 (6197), pp. 668–673. External Links: ISSN 0036-8075 Cited by: §1.
  • [15] S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri (2018-02) A scalable multicore architecture with heterogeneous memory structures for dynamic neuromorphic asynchronous processors (DYNAPs). IEEE Transactions on Biomedical Circuits and Systems 12 (1), pp. 106–122. External Links: ISSN 1932-4545 Cited by: §1.
  • [16] H. Mostafa (2018-07) Supervised learning based on temporal coding in spiking neural networks. IEEE Transactions on Neural Networks and Learning Systems 29 (7), pp. 3227–3235. External Links: ISSN 2162-237X Cited by: §1, §2, §5.
  • [17] S. Oh et al. (2020) Hardware implementation of spiking neural networks using time-to-first-spike encoding. arXiv:2006.05033. Cited by: §2, §4.1.
  • [18] M. Pfeiffer and T. Pfeil (2018) Deep learning with spiking neurons: opportunities and challenges. Frontiers in Neuroscience 12 (774), pp. 1–18. External Links: ISSN 1662-453X Cited by: §1.
  • [19] Y. Sakemi, K. Morino, T. Morie, and K. Aihara (2020) A supervised learning algorithm for multilayer spiking neural networks based on temporal coding toward energy-efficient VLSI processor design. arXiv:2001.05348. Cited by: §1, §2, §3.1, §3.1, §4.1.
  • [20] V. Sze, Y. Chen, T. Yang, and J. S. Emer (2017) Efficient processing of deep neural networks: a tutorial and survey. Proceedings of the IEEE 105 (12), pp. 2295–2329. Cited by: §2.
  • [21] A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, and A. Maida (2019) Deep learning in spiking neural networks. Neural Networks 111, pp. 47–63. External Links: ISSN 0893-6080 Cited by: §1.
  • [22] Q. Wang, H. Tamukoh, and T. Morie (2018) A time-domain analog weighted-sum calculation model for extremely low power VLSI implementation of multi-layer neural networks. arXiv:1810.06819. Cited by: §4.1, §4.2.
  • [23] Q. Wang, H. Tamukoh, and T. Morie (2016-09) Time-domain weighted-sum calculation for ultimately low power VLSI neural networks. In Neural Information Processing, Vol. 9947, pp. 240–247. External Links: ISBN 978-3-319-46687-3 Cited by: §4.1.
  • [24] H. Xiao, K. Rasul, and R. Vollgraf (2017-08-28)

    Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms

    .
    arXiv:1708.07747. Cited by: §3.2.
  • [25] M. Yamaguchi, G. Iwamoto, Y. Nishimura, H. Tamukoh, and T. Morie (2020) An energy-efficient time-domain analog cmos binaryconnect neural network processor based on a pulse-width modulation approach. IEEE Access 9 (), pp. 2644–2654. Cited by: §4.1, §4.2.