Neuromorphic Computing for Content-based Image Retrieval

08/04/2020 ∙ by Te-Yuan Liu, et al. ∙ 12

Neuromorphic computing mimics the neural activity of the brain through emulating spiking neural networks. In numerous machine learning tasks, neuromorphic chips are expected to provide superior solutions in terms of cost and power efficiency. Here, we explore the application of Loihi, a neuromorphic computing chip developed by Intel, for the computer vision task of image retrieval. We evaluated the functionalities and the performance metrics that are critical in context-based visual search and recommender systems using deep-learning embeddings. Our results show that the neuromorphic solution is about 3.2 times more energy-efficient compared with an Intel Core i7 CPU and 12.5 times more energy-efficient compared with Nvidia T4 GPU for inference by a lightweight convolutional neural network without batching, while maintaining the same level of matching accuracy. The study validates the longterm potential of neuromorphic computing in machine learning, as a complementary paradigm to the existing Von Neumann architectures.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 11

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Neuromorphic computing mimics the neural activity of the brain through emulating spiking neural networks. In numerous machine learning tasks, neuromorphic chips are expected to provide superior solutions in terms of cost and power efficiency. Here, we explore the application of Loihi, a neuromorphic computing chip developed by Intel, for the computer vision task of image retrieval. We evaluated the functionalities and the performance metrics that are critical in context-based visual search and recommender systems using deep-learning embeddings. Our results show that the neuromorphic solution is about 3.2 times more energy-efficient compared with an Intel Core i7 CPU and 12.5 times more energy-efficient compared with Nvidia T4 GPU for inference by a lightweight convolutional neural network without batching, while maintaining the same level of matching accuracy. The study validates the longterm potential of neuromorphic computing in machine learning, as a complementary paradigm to the existing Von Neumann architectures.

Introduction

Neuromorphic computing is a non-von Neumann computer architecture, aiming to obtain ultra-high-efficiency machines for a diverse set of information processing tasks by mimicking the temporal neural activity of the brain[7, 12, 1]

. In neuromorphic computing, numerous spiking signals carry information among computing units i.e. artificial neurons, synchronously or asynchronously

[10], forming a mesh-like, nonlinear dynamical system[9]. The information can be encoded in the temporal characteristics of the signals, for example firing rates.

In this work, we implement and analyze a low-latency computer vision model for visual search engines and recommender systems that evaluate the visual similarity between a query image and a database of product images. In conventional machine learning pipelines, this is often performed by transfer learning using a deep convolutional neural network pre-trained on a large-scale dataset e.g., ImageNet

[3] and fine-tuned on a domain-specific image dataset e.g., DeepFashion2 for apparel[4]. The embeddings of the images are calculated by inferring the activation values of the last few layers of the neural network. The distances between embeddings of the query image and the database images are used to find the nearest neighbors for the query image in the embeddings space, identifying the most similar items visually.

We would like to evaluate the same visual search and recommendation technique using transfer learning and embeddings by the neuromorphic neural network. Our study of neuromorphic computing for machine learning begins with training spiking convolutional neural networks on general image classification datasets, followed by transfer learning to a clothing-specific dataset. The trained spiking neural networks are then used for extraction of embeddings for product images and sample query images. The embeddings will be based on the patterns of the temporal spikes, and similar to the conventional convolutional neural networks, they are used for finding nearest visual neighbors of the query image among product images. Our results show considerable power efficiency in finding the most visually similar products using neuromorphic chips and particularly Loihi[2].

Methods

Fashion-MNIST is the dataset used in our experiments. We select this dataset because it is popular for benchmarking small-footprint computer vision models, and we are interested in image search on a dataset closely related to the retail applications. Note that we use the dataset without data augmentation in our experiments.

To explore applications of neuromorphic computing, we have built and deployed a spiking neural network (SNN) on Intel’s Loihi neuromorphic chip for image search. Our image search pipeline is shown in Fig.Methods. Firstly, we convert a trained artificial neural network (ANN) into a spiking neural network (SNN) and deploy it on Loihi chip. We then feed training and test images into the SNN and probe the neurons of the layer before the output layer to get image embeddings. Finally, Nearest Neighbor Search is employed on CPU cores to find the best matches for each test image.

figure[caption] Image search pipeline by spiking neural network.

In the first step, we train different ANNs and convert them into SNNs. Then, we compare the performance of SNNs and select our optimal SNN model. Suggested by [6] and [11]

, we replace lateral operations in ANNs such as max pooling with average pooling to reduce feature map size, and employ dropout to regularize.

Note that there are two constraints on the neural network architectures that can be deployed on Loihi chips. One constraint is that the synaptic memory, which stores neuron weights per neuromorphic core is 128 KB. This indicates that the number of parameters associated with neurons in a core is limited. The other constraint is the fan-in maximum per neuromorphic core is 4,096, which means the input size of each neuron cannot exceed 4,096. These two constraints result in neural networks deployed on Loihi chip to have slim rather than wide layers.

Given an ANN, the conversion is done through building a SNN which has the same architecture as the ANN, but changing the neuron type to Leaky, Integrate and Fire (LIF) neuron with soft-reset, which is a variant of Residual Membrane Potential (RMP) neuron proposed in [5]. Then, floating-point ANN parameters are scaled to integers and transplanted to the SNN as Loihi chip executes operations with integer numbers. The spiking threshold of each LIF neuron is determined at the same time as the parameter scaling, using a method provided by Loihi NxSDK[8]. The method of parameter scaling and threshold calculation is shown in Algorithm 1.

1:Normalized Input:
2:
3:for , in zip(, do
4:     if  is input layer then
5:         
6:         
7:     else
8:         
9:         
10:         
11:         
12:         
13:         
14:         
15:         
16:     end if
17:     
18:     
19:     
20:     
21:end for
Algorithm 1 Parameter scaling and threshold calculation

Similar to the spike-norm algorithm proposed in [11], a set of images are fed into the network and the threshold at each layer is set to the maximum activation at that layer. However, Loihi chip uses a rate-based simulation of SNN instead of doing the actual SNN forward-pass to calculate the spiking thresholds.

In Algorithm 1, there are two important variables. One is named , which gives the factor we use to scale the ANN parameters to integers to get the SNN parameters. The other one is named , which is the spiking threshold that decides the LIF neuron spiking activity.

Algorithm 1 requires input images, and they are represented as a by matrix with floating-point elements ranging between zero and one, for the number of images and for the number of pixels per image. Line 1 set the , , and variables. If we use 9 bits to represent SNN weights on Loihi chip, then the maximum weight is . We set the maximum bias in the same way. The variable shows the ratio between the SNN neuron output and the ANN neuron output at the current layer and is initialized to one.

In line 2, we get and its corresponding . From line 3 to 5, if is the input layer which encodes input image into spike time series, we set to and multiply by to get the , which is the neuron membrane potential increment rate. From line 6 to 14, if the is not the input layer, we have to scale ANN parameters and set the SNN parameters.

In line 7, we get the ANN and from . Then in line 8, we multiply by to update with the scaling of the previous layer. In line 9, we set as the maximum absolute value of and do likewise to set . Then in line 10, we set as the ratio between and to find out how many times we can scale up without exceeding , and we do the same thing to calculate . In line 11, we compare and to set the smaller value to . In line 12 and 13, we use to scale the ANN and , quantizing them to integers, and set them as the parameters of . In line 14, we calculate by simulating the ANN neuron activation.

In line 16 and 17, we set the threshold of neurons at to the quantized maximum value of . Then, in line 18, we calculate the

, an estimation of the spiking probability of neurons, as the output of

. In line 19, we update by multiplying it with the ratio of and , which is also the ratio between the output of and .

Now having a SNN at hand, we start feeding images into the network. For each image, we probe the neurons of the layer before the output layer at the last execution time step to get the neuron membrane potentials. The membrane potential vector is then the embedding of the input image.

Our SNN takes images in the training and test set as inputs and outputs their embeddings. We see the training image embeddings as a corpus. For each test image, we apply Nearest Neighbor Search using Cosine Similarity to find images in the corpus that are the closest to the test image in the embedding space.

Results

We have implemented and tested 3-layer, 4-layer, and 5-layer SNNs for classification of Fashion-MNIST dataset. The architectures we have experimented on are shown in Table 1. In the architecture column, number of convolutional kernels (number of output channels) in each layer are concatenated by hyphens. Note that the last architecture in Table 1 is not deployable because the maximum fan-in of Loihi chip is exceeded. The fourth SNN architecture in Table 1 scores the best classification test accuracy and its architecture is shown in Fig. 1. This SNN consists of three layers, including two convolutional layers and one dense layer. We use this SNN to conduct image search and get the following results.

ANN ANN ANN SNN
Architecture train validation test test
acc. (%) acc. (%) acc. (%) acc. (%)
4-8-10 85.03 85.37 83.96 83.73
8-16-10 89.11 88.43 87.41 86.98
16-32-10 89.41 88.72 87.85 87.41
32-64-10 93.99 90.68 90.07 90.01
4-8-16-10 87.47 87.20 86.16 86.02
8-16-32-10 90.43 89.03 88.40 76.54
16-32-64-10 90.36 89.25 87.91 87.85
32-64-128-10 93.37 90.08 89.43 88.18
4-8-16-32-10 89.14 88.57 87.87 87.49
8-16-32-64-10 92.11 90.20 89.51 84.19
16-32-64-128-10 94.27 90.60 89.76 85.79
32-64-128-256-10 93.38 90.78 90.21 N/A
Table 1: Architectures we have experimented on and their performance.

figure[caption] Architecture of our best SNN.

The SNN layer partition on a Loihi chip is shown in Fig. 1. There are 128 neuromorphic cores on a Loihi chip with 16 cores a row and there are 8 rows. We can see each layer occupies certain number of the neuromorphic cores. Our best performing SNN is relatively compact, so the number of cores occupied is small compared with the number of cores available on a Loihi chip.

figure[caption] SNN layer distribution on Loihi chip.

SNN has an intrinsic execution time parameter, called number of time steps, which is used to define how many discrete time slots are given to the network to process information during inference. It is intuitive that the more time steps we give our SNN to process the information, the higher performance we get, but the runtime is also larger. This tradeoff between performance and number of time steps is shown in Fig. 1. We can see that performance metrics skyrocket between 4 time steps and 16 time steps and then plateau, showing that using 16 time steps is enough to achieve certain degree of performance. The error bars indicate the negligible variations among five independently trained networks, displaying reproducibility of our results.

figure[caption] Tradeoff between the performance metrics and the number of time steps.

The performance comparison between the selected SNN and its ANN counterpart is shown in Table 2. Note that the number in the parentheses next to the model type is the number of time steps used per sample during SNN inference. The ANN and SNN have the same network architecture but different neuron types and parameters. We can see that the SNN using 128 time steps have accuracies and mean average precision very close to the ANN, indicating that the SNN is capable of achieving comparable performance with its ANN counterpart. Using fewer time steps, e.g., 16 time steps, our SNN suffers a classification accuracy degradation, but the gap is smaller than 5%. However, the top-1 and top-3 accuracies of the SNN with 16 time steps is still very close to the ANN. This means that the SNN with 16 time steps per inference generates reasonable embeddings, useful in visual search and image retrieval.

Model Type Classification Top-1 Top-3 Mean Average
Accuracy (%) Accuracy (%) Accuracy (%) Precision
ANN 90.07 87.49 94.55 0.5212
SNN (16) 85.05 85.55 93.56 0.4797
SNN (128) 90.01 86.58 93.93 0.4919
Table 2: Accuracy and mean average precision comparison.

A few image search examples are shown in Fig. 2. The first column shows ten query images, each from a class in the dataset. The next three columns present the three images randomly selected from the corpus with the same class label as the query images. The next three columns demonstrate the three images selected by image search from the corpus using ANN-generated image embeddings. The last three columns show the three images selected by image search from the corpus using SNN-generated image embeddings. It is obvious that image search results, either using ANN or SNN, are visually closer to the query images compared with images selected randomly from the corpus. Again, our SNN using Loihi chip demonstrates comparable performance with the ANN.

figure[caption] Image search examples.

The neural network inference latency (forward-pass runtime) comparison between the selected SNN and its ANN counterpart is shown in Table 3. Note that Loihi chip does not support batch sizes larger than one. We can see that when batch size equals one, the SNN on Loihi using 16 time steps has approximately 13.8x/11.3x larger runtime than the ANN on Xeon/i7 CPU and 2.3x/2.4x than the ANN on V100/T4 GPU. The difference is even bigger if we use larger batch size on CPU or GPU. It is obvious that SNN on Loihi does not have an advantage on inference latency in this case as large batch size is not supported and SNN’s time step property makes it naturally take more execution time. Reducing the runtime is a direction where we look forward the neuromorphic hardware to improve on and enable SNNs to be executed in batches with smaller runtime.

Model Type Batch Size Hardware Runtime
per Sample (ms)
ANN 1 Intel Xeon CPU (Gold 6148) 0.216
ANN 262144 Intel Xeon CPU (Gold 6148) 0.0073
ANN 1 Intel CPU (i7-8750H) 0.2634
ANN 128 Intel CPU (i7-8750H) 0.013
ANN 1 Nvidia GPU (V100) 1.296
ANN 4096 Nvidia GPU (V100) 0.0075
ANN 1 Nvidia GPU (T4) 1.204
ANN 4096 Nvidia GPU (T4) 0.010
SNN (16) 1 Loihi chip (neuromorphic cores) 2.984
SNN (128) 1 Loihi chip (neuromorphic cores) 11.976
Table 3: Inference latency comparison.

The power comparison between the selected SNN and its ANN counterpart is shown in Table 4. With batch size set to one, the SNN with 16 time steps uses 217.0x/24.0x less power than the ANN on Xeon/i7 CPU and 40.8x/31.3x than the ANN on V100/T4 GPU. This is where neuromorphic hardware starts to shine as it consumes way less power than conventional hardware. Utilizing the spiking sparsity of SNN appropriately, we believe the neuromorphic hardware can further reduce its power consumption. Another thing that we can observe from Table 4 is that static (ideal) power dominates the power consumption of Loihi chips. We think if the architecture is sparsely activated, static power should be much less than the dynamic power such that the neuromorphic hardware consumes little power while idle.

Note that we use energy probes provided by Loihi NxSDK to get power and energy measurements of Loihi chips. For the CPU, we use Intelligent Platform Management Interface (IPMI) and system profiler information to measure power and integrate power readings to get energy. For the GPUs, we use Nvidia System Management Interface (nvidia-smi) to measure power and integrate power readings to get energy.

Model Batch Hardware Static Dynamic Total
Type Size Power (W) Power (W) Power (W)
ANN 1 Intel Xeon CPU (Gold 6148) 196 19.1 215.1
ANN 262144 Intel Xeon CPU (Gold 6148) 196 44.189 240.189
ANN 1 Intel CPU (i7-8750H) 22 1.805 23.805
ANN 128 Intel CPU (i7-8750H) 22 5.633 27.633
ANN 1 Nvidia GPU (V100) 24 16.441 40.441
ANN 4096 Nvidia GPU (V100) 24 20.511 44.511
ANN 1 Nvidia GPU (T4) 17 14.049 31.049
ANN 4096 Nvidia GPU (T4) 17 18.228 35.228
SNN (16) 1 Loihi chip (neuromorphic cores) 0.946 0.044 0.991
SNN (128) 1 Loihi chip (neuromorphic cores) 0.952 0.064 1.016
Table 4: Power comparison.

Discussion

We measured the total energy used per inference (forward pass) reported in Table 5. These results can also be estimated by combining the results of Table 3 and Table 4. As summarized in Table 5, with batch size set to one, the energy consumption of SNN with 16 time steps is 15.6x/3.2x smaller than ANN on Xeon/i7 CPU and 17.5x/12.5x than the ANN on V100/T4 GPU per inference. This proves the benefits of the neuromorphic hardware in the low energy budget application of machine learning, particularly image search engines and visual recommender systems. It is apparent that when large batch size is used, CPUs and GPUs have better numbers on energy per sample, but we believe there are use cases where inference is executed in small batches and they are the targets for neuromorphic hardware in the current stage.

Model Type Batch Size Hardware Energy per Energy per
Sample (mJ) Sample (relative)
ANN 1 Intel Xeon CPU (Gold 6148) 46.787 15.6x
ANN 262144 Intel Xeon CPU (Gold 6148) 1.753 0.585x
ANN 1 Intel CPU (i7-8750H) 9.522 3.178x
ANN 128 Intel CPU (i7-8750H) 0.316 0.105x
ANN 1 Nvidia GPU (V100) 52.399 17.5x
ANN 4096 Nvidia GPU (V100) 0.337 0.112x
ANN 1 Nvidia GPU (T4) 37.399 12.5x
ANN 4096 Nvidia GPU (T4) 0.366 0.12x
SNN (16) 1 Loihi chip (neuromorphic cores) 2.996 1x
SNN (128) 1 Loihi chip (neuromorphic cores) 12.17 4.0x
Table 5: Inference energy comparison.

Another observation is that the energy consumption for a small number of time steps does not scale linearly. For example, the energy consumption per inference for 128 time steps is only 4.0 times larger than 16 time steps (Table 5). This is due to the constant portion of the energy needed for running each inference, which does not change by the number of time steps.

Our results confirm the energy efficiency of the Loihi neuromorphic chip. However, we noticed that the inference latency becomes impractically large when a network of Loihi chips are used. We ponder this is due to the interchip communication latencies. Nowadays in many applications, deep neural networks models with millions of parameters and billions of intermediate activations are used. Neuromorphic chips need to scale up, possibly by increasing the number of neuromorphic cores and on-chip memory, to support these applications in future.

The energy efficiency obtained by the Loihi chip in our experiments is owing to the specialized functionalities of the neuromorphic cores. This efficiency is very similar to that of other specialized accelerators e.g., graphical processing units (GPUs). The typical conversion methods from ANN to SNN, including Algorithm 1 used here, do not capitalize on temporal sparsity, possible on the neuromorphic processors, as in the brain. So, designing better training and conversion algorithms to employ temporally sparse signals for neuromorphic machine learning is a promising future direction.

Finally, it is worthwhile to emphasize that to implement the complete image retrieval pipeline, we performed the nearest neighbors search on the CPU cores. While Loihi incorporates 3 Lakemont cores, they were not powerful enough for the task. We believe that the CPU cores are always needed for some stages of a machine learning pipeline, so the role of neuromorphic computing is to improve the performance of some tasks and supplement the general-purpose processors.

Conclusion

We studied the application of the Loihi chip, a neuromorphic computing hardware developed by Intel, in image retrieval using deep-learning embeddings. Our results show that the generation of the deep learning embeddings by spiking neural networks for computer vision applications is about 3.2 times more energy-efficient compared with a CPU and 12.5 times more energy-efficient compared with a GPU. We confirm the longterm potential of neuromorphic computing in machine learning, not as a replacement for the predominant Von Neumann architecture, but as accelerated coprocessors.

Acknowledgements

We would like to thank Hari Govind at Target and Andreas Wild at Intel for helpful suggestions and constructive comments.

References

References

  • [1] G. Cauwenberghs (1998) Neuromorphic learning VLSI systems: a survey. In Neuromorphic systems engineering, pp. 381–408. Cited by: Introduction.
  • [2] M. Davies, N. Srinivasa, T. Lin, G. Chinya, P. Joshi, A. Lines, A. Wild, and H. Wang (2018-01) Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro PP, pp. 1–1. External Links: Document Cited by: Introduction.
  • [3] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: A large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    ,
    pp. 248–255. Cited by: Introduction.
  • [4] Y. Ge, R. Zhang, X. Wang, X. Tang, and P. Luo (2019)

    DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images

    .
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5337–5345. Cited by: Introduction.
  • [5] B. Han, G. Srinivasan, and K. Roy (2020) RMP-snn: residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: Methods.
  • [6] E. Hunsberger and C. Eliasmith (2016) Training spiking deep networks for neuromorphic hardware. arXiv preprint arXiv:1611.05141. Cited by: Methods.
  • [7] C. D. James, J. B. Aimone, N. E. Miner, C. M. Vineyard, F. H. Rothganger, K. D. Carlson, S. A. Mulder, T. J. Draelos, A. Faust, M. J. Marinella, et al. (2017) A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications. Biologically Inspired Cognitive Architectures 19, pp. 49–64. Cited by: Introduction.
  • [8] C. Lin, A. Wild, G. N. Chinya, T. Lin, M. Davies, and H. Wang (2018) Mapping spiking neural networks onto a manycore neuromorphic architecture. ACM SIGPLAN Notices 53 (4), pp. 78–89. Cited by: Methods.
  • [9] A. Neckar, S. Fok, B. V. Benjamin, T. C. Stewart, N. N. Oza, A. R. Voelker, C. Eliasmith, R. Manohar, and K. Boahen (2018) Braindrop: a mixed-signal neuromorphic architecture with a dynamical systems-based programming model. Proceedings of the IEEE 107 (1), pp. 144–164. Cited by: Introduction.
  • [10] C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. Rose, and J. S. Plank (2017) A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963. Cited by: Introduction.
  • [11] A. Sengupta, Y. Ye, R. Wang, C. Liu, and K. Roy (2019) Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in neuroscience 13. Cited by: Methods, Methods.
  • [12] T. Wunderlich, A. F. Kungl, E. Müller, A. Hartel, Y. Stradmann, S. A. Aamir, A. Grübl, A. Heimbrecht, K. Schreiber, D. Stöckel, et al. (2019) Demonstrating advantages of neuromorphic computation: a pilot study. Frontiers in Neuroscience 13, pp. 260. Cited by: Introduction.