An event camera is a dynamic vision sensor that responds to the changes of brightness at any pixel location. Lower power consumption, higher dynamic range and high temporal resolution are the advantages of event cameras over conventional shutter cameras. These advantages make an event sensor suitable for real-life applications that rely on fast responses, time and scene illumination. Such scenarios build the ground for the development of artificial systems that are dependent on the process of continuous learning. These aspects motivated us to use event-based data for lifelong learning. The architectures for lifelong learning that are based on a pre-trained feature extractor deliver state-of-the-art results [10, 11]. However, these architectures consider data that are produced by a conventional shutter camera. In our paper, we show that lifelong learning from event-based data can follow the same strategy. Furthermore, the application of event cameras opens new possibilities for the development of novel learning systems in dynamic environments.
Since each pixel in an event camera responds to brightness change independently, the generated asynchronous output carries challenges for the processing of such data. The event-based sequences can be processed by event-by-event methods or methods that group events . The event-by-event methods process events sequentially. As an example of such a method, Phased LSTM  is an extension to LSTM  and introduces a new time gate. This gate allows the updates to a memory cell only during some specific periods. Another approach is to group events to image-like data. A histogram is one of the possibilities to convert events to the event frame 
. To convert events to a histogram, the occurrences of brightness change at any pixel location over a particular period of time are counted. Specifically, the events that refer to a brightness increase are stored in one histogram, and the events that capture brightness decrease are saved to another histogram. Two histograms act then as two input channels to a convolutional neural network (CNN). Since an event camera reacts only to brightness change, a lot of pixel locations in a histogram can contain no values. Thus, a histogram represents the edges of a scene captured by an event camera. However, a conventional convolutional operation causes dilation when the input is sparse. Therefore, a sparse CNN that preserves sparseness is a more reasonable choice.
. Regularization-based methods restrict the updates to the model’s parameters that are important for encoding previous knowledge. One of these methods that estimates this importance is synaptic intelligence
which introduces a regularization term that is added to the total loss to penalize changes to important parameters while learning a new task. We will present and evaluate a simpler method using a neuron habituation mechanism. The methods that utilize replay mechanisms store either some previous samples or learn the representations for previously learned data. A generative model, in particular, a variational autoencoder can be used to learn latent representations of data. We show that the architecture for lifelong learning from event-based data can utilize the same methods that are applied to frame-based images.
We propose an architecture that consists of a feature extractor and a component for continuous learning, visualized in Figure 1. To the best of our knowledge, there are no approaches for direct comparison. Lungu et al. used a memory-based method for incremental learning of hand gestures . However, it is questionable if their approach is extensible to scenarios in which the input is more complex.
3.1 Feature Extraction
To design a feature extraction module, we compare Phased LSTM and sparse CNN as possible models to extract features from events. We train Phased LSTM in a supervised way and sparse CNN in a self-supervised way following the batch learning strategy. On the one hand, we use Phased LSTM as the event-by-event method, and on the other hand, we utilize Sparse CNN that learns from histograms as grouped events. Based on the results provided in Section 4
, we select only the self-supervised approach using sparse CNN for the feature extraction module. Self-supervised learning is a subset of unsupervised learning, where no labels are provided during training. We follow the same strategy for self-supervised learning proposed by Chen et al.. However, instead of frame-based input, we provide events as histograms. The model applies random augmentations directly to histograms, thus learning in a contrastive way by maximizing the agreement between two augmented representations of the same object (Figure 1, bottom).
3.2 Continuous Learning
The module for continuous learning operates on the features provided by the feature extraction module. The learning process follows the incremental strategy, where a model has access only to some object categories during the learning episode. Thus, a learning episode contains only a subset of non-repeating objects that belong to the same category. We base our model on the method proposed by Ven et al. called brain-inspired replay 
. It uses a variational autoencoder (VAE) that is trained together with a classifier (Figure1, top).
Additionally, we introduce a habituation-based method to mitigate catastrophic forgetting while learning incrementally. This method utilizes the concept of habituation that was successfully applied to self-organizing neural networks. Habituation is the reduction of responses to repeated stimuli. We quip each neuron in the last fully connected layer of an encoder with a habituation counter, which is initialized with . During training, only a part of neurons with the highest activation values are habituated. We slightly modify the habituation update rule presented in  and define it as follows:
where is a habituation counter of a neuron , is the decay rate that controls a steepness of decay. Consequently, a habituation counter is a monotonically decreasing function. To constrain the changes to the model’s parameters, the gradient of a neuron is scaled by the habituation counter of this neuron during each learning iteration:
where is a partial derivative and
is a loss function. Although the habituation-based regularization method is similar to synaptic intelligence, it can be utilized in each layer of a neural network with different values for, thus providing more plasticity during learning. We compare different combinations of brain-inspired-replay, synaptic intelligence and habituation to investigate the effect of the habituation-based approach on the mitigation of catastrophic forgetting while learning incrementally.
4 Experimental Results
We train and evaluate the proposed model on the N-Caltech101 dataset . This dataset contains event-based representations from static images. While an image was shown on a screen, an event camera made three saccadic movements to record events. We use the same training and test sets as used in . The code is available from http://software.knowledge-technology.info. To first evaluate the feature extraction module under optimal conditions, the whole training set is utilized to learn a feature extraction module following the batch learning strategy. Each sample in the dataset can contain dozens of thousands of events, thus the training time of Phased LSTM becomes intractable since Phased LSTM processes events sequentially. Therefore, we randomly select only of events, but at least . Histograms are created from consecutive events; however, this interval of events is randomly placed over the whole sequence of events. Table 1 shows the classification accuracy of Phased LSTM and the classification accuracy of a linear classifier trained on top of sparse CNN. Although Phased LSTM achieves worse results, it operates on a portion of events, which can lead to a drop in performance. Yet, using even of events causes a huge overhead in training time when compared to sparse CNN. Furthermore, a feature extractor that is trained in a supervised way on the same training set that is used for the continuous learning module is not a fair condition. Thus, either a feature extractor that is trained without labels or a feature extractor that is used to extract features from different data is a more reasonable approach. Based on these conditions, we use sparse CNN as a feature extractor for the continuous learning module.
|Training||Test||Training||Test Top-1||Test Top-5|
To evaluate the proposed habituation-based method, we combine habituation with the brain-inspired replay (BIR) and synaptic intelligence (SI) methods. All hyper-parameters are the same for all methods to provide a fair comparison. The strength parameter of SI was found by a grid search and is set to
. The habituation-based method (H) has two hyperparameters: a decay rateand the fraction of neurons with the highest activation values that are allowed to be habituated during each learning iteration. We set to and for the BIR+H and BIR+SI+H methods, respectively. For the strategies BIR+H and BIR+SI+H the values for are set to and , respectively. Figure 2
illustrates class-incremental learning on N-Caltech101. The number of learning episodes is set to 20. Each learning episode contains samples from 5 different non-repeating object categories. The shaded areas show the standard error of the mean. The experiment was executed for three trials, and each trial, a new seed and the random order of classes were used. The BIR+H and BIR+SI methods achieve after learning data of all episodes on average the classification accuracy ofand , respectively. The addition of the habituation-based method to BIR+SI provides a slight but significant increase in test accuracy: .
We presented an architecture for lifelong learning, consisting of a feature extractor and a module for continuous learning. We showed that the Phased LSTM is not a favourable method for learning long event-based sequences. The Sparse CNN trained in a self-supervised way achieves better results but histograms discard short time-scale information. A combination of brain-inspired replay and synaptic intelligence with a simple habituation method, which was previously applied to self-organizing neural networks, yields the best performance over class-incremental learning of 100 classes. Furthermore, with this presented approach in this paper we provide useful insights into the application of event cameras for real-life scenarios, in which incremental accumulation of knowledge is crucial.
T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton.
A simple framework for contrastive learning of visual
Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, 2020.
-  G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza. Event-based vision: A survey. CoRR, abs/1904.08405, 2019.
-  B. Graham, M. Engelcke, and L. van der Maaten. 3D semantic segmentation with submanifold sparse convolutional networks. In , pages 9224–9232, 2018.
-  S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997.
-  I. A. Lungu, S.-C. Liu, and T. Delbruck. Incremental learning of hand symbols using event-based cameras. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9:690–696, 2019.
A. I. Maqueda, A. Loquercio, G. Gallego, N. García, and D. Scaramuzza.
Event-based vision meets deep learning on steering prediction for self-driving cars.In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 5419–5427. IEEE Computer Society, 2018.
-  N. Messikommer, D. Gehrig, A. Loquercio, and D. Scaramuzza. Event-based asynchronous sparse convolutional networks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VIII, volume 12353 of Lecture Notes in Computer Science, pages 415–431. Springer, 2020.
-  D. Neil, M. Pfeiffer, and S.-C. Liu. Phased LSTM: Accelerating recurrent network training for long or event-based sequences. In Advances in Neural Information Processing Systems 29, pages 3882–3890. Curran Associates, Inc., 2016.
-  G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor. Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience, 9:437, 2015.
-  G. I. Parisi, J. Tani, C. Weber, and S. Wermter. Lifelong learning of spatiotemporal representations with dual-memory recurrent self-organization. Frontiers in Neurorobotics, 12:78, 2018.
-  G. M. van de Ven, H. T. Siegelmann, and A. S. Tolias. Brain-inspired replay for continual learning with artificial neural networks. Nature Communications, 11:4069, 2020.
-  F. Zenke, B. Poole, and S. Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3987–3995. PMLR, 2017.