Lifelong Learning from Event-based Data

by   Vadym Gryshchuk, et al.

Lifelong learning is a long-standing aim for artificial agents that act in dynamic environments, in which an agent needs to accumulate knowledge incrementally without forgetting previously learned representations. We investigate methods for learning from data produced by event cameras and compare techniques to mitigate forgetting while learning incrementally. We propose a model that is composed of both, feature extraction and continuous learning. Furthermore, we introduce a habituation-based method to mitigate forgetting. Our experimental results show that the combination of different techniques can help to avoid catastrophic forgetting while learning incrementally from the features provided by the extraction module.



There are no comments yet.


page 1

page 2

page 3

page 4


Measuring Catastrophic Forgetting in Neural Networks

Deep neural networks are used in many state-of-the-art systems for machi...

Representation Memorization for Fast Learning New Knowledge without Forgetting

The ability to quickly learn new knowledge (e.g. new classes or data dis...

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

A central challenge in developing versatile machine learning systems is ...

Intentional Forgetting

Many damaging cybersecurity attacks are enabled when an attacker can acc...

Uncertainty-based Modulation for Lifelong Learning

The creation of machine learning algorithms for intelligent agents capab...

Dynamic memory to alleviate catastrophic forgetting in continuous learning settings

In medical imaging, technical progress or changes in diagnostic procedur...

Forgetting to learn logic programs

Most program induction approaches require predefined, often hand-enginee...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An event camera is a dynamic vision sensor that responds to the changes of brightness at any pixel location. Lower power consumption, higher dynamic range and high temporal resolution are the advantages of event cameras over conventional shutter cameras. These advantages make an event sensor suitable for real-life applications that rely on fast responses, time and scene illumination. Such scenarios build the ground for the development of artificial systems that are dependent on the process of continuous learning. These aspects motivated us to use event-based data for lifelong learning. The architectures for lifelong learning that are based on a pre-trained feature extractor deliver state-of-the-art results [10, 11]. However, these architectures consider data that are produced by a conventional shutter camera. In our paper, we show that lifelong learning from event-based data can follow the same strategy. Furthermore, the application of event cameras opens new possibilities for the development of novel learning systems in dynamic environments.

2 Background

Since each pixel in an event camera responds to brightness change independently, the generated asynchronous output carries challenges for the processing of such data. The event-based sequences can be processed by event-by-event methods or methods that group events [2]. The event-by-event methods process events sequentially. As an example of such a method, Phased LSTM [8] is an extension to LSTM [4] and introduces a new time gate. This gate allows the updates to a memory cell only during some specific periods. Another approach is to group events to image-like data. A histogram is one of the possibilities to convert events to the event frame [6]

. To convert events to a histogram, the occurrences of brightness change at any pixel location over a particular period of time are counted. Specifically, the events that refer to a brightness increase are stored in one histogram, and the events that capture brightness decrease are saved to another histogram. Two histograms act then as two input channels to a convolutional neural network (CNN). Since an event camera reacts only to brightness change, a lot of pixel locations in a histogram can contain no values. Thus, a histogram represents the edges of a scene captured by an event camera. However, a conventional convolutional operation causes dilation when the input is sparse. Therefore, a sparse CNN that preserves sparseness is a more reasonable choice


Methods that are successfully used to mitigate catastrophic forgetting rely on regularization-based techniques or use replay mechanisms [10, 11]

. Regularization-based methods restrict the updates to the model’s parameters that are important for encoding previous knowledge. One of these methods that estimates this importance is synaptic intelligence


which introduces a regularization term that is added to the total loss to penalize changes to important parameters while learning a new task. We will present and evaluate a simpler method using a neuron habituation mechanism. The methods that utilize replay mechanisms store either some previous samples or learn the representations for previously learned data. A generative model, in particular, a variational autoencoder can be used to learn latent representations of data

[11]. We show that the architecture for lifelong learning from event-based data can utilize the same methods that are applied to frame-based images.

3 Approach

We propose an architecture that consists of a feature extractor and a component for continuous learning, visualized in Figure 1. To the best of our knowledge, there are no approaches for direct comparison. Lungu et al. used a memory-based method for incremental learning of hand gestures [5]. However, it is questionable if their approach is extensible to scenarios in which the input is more complex.

Figure 1: Illustration of the proposed architecture. Sparse CNN trained in a self-supervised way extracts features from the events represented as histograms. The continuous learning component learns incrementally from the extracted features by utilizing replay through the VAE, synaptic intelligence, and habituation.

3.1 Feature Extraction

To design a feature extraction module, we compare Phased LSTM and sparse CNN as possible models to extract features from events. We train Phased LSTM in a supervised way and sparse CNN in a self-supervised way following the batch learning strategy. On the one hand, we use Phased LSTM as the event-by-event method, and on the other hand, we utilize Sparse CNN that learns from histograms as grouped events. Based on the results provided in Section 4

, we select only the self-supervised approach using sparse CNN for the feature extraction module. Self-supervised learning is a subset of unsupervised learning, where no labels are provided during training. We follow the same strategy for self-supervised learning proposed by Chen et al.

[1]. However, instead of frame-based input, we provide events as histograms. The model applies random augmentations directly to histograms, thus learning in a contrastive way by maximizing the agreement between two augmented representations of the same object (Figure 1, bottom).

3.2 Continuous Learning

The module for continuous learning operates on the features provided by the feature extraction module. The learning process follows the incremental strategy, where a model has access only to some object categories during the learning episode. Thus, a learning episode contains only a subset of non-repeating objects that belong to the same category. We base our model on the method proposed by Ven et al. called brain-inspired replay [11]

. It uses a variational autoencoder (VAE) that is trained together with a classifier (Figure 

1, top).

Additionally, we introduce a habituation-based method to mitigate catastrophic forgetting while learning incrementally. This method utilizes the concept of habituation that was successfully applied to self-organizing neural networks

[10]. Habituation is the reduction of responses to repeated stimuli. We quip each neuron in the last fully connected layer of an encoder with a habituation counter, which is initialized with . During training, only a part of neurons with the highest activation values are habituated. We slightly modify the habituation update rule presented in [10] and define it as follows:


where is a habituation counter of a neuron , is the decay rate that controls a steepness of decay. Consequently, a habituation counter is a monotonically decreasing function. To constrain the changes to the model’s parameters, the gradient of a neuron is scaled by the habituation counter of this neuron during each learning iteration:


where is a partial derivative and

is a loss function. Although the habituation-based regularization method is similar to synaptic intelligence, it can be utilized in each layer of a neural network with different values for

, thus providing more plasticity during learning. We compare different combinations of brain-inspired-replay, synaptic intelligence and habituation to investigate the effect of the habituation-based approach on the mitigation of catastrophic forgetting while learning incrementally.

4 Experimental Results

We train and evaluate the proposed model on the N-Caltech101 dataset [9]. This dataset contains event-based representations from static images. While an image was shown on a screen, an event camera made three saccadic movements to record events. We use the same training and test sets as used in [7]. The code is available from To first evaluate the feature extraction module under optimal conditions, the whole training set is utilized to learn a feature extraction module following the batch learning strategy. Each sample in the dataset can contain dozens of thousands of events, thus the training time of Phased LSTM becomes intractable since Phased LSTM processes events sequentially. Therefore, we randomly select only of events, but at least . Histograms are created from consecutive events; however, this interval of events is randomly placed over the whole sequence of events. Table 1 shows the classification accuracy of Phased LSTM and the classification accuracy of a linear classifier trained on top of sparse CNN. Although Phased LSTM achieves worse results, it operates on a portion of events, which can lead to a drop in performance. Yet, using even of events causes a huge overhead in training time when compared to sparse CNN. Furthermore, a feature extractor that is trained in a supervised way on the same training set that is used for the continuous learning module is not a fair condition. Thus, either a feature extractor that is trained without labels or a feature extractor that is used to extract features from different data is a more reasonable approach. Based on these conditions, we use sparse CNN as a feature extractor for the continuous learning module.

Phased LSTM (supervised)
sparse CNN (self-supervised)
Training Test Training Test Top-1 Test Top-5
Table 1: Evaluation of the feature extraction module on N-Caltech101. (left) Classification accuracy using Phased LSTM. (right) Classification accuracy using linear classifier trained on top of frozen features from sparse CNN.

To evaluate the proposed habituation-based method, we combine habituation with the brain-inspired replay (BIR) and synaptic intelligence (SI) methods. All hyper-parameters are the same for all methods to provide a fair comparison. The strength parameter of SI was found by a grid search and is set to

. The habituation-based method (H) has two hyperparameters: a decay rate

and the fraction of neurons with the highest activation values that are allowed to be habituated during each learning iteration. We set to and for the BIR+H and BIR+SI+H methods, respectively. For the strategies BIR+H and BIR+SI+H the values for are set to and , respectively. Figure 2

illustrates class-incremental learning on N-Caltech101. The number of learning episodes is set to 20. Each learning episode contains samples from 5 different non-repeating object categories. The shaded areas show the standard error of the mean. The experiment was executed for three trials, and each trial, a new seed and the random order of classes were used. The BIR+H and BIR+SI methods achieve after learning data of all episodes on average the classification accuracy of

and , respectively. The addition of the habituation-based method to BIR+SI provides a slight but significant increase in test accuracy: .

Figure 2: Class-incremental learning on N-Caltech101.

5 Conclusion

We presented an architecture for lifelong learning, consisting of a feature extractor and a module for continuous learning. We showed that the Phased LSTM is not a favourable method for learning long event-based sequences. The Sparse CNN trained in a self-supervised way achieves better results but histograms discard short time-scale information. A combination of brain-inspired replay and synaptic intelligence with a simple habituation method, which was previously applied to self-organizing neural networks, yields the best performance over class-incremental learning of 100 classes. Furthermore, with this presented approach in this paper we provide useful insights into the application of event cameras for real-life scenarios, in which incremental accumulation of knowledge is crucial.


  • [1] T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton. A simple framework for contrastive learning of visual representations. In

    Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event

    , volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, 2020.
  • [2] G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza. Event-based vision: A survey. CoRR, abs/1904.08405, 2019.
  • [3] B. Graham, M. Engelcke, and L. van der Maaten. 3D semantic segmentation with submanifold sparse convolutional networks. In

    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

    , pages 9224–9232, 2018.
  • [4] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997.
  • [5] I. A. Lungu, S.-C. Liu, and T. Delbruck. Incremental learning of hand symbols using event-based cameras. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9:690–696, 2019.
  • [6] A. I. Maqueda, A. Loquercio, G. Gallego, N. García, and D. Scaramuzza.

    Event-based vision meets deep learning on steering prediction for self-driving cars.

    In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 5419–5427. IEEE Computer Society, 2018.
  • [7] N. Messikommer, D. Gehrig, A. Loquercio, and D. Scaramuzza. Event-based asynchronous sparse convolutional networks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VIII, volume 12353 of Lecture Notes in Computer Science, pages 415–431. Springer, 2020.
  • [8] D. Neil, M. Pfeiffer, and S.-C. Liu. Phased LSTM: Accelerating recurrent network training for long or event-based sequences. In Advances in Neural Information Processing Systems 29, pages 3882–3890. Curran Associates, Inc., 2016.
  • [9] G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor. Converting static image datasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience, 9:437, 2015.
  • [10] G. I. Parisi, J. Tani, C. Weber, and S. Wermter. Lifelong learning of spatiotemporal representations with dual-memory recurrent self-organization. Frontiers in Neurorobotics, 12:78, 2018.
  • [11] G. M. van de Ven, H. T. Siegelmann, and A. S. Tolias. Brain-inspired replay for continual learning with artificial neural networks. Nature Communications, 11:4069, 2020.
  • [12] F. Zenke, B. Poole, and S. Ganguli. Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3987–3995. PMLR, 2017.