Although artificial neural networks (ANNs) have equaled and even surpassed human performance on various tasks [1, 2], they suffer from a range of problems. First, ANNs suffer from catastrophic forgetting [3, 4]. While humans and animals can continuously learn from new information, ANNs perform well on new tasks while forgetting older tasks that are not explicitly retrained. Second, ANNs fail to generalize to multiple examples of the specific task for which they were trained [5, 6, 7]. Indeed, ANNs are usually trained with highly filtered datasets, which limits the extent to which they can generalize beyond these filtered examples. In contrast, humans robustly act in the presence of limited or altered (e.g., by noise) stimulus conditions [5, 6]. Thirdly, ANNs sometimes fail to transfer learning to the other similar tasks apart from the ones they were explicitly trained on . In contrast, humans represent information in a generalized fashion that does not depend on the exact properties or conditions of how the task was learned . This allows the mammalian brain to transfer old knowledge to unlearned tasks, while the current state-of-the-art deep learning models are unable to do so.
. During sleep, neurons are spontaneously active without external input and generate complex patterns of synchronized oscillatory activity across brain regions. Previously experienced or learned activity is believed to be replayed during sleep[13, 14]. This replay of the recently learned memories along with relevant old memories is thought to be the critical mechanism that results in memory consolidation. In this new study, we implemented the main mechanisms behind the sleep neuronal activity to benefit ANNs performance based on the relevant biophysical modeling work [15, 16, 17].
The principles of memory consolidation during sleep have previously been used to address the problem of catastrophic forgetting in ANNs. A generative model of the hippocampus and cortex was used to generate examples from a distribution of previously learned tasks in order to retrain (replay) these tasks during an off-line phase . Generative algorithms were used to generate previously experienced stimuli during the next training period in [19, 20]
. A loss function (termed elastic weight consolidation - EWC), which penalizes updates to weights deemed important for previous tasks, was introduced in making use of synaptic mechanisms of memory consolidation. Although these studies report positive results in preventing catastrophic forgetting, they have many limitations. First, EWC does not seem to work in an incremental learning framework [19, 22]. Second, generative models only focus on the replay aspect of sleep and therefore it is unclear if these models could have any benefits in addressing problems of generalization of knowledge. Further, generative models require a separate network that stores the statistics of the previously learned inputs which imposes an additional cost, while rehearsal of small examples of different classes may be sufficient to prevent catastrophic forgetting .
In this work, we propose a novel sleep algorithm which makes use of two principles observed during sleep in biology: memory reactivation and synaptic plasticity. First, we train ANN using backpropagation algorithm. After initial training, denoted awake training, we convert the ANN to SNN and perform unsupervised STDP phase with noisy input and increased intrinsic network activity to simulate sleep-like active (Up) state dynamics found during deep sleep. Finally, the weights from the SNN are converted back to the ANN and we test performance. We uncover three benefits of using this sleep algorithm.
Sleep reduces catastrophic forgetting by reactivation of the older tasks.
Sleep increases the network’s ability to generalize to noisy or alternated versions of the training data set.
Sleep allows the network to perform forward transfer learning.
To the best of our knowledge, this is the first known sleep-like algorithm that improves ANNs ability to generalize on the noisy or alternated versions of the input. While few other algorithms were previously proposed to prevent catastrophic forgetting [19, 23, 18], our approach is more scalable and it does not require storage of the previously seen inputs or using pseudo-rehearsal to regenerate and retrain those inputs. Importantly we demonstrate that ANNs retain information about (what seems to be) forgotten tasks that could be recovered during sleep. Our algorithm can be complimentary to the other approaches and, importantly, it provides a principled way to incorporate various features of sleep to the wide range of neural network architectures.
First, we describe the general components of the sleep algorithm. Briefly, a fully connected feedforward network (FCN) is trained on a task. The ANN consisted of ReLU activation units to create positive firing rates and no bias. We used a previously developed algorithm to convert the architecture in the FCN to an equivalent SNN. In short, the weights are transferred directly to the SNN, which consists of leaky integrate and fire neurons. Weights are scaled by the maximum activation in each layer during training. After building the SNN, we run a ’sleep’ phase which modifies the network connectivity based on spike-timing dependent plasticity (STDP). After running sleep phase, the weights are converted back into the FCN and testing or further training is performed.
Below, we describe the sleep phase in more details. The input layer of the SNN is activated with Poisson-distributed spike trains with mean firing rate given by the average value of each unit activation in the ANN for all tasks seen so far (during initial training). We presented either the entire average image seen during initial ANN training or randomized portions of the average image seen so far or all the active regions during any of the inputs. To apply STDP, we ran one time step of the network propagating activity. Each layer in the SNN is characterized by two important parameters that dictates its firing rate: a threshold and a synaptic scaling factor. The input to a neuron is computed as, where is the layer-specific synaptic scaling factor, is the weight matrix, and is the spiking activity (binary) of the previous layer. This input is added to the neuron’s membrane potential. If the membrane potential exceeds a threshold, the neuron fires a spike and its membrane potential is reset. Otherwise, the potential decays exponentially. After each spike, weights are updated according to a modified sigmoidal weight-dependent STDP rule. Weights are increased if a pre-synaptic spike leads to a post-synaptic spike. Weights are decreased if a post-synaptic spike fires without a pre-synaptic spike.
We tested the sleep algorithm on various datasets, including toy datasets which was used as a motivating example. This dataset, termed "Patches", consists of 4 images of binary pixels arranged in an matrix. Each of the images has varying amount of overlap with the other 4 images to test the catastrophic forgetting. Likewise, we blurred the patches so that on-pixels spillover into the neighboring pixels making the dataset slightly different from the one the network was trained on. We used this dataset to show the benefits of the sleep algorithm in a simpler setting. We also tested the sleep algorithm on the MNIST  and CUB200  datasets to ensure generalizability of our approach. For CUB200, we used the pre-trained Resnet embeddings previously used for catastrophic forgetting [18, 27].
To test catastrophic forgetting, we utilized an incremental learning framework. The FCN was trained sequentially on groups of 2 classes for patches and MNIST and groups of 100 classes for CUB200 . After training on a single task, we run the sleep algorithm as described above before training on the next task. To test generalization, we trained FCN on the entire dataset and we compared this network’s performance on classifying noisy or blurred images to the FCN performance that implemented sleep phase after training. For transfer learning, a network trained on one task was put to sleep and then tested on a new, unseen task. Dataset specific parameters for training and sleep in the catastrophic forgetting task are shown in Table 1
. For the MNIST dataset, we utilized a genetic algorithm to find optimal parameters, although this is not an absolute requirement and our summary results are based on hand-tuned parameters.
|Architecture||[100, 4]||[784, 1200, 1200, 10]||[2048, 350, 300, 200]|
|Learning Rate||0.1||0.065||0.1, 0.01|
|Epochs||1 per task||2 per task||50 per task|
|Input Rate||64||130 Hz||32|
|Thresholds||1.045||2.1772, 1.5217, 0.9599||1, 1, 1|
|Synaptic||4.25||3.4723, 25.52, 2.4186||1, 1, 1|
3.1 Sleep prevents catestrophic forgetting and lead to forward transfer for Patches
The Patches dataset represents an easily interpretable example to verify and validate our sleep algorithm. We utilized 4 binary images of size with 15 pixel overlap and 25% of pixels turned on. Thus, 10 pixels are unique for each image in the dataset (Fig. 1A). To determine if catastrophic forgetting occurs in this model, and if sleep can recover performance, we split the dataset into two tasks - task one representing two images (out of four total) and the other task comprised of the other two images. Training on task 1 resulted in the high performance on task 1 with no performance improvement on task 2. After sleep phase, performance on task 1 remained perfect, while task 2 performance sometimes revealed an increase. After training on task 2, performance on task 1 on average decreased from its perfect level, indicating forgetting of task 1. However, after sleep, performance on both task 1 and task 2 reached 100% (Fig 1B). Including only one sleep phase at the end of awake training also resurrected performance for both tasks (Fig. 1C).
To analyze why sleep prevents catastrophic forgetting in this toy example, we looked at the weights connecting to each input neuron. Since we have knowledge of all the pixels in the training data, we could measure the weights connecting from the pixels that are turned on during image presentation to the corresponding output neuron. Ideally, for a given image, the spread between weights from on-pixels and weights from off-pixels should be high, such that on-pixels drive an output neuron and off-pixels suppress the same output neuron. To measure this, we computed the average spread across output neurons and weights for on-pixels and off-pixels (Fig. 1D). Our results indicate that sleep increases the spread between weights originating from on-pixels vs those from off-pixels, validating that the sleep algorithm is acting by increasing meaningful weights and decreasing potentially irrelevant or incorrect weights.
We next observed the performance as a function of the number of overlapping pixels in the dataset for 2 cases: one with sleep implementing after each awake training period and one with only one sleep phase at the end of training. With 2 sleep phases, we observed that after the first sleep episode, the network performed well on the first task and correctly classified images from the second task about 50% of the time (Fig. 1E). This suggests that sleep may lead to increase in performance on tasks for which SNN has not seen any training data inputs. We call such an improvement on the previously unseen tasks as ’forward transfer’ similar to zero-shot learning phenomenon previously shown in other architectures, e.g. [28, 29].
After training on the second task followed by sleep, the network classified all the images correctly up to the very high level of the pixel overlap. In the later case, we observed that the sleep phase increases performance beyond that of the control network, indicating less catastrophic forgetting (Fig. 1F). Forgetting only occurs for pixel overlap greater than 15 pixels. However, for higher pixel overlap values, sleep routinely reduced the amount of forgetting. Comparing the two cases, we note that an intermediate sleep phase between task one and task two actually increases performance and reduces forgetting after normal awake training on task two. This again suggests that sleep may be useful in creating a forward transfer representation of similar, yet novel, tasks and may boost transfer learning in other domains. Overall, these results provide validation of our sleep algorithm and raise the question if the same results can be obtained for more complex datasets and network architectures, which we will discuss later in this paper.
3.2 What causes catastrophic forgetting and how does sleep help?
In this section, we consider a simple case study to examine the cause of the catastrophic failure and the role of sleep in recovering from the forgetting. While this example is not intended to model all scenarios of catastrophic forgetting, it extracts the intuition and explains the basic mechanism behind our algorithm.
Let us consider the 3-layer network trained on two categories, each with just one example. Consider 2 binary vectors (Category 1 and Category 2) with some region of overlap.
We consider ReLU activation since it was used in the rest of this work. We assume the output to be the neuron with the highest activation in the output layer. Let the network be trained on Category 1 with backpropagation using static learning rate. Following this, we trained the network on Category 2 using same approach. A 3-layer network we consider here has an input layer with neurons, hidden neuron and an output layer with neurons for the categories. Inputs are 10 bits long with 5 bit overlap. We trained with learning rate of 0.1 for 4 epochs.
Analysis of hidden layer behaviour:
We can divide the hidden neurons into four types based on their activation for the two categories: - those neurons that fire for Category 1 but not 2; - those neurons that fire for Category 2 but not 1; - those neurons that fire for Category 1 and 2; - those that fire for neither category, where firing indicates a non-zero activation. Note that these sets may change during training or sleep. Let be the weights from type to output .
Consider the case where the input of Category 1 is presented. The only hidden layer neurons that fire are and . Output neuron 1 will get the net value and output neuron 2 will get the net value . For output neuron 1 to fire, we need two conditions to be held: (1) (2) . The second condition above can be rewritten as , which separates the weights according to the hidden neurons. Using this separation, we give the following definitions: Define to be on pattern ; to be on pattern ; to be on pattern and to be on pattern . (Note that and are very closely correlated since they differ only in the activation values of neurons which are positive in both cases).
So, on the input pattern , output 1 fires only if ; on input pattern , output 2 fires only if .
Catastrophic forgetting: Following training on 2 categories, if the network can not recall Category 1, i.e., output neuron 1 activation is negative or less than that of output neuron 2, catastrophic forgetting has occurred (We confirmed this occurred 78% of times for the 3 layer network described above and 100 trials). The second phase of training ensures . This could involve reduction in which would reduce as well. (Since does not fire on input pattern 2, back-propagation does not alter ) Reducing may result in failing the condition , i.e., misclassifying input 1.
Effect of sleep: Sleep can increase difference among weights (which are different enough to begin with) as was shown in [30, 31]. So, as the difference between and increases, this decreases (as is higher, decreases). Occurrence of the same change to is prevented as follows: it is likely that at least one of the weights coming to a neuron is negative. In that case, increasing the difference would involve making the negative weight even more negative, resulting in the neuron joining either or (as it no longer fires for the pattern showing the negative weight), thus reducing . (This is explained further in the supplement)
When the neurons in remains, we have a more complex case: here, decreases, but may also decrease correspondingly; another undesirable scenario is when decreases to become less than . Typically sleep tends to drive synaptic weights of the opposite signs, or the weights of same sign but different by some threshold value, away from each other. There are conditions when the difference between weights is below threshold point to cause divergence. In those cases sleep does not improve performance.
Experiments: In our experiments, for majority of the cases, we found to be empty after sleep, thus making to become . For the instances when this was not a case, the initial values of , , and were almost, i.e., the entire work of classifying the inputs is done by shared input. In such case, the network has no hidden information that sleep could retrieve. (Evidence is provided in the supplement).
3.3 Sleep recovers tasks lost due to catastrophic forgetting in MNIST and CUB200
ANNs have been shown to suffer from catastrophic forgetting whereby they perform well on the recently learned tasks but fail at previously learned tasks for various datasets, including MNIST and CUB200 . Here, we created 5 tasks for the MNIST dataset and 2 tasks for the CUB200 dataset. Each pair of digits in MNIST was defined as a single task, and half of the classes in CUB200 was considered to be a single task. Each task was incrementally trained, followed by a sleep phase, until all tasks were trained. A baseline network trained incrementally without sleep performed poorly (Fig. 2D, black bar). However, we noted a significant improvement in the overall performance, as well as task specific performance, when sleep algorithm was incorporated into the training cycle (Fig. 2D, red bar).
For MNIST, we found that each of the five tasks revealed an increase in classification accuracy after sleep even after being completely "forgotten" during awake training (Fig. 2A). For the 1st training + sleep cycle, the "before sleep" network only classifies images for the task that was seen during last training (digits 4-5 in Fig. 2B). After sleep, performance remains high on digits 4 and 5 but there is also spillover into the other digits. For the last training + sleep cycle, we observed the same effect. Only last task performed well right after the training (Fig. 2C). After sleep, performance on almost all digits nearly recovered (Fig. 2D). On the CUB200 dataset, we found that sleep can recover task 1 performance after training on task 2, with only minimal loss to task 2 performance (Fig. 2E). In conclusion, the sleep algorithm reduces catastrophic forgetting by reducing overlap between network activity for distinct classes.
Although specific performance numbers we obtain here are not as impressive as for some generative models [19, 18], they surpass certain regularization methods, such as EWC, on incremental learning . Overall, we believe that the sleep algorithm can reduce catastrophic forgetting and interference with very little knowledge of the previously learned examples solely by utilizing STDP to reactivate forgotten weights. Ultimately, these results suggest that information about old tasks is not completely lost when catastrophic forgetting occurs from performance level perspective. Instead, information about old tasks remains present in the connectivity weights and offline STDP phase can resurrect this hidden information. To achieve higher performance, offline STDP/sleep algorithm could be combined with generative replay to utilize specific, rather than average (as we use in our study here), inputs during sleep.
3.4 Sleep promotes separation of internal representations for different inputs
As suggested by our previous analysis section, sleep could separate the neurons belonging to the different input categories and prevent catastrophic forgetting. This would also result in a change in the internal representation of the different inputs in the network. We examined this in the network trained on MNIST dataset and we compared performance before and after the sleep.
In order to examine how the internal representation of the different tasks are related and modified after sleep, we examined the correlation between ANN activation at different layers after awake training and after sleep. In particularly, we computed the average correlation between activation of examples of the class with examples of the class . We observed the correlation before sleep was higher both within the same input category and across all categories. After sleep, the correlations between different categories were reduced (Fig. 3) while the correlation within category remained high. This proposes that sleep promotes decorrelating the internal representations of the input categories, suggesting a mechanism by which sleep can prevent catastrophic forgetting.
3.5 Sleep improves generalization
Many studies in machine learning reported a failure of neural networks to generalize beyond their explicit training set. Given that sleep tends to create a more generalized representation of the stimulus within network architecture, next we tested the hypothesis that sleep algorithm could increase ANN’s ability to generalize beyond the original training data set. To do so, we created noisy and blurred versions of the MNIST and Patches samples and we tested the network before and after sleep on these distorted datasets (Fig. 4). Our results suggest that sleep can substantially increase the network’s ability to classify degraded images. Indeed, for both MNIST and Patches dataset, the "after sleep" network substantially outperformed the "before sleep" network on classifying noisy and blurred images. This is illustrated by analysis of the confusion matrices, where "before sleep" network trained on the intact MNIST images favors one class over another when tested on the degraded images. Surprisingly, sleep restored the network ability to correctly predict the classes. It is important to note that we trained MNIST network sub-optimally to illustrate the case where the network performs low on degraded images. The same network architecture can perform well without sleep even on degraded images if the training dataset is significantly expanded.
These results highlight the benefit of utilizing sleep to generalize representation of the task at hand. ANNs are normally trained on the highly filtered datasets that are identically and independently distributed. However, in a real-world scenario, inputs may not meet these assumptions. Incorporating a sleep-like phase into training of ANNs may enable a more generalized representation of the input statistics, such that distributions which are not explicitly trained may still be represented by the network after sleep.
We showed that a biologically-inspired sleep algorithm may provide several important benefits when incorporated to the neural network training. We found that sleep is able to resurrect tasks that were erased due to the catastrophic forgetting after new task training utilizing backpropagation algorithm. Our study suggests that while performance on such "forgotten" tasks was dramatically reduced after new training, the network weights retained partial information about the older tasks and sleep could reactivate the older tasks to strengthen the reduced connectivity and to recover the performance.
While proposed sleep method to prevent catastrophic forgetting currently performs below some other techniques [19, 18, 23], the other approaches either remember full set of training inputs or recreate inputs from the generator networks. Our approach does not require storing any input information and it may be complimentary to the other techniques; in that applying a sleep-like phase to the generative mechanisms may further boost overall performance.
We found that sleep algorithm can also help to generalize on the previously learning tasks. Indeed, classification accuracy increased significantly after sleep for images that incorporated Gaussian noise or were blurred. We used MNIST dataset to demonstrate this effect which improved performance from about 20% to 50%. This additional benefit of sleep likely arises from stochastic nature of the network dynamics during sleep that creates a more generalized representation of the previously learned tasks. Indeed, we found (not shown) that the same approach can be extended to increase network resistance to adversarial attacks.
Finally, we also observed that sleep improves performance on the tasks that the network has not been trained on, but that share some properties with the previously trained tasks. We refer to this effect as ’forward transfer’, similar to zero-shot learning [28, 29]. This effect again likely arises from the stochasticity of the sleep dynamics which allows for the shared features between tasks to be strengthened which are then used in the backpropagation phase to learn different tasks.
There are several current and past attempts to implement effect of sleep in ANNs or machine learning architectures [32, 18]. However, our approach significantly differs from these previous attempts, in that we used conversion method from ANN to SNN and implemented sleep at the SNN level which is relatively well understood from the neuroscience perspective [15, 16, 33, 30]. Importantly, this approach allows direct implementation of the many other brain inspired ideas. To sum, we believe that our approach provides a principled way to apply mechanisms of the biological sleep in memory consolidation to existing AI architectures.
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al.
Imagenet large scale visual recognition challenge.
International journal of computer vision, 115(3):211–252, 2015.
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew
Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore
Graepel, et al.
A general reinforcement learning algorithm that masters chess, shogi, and go through self-play.Science, 362(6419):1140–1144, 2018.
-  Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211, 2013.
-  James L McClelland, Bruce L McNaughton, and Randall C O’reilly. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review, 102(3):419, 1995.
-  Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems, pages 7538–7550, 2018.
-  Samuel Dodge and Lina Karam. A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th international conference on computer communication and networks (ICCCN), pages 1–7. IEEE, 2017.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
-  Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
-  Friederike Spengler, Timothy PL Roberts, David Poeppel, Nancy Byl, Xiaoqin Wang, Howard A Rowley, and Mike M Merzenich. Learning transfer and neuronal plasticity in humans trained in tactile discrimination. Neuroscience letters, 232(3):151–154, 1997.
-  Matthew P Walker and Robert Stickgold. Sleep-dependent learning and memory consolidation. Neuron, 44(1):121–133, 2004.
-  Robert Stickgold and Matthew P Walker. Sleep-dependent memory triage: evolving generalization through selective processing. Nature neuroscience, 16(2):139, 2013.
-  Björn Rasch and Jan Born. About sleep’s role in memory. Physiological reviews, 93(2):681–766, 2013.
-  Daoyun Ji and Matthew A Wilson. Coordinated memory replay in the visual cortex and hippocampus during sleep. Nature neuroscience, 10(1):100, 2007.
-  Matthew A Wilson and Bruce L McNaughton. Reactivation of hippocampal ensemble memories during sleep. Science, 265(5172):676–679, 1994.
-  Giri P Krishnan, Sylvain Chauvette, Isaac Shamie, Sara Soltani, Igor Timofeev, Sydney S Cash, Eric Halgren, and Maxim Bazhenov. Cellular and Neurochemical Basis of Sleep Stages in the Thalamocortical Network. 5:e18607.
-  Sean Hill and Giulio Tononi. Modeling Sleep and Wakefulness in the Thalamocortical System. 93(3):1671–1698.
-  Maxim Bazhenov, Igor Timofeev, Mircea Steriade, and Terrence J Sejnowski. Model of thalamocortical slow-wave sleep oscillations and transitions to activated states. Journal of neuroscience, 22(19):8691–8704, 2002.
-  Ronald Kemker and Christopher Kanan. Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563, 2017.
-  Gido M van de Ven and Andreas S Tolias. Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635, 2018.
-  Zhizhong Li and Derek Hoiem. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2018.
-  James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
Ronald Kemker, Marc McClure, Angelina Abitino, Tyler L Hayes, and Christopher
Measuring catastrophic forgetting in neural networks.
Thirty-second AAAI conference on artificial intelligence, 2018.
-  Tyler L. Hayes, Nathan D. Cahill, and Christopher Kanan. Memory Efficient Experience Replay for Streaming Learning.
-  Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiffer. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2015.
-  Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
-  Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. Caltech-ucsd birds 200. 2010.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  Mark Palatucci, Dean Pomerleau, Geoffrey E Hinton, and Tom M Mitchell. Zero-shot learning with semantic output codes. In Advances in neural information processing systems, pages 1410–1418, 2009.
-  Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems, pages 935–943, 2013.
-  Yina Wei, Giri P. Krishnan, Maxim Komarov, and Maxim Bazhenov. Differential roles of sleep spindles and sleep slow oscillations in memory consolidation. page 153007.
-  Oscar C Gonzalez, Yury Sokolov, Giri Krishnan, and Maxim Bazhenov. Can sleep protect memories from catastrophic forgetting? BioRxiv, page 569038, 2019.
-  Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Radford M Neal. The" wake-sleep" algorithm for unsupervised neural networks. Science, 268(5214):1158–1161, 1995.
-  Yina Wei, Giri P Krishnan, and Maxim Bazhenov. Synaptic Mechanisms of Memory Consolidation during Sleep Slow Oscillations. The Journal of Neuroscience, 36(15):4231–4247, 2016.