After just a single night of bad sleep, we are acutely aware of the importance of sleep for orderly body and brain function. In fact, it has become clear that sleep serves multiple crucial physiological functions (siegel_sleep_2009; xie2013sleep), and growing evidence highlights its impact on cognitive processes (walker_role_2009). Yet, a lot remains unknown about the precise contribution of sleep, and in particular dreams, on normal brain function.
One remarkable cognitive ability of humans and other animals lies in the extraction of general concepts and statistical regularities from sensory experience without extensive teaching (bergelson_at_2012)
. Such regularities in the sensorium are reflected on the neuronal level in invariant object-specific representations in high-level areas of the visual cortex(grill-spector_2001_the; hung_fast_2005; dicarlo_how_2012) on which downstreams areas can operate. These so called semantic representations are progressively constructed and enriched over the organisms lifetime (tenenbaum_how_2011; yee_semantic_2013) and hypothesized to be consolidated offline during sleep (dudai_consolidation_2015).
The observation of hippocampal activity patterns with a high degree of similarity between wakefulness and NREM sleep has initially spurred the development of abstract models connecting NREM to memory consolidation. In contrast to ‘standard consolidation theories’ describing such activity replay as a transfer of declarative memories from hippocampal to cortical systems (mcclelland_why_1995; rasch_maintaining_2007), ‘transformation theories’ propose that the hippocampus mostly replays episodic memories from which semantic content is actively extracted and stored in cortical networks (nadel_memory_1997; winocur_memory_2010; lewis_overlapping_2011). This formation of cortical representations which are organized according to high-level concepts is typically referred to as ‘semantization’. However, the proposed theoretical framework to support these theories (lewis_overlapping_2011) lacks a mechanistic implementation compatible with cortical structures. The Wake-Sleep algorithm (hinton_wake-sleep_1995) has recently been interpreted as a model of sleep-mediated semantization (rauss2017; renno-costa_computational_2019). In contrast to alternative theories, it provides a more plausible basis for cortical implementations. However, this model is challenging to scale to complex data and produces relatively weak representations. Furthermore, none of these existing models distinguish between different sleep state nor do they account for the subjectively most prominent aspect of sleep, namely dreams, which typically go beyond replaying episodic memories (fosse_dreaming_2003; nir_dreaming_2010; wamsley_dreaming_2014).
Most dreams experienced during REM sleep only incorporate fragments of previous waking experience, often intermingled with past memories (schwartz_are_2003). Suprisingly, such random combinations of memory fragments often results in visual experiences which are perceived as highly structured and realistic by the dreamer. The striking similarity between the inner world of dreams and the external world of wakefulness suggests that the brain actively creates novel experiences by rearranging stored episodic patterns in a meaningful manner (nir_dreaming_2010). A few hypothetical functions were attributed to this phenomena, such as enhancing creative problem solving by building novel associations between unrelated memory elements (cai_rem_2009; llewellyn_crossing_2016; lewis_how_2018), forming internal prospective codes oriented toward future waking experiences (llewellyn_dream_2016), or refining a generative model by minimizing its complexity and improving generalization (hobson_virtual_2014; hoel_overfitted_2021). However, these theories do not consider the role of dreams for a more basic operation of memory consolidation, namely semantization.
Here, we propose that dreams, and in particular their creative combination of episodic memories, play an essential role in forming semantic representations. To support this hypothesis, we introduce a new, functional model of cortical semantic memory formation. The central ingredient is a creative generative process via feedback from higher to lower cortical areas which mimics dreaming during REM sleep. This generative process is trained to produce more realistic virtual sensory experience in an adversarial fashion by trying to fool an internal mechanism distinguishing low-level activities between wakefulness and REM sleep. Intuitively, generating new but realistic sensory experiences, instead of merely reconstructing previous observations, requires the brain to understand the composition of its sensorium. In line with transformation theories, this suggests that cortical representations should carry semantic, decontextualized gist information.
We implement this model in a cortical architecture with hierarchically organized forward and backward pathways, loosely inspired by generative adversarial networks (GANs; goodfellow2014generative). The connectivity of the model is adapted by gradient-based synaptic plasticity, optimizing different, but complementary objective functions depending on the brain’s global state. During wakefulness, the model learns to recognize that low-level activity is externally-driven, stores high-level representations in hippocampus, and tries to predict low-level from high-level activity (Fig. 1a). During NREM sleep, the model learns to reconstruct replayed high-level activity patterns from generated low-level activity, perturbed by virtual occlusions, referred to as perturbed dreaming (Fig. 1b). During REM sleep, the model learns to generate realistic low-level activity patterns from random combinations of high-level activities while simultaneously learning to distinguish these virtual experiences from externally-driven waking experiences, referred to as adversarial dreaming (Fig. 1
c). Together with the wakefulness, the two sleep states, NREM and REM, jointly implement our model of Perturbed and Adversarial Dreaming (PAD).
Over the course of learning, our cortical model trained on natural images develops rich latent representations along with the capacity to generate plausible early sensory activities. We demonstrate that adversarial dreaming during REM sleep is essential for the extraction of semantic knowledge, which is improved and robustified by perturbed dreaming during NREM sleep. Together, our results demonstrate a potential role of dreams and suggest complementary functions of REM and NREM sleep in memory semantization.
2.1 Complementary objectives for wakefulness, NREM and REM sleep
We consider an abstract model of the visual ventral pathway consisting of multiple, hierarchically organized cortical areas, with a feedforward pathway, or encoder, transforming neuronal activities from lower to higher areas (Fig. 2, ). These high-level activities are compressed representations of low-level activities and are named latent representations. In addition to this feedforward pathway, we similarly model a feedback pathway, or generator, projecting from higher to lower areas (Fig. 2, ). These two pathways are supported by a simple hippocampal module which can store and replay latent representations. Three different global brain states are considered: wakefulness (Wake), non-REM sleep (NREM) and REM sleep (REM). We focus on the functional role of these phases while abstracting away dynamical features such as bursts, spindles or slow waves (leger_slow-wave_2018), in line with previous approaches based on goal-driven modeling which successfully predict physiological features along the ventral stream (yamins_performance-optimized_2014; zhuang_unsupervised_2021).
In our model, the three brain states only differ in their objective function and the presence or absence of external input. Synaptic plasticity performs stochastic gradient descent on state-specific objective functions via error backpropagation(lecun_deep_2015). We assume that efficient credit assignment is realized in the cortex, and focus on the functional consequences of our specific architecture. For potential implementations of biophysically plausible backpropagation in cortical circuits, we refer to previous work (e.g., sacramento_dendritic_2017; lillicrap_backpropagation_2020).
During Wake (Fig. 2a), sensory inputs evoke activities in lower sensory cortex which are transformed via the feedforward pathway into latent representations in higher sensory cortex. The hippocampal module stores these latent representations, mimicking the formation of episodic memories. Simultaneously, the feedback pathway generates low-level activities from these representations. Synaptic plasticity adapts the encoding and generative pathways ( and ) to minimize the mismatch between externally-driven and internally-generated activities (Fig. 2a). Thus, the network learns to reproduce low-level activity from abstract high-level representations. Simultaneously, also acts as a ‘discriminator’ with output that is trained to become active, reflecting that the low-level activity was driven by an external stimuli. This discriminator function is essential to drive adversarial learning during REM sleep.
For the subsequent sleep phases, the system is disconnected from the external environment, and activity in lower sensory cortex is driven by top-down signals originating from higher areas, as previously suggested (nir_dreaming_2010; aru_apical_2020). During NREM (Fig. 2b), latent representations are recalled from the hippocampal module, corresponding to the replay of episodic memories. These representations generate low-level activities which are perturbed by suppressing early sensory neurons, modeling the observed differences between replayed and waking activities (ji_coordinated_2007). The encoder reconstructs latent representations from these activity patterns, and synaptic plasticity adjusts the feedforward pathway to make the latent representation of the perturbed generated activity similar to the original episodic memory. This process defines the perturbed dreaming.
During REM (Fig. 2c), sleep is characterized by creative dreams generating realistic virtual sensory experiences out of the combination of episodic memories (fosse_dreaming_2003; lewis_how_2018). In PAD, multiple episodic memories are randomly recalled from the hippocampal module and their latent representations are linearly combined. From this new, mixed representation, activity in lower sensory cortex is generated and finally passed through the feedforward pathway. Synaptic plasticity adjusts feedforward connections to silence the activity of the discriminator output as it should learn to distinguish it from externally-evoked sensory activity. Simultaneously, feedback connections are adjusted adversarially to generate activity patterns which appear externally-driven and thereby trick the discriminator into believing that the low-level activity was externally-driven. This is achieved by inverting the sign of the errors that determine synaptic-weight changes in the generative network. This process defines adversarial dreaming.
The functional differences between our proposed NREM and REM sleep phases are motivated by experimental data describing a reactivation of hippocampal memories during NREM and the occurrence of creative dreams during REM sleep. In particular, hippocampal replay has been reported during NREM sleep within sharp-wave-ripples (oneill_play_2010), also observed in the visual cortex (ji_coordinated_2007), which resembles activity from wakefulness. Our REM sleep phase is built upon cognitive theories of REM dreams (llewellyn_dream_2016; lewis_how_2018) postulating that they emerge from random combinations between episodic memory elements, sometimes remote from each other, which appear realistic for the dreamer. This random coactivation could be caused by the ponto-geniculo-occipital (PGO) waves (nelson_rem_1983) often associated with generation of novel visual imagery in dreams (hobson_dreaming_2000; hobson_virtual_2014).
Within our suggested framework, ‘dreams’ arise as early sensory activity that is internally-generated via feedback pathways during offline states, and subsequently processed by feedforward pathways. In particular, this implies that besides REM dreams, NREM dreams exist. However, in contrast to REM dreams, which are significantly different from waking experiences (fosse_dreaming_2003), our model implies that NREM dreams are more similar to waking experiences since they are driven by single episodic memories, in contrast to REM dreams which are generated from a mixture of episodic memories. Furthermore, the implementation of adversarial dreaming requires an internal representation of whether early sensory activity is externally or internally generated, i.e., a distinction whether a sensory experience is real or imagined.
2.2 Dreams become more realistic over the course of learning
To illustrate learning in PAD, we consider low-level activities during NREM and during REM for a model with little learning experience and a model which has experienced many wake-sleep cycles (Fig. 3). A single wake-sleep cycle consists of Wake, NREM and REM phases. As an example, we train our model on a dataset of natural images (CIFAR-10; cifar10) and a dataset of images of house numbers (SVHN; svhn). Initially, internally-generated low-level activities during sleep do not share significant similarities with sensory-evoked activities from Wake (Fig. 3a); for example, no obvious object shapes are represented (Fig. 3b). After plasticity has organized network connectivity over many wake-sleep cycles (50 training epochs), low-level internally-generated activity patterns resemble sensory evoked activity (Fig. 3c). NREM-generated activities reflect the sensory content of the episodic memory (sensory input from the previous day). REM-generated activities are different from the sensory activities corresponding to the original episodic memories underlying them as they recombine features of sensory activities from the two previous days, but still exhibit a realistic structure. This increase in similarity between externally-driven and internally-generated low-level activity patterns is also reflected in a decreasing Fréchet inception distance (Fig. 3d, downward dashed arrow), a metric used to quantify the realism of generated images (heusel_gans_2018). While NREM and REM dreams both gain realism, the discrepancy between the two increases: NREM dreams become disproportionally more realistic than REM dreams (Fig. 3d, vertical solid arrow).
The PAD training paradigm hence leads to low-level activities patterns that become more difficult to discern from externally-driven activity, whether they originate from single episodic memories during NREM or from random combinations thereof during REM, mimicking creative dreaming. We will next demonstrate that the same learning process leads to the emergence of robust semantic representations.
2.3 Adversarial dreaming during REM facilitates memory semantization
Semantic knowledge is fundamental for animals to learn quickly, adapt to new environments and communicate, and is hypothesized to be held by so-called semantic representations in cortex (nadel_memory_1997). An example of such semantic representations are neurons from higher visual areas that contain linearly separable information about object category, invariant to other factors of variation, such as background, orientation or pose (grill-spector_2001_the; hung_fast_2005; majaj_simple_2015).
Here we demonstrate that PAD, due to the specific combination of plasticity mechanisms during Wake, NREM and REM, develops such semantic representations in higher visual areas. Similarly as in the previous section, we train our model on the CIFAR-10 and SVHN datasets. To quantify the quality of inferred latent representations, we measure how easily downstream neurons can read out object identity from these. For a simple linear read-out, its classification accuracy reflects the linear separability of different contents represented in a given dataset. Technically, we train a linear classifier that distinguishes object categories based on their latent representations after different numbers of wake-sleep cycles (‘epochs’, Fig. 4a). While training the classifier, the connectivity of the network ( and ) is fixed.
The latent representation () emerging from the trained network (Fig. 4b, full model) shows increasing linear separability reaching around 58% test accuracy on CIFAR-10 (Fig. 4c, black line, for details see Supplements Table 1) and 79% on SVHN (Fig. 4
d, black line), comparable to less biologically plausible machine-learning models(berthelot_understanding_2018). These results show the ability of PAD to discover semantic concepts across wake-sleep cycles in an unsupervised fashion.
Within our computational framework, we can easily consider sleep pathologies by directly interfering with the sleep phases.
To highlight the importance of REM in learning semantic representations, we consider a reduced model (P
AD) in which the REM phase with adversarial dreaming is suppressed and only perturbed dreaming during NREM remains (Fig. 4b, pink cross).
Without REM sleep, linear separability increases much slower and even after a large number of epochs remains significantly below the PAD (see also Supplements Fig. 12c,d).
This suggests that the adversarial dreaming during REM, here modeled by an adversarial game between feedforward and feedback pathways, is essential for the emergence of easily readable, semantic representations in the cortex.
From a computational point of view, this result is in line with previous work showing that learning to generate virtual inputs via adversarial learning (GANs variants) forms better representations than simply learning to reproduce external inputs (radford_unsupervised_2015; donahue_adversarial_2016; berthelot_understanding_2018).
Finally, we consider a different pathology in which REM is not driven by randomly combined episodic memories, but by single episodic memories (Fig. 4b, purple cross). Similarly to removing REM, linear separability increases much slower across epochs, leading to worse performance of the readout (Fig. 4c,d, purple lines). For the SVHN dataset, the performance does not reach the level of the PAD even after many wake-sleep cycles (see also Supplements Fig. 12d). This suggests that combining different, possibly non-related episodic memories as reported during REM dreaming (fosse_dreaming_2003) leads to significantly faster memory semantization.
Our results suggest that generating virtual sensory inputs during REM dreaming, via a high-level combination of hippocampal memories and subsequent adversarial learning, is essential for an animal to extract semantic information from its sensorium. This contrasts with transformation theories which assign semantization mainly to NREM-mediated replay of individual memories (nadel_memory_1997; lewis_overlapping_2011). Our model provides hypotheses about the effects of REM deprivation, complementing pharmacological and optogenetic studies reporting impairments in the learning of complex rules and spatial object recognition (boyce_causal_2016). For example, our model predicts limited generalization in animal models with chronically impaired REM sleep within a classical conditioning paradigm. After REM deprivation, the animal would not be able to appropriately react to novel stimuli that are semantically similar to previously encountered conditioned stimuli.
2.4 Perturbed dreaming during NREM improves robustness of semantization.
Generalizing beyond previously experienced stimuli is essential for an animal’s survival. This generalization is required due to natural perturbations of sensory inputs, for example partial occlusions, noise, or varying viewing angles. These alter the stimulation pattern, but in general should not change its latent representation subsequently used to make decisions.
Here, we model such sensory perturbations by silencing patches of neurons in early sensory areas during the stimulus presentation (Fig. 5a). As before, linear separability is measured via a linear classifier that has been trained on latent representations of un-occluded images. Adding occlusions hence directly tests the out-of-distribution generalization capabilities of the learned representations. For the model trained with all phases (Fig. 5b, full model), the linear separability of latent representations decreases as occlusion intensity increases, until reaching chance level for fully occluded images (Fig. 5c,d; black line).
We next consider a sleep pathology in which we suppress perturbed dreaming during the NREM phase while keeping adversarial dreaming during REM (
PAD, Fig. 5B, orange cross).
In PAD, linear separability of partially occluded images is significantly decreased for identical occlusion levels (Fig. 5c, d; compare black and orange lines).
In particular, performance degrades much faster with increasing occlusion levels.
Note that despite the additional training objective, the full PAD develops equally good or even better latent representations of unoccluded images (0% occlusion intensity) compared to this pathological condition without perturbed dreams.
Crucially, the perturbed dreams in NREM are generated by replaying single episodic memories. If a convex combination of episodic memories during NREM were used, similarly as during REM, the quality of the latent representations decreases (see also Supplements Fig. 13). This suggests that only replaying single episodes, rather than their creative combination, is beneficial to robustify latent representations against input perturbations.
Our NREM memory consolidation originates from the training objective defined in the NREM phase, forcing feedforward pathways to map perturbed inputs to the latent representation corresponding to their clean, non-occluded version. This procedure is reminiscent of a regularization technique from machine learning called ‘data augmentation’ (shorten2019survey), which increases the amount of training data by adding stochastic perturbations to each input sample. However, in contrast to data augmentation methods which directly operate on samples, here the system autonomously generates augmented data in offline states, preventing interference with online cognition and avoiding storage of the original samples. Our ‘dream augmentation’ suggests that hippocampal replay not only maintains or strengthens cortical memories, but also improves the extraction of content when only partial information is available. For example, our model predicts that animals lacking such dream augmentation, potentially due to perturbed NREM sleep, fail to react reliably to partially occluded stimuli even though their responses to clean stimuli are accurate.
2.5 Latent organization in healthy and pathological models
The results so far demonstrate that perturbed and adversarial dreaming (PAD), during REM and NREM sleep states, contribute to memory semantization by increasing the linear separability of latent representations into object classes. We next investigate how the learned latent space is organized, i.e., whether representations of sensory inputs with similar semantic content are grouped together even if their low-level structure may be quite different, for example due to different viewing angles, variations among an object category, or (partial) occlusions.
We illustrate the latent organization by projecting the latent variable first linearly through the weight matrix of the trained classifier used in the previous two sections, followed by a non-linear dimensionality reduction method t-stochastic neighborhood embedding (tSNE, Fig. 6a, tSNE). This procedure allows tSNE to highlight more discernable object clusters.
For PAD, the obtained tSNE projection shows distinctive clusters of latent representations according to the semantic category (‘class identity’) of their corresponding images (Fig. 6b). The model thus tends to organize latent representations such that high-level, semantic clusters are discernable. Furthermore, partially occluded objects (Fig. 6b, empty circles) are represented closeby their corresponding un-occluded version (Fig. 6b, full circles).
As shown in the previous sections, removing either REM or NREM has a negative impact on the linear separability of sensory inputs.
However, the reasons for these effects are different between REM and NREM.
If REM sleep is removed from training (P
AD), representations of unoccluded images are less organized according their semantic category, but still match their corresponding occluded versions (Fig. 6c).
REM is thus necessary to organize latent representations into semantic clusters, providing an easily readable representation for downstream neurons.
In contrast, removing NREM ( PAD) causes representations of occluded inputs to be remote from their non-occluded representations (Fig. 6d).
We quantify these observations by computing the average distances between latent representations from the same object category (intra-class distance) and between representations of different object category (inter-class distance). Since the absolute distances are difficult to interpret, we focus on their ratio, the normalized intra-class distance (Fig. 6e). On both datasets, this ratio increases if the REM phase is removed from training (Fig. 6e, compare black and pink bars), reaching levels comparable to the one with the untrained network. Moreover, removing NREM from training also increases the intra/inter clusterings ratio. These observations suggest that both REM and NREM are jointly reorganizing the latent space such that stimuli with similar semantic structure are mapped to similar latent representations. In addition, we compute the distance between the latent representations inferred from clean images and their corresponding occluded versions, also normalized by the inter-class distance (normalized clean-occluded distance, Fig. 6f). By removing NREM from training, this distance increases significantly, highlighting the importance of NREM in making latent representations invariant to input perturbations.
2.6 Cortical implementation of PAD
We have shown that perturbed and adversarial dreaming (PAD) can cluster sensory inputs observed during wakefulness by content and thereby support memory semantization. Our proposed adversarial dreaming process requires a cortical representation about whether an early sensory activation is internally or externally generated. On an abstract level, such a distinction requires a ‘conductor’ that orchestrates learning. For example, from the viewpoint of a single neuron in the generator pathway, local error signals may cause potentiation during wakefulness, while an identical error during REM sleep would cause depression of synaptic weights. The PAD model suggests that this conductor, extending the classical student-teacher paradigm, is a crucial ingredient for cortical learning during wakefulness and sleep. Here we hypothesize how the associated mechanisms may be implemented in the cortex.
First, our training paradigm is orchestrated into different phases, wakefulness, NREM and REM sleep, which affect both the objective function and synaptic plasticity (Fig. 7a). Wakefulness is associated with increased activity of modulatory brainstem neurons releasing neuromodulators such as acetylcholine (ACh) and noradrenaline (NA), hypothesized to prioritize the amplification of information from external stimuli (Adamantidis2019; aru_apical_2020). In contrast, neuromodulator concentrations during NREM are reduced compared to Wake, while REM is characterized by high ACh and low NA levels (hobson_rem_2009). During wakefulness, the modulation provides a high activity target for the discriminator which is decreased during REM sleep and entirely gated off during NREM sleep. Furthermore, plasticity in our generative pathway () is suppressed during NREM sleep and sign-switched during REM sleep (Fig. 2). The NREM-related suppression of plasticity may result from a global reduction of all the involved neuronmodulators, in particular NA and ACh (see, e.g., Marzo2009; Mitsushima2013). The REM-switch may be induced by inhibiting backpropagating action potentials to the apical dendritic tree of the cortical pyramidal neurons representing the -network and cause plasticity to switch sign (Sjostrom2006a; McKay2007; koch_hebbian_2013).
Second, our model requires computation and representation of (mis)matches between top-down and bottom-up activity. This information may be represented by layer 5 pyramidal neurons that compare bottom-up with top-down inputs from our and pathways (see Fig. 7b and Larkum2013). A more explicit mismatch information between the two pathways may be represented by subclasses of layer 2/3 pyramidal neurons (Keller2018).
Third, our computational framework assumes effectively separate feedforward and feedback streams. In contrast, cortical structure shows an abundance of cross-projections (gilbert_top-down_2013). In our model, a strict separation is required to prevent ‘information shortcuts’ across early sensory cortices which would prevent learning of good representations in higher sensory areas. This suggests that for significant periods of time, intra-areal lateral interactions between our cortical feedforward and feedback pathways are effectively gated off in most of the areas. This may be achieved through additional classes of pyramidal neurons (the red ‘layer 5 pyramidal neurons’ in Fig. 7b) that cross-link the and pathways and that are shunted, for instance, by parvalbumin (PV) interneurons (Bartos2012; Safari2017) in most areas, but not in the areas that calculate the match and mismatch.
Forth, beyond the mechanisms discussed above, our model assumes that cortical circuits can efficiently perform credit assignment, similar to the classical error-backpropagation algorithm. Assuming an implementation following the dendritic error-backpropagation model proposed by sacramento_dendritic_2017, our model would require additional feedforward and feedback connections on each neuron (Fig. 7b, dotted lines). For example, neurons in the feedforward pathway would not only project to higher cortical areas to transmit signals, but additionally project back to earlier areas to allow these to compute local errors required for credit assignment. This implementation would thus predict two subclasses of cortical pyramidal neurons (representing the and networks) that are driven mostly by either bottom up or top-down inputs, but also receive a feedback copy from the projection area to the apical tree that, after learning, is explained away by the lateral inhibition, say via somatostatin (SST) neurons (Fig. 7b, dotted lines; interneurons not shown).
Overall, our proposed adversarial dreaming mechanism can be mechanistically implemented in cortical networks through different classes of pyramidal and interneurons, with a biological version of supervised learning based on a dendritic prediction of somatic activity(Urbanczik2014), and a corresponding global modulation by the sleep state.
Memory semantization describes the process of extracting latent (hidden) structure from episodic memories (meeter_consolidation_2004).
Sleep has been hypothesized to facilitate this process through the extraction of regularities from episodic memories via replay.
However, the role of dreams in semantization remains unclear. Here we proposed that creating virtual sensory experiences by randomly combining episodic memories during REM sleep lies at the heart of semantic memory formation.
Based on a functional cortical architecture, we introduced the perturbed and adversarial dreaming (PAD) and demonstrated that REM sleep can implement an adversarial learning process which builds semantically organized latent representations.
The perturbed dreaming that is based on the episodic memory replay during NREM stabilizes the cortical representations against sensory perturbations.
Our computational framework allowed us to investigate the effects of specific sleep-related pathologies on memory semantization.
Together, our results demonstrate complementary effects of perturbed dreaming from a single episode during NREM and adversarial dreaming from mixed episodes during REM.
Our PADmodel suggests that the superior generalization abilities exhibited by humans and other animals arise from distinct processes during the two sleep phases: REM dreams organize representations semantically and NREM dreams stabilize these representations against perturbations.
Finally, the model suggests how adversarial learning inspired by GANs can potentially be implemented by cortical circuits and associated plasticity mechanisms.
PAD focuses on the functional role of sleep and in particular dreams. Many dynamical features of brain states during NREM and REM sleep, such as cortical oscillations (leger_slow-wave_2018) are hence ignored here but will potentially become relevant when constructing detailed circuit models of the suggested architectures. Our proposed model of sleep is complementary to theories suggesting that sleep is important for physiological and cognitive maintenance (mcclelland_why_1995; kali_off-line_2004; tononi_sleep_2014; renno-costa_computational_2019; van_de_ven_brain-inspired_2020).
Recent advances in machine learning, such as self-supervised learning approaches, have provided powerful techniques to extract semantic information from complex datasets (liu_self-supervised_2021)
. Here, we mainly took inspiration from self-supervised generative models combining autoencoder and adversarial learning approaches(radford_unsupervised_2015; donahue_adversarial_2016; berthelot_understanding_2018; liu_self-supervised_2021). In contrast to these GANs variants, our model removes many optimization tricks which are challenging to implement in biological substrates, while maintaining a high quality of latent representations. As our model is relatively simple, it is amenable to implementations within frameworks approximating backpropagation in the brain (sacramento_dendritic_2017; lillicrap_backpropagation_2020). However, some components remain challenging for implementations in biological substrates, for example convolutional layers (but see pogodin_towards_2021) and batched training (but see marblestone2016toward).
To make semantic knowledge robust, a computational strategy consists of learning to map different sensory inputs containing the same object to the same latent representation. This was shown to enhance the semantic structure of latent representations (gidaris_unsupervised_2018; zbontar2021barlow) by maximizing embedding distances between unrelated images while maintaining similarity between highly related views. Our NREM phase does not require to store raw sensory inputs to create altered inputs necessary for such data augmentation and instead relies on (hippocampal) replay being able to regenerate similar inputs from high-level representations stored during wakefulness. Our results obtained through perturbed dreaming during NREM provide initial evidence that this dream augmentation may help during semantization. Introducing more specific modifications of the replayed activity, for example mimicking translations or rotations of objects, may further contribute to the formation of invariant representations.
In our REM phase, different mixing strategies could be considered. For instance, latent activities could be mixed up by retaining some vector components of a representation and use the rest from a second one (beckham_adversarial_nodate). Moreover, more than two memory representations could have been used. As shown in our results, mixing memories during REM sleep is an important feature to find organized latent representations and we expect that other mixing strategies would similarly improve latent representations over replaying a single episodic memory.
Here we used a simple linear classifier to measure the quality of latent representations, which is an obvious simplification with regards to cortical processing. Note however that also for more complex ‘readouts’, organized latent representations allow for more efficient and faster learning (silver2017predictron; ha2018world; schrittwieser2020mastering). PAD assumed that training the linear readout does not lead to weight changes in the encoder network. However, in cortical networks, downstream task demands likely shape the encoder, which could in our model be reflected in ‘fine-tuning’ the encoder for specific tasks (compare liu_self-supervised_2021).
PAD makes several experimentally testable predictions at the neuronal and systems level. First, our NREM phase assumes that hippocampal replay generates perturbed wake-like early sensory activity (see also ji_coordinated_2007) which is subsequently processed by feedforward pathways. Moreover, our model predicts that over the course of hippocampal-cortical memory consolidation, sensory-evoked neuronal activity and internally-generated activity during sleep becomes more similar. In particular, we predict that NREM activity reflects patterns of wake activity, while REM activity resembles wake activity but remains distinctively different due to the creative combination of episodic memories. Future experimental studies could confirm these hypotheses by recording early sensory activity during wakefulness, NREM and REM sleep at different developmental stages and evaluating commonalities and differences between activity patterns. Previous work has already demonstrated increasing similarity between stimulus-evoked and spontaneous (generated) activity patterns during wakefulness in ferret visual cortex (berkes2011spontaneous). On a behavioral level, the improvement of internally-generated activity patterns correlates with the development of dreams in children, that are initially unstructured, simple and plain, and gradually become full-fledged, meaningful, narrative, implicating known characters and reflecting life episodes (nir_dreaming_2010).
Second, our model suggests that the extraction of semantic information from episodic memories is mainly driven by REM sleep. A few experimental paradigms have been designed to evaluate the effects of sleep on semantization (friedrich_generalization_2015; lewis_how_2018; lerner_sleep_2019), for example by measuring the formation of false memories (lutz_sleep_2017). Future experimental studies could further delineate the specific impact of sleep stages in this process by selectively depriving participants from REM sleep and testing their ability to form false memories.
On a neuronal level, one could selectively silence feedback pathways during REM sleep, for example by manipulating VIP interneurons via optogenetic tools (batista_modulation_2018).
Our model predicts that this manipulation of cortical activity would significantly impact the animal’s generalization capabilities, similar to a reduction of theta rhythm during REM sleep (boyce_rem_2017).
Finally, the adversarial dreaming offers a theoretical framework to investigate neuronal correlates of normal versus lucid dreaming (Dresler2012; Baird2019).
While in normal dreaming the internally generated activity is perceived as externally caused, in lucid dreaming it is perceived as what it is, i.e. internally generated.
These are the same concepts that adversarial dreaming manipulates when teaching the generative network to produce wake-like sensory activity that is classified by the discriminator as externally caused.
In fact, lucid dreaming shares EEG patterns from both wake and non-lucid dreaming (Voss2009).
We hypothesize that the ‘neuronal conductor’ that orchestrates adversarial dreaming is also involved in lucid dreaming. Our cortical implementation suggests that the neuronal conductor could gate the discriminator teaching via apical activity of cortical pyramidal neurons.
The same apical dendrites were also speculated to be involved in conscious perception (Takahashi2020), dreaming (aru_apical_2020), and in representing the state and content of consciousness (Aru2019).
We have demonstrated that sleep, and in particular dreams, can provide significant benefits to extract semantic concepts from sensory experience. By bringing insights from modern artificial intelligence to cognitive theories of sleep function, we suggest that memory semantization is a creative process orchestrated by brain-state-regulated adversarial games between feedforward and feedback streams. Our framework unifies several views of memory consolidation during sleep by proposing that creative dreaming and hippocampal replay work in harmony for forming semantized and robust cortical representations.
4.1 Network architecture
The network consists of two separate pathways, mapping from the pixel to the latent space (‘encoder’/’discriminator’) and from the latent to pixel space (‘generator’). Encoder/Discriminator and Generator architectures follow a similar structure as the DCGANs model (radford_unsupervised_2015). The encoder has four convolutional layers (lecun_deep_2015) containing and channels respectively (Fig. 8).
Each layer uses a kernel, a padding of (
for last layer), and a stride of, i.e., feature size is halved in each layer. All convolutional layers except the last one are followed by a LeakyReLU non-linearity (leakyrelu). We denote the activity in the last convolutional layer as . An additional convolutional layer followed by a sigmoid non-linearity is added on top of the second-to-last layer of the encoder and maps to a single scalar value , the internal/external discrimination (with putative teaching signal or ). We denote the mapping from to by . and thus share the first three convolutional layers. We jointly denote them by , where (Fig. 8).
Mirroring the structure of , the generator has four deconvolutional layers containing , and channels. They all use a kernel, a padding of ( for first deconvolutional layer) and a stride of , i.e, the feature-size is doubled in each layer. The first three deconvolutional layers are followed by a LeakyReLU non-linearity, and the last one by a tanh non-linearity.
As a detailed hippocampus model is outside the scope of this study, we mimic hippocampal storage and retrieval by storing and reading latent representations to and from memory.
We use the CIFAR-10 (cifar10) and SVHN (svhn) datasets to evaluate our model. They consist of pixel images with three color channels. We consider their usual split into a training set and a smaller test set.
4.3 Training procedure
We train our model by performing stochastic gradient-descent with mini-batches on condition-specific objective functions, in the following also referred to as loss functions, using the ADAM-optimizer(, ; kingma_adam_2017) with learning rate of 0.0002 and mini-batch size of 64. We rely on our model being fully differentiable. The following section describes the loss functions for the respective conditions.
4.3.1 Loss functions
In the Wake condition, we minimize the following objective function, composed of a loss for image encoding, a regularization, and a real/fake (external/internal) discriminator,
and learn to reconstruct the mini-batch of images similarly to autoencoders (bengio_representation_2013) by minimizing the image reconstruction loss defined by
where denotes the size of the mini-batch. We store the latent vectors corresponding to the current mini-batch for usage during the NREM and REM phases.
We additionally impose a Kullback-Leibler divergence loss on the encoder
. This acts as a regularizer and encourages latent activities to be Gaussian with zero mean and unit variance:
where is a distribution over the latent variables , parametrized by mean, and is the prior distribution over latent variables. is trained to minimize the following loss:
where denotes the dimension of the latent space and where and represent the th elements of respectively the empirical mean and empirical standard deviation of the set of latent vectors .
As part of the adversarial game, is trained to classify the mini-batch of images as real. This corresponds to minimizing the loss defined as sum across the mini-batch size ,
Note that, in principle, can be any GAN-specific loss function (gui_review_2020). Here we choose the binary cross-entropy loss.
Each Wake phase is followed by a NREM phase. During this phase we make use of the mini-batch of latent vectors stored during the Wake phase. Starting from a mini-batch of latent vectors, we generate images . Each obtained image of is multiplied by a binary occlusion mask of the same dimension. This mask is generated by randomly picking two occlusion parameters, occlusion intensity and square size (for details see Sec. 4.3.2). The encoder learns to reconstruct the latent vectors by minimizing the following reconstruction loss:
where denotes the element-wise product.
In REM, each latent vector from the mini-batch considered during Wake is combined with a randomly chosen latent vector from the previous mini-batch, leading to a mini-batch of latent vectors . Here, , where is the previous mini-batch of latent activities. Using an uniformly random in the unit interval, instead of , does not significantly impact learning as long as many samples are used (data not shown). This batch of latent vectors is passed through to generate the associated images . In this phase, the loss function encourages to classify as fake, while adversarially pushing to generate images which are less likely to be classified as fake by the minimax objective
In our model, the adversarial process is simply described by a full backpropagation of error through and with a sign switch of weight changes in .
In summary, each Wake-NREM-REM cycle consists of: 1) reconstructing a mini-batch of images during Wake, 2) reconstructing a mini-batch of latent activities during NREM with perturbation of , and 3) replaying convex combined with from the -th cycle. One training epoch is defined by the number of mini-batches necessary to cover the whole dataset. The evolution of losses with training epochs is shown in Fig. 10 and Fig. 11. The whole training procedure is summarized in the pseudo-code implemented in Algorithm 1.
4.3.2 Image occlusion
Following previous work (zeiler_visualizing_2013), grey squares of various sizes are applied along the image with a certain probability (Fig. 9). For each mini-batch, a probability and square size were randomly picked between and , and respectively. We divide the image into patches of the given size and we replace each patch with a constant value (here, 0) according to the defined probability.
4.4.1 Training of linear read-out
A linear classifier is trained on top of latent features , with , where is the number of training dataset images. A latent feature is projected via a weight matrix to the label neurons to obtain the vector .
This weight matrix is trained in a supervised fashion by using a multi-class cross-entropy loss. For a feature labelled with a target class , the per-sample classification loss is given by
Here, is the conditional probability of the classifier defined by the linear projection and the softmax function
The classifier is trained by mini-batch () stochastic gradient descent on the loss with a learning rate for epochs, using the whole training dataset.
4.4.2 Linear separability
Following previous work (hjelm_learning_2019), we define linear separability as the classification accuracy of the trained classifier on inferred latent activities from a separate test dataset . Given a latent feature , class prediction is made by picking the index of the maximal activity in the vector . We ran several simulations for 4 different initial parameters of and
and report the average test accuracy and standard error of the mean over trials. To evaluate performance on occluded data, we applied random square occlusion masks on each sample fromfor a fixed probability of occlusion and square size. We report only results for occulusions of size , after observing similar results with other square sizes.
4.4.3 tSNE visualization
To visualize the 256-dimensional latent representation of the trained model we used t-stochastic neighborhood embedding (tSNE; tSNE). To highlight the structured representations learned by our model, we projected the latent activities through the linear layer of the trained linear classifier used for linear separability experiments (without softmax non-linearity), obtaining -dimensional vectors. We then display these vectors in a two-dimentional plot via tSNE projection.
4.4.4 Latent-space organization metrics
Intra-class distance is computed by randomly picking pairs of images of the same class, projecting them to the encoder latent space and computing their Euclidian distance. This process is repeated over the classes in order to obtain the average over classes. Similarly, inter-class distance is computed by randomly picking pairs of images of different classes, projecting them to the encoder latent space and computing their Euclidian distance. The normalized intra-class distance is obtained by dividing the mean intra-class distance by the mean inter-class distance. Clean-occluded distance is computed by randomly picking pairs of non-occluded/occluded images, projecting them to the encoder latent space and computing their Euclidian distance. The normalized clean-occluded distance is obtained by dividing the clean-occluded distance by the mean inter-class distance. We performed this analysis for several different trained networks with different initial conditions and report the mean distances and standard error of the mean over trials.
4.4.5 Fréchet inception distance
, Fréchet inception distance (FID) is computed by comparing the statistics of generated (NREM or REM) samples to real images from the training dataset projected through an Inception-v3 network pre-trained on ImageNet
where and represent the empirical mean and covariance of the -dimensional activations of the Inception v3 pool3 layer for pairs of data samples and generated images. Results represent mean FID and standard error of the mean FID over different trained networks with different initializations.
4.4.6 Modifications specific to pathological models
A few adjustments were empirically observed to be necessary in order to obtain a fair comparison between each condition. When removing the REM phase during training, we observed a decrease of linear separaribility after some () epochs. We suspect that this decrease is a result of overfitting due to unconstrained autoencoding objective of and . Models trained without REM hence would not provide a good baseline to reveal the effect of adversarial dreaming on linear separability. For models without the REM phase, we hence added a vector of Gaussian noise to the encoded activities of dimension before feeding them to the generator. Thus, Eq. 2 becomes:
which stabilizes linear separability of latent activities around its maximal value for both CIFAR-10 and SVHN datasets until the end of training.
Furthermore, we observed that the NREM phase alters linear performance in the absence of REM (w/o REM condition). To overcome this issue, we reduced the effect of NREM by scaling down its loss with a factor of . This enabled to benefit from NREM (recognition under image occlusion) without altering linear separability on full images.
This work has received funding from the European Union 7th Framework Programme under grant agreement 604102 (HBP), the Horizon 2020 Framework Programme under grant agreements 720270, 785907 and 945539 (HBP), the Swiss National Science Foundation (SNSF, Sinergia grant CRSII5-180316), the Interfaculty Research Cooperation (IRC) ‘Decoding Sleep’ of the University of Bern, and the Manfred Stärk Foundation. The authors thank the IRC collaborators Paolo Favaro for inspiring discussions on related methods in AI and deep learning, and Antoine Adamantidis and Christoph Nissen for helpful discussions on REM/NREM sleep phenomena in mice and humans.
6 Supplementary material
6.1 Training losses for full and pathological models
In the following, we report the measured losses over training for the various different pathological conditions. and are optimized for each condition and systematically decrease with learning, while is only reduced in models with NREM (Figs. 10, 11). Discriminator loss and generator loss are only optimized in models with REM, showing a progressive decrease of the discriminator loss in parallel with an increase of the generator loss, reflecting adversarial learning between the two streams.
6.2 Linear classification performance
We report the mean and standard error of the mean (SEM) of the final linear classification performance (epoch 50) on latent representations of from the PAD and pathological models in Table 1.
|Dataset||PAD||w/o memory mix||w/o REM||w/o NREM||Wake only|
We also report the linear classification performance for the full and pathological models over epochs. Linear separability for the "w/o REM" (Figs. 12c,d, pink curves) and "w/o memory mix" (Figs. 12d, purple curve) conditions do not reach levels of the full model (Figs. 12c,d, black curves) even after many training epochs. Furthermore, without NREM (Figs. 12c,d, "w/o NREM" and "Wake only", orange and gray curves), linear separability tends to decrease after many training epochs, suggesting that NREM helps to stabilize performance with training by preventing overfitting.
6.3 Replaying multiple episodic memories during NREM sleep
While in the main text we considered NREM to use only a single episodic memory, here we report results for a model in which also NREM uses multiple (here: two) episodic memories. In the full model (Fig. 13, black curves, same data as in Fig. 5c,d), NREM uses a single stored latent representation. Here we additionally consider an additional model in which, these representations are mixed by convex combination, similar as during REM sleep. The slightly better performance of a single replay suggests that replay from single episodic memories as postulated to occur during NREM sleep is more efficient to robustify latent representations against input perturbations.