Digital Neural Networks in the Brain: From Mechanisms for Extracting Structure in the World To Self-Structuring the Brain Itself

05/22/2020
by   Alexandre Pitti, et al.
0

In order to keep trace of information, the brain has to resolve the problem where information is and how to index new ones. We propose that the neural mechanism used by the prefrontal cortex (PFC) to detect structure in temporal sequences, based on the temporal order of incoming information, has served as second purpose to the spatial ordering and indexing of brain networks. We call this process, apparent to the manipulation of neural 'addresses' to organize the brain's own network, the 'digitalization' of information. Such tool is important for information processing and preservation, but also for memory formation and retrieval.

READ FULL TEXT VIEW PDF
08/18/2014

Brain: Biological noise-based logic

Neural spikes in the brain form stochastic sequences, i.e., belong to th...
12/07/2013

Region and Location Based Indexing and Retrieval of MR-T2 Brain Tumor Images

In this paper, region based and location based retrieval systems have be...
11/17/2014

Can we build a conscious machine?

The underlying physiological mechanisms of generating conscious states a...
06/09/2022

Information And Control: Insights from within the brain

The neural networks of the brain are capable of learning statistical inp...
07/23/2019

Temporal connection signatures of human brain networks after stroke

Plasticity after stroke is a complex phenomenon initiated by the functio...
04/05/2019

Inferring the temporal structure of directed functional connectivity in neural systems: some extensions to Granger causality

Neural processes in the brain operate at a range of temporal scales. Gra...

Short abstract

In order to keep trace of information, the brain has to resolve the problem where information is and how to index new ones. We propose that the neural mechanism used by the prefrontal cortex (PFC) to detect structure in temporal sequences, based on the temporal order of incoming information, has served as second purpose to the spatial ordering and indexing of brain networks. We call this process, apparent to the manipulation of neural ’addresses’ to organize the brain’s own network, the ’digitalization’ of information. Such tool is important for information processing and preservation, but also for memory formation and retrieval.

Long abstract

What differentiates essentially the Pre-Frontal Cortex (PFC) from other parts of the brain is its ability to extract structure in raw data by separating which items to encode from where they are located.

Based on this proposal, our first hypothesis (H1) is that the conjunctive cells in PFC are responsible for ’encoding’ where information is located (the neurons’ address), but not what type of information is encoded per se (what the neurons represent). This process is done by representing the relative order of items in sequences, their rank-order, and not their index. Our second hypothesis (H2) is that simultaneously, the code used by these conjunctive cells are exploited to ‘decode’ iteratively the location of an item in the brain.

In a third hypothesis (H3), we propose that the mechanism for detecting structure in the world (H1) and for retrieving particular items and sequences (H2) participates also to the self-structuring of the brain’s organization itself and to what we call the ‘digitalization’ of information, for robustness purpose.

Our theory proposes to understand (H1) how the PFC can extract structures in the world and synthesizes new ones (compositionality), (H2) what efficient coding mechanism is used to detect, learn and retrieve back relevant information and (H3) how we can achieve the engineering of brain-inspired ‘digital’ neural networks.

We will confront our ideas to behavioral and brain results and demonstrate how the mechanisms of spiking neurons possess the computational power for indexing hierarchically information in order to make possible the creation of very large neural networks.

Keywords: prefrontal cortex, structure learning, mirror neurons system, rank-order codes, predictive coding, Broca area, hierarchical nested trees, digital networks.

1 Introduction

More than any other area in the brain, the prefrontal cortex (PFC) plays a major role for the acquisition of models, patterns and for the manipulation of structured knowledge. For instance, the features of the PFC make it an important place for the development of logical inference and algebra, for the acquisition of language and music Dehaene et al. (2015); Koechlin (2014, 2016); Rouault and Koechlin (2018), for the learning of task sets and the resolution of rule-based problems Romo et al. (2018); Tanji and Hoshi (2001); Wang et al. (2018). We propose that what does essentially the PFC is to separate and to manipulate items and patterns; see tables 2) and  2) in the Annex section for a glossary of the two terms employed in the paper. By doing so, the PFC participates to what we call the  digitalization of neural information in the brain. We will develop this idea throughout the paper.

Observations in the Broca area and in the pre-Supplementary Motor Area (pre-SMA) confirmed the existence of such neurons sensitive to patterns only. For instance, some of them were found sensitive to the temporal order in audio sequences and to proto-grammars but not to the particular sound emitted Friederici (2011); Gervain et al. (2008); Benavides-Varela and Gervain (2017). Other neurons were found active to the syntax of actions performed in motor sequences but not to the particular motor units within Tanji and Hoshi (2001); Shima et al. (2007); Tanji et al. (2007) and different ones were observed salient to the temporal coherence in visual scenes Fadiga et al. (2009); e.g. its semantic. Some similar results were found with neurons sensitive to orders, schemata in spatial contexts Barone and J.P. (2018), and to geometrical rules in the recognition of shapes Averbeck et al. (2003a, b) or in visual sequences Wang et al. (2019). Surprisingly, these neurons were all found insensitive to the particular sound, action or visual information composing the sequence presented per se, but only to the specific patterns or schemata that they were encoding such as AABB or ABAB or AAAA, or to a relative order in a temporal sequence (e.g., the beginning, the second place or at the end) or to a relative location in space.

Figure 1: Global schema on the putative functional organization of the brain between the frontal area and the posterior areas. a), our framework proposes that the conjunctive cells in the Prefrontal Cortex along with the dense long-range axons serving to connect the PFC to other parts of the brain are responsible for encoding where information is located (the neurons’ address), using a rank-order code. b), these conjunctive cells are used to decode iteratively the location of an item. Rank-order codes can be viewed as binary trees to perform spatial search in large intervals. This decoding can be implemented by trial and error corrective methods (e.g., predictive coding). We suggest that the features found in the PFC for the detection of structure in the world participate also to the self-structuring of the brain’s organization itself and to what we call the digitalization of information, for robustness and efficacy in communication and information processing.  Image Source: http://brain.labsolver.org/Yeh et al. (2018).

Our core proposition is that what differentiates essentially the PFC from other parts of the brain is its ability to extract structure in raw data by separating which items to encode (i.e., corresponding to their identity) from where they are located (i.e., corresponding to their addresses). This proposition is implemented through two rank-order coding mechanisms (ROC1 and ROC2) that will provide the item’s rank and the item’s temporal order Van Rullen et al. (1998); Thorpe et al. (2001); Van Rullen and Thorpe (2002). Based on this proposal, our first hypothesis (H1) is that the conjunctive cells in PFC along with the dense long-range axons serving to connect the PFC to other parts of the brain are responsible for encoding where information is located (the neurons’ address), but not what type of information is encoded (the neurons’ value or item they represent per se); see Fig. 1. In other words, the prefrontal cortex may have learned neuronal pointers to retrieve back information Zylberberg et al. (2011); Eliasmith et al. (2012). As a metaphor, we can see the PFC to act as a mailman, having no idea about the identities of the sender or of the receiver but an exact knowledge where they are located (e.g., their mailing address). Therefore, in order to drive robustly the retrieval of one durable information, the PFC must use these conjunctive cells to encode redundantly its location (e.g., through population coding).

Our second hypothesis (H2) is that simultaneously, these conjunctive cells are used to decode

iteratively the location of an item. This decoding can be implemented by trial and error corrective methods (reinforcement learning: DECOD1) or with faster mechanisms (predictive coding: DECOD2). Used redundantly as basis functions, they may help to retrieve back any exact sequences of neurons, even though none of these neurons engram the strict information about the units’ identity, but only their potential membership to it with respect to one particular set of neurons. Since they code multiple potential membership of units to clusters, they can be identified as neural pointers and serve to the broadcast of information at the brain level or to the generation of any new sequences following the particular temporal pattern they have learned. Therefore, on the one hand, the conjunctive cells may produce a distributed code for representing the relative membership of neurons to particular clusters or their relative order in temporal sequences (H1). On the other hand, the dense long-range axons may achieve to select dynamically disparate neurons off-the-shelf into one coherent cluster for the current task (H2); see Fig. 

1.

In a third hypothesis (H3), we defend the idea that the features found in the PFC for the detection of structure in the world participate also to the self-structuring of the brain’s organization itself and to what we call the digitalization of information. This digitalization is used for robustness against noise, for protecting information and for energy efficiency. We suggest that these features used for finding structure in signals might serve, by extension, for the re-organization and structuring of information into the brain itself for robust memory recall, for faster processing and for avoidance of catastrophic forgetting. Thus, we suggest that this very essential neural mechanism for detecting patterns in incoming signals has been exploited by the brain itself for patterning and optimizing also the memory retrieval over large neural ensembles, giving rise to dimensionality reduction and compositionality features. As a twist of evolution and of embodiment, we propose therefore that this mechanism could have shaped information processing in the brain and its functional organization for efficiency purpose.

In summary, we propose to understand (H1) how the PFC can extract structures in the world and synthesizes new ones (compositionality), (H2) what efficient coding mechanism is used to detect, learn and retrieve back relevant information and (H3) how we can achieve the engineering of brain-inspired digital neural networks. This is a novel class of neural networks that we want to present here.

Figure 2: Coding and decoding sequences of neurons by extracting their temporal structure and the relative order of neurons. Gating operation for learning structure in sequences based on rank-order coding Pitti et al. (2020)

. In a), we can discriminate the items’ index (rank #) from their position (order) to represent one sequence. By separating the two, we can extract the temporal pattern and arrange items in a different order. Hence, the coding of the temporal pattern can make it robust to variability and can represent many sequences (generalization). This process is operated by a gain-modulation or gating mechanism explained later. In b), the combination of these temporal patterns can serve to compose any novel temporal pattern in the same fashion as radial basis functions would do (e.g., Fourier Transform).

As an analogy, human-made networks are using also communication strategies based on the separation between address and information, channel and source to overcome noise and to achieve information retrieval Shannon (1948); Shannon and Weaver (1963). Similar mechanisms (but implemented differently) might have developed in the brain. We propose that the brain has found this solution to overcome its own complexity for processing, protecting and backing-up information and this solution has some similar principles found in current telecommunication networks.

In what follows, we will first present our model in section 2. We will focus first on the ROC1 and ROC2 mechanisms, and then on the DECOD1 and DECOD2 mechanisms. In section 3, we will discuss the neurocognitive foundations before drawing links with cognitive sciences and developmental studies in section 4

and finally Information Theory and Machine Learning in section 

5.

Figure 3: Rank-order algorithm for compressive rank representation. We describe the two-step process carried out with the rank-order coding algorithm to model the Spike Timing-dependent Plasticity rule and the Gating mechanism. In a), two sequences in cyan and magenta are represented with different neuron index (idx), different timing but the same temporal structure (up-down-up-down). In b), the rank-order coding algorithm is used to quantify any sequence in the temporal domain with discrete timing; e.g. first ranked, second ranked. This is a rough approximation of the STDP rule. The index of the neurons are kept and only the temporal information is lost. In c), we can for a second time use the rank-order coding algorithm to now suppress the neurons’ index (their identity) within the sequence in order to keep only their rank (#) within the sequence. This second process makes possible a temporal pattern, a compressive representation of the two sequences in which only the rank order is kept. This second process reduces drastically the amount of information to encode any sequence, irrespective of the neurons’ index and their precise timing. For any sequence of length , the problem dimensionality is reduced to .

2 Explanation of the model

We will present a neuro-computational model inspired by the prefrontal organization using ideas developed in the introduction called Inferno Gate and standing for Gated Spiking Recurrent Neural Network using Iterative Free-Energy Optimization 

Pitti et al. (2020). This model uses the rank-order coding algorithm to learn a compact code of the relative location of neurons within a sequence in a distributed fashion and predictive coding to retrieve back the original sequence almost error-free within very large databases. In comparison to current neural networks, it demonstrates rapid learning on long sequences, even from a large database with rare events, and rapid iterative decoding of the original sequences almost without loss, even if information provided is scarce. Both tasks are still difficult for recent machine learning algorithms that use statistical regression for learning correlations between items, and require therefore a huge amount of data. To our knowledge, it is one of the first neural network, bio-inspired, that we can label as digital in our sense due to its manipulation of relative memory neural addresses and because it can be explained from an information theory perspective using general principles of error-correction decoding and redundant encoding of modern communication theory. It reveals also some computational advantages of Hebbian learning and spiking neurons never examined before for compact coding, fast sorting and for the hierarchical representation of sequences as nested trees for encoding and search.

In order to better understand what are digital neural networks and their capabilities, we introduce first the coding mechanism we applied for representing neural addresses as a distributed code (ROC1 and ROC2). We will describe then in a second part how a decoding mechanism based on predictive coding can manipulate this distributed code to retrieve back the neurons’ identity and their position in a sequence taken from a large assembly of neurons (DECOD2).

2.1 Computational features of rank-order codes

In our model of prefrontal neurons presented in Pitti et al. (2020), we applied the rank-order algorithm (ROC1) to learn the relative order of the elements within a sequence, which corresponds to the learning of their rank Van Rullen et al. (1998); Thorpe et al. (2001); Van Rullen and Thorpe (2002); Botvinick and Watanabe (2007); Pitti et al. (2012); Abrossimoff et al. (2018).

For example, for two time series of neurons and with index respectively and , the same rank code is used to represent their underlying pattern, starting from the lower-rank to their higher-rank. This coding is not bijective, since the resulting rank code gives only some interval constraints about possible sequences and possible index within it; e.g., neurons have a lower rank or a higher rank in comparison to others within a set. By doing so, ROC1 permits to have neurons sensitive to temporal patterns only and to learn structure without any other type of information (i.e., the neurons’ index). In decoding processing (DECODx), the rank-order coding can ease the task of retrieving neurons’ index by imposing interval constraints on the search space of potential valid ones. For example, in the search of the last element in the sequence , if we choose the rank code , it imposes that the index of the missing element with rank is below the index of the neuron of rank which is index and therefore has to be searched within the interval . This ensemblist view of sequence representation, found in logic, will be described further in section. 2.1.3.

2.1.1 Computational property of stack-sortable (tree) permutations

In computation theory, the obtained rank-order codes (ROC1 and ROC2) can be seen as stack-sortable permutations and represented by hierarchical binary search trees Knott (1968, 1977), see Fig. 4 (i-iv). A stack-sortable permutation may be decoded into a tree in which the first value x of the permutation corresponds to the root of the tree, the next x-1 values are decoded recursively to give the left child of the root, and the remaining values are again decoded recursively to give the right child wik . These binary trees are exploited to perform fast dichotomic search of elements in large ensembles by constraining progressively the search space of elements, see Fig. 4 (v- vii). The computational cost has been defined to be in linear polynomial time.

Another property of stack-sortable permutation is to form a Dyck language, which is a string of balanced parentheses, capable to form any regular context-free grammar, arithmetic or algebraic expressions; see Fig. 4 (v- vii). Thus, the neurons with rank-order coding mechanism have potentially the computational tools for encoding and decoding nested trees and hierarchical information; e.g., grammar and languages.

In our case, the Hebbian-based rank-order coding algorithm permits to avoid implementing such kind of algorithmic function and to have neurons directly sensitive to relative order, structure and hierarchies in signals. One neural decoding mechanism may use its features to accelerate search in large neural populations in a structured fashion.

As the data identity is now lost, it is only through population coding that a redundant encoding of the sequence can be done and that a decoding mechanism can be used further to retrieve back the original sequence. The population coding permits a distributed representation of the neurons’ index similar to radial basis functions, which can approximate universally any functions such as the Fourier Transform does with sinus units.

This method has several advantages. For instance, eliciting data from the sequence to encode permits to eliminate the burden to learn them and to recollect them. Furthermore, in terms of sparsity, it is more costly to learn items and their locations in a sequence than separating the two information; e.g., if one repeating sequence exists, for example AAB or ABA, then we can reduce the number of neurons necessary for coding it at once.

Figure 4: Neural assembly recruiting as a spatial ordering optimization process. In a), we see the activation of neural clusters in the anterior part of the cortex as a top-down optimization process done under the supervision of the PFC, see (i). The selection of the correct spatio-temporal sequences –, i.e. the neurons’ index and their order in the sequence, see (ii),– is done by the rank-order codes in the PFC, which represent in a compact manner the ordinal structure of sequences, not at the unit-level, see (iii). These rank-order codes construct a higher-level repertoire that guides the optimization search using free-energy and error prediction for generating and retrieving neural sequences in large ensemble of neurons. In b), the rank-order codes have some original computational properties. Rank-order codes can be described as relative codes that give only an ordinal information about items rank in a sequence; e.g., X is bigger than Y but lower than Z. In a retrieval task, ranking codes can be seen as interval sets that constraint the spatial search of items, see (v). Rank-order codes can be seen also as stack-sortable permutation trees and can represent therefore any context-free grammars, see (vi). The bracket representation of sequences shows the cumulated constraints imposed on the interval search, see (vii).

2.1.2 Scale-invariance property of rank-order codes

Althought not necessarily named as fractal codes, rank-order codes permit to represent information invariant to scales, because they are not sensitive to the particular value (or index) of the neurons but to their relative order (or locations) in a sequence, as seen in section 2.1.

As for Wavelet Transform, self-similarity patterns can be advantageous for the organization of memory in neural networks, for compactness and for information retrieval. We will develop further its features in rank-order codes for encoding and decoding tasks in section 3.1 and in section 3.4, in which we discuss about the topology in brain networks, and what it might imply in terms of information processing to organize knowledge.

2.1.3 Set theory view of rank-order codes

Because rank-order codes represent relative order of elements in sequences, they generate a discrete code representative of an order magnitude or of a metric of the elements within a sequence. It imposes also interval constraints on the elements constituting the sequence; for example, the index of the first ranked element is necessarily lower than the index of the second, the index of the second element is necessarily lower than the third one and so on. Thus, rank-order codes can be viewed also as restrictive set intervals with brackets representation, such as in Fig. 4. For example, the rank-order code constraints the choice of each element constituting the sequence such that .

We propose therefore that rank-order codes and spiking neurons have the computational capabilities to represent logical statements similar to those in set theory, and to detect, retrieve, manipulate or to compose any. This extends what we have written about the other computational properties for hierarchical representations and search in section 2.1.1.

2.2 Rank-order coding implementation

Our coding strategy consists in discretizing the serial order of units both in time (ROC1) and space (ROC2) based on their rank order, see Fig. 3

. ROC1 can be constructed from the Spike Timing-Dependent Plasticity mechanism in synapses 

Bi and Poo (1998); Abbott and Nelson (2000); Song et al. (2000); Izhikevich et al. (2004).

Here, the index of the neurons (or their identities) are no longer preserved and it is their rank within the sequence that is taken into account; e.g. first, second or n-th in the sequence. This strategy reduces drastically the amount of information to process, which makes possible the discovery of an abstract temporal structure disregarding the units indices; by doing so, the sequence becomes a template. Since the units index is no longer present in the temporal code, it is sensitive to any novel sequences that preserve the global temporal structure. This coding mechanism is described as compressive representation by Botvinick and Watanabe (2007).

For instance, in Fig. 3, the temporal encoding of two sequences following the same spatio-temporal pattern is constructed successively by first dismissing the temporal information (ROC1) and then the identity information (ROC2) with the rank-coding algorithm first on the time axis and then on the index axis.

The problem’s dimensionality for temporal sequences of elements is reduced from a continuous time space dimension in to an intermediate representation of and then to a discrete representation of

. Although the reduction of complexity might not appear important when looking at the dimensionality of the vector quantization, all the compressive codes are now only permutations of sequences of rank

. There are therefore possible permutations, each representing one particular compressive code.

At the unit level, these compressive codes permit to detect in a compact way an infinity of varying spatio-temporal sequences that follow the same structure. At the population level, the weighted sum of these compressive codes permits then to have a descriptive representation of any particular sequences with their particular locations and index. We can make again the analogy with the Fourier transform that sees signal approximation as a weighted sum of sinus functions only. The number of sinus indicate the level of approximation and their weights the influence of each particular unit.

Because this coding strategy encodes only the relative position of neurons, it has some interesting properties in terms of compositionality. For instance, as these neurons are sensitive to specific rank-order code in sequences, they can be used to fill out any missing variables following its pattern, which corresponds well to the variable binder property found in PFC neurons Kriete et al. (2013), of time-stamp neurons in Jin et al. (2009); Wacongne et al. (2012), of neuronal pointers in Eliasmith et al. (2012) or of nonlinear-mixed selectivity neurons in Rigotti et al. (2013).

In terms of robustness to noise, another advantage of this coding strategy compared with the STDP one is that the temporal information is now learned separately from the inputs, which enables the network to learn long-range dependencies at an abstract level and to prevent it losing information more rapidly within a temporal horizon; this corresponds to the so-called vanishing gradient effect in classical and deep neural networks. As a remark, feed-forward (deep) networks, standard recurrent neural networks (with/out STDP) or hidden Markov models will easily lose accuracy after several iterations due to the accumulated errors because any errors, noise or, delays within a sequence and sensitivity to duration, will disrupt the sequence. One explanation for why any error introduced into the network will make conventional neural networks brittle is that the neural index and the temporal structure are coded together. This is less the case in neural models with a gating mechanism like PBWMs 

Kriete et al. (2013) , SPAWN Eliasmith et al. (2012) or LSTMs because the temporal information of a sequence can be learned in memory cells separately from the variable values that can be retrieved online or maintained dynamically over an indefinite amount of time. Nonetheless, their coding strategy does not have compressive or mixed codes and thus, cannot permit to have strong compositionality feature for the learning of structure in very large ensemble of neurons.

The equations of the rank-order coding algorithm that we used are as follows. The neurons’ output is computed by forming the dot product between the function rank() sensitive to a specific rank ordering within the input signal vector and the synaptic weights ; . For a vector signal of dimension and for a population of neurons ( afferent synapses), we have:

(1)

We implement the rank function rank() as a power law of the argsort() function normalized between

for modeling the gain-modulation mechanism applied twice on the time axis (ROC1) and on the index axis (ROC2). This guarantees that the density distribution is limited and that the weight matrix is sparse, which makes the rank-order coding neurons similar to radial basis functions. This attribute permits us to use them as receptive fields so that the more distant the input signal is to the receptive field, the lower is its activity level. The updating rule of the weights is similar to the winner-takes-all strategy in Kohonen networks 

Kohonen (1982) with an adaptive learning rate . For the best neuron , we have:

(2)
(3)

2.3 Predictive decoding mechanism (DECOD2)

In order to decode and retrieve back one hidden sequence, we use the predictive coding framework of free-energy minimization proposal by Friston for generating solutions through iterative optimization Friston (2003); Friston et al. (2006); Friston (2009); Friston and Kiebel (2009); Friston et al. (2016). Using intrinsic noise , we test any generated sequences, we evaluate their error based on the activity of the population of rank-order coding neurons and guide iteratively the exploration process following a reinforcement learning (RL) mechanism. We compute the error by comparing the activity level of one unit coding the sequence to its maximum amplitude level . The current input is kept for the next step , if and only if it diminishes the gradient . Over time, converges to , its optimum sequence vector, and converges to its maximal value.

We showed in Pitti et al. (2017) that this variational process is similar to an online stochastic hill-climbing algorithm performed iteratively. We have added in Pitti et al. (2020) a more sophisticated hill-climbing algorithm corresponding to simulated annealing in order to drive efficiently the exploration process. As the compressive codes are made of permutations only, more efficient dichotomy and tree-sorting techniques can be applied to drive the decoding search in linear time. Such computing process, nonetheless, might not be very biologically plausible as we lack evidences and different active inference mechanisms might be used by the brain, such as predictive coding or free-energy minimization.

In experiments done with Inferno Gate in Pitti et al. (2020) on retrieval tasks of long-range temporal sequences (fifty neurons), ordered in time, and taken out from a large ensemble (forteen thousand neurons), results showed outstanding performances almost error-free thanks to both the population coding of temporal structure and to the iterative decoding mechanism. Comparatively, in the same retrieving task, deep gated LSTM networks could not achieve the items retrieving, even with a larger amount of data provided in the learning set and a longer time to converge.

We interpret our result as follows. Because Inferno Gate has learned an intertwined coding of the sequence using basis functions –, which means that it has represented the sequence as a combination of conjunction units sensitive to different parts of its underlying structure,– it has given it a way of handling compositionality by crossing inputs and templates together. In terms of Information Theory, redundancy in the channel coding permits to resist the disturbance to noise and to have a digital-like decoding processing of the original information.

In comparison with machine learning algorithms, this approach has several advantages: (1) not to learn the items per se and to reduce the dimensionality of the code (sparse coding and computational gain), (2) to represent sequences of items by a combination of temporal pattern primitives only (population coding of temporal primitives), (3) to infer sequences of new items never seen before because the coding is based on models (model-based & compositionality), (4) to be less prone to noise because it separates the temporal structure from content (data) in sequences, which differs from statistical learning.

3 Neuro-cognitive foundations

If some digital-like decoding mechanisms exist to avoid catastrophic forgetting, as we propose it, they must confront to some evidences and results found in the brain literature.

We will detail some phenomenons and mechanisms that we can reinterpret within our framework to encode nonlinearly patterns and locations of neurons, respectively in section 3.1, to detect syntactic structures in speech, resp. in section 3.2, and abstract rules in motor sequences, resp. in section 3.3, and to describe functional organization of networks necessary to retrieve back ’lost’ neurons in large assemblies, resp. in section 3.4.

3.1 Redundant nonlinear mixed selective codes

The strongest argument in favor of our proposal comes from the discovery of very dynamic neurons found in PFC Romo et al. (2018); Machens et al. (2010); Tanji and Hoshi (2001); Shima et al. (2007) functioning with conjunctive coding of multiple stimulus features, such as sensory stimuli, task rule, or motor response. One computational description of this phenomenon is done with the mechanism of nonlinear-mixed selectivity (NMS) proposed in Rigotti et al. (2013); Fusi et al. (2016). Within this framework, high-dimensional representations with mixed selectivity allow a simple linear readout to generate a huge number of different potential responses that depend on multiple task-relevant variables.

Recent studies have shown that the Lateral PFC hosts an abundance of these neurons with mixed selectivity Parthasarathy et al. (2017); Sarma et al. (2016); Mansouri et al. (2006). In particular, neurons with nonlinear mixed selectivity are thought to play a key role in the encoding of information Fusi et al. (2016). One recent paper explains further how it may be used by the brain to support reliable information transmission using unreliable neurons Johnston et al. (2019).

As a note, the parietal cortex possesses also conjunctive cells Salinas and Sejnowski (2001); Andersen and Buneo (2002); Blohm and Crawford (2009); Genovesio et al. (2014), that bind mutual information together for spatial transformation between different reference frames, multisensory alignment and decision making. Comparative neuroanatomical studies attribute similar functions to the parietal cortex and to the prefrontal cortex, representing relative metrics or conjunctive representations Genovesio et al. (2014) such as order with relative duration, and order with relative distance; but only the PFC is in a position to generate goal-based aims in context Genovesio (2009). Furthermore, neurons with mixed codes were also found in the Hippocampus and computational models for memory retention and regeneration were proposed to preserve robustly information using fractal codes Tsuda et al. (2008); Yamaguti et al. (2011); Tsuda (2015), Haar/modulo codes [Gaussier unpublished], and generative models Stoianov et al. (2018).

We would like to give some support on the NMS mechanism with additional hypothesis/features that can help to describe it from a communication theory perspective. First, in the context of one addressable memory system, we see the role of the PFC neurons as much for encoding and categorization as for decoding and retrieving, which is not the case with the original formulation of the NMS mechanism. Second, we might see in the nonlinear mixed effect of those PFC neurons the encoding of the relative memberships or locations of neurons to particular clusters. Hence, we suggest that what we might observe in their activity is their sensitivity to multiple ’addresses’, not to variables. Differently said, they might encode temporal or spatial structure without content (H1). Third, the linear combination of mixed codes permits to represent efficiently nested representation of sequences, as we have demonstrated it in section 2.2. Thus, we suggest further that this linear combination of mixed codes can serve to decode efficiently sequences of neurons in very large assemblies based on their nonlinearity; in our framework, the nonlinearity effect is linked to the encoding of relative neural ’addresses’ (H2).

3.2 Speech structure in the Broca area

Since the seminal work of Broca, we know that the circuits in the left cortical hemisphere and in the prefrontal area implement language for perception and production of semantic and rule-based behaviors Friederici (2011).

Recently, it has been suggested that the Broca area plays a more general role as being a supra-modal “Syntax Engine” in the broaden sense, to abstract rules in other core domains and modalities such as music and action representation, as well as in visual scene understanding 

Fogassi and Ferrari (2007); Arbib (2005, 2008, 2019); Fadiga et al. (2009); Gentilucci and Corballis (2006).

Although language has been long time thought separated from other types of cognition, syntactic rules and semantic exist in other domains as well (e.g., visual and motor) for which the Broca area plays also an integrative part of their underlying processing.

The Broca area interacts heavily with the Primary Auditory Cortex (PAC) and the Superior Temporal Gyrus (STS) located in the temporal area. The former is associated with the syntax in sentences and the later are associated with the sound representation. Comparison of fMRI brain activation in sentence processing and nonlinguistic sequence mapping tasks Hoen et al. (2006) found that BA44 was involved in both the processing of sentences and abstract structure in non-linguistic sequences whereas BA45 was exclusively activated in sentence processing Arbib et al. (2014).

In speech processing, the Broca area (Broadman area 44/45) is found sensitive to the order of events in sequences and its activity level is correlated with the syntactic complexity of the sentence Dominey et al. (2003). For instance, the two slightly similar sentences ‘‘Jean said # Marie is great’’ and ‘‘Jean # said Marie # is great’’ express different meaning and different relationships between the two persons. They can be represented by different ordering trees for which the Broca area is sensitive with Friederici (2011). Experiments have shown that a higher complexity of the tree depth is correlated with a higher neural activity level in the Broca area region; see Friederici et al. (2006b, a).

In line with Dehaene et al. (2015) who supports the view that the brain holds some exclusive mechanisms for manipulating symbolic nested trees, the Broca area appears clearly to hold one of those mechanisms for the detection of the complexity pattern in sequences Rouault and Koechlin (2018). We might suspect that the Broca aera is functional very rapidly during infancy since babies and even neonates appear to be sensitive to syntax in proto-words Saffran et al. (1996); Nazzi et al. (1998); Marcus et al. (1999); Gervain et al. (2008); Benavides-Varela and Gervain (2017); see also the computational models of the frontal areas done by Dominey to explain these results in Dominey and Ramus (2000); Dominey et al. (2003, 2006). In favor to this, recent experiments done in sentence processing by Meyer and colleagues present results showing the complementary work and the functional unity between the posterior cortical areas and the Broca area Meyer et al. (2012). Their experiment permitted to trace the linkage between the process of storage of verbs done in the posterior parts of the cortex and the processing of ordering elements done in the Broca area, the two working together. According to our framework, we propose that the encoding done in the Broca area (H1) serves also for the decoding and the retrieving of the information stored in the posterior part of the cortex, and the type of code used represents addresses (H2).

In support of this separation between storage and ordering, we demonstrate in computer simulations that a model of the Broca area could create ordinal representation of sequences, sensitive to the temporal order of items composing it but not to its details Pitti et al. (2020). Our network Inferno Gate extracted easily proto-words and syntax from sequences based on the STDP rule (rank-order coding) and could generate and retrieve long sound sequences (fifty iterations length) from the ordinal representations we selected. Because fo the distinction between information and addresses, the computational cost was dramatically more effective and rapid in comparison with deep networks and standard recurrent network in retrieving tasks.

Our proposal is not the first one to resolve the manner how the frontal area extracts abstract structures in sequences, Dominey proposed the neural network Abstract Temporal Recurrent neural Network (ATRN) based on a different mechanism to explain how the Broca area operates as a Short-Term Memory Dominey and Ramus (2000); Dominey et al. (2003). Although ours might be simpler to implement and permit to reconstruct back the items’ order in sequences, it provides a broader role played by structural information to organize, generate and retrieve knowledge between the frontal and the posterior parts of the neocortex.

3.3 Representative motor schemata neurons, the mirror neurons

In support of a mechanism for detection and manipulation of highly structured information in the PFC other than speech, higher-level semantic action neurons called Mirror Neurons (MN) were found Rizzolatti et al. (1996); Rizzolatti and Craighero (2004) in the pre-Supplementary Motor Area (SMA). In these circuits, movement primitives coordinate to effect a wide variety of actions at a higher semantic-level to represent goal-directed motion such as grasping, holding, tiring, pulling etc… Mirror neurons have been discovered to fire for various action schemata depending on the type of grasp and goal Rizzolatti and Arbib (1998); Arbib (2005); Oztop et al. (2006, 2013). For instance, same mirror neurons fired whenever the monkey grasps an object from the left hand or even the right hand, and more surprisingly, when someone else also grasps an object. According to Arbib, this very striking result supports the idea of representing neurons as a common substrate for motor preparation and imagery Arbib (2005).

Arbib develops extensively a motor schema theory that we are in line with to explain Language by the assemblage of motor schemata or of representative neurons Arbib (1985, 2005, 2008). Jeannerod describes particularly well his theory in Jeannerod (1994): Arbib’s view is that motor representations are composed of elementary schemas which are activated by object affordances and can adjust to visual input. During prehension, motor schemas for the subactions ”reach,” ”preshape,” ”enclose,” ”rotate the forearm,” or for selecting the number of fingers involved, would be available and would be selected automatically when required by object affordances.

In other experiments, Tanji and colleagues observed the sensitivity of SMA motor neurons to the structure of the motor sequence and not to the individual actions performed per se Ninokura et al. (2004); Shima et al. (2007). For instance, some of these higher-order neurons were sensitive to the motor pattern (AABB) or (ABAB) so that they could fire to any combinations of action primitives that follow this structure like Push-Push-Turn-Turn or Turn-Turn-Push-Push for the former and Push-Turn-Push-Turn or Turn-Tire-Turn-Tire for the later, as illustrated in Fig. 6. Tanji proposed that these SMA neurons encode the structure in the motor sequence –, that is, the syntax of the task,– but not its details (eg, the action unit) (H1).

In similar experiments performed by Inoue and Mikami, some PFC neurons were found to modulate their amplitude level with respect to the position of items during the sequential presentation of two visual shape cues Inoue and Mikami (2018)

. The PFC neurons displayed graded activity with respect to their ordinal position within the sequence and to the visual shapes; e.g. first-ranked items, or second-ranked items (H1). In more complex tasks, PFC neurons were found to fire at particular moments within the sequence 

Tanji and Hoshi (2001); e.g. the beginning, the middle, the end, or even throughout the evolution of the sequence.

In line with this, Koechlin conducted experiments to isolate the functionalities of the Broca area and its implication to the hierarchical organization of human’s behavior Koechlin and Jubault (2006). He observed the formation of superordinate chunks based on the temporal structure of simpler actions. Accordingly, the Broca area processes hierarchical relations rather than cross-temporal contingencies between elements comprising action plans. Koechlin proposes that the Broca area is sensitive to the structural complexity of those action plans but insensitive to the variabilities of simple motor responses composing them (H1). In a theoretical schematic model that he developed in Koechlin and Jubault (2006); Koechlin and Summerfield (2007), he describes how the Broca aera provides hierarchical control on lower regions in order to generate sequence of single acts.

Koechlin uses also information theory and Bayesian theory to explain executive and hierarchical control provided by the prefrontal area on more posterior brain areas Fuster (2001); Koechlin and Summerfield (2007); Rouault and Koechlin (2018), which is in support of our ideas. The difference with ours relies on our explanations and on the underlying mechanisms we propose using rank-codes: how information is coded and its potentially strong impact on how the brain is functionally organized to process, keep and retrieve information, as well as its computational cost.

Although the block view of a hierarchical control based on Bayesian theory is interesting, we suggest that the details about information encoding of super-ordinate chunks and of sequences generation are of particular importance to understand the functional organization of the complete neural architecture he proposes. For instance, we propose that the type of coding done with rank-order neurons is not only important to learn compact hierarchical representations (ROC1 and 2), but it is also pivotal during the decoding task in order to make effective the search of sequences (DECOD1 and 2).

Our framework may provide a comprehensive understanding of the computational features underlying these representative mirror neurons, in line with Simulation Theory Jeannerod (1994, 2001), or of these  super-ordinal chunks, in line with Hierarchical Control based on Information Theory Koechlin and Summerfield (2007); Koechlin (2016). In our view, we may see MN as “structure” detector neurons insensitive to the raw action signals (H1). They may act then as the neuronal pointers we previously described and may be used then to regenerate (i.e. simulate) any original sequences with respect to the varying context (oneself-grasp or someone else, use of the left hand/right hand or tool, type of grasps, etc…) (H2). Our theory resembles also the dissociation processes for learning ’surface’ structure and ’abstract’ structure in sequences proposed by Dominey and colleagues Dominey et al. (1998).

Furthermore, the temporal coding carried out in Inferno Gate extends the STDP mechanism with extra information, the position of an item within a sequence, making it more nonlinear and abstract in the sense that the neurons’ receptive field encodes now structural information about the sequence and not the sequence itself.

In our computer simulation in Pitti et al. (2020), rank-order neurons are weighted so that they are more salient to particular positions in a sequence (the weights in eq. 1), either at the beginning, middle or end of it. They can be sensitive to other positions in the sequence but with less strength.

As they are sensitive to items at the beginning of a sequence or at the end, this behavior reflects well the neural behaviors found in the SMA in Tanji experiments Tanji and Hoshi (2001) and in the posterior PFC in Fujii and Graybiel (2018), firing in line with retrospective or anticipatory events. Retrospective neurons are firing depending on the previous events. Conversely, prospective neuronal firing depends on future events.

The striking similarity between the computational processing done in the pre-SMA for motor sequences and in the Broca area for speech sequences described in section 3.2 may support the view that language emerges with action planning and with mirror neurons, as suggested in Rizzolatti and Arbib (1998); Rizzolatti and Craighero (2004); Arbib (2005); Fadiga et al. (2009). Brain theorists Fadiga, Arbib, Corballis and Gallese propose that motor articulation in speech is deeply rooted in the syntactic and structural organization of actions Fadiga et al. (2009). They suggest further that the neural pathways for syntactic representation in speech and action are located in the same places, respectively in the Broca area in humans and in the mirror neurons system in monkeys, the two presenting potentially similar functional organization and similar mechanisms for the acquisition of conceptual knowledge and hierarchical representation Gallese and Lakoff (2005).

Our idea, that STDP-based rank-order neurons have the computational capabilities to manipulate structured and hierarchical information and have some advantages to do it in terms of efficiency and compactness, may provide some additional arguments in favor of these embodied theories of the brain and also to the more cognitive ones based on information theory.

3.4 Small-world network organization of cortical layers

Similar to communication in electric wires, long-distance interaction and exchange across neural complexes between the PFC and the other parts of the brain must be influenced by intrinsic noise and information dissipation Laughlin and Sejnowski (2003). We suggest that the broadcasting of information by the PFC and the recruiting of distributed neurons in other parts of the brain must rely on an efficient and redundant channel decoding for interconnection. We see the retrieving task of one neuronal cluster distributed across different brain areas as a top-down optimization process to denoise iteratively missing neurons part of that cluster; that is, we suggest that the mechanism described as global synchronization in Varela et al. (2001); Engel and Singer (2001); Engel et al. (2001); Singer (2003) corresponds instead to a top-down optimization process for decoding in large-scale assemblies, see Fig. 4.

Accordingly, not all codes are equals in terms of complexity in communication theory and some are easier to retrieve than others. It follows that depending on whether the channel is well-ordered or not (e.g., noisy) the retrieving of one code (i.e., a cluster) during the decoding process is subject to errors. Conversely, neuronal clusters are not all equals from each others and some are more difficult to retrieve and to learn with respect to their complexity, rareness or code length.

In retrieval tasks, because source and destination are potentially the same in the brain –, which differs from human-made telecommunication networks,– the well-ordering of the source/destination network has an influence also on the efficacy of the channel code to retrieve neurons in them. Thus, the topology of the source (the encoder), in terms of randomness or complexity, has also an impact in terms on efficacy on the channel itself (the decoder).

Using again the metaphor of the mailman, the efficacy of the mailman to distribute letters depends not only on his own competence to deliver letters but also on how well the city is organized; e.g., well-ordered or randomly organized. There are therefore some advantages to redesign cities at multiple hierarchical levels into districts, streets and building numbers for efficacy purpose (H3).

In the brain, the small-world organization of the different regions of the neocortex, the connectome, may serve this purpose for information processing within it Sporns et al. (2000); Sporns and Honey (2006); Bassett and Bullmore (2006); Park and Friston (2013) (H3). We suggest therefore two ideas (H3). First, the well-ordering of the neocortex into hierarchies and small-world dynamics serve for fast and robust information retrieval of codes, which is not something clearly acknowledged in the computational neuroscience literature and no mechanism for cluster retrieval in complex systems is proposed in that sense. For instance, in most researches, only the topological aspect of complex networks is emphasized such as the balanced level between segregation and integration Varela et al. (2001); Sporns et al. (2004); Tognoli and J.A.S. (2014), but the way how information is then retrieved in those networks is not examined. Second, we propose that the mechanism for structure learning and decoding in the PFC may serve, as a second feature, to shape the well-ordering of the other parts of the brain for efficacy. Detecting one structure in sequences gives the opportunity to generate sequences that possess also this particular structure, in an unsupervised manner. Hence, we propose that the mechanisms for extracting structure in the world, performed by the PFC, serve for structuring memory networks in the brain itself, in line with the ideas of self-structuring information Lungarella and Sporns (2005); Byrge et al. (2014), neural scaffolding Changeux and Dehaene (1989) and embrainment McClelland et al. (2010) (H3).

Although most neural models support the idea of bottom-up and local self-organization of random networks into complex small-world networks through synaptic connections and neuromodulation, we suggest that global synchrony via top-down and reentrant signals might play an important role as well for re-organizing information processing within networks for efficiency purpose in retrieval tasks, in line with top-down synchrony Engel and Singer (2001); Engel et al. (2001); Singer (2003), reentry Tononi et al. (1992); Tononi (1992) and the global working space Dehaene et al. (1998); Dehaene and Naccache (2001); Dehaene and Changeux (2011) (H3).

In comparison to a random network, one organized network into small-world dynamics may favorish the faster retrieval of neurons and clusters. Thus, we propose that the address decoding mechanism used as a first purpose for sequence retrieving in PFC (H2) may serve as a second purpose to control hierarchically and to re-organize the initial neural network for efficient decoding (H3).

More precisely, the top-down generation of structured sequences in PFC can shape globally the functional organization of the neo-cortex in an unsupervised way, by reinforcing dynamically the synaptic links through hebbian learning and STDP, enhancing specialization and segregation. Similar with neural Darwinism Edelman (1987); Changeux and Dehaene (1989), on the one hand, the retrieved neurons part of the same cluster specialize themselves and are kept unchanged. On the other hand, unselected neurons that cannot be reached –, because of the complexity of the code to retrieve during the decoding process,– are just lost and forgotten. We suggest that only the most easily retrieved codes in PFC and the most efficient topologies in the neocortex remain and scaffold, thanks to the ability to detect and generate structured information in PFC (H3).

To resume the different ideas developed in this section, we suggest that the ability to detect structure, to separate information and address, and to organize information retrieval in PFC have endowed the brain of digital-like mechanisms for information retrieval and preservation, which have for effect the scaffolding of brain networks themselves. Thus, the digital-like reorganization of the brain might be decisive for information survival and energy efficiency against noisiness and for the purpose of autonomy as well.

Figure 5: Organization of a random network structured with rank-order codes. We perform two experiments, 1D and 2D, consisting on generating sequences of ten units each selected randomly within the interval range

. In a), we colorize the rank code of two thousand units taken from the 2D sequences. We can observe a self-organization with dense overlapping between the clusters although spatially localized. In b), we plot the number of membership to clusters of each unit. The pinker and larger the unit, the more clusters it belongs. Those units may be similar to the hubs in small-world networks. In c), we present the clusters of the units from 1D sequences. Each unit is classified following its rank-order within the sequence. With this random generation of ten unit sequences, there are maximum ten clusters. The clusters produced by the rank-order codes self-organize to spread within the units’ interval range between

. In d) and e), we display the histogram of the neuron membership to their clusters and the corresponding neuron-cluster table. Within the interval range , the clusters possess many neurons and overlap largely from each others. Besides, neurons can be part of several clusters. In f), a second histogram indicates the average number of clusters to which neurons belong to, from 1 to 4 clusters maximum. These properties differ slightly with the organization of small-world networks in which few neurons are connected with many clusters. The top-down organization of clusters by rank-order codes generates more overlapping and more intermediate neurons. The rank-order codes serve to disentangle the clusters, to retrieve back individual units and their order in sequences.

3.5 Functional organization of random networks structured by rank-order coding

Does the brain employ a rank-order code for neural computation (i.e., decoding location of neurons in an ordered manner)? If so, the topology of the brain networks must be organized in accordance Kaiser (2007). We propose to study in an experiment how a rank-order code can influence the topology of one random network.

Our experiment consists on the random generation of sequences of several items length each, following no particular pattern (uniform distribution). In order to better apprehend the results, we present some experiments in 1D and in 2D. We display at first some results on a 2D example on the random generation of two thousand sequences of ten items length each, taken from a uniform distribution within the interval

, see Fig. 5 a-b). In second, we perform some analysis on a 1D example on the random generation of one hundred sequences of ten items length each, following no particular pattern (uniform distribution between ), see Fig. 5 c-f). Items can represent neurons and sequences of items can represent clusters of neurons.

The second stage consisted to extract the rank-order code of each unit in the sequences, and to categorize them with respect to their relative rank-order, see Fig. 5

a) for the 2D example and c) for the 1D example. Each unit is associated to a rank in a sequence and this rank is displayed with different colors. This simple operation permits to observe that the randomly generated clusters follow each of them a normal distribution and that they largely overlap from each other. We can observe in Fig. 

5 a) and c) a strong overlapping among the clusters but still a spatial separation between them, due to the rank-order code.

Furthermore, each unit can be associated to different ranks in several sequences and therefore be part of multiple clusters. We display in Fig. 5 b) the spatial location of units whose color and dimension are associated with the number of cluster they are part of; the higher the number of clusters they are belonging to, the bigger the circles and the pinker the color. This graph displays the dense organization of units belonging to few clusters only in cyan and the fewer units part of several clusters in pink, mostly in the middle. Those later units may be similar to the neuronal ’hubs’ in small-world networks.

In order to better understand this situation, we plot in Fig. 5 d) the histogram of the number of membership of neurons to clusters with respect to their rank (1D example). The graph shows that neurons are part of many clusters at the same time and that clusters are also intertwined from each other especially in the center, where we can observe the overlapping of four clusters out of ten. This effect is different from small-world networks, in which we would have less inter-connection.

We display in Fig. 5 e), the graph of the neurons membership relative to their clusters. The color intensity indicates the degree of membership of each neuron to clusters. Similar to what we observe in the previous analysis, the neurons are part of several clusters. A histogram of the number of membership to clusters at the population level is displayed in Fig. 5 f). This graph reveals a small-world like topology of the created network completely random, with a majority of neurons member of two or three clusters at the same time, with a maximum to four clusters. Those highly connected neurons might be assimilated to the ’rich-club’ neurons described in van den Heuvel and Sporns (2011). We observe that random network based on rank-order coding can generate those spontaneously.

To resume, rank-order codes can shape networks in a self-organized manner that have a topology similar to small-world networks, although different, with the emergence of inter-neurons part of many clusters –, in a way similar with the neural hubs or the ’rich-club’ neurons,– without imposing any locality rules on the synaptic length. These topologies permit the well-retrieving of the neurons’ location (their address), if the decoding mechanism follows a rank-order code. If our framework is valid, the mechanism proposed may furnish some novel hypothesis on the ontogeny and the development of the Connectome in the brain Kaiser (2017); Ardesch et al. (2019).

Figure 6: Structure learning in core domains in infancy and linked to neurons in PFC sensitive to the rank order in sequence. Biological evidences and developmental data show that babies as early as 3 months-old of age are ready to grasp structure in data and to separate the hidden temporal pattern from the raw events. In sound processing in I, young infants were found to be sensitive to the structure of words like the examples above, independently to the syllables pronounced showing some activity in the left pre-frontal area4. In visual scene understanding in II, Baillargeon showed how infants were sensitive to the sequences coherency or to their violation. She argued that infants possess a Representational Engine to extract higher-level rules. In action planning in III, Tanji discovered in the pre-supplementary motor area of monkeys, a structure related to the Broca area in humans, the activation of neurons sensitive to specific sequential pattern but not to the motor item constituting them.

4 Links to Cognitive Sciences & Developmental studies

Several developmental observations suggest that babies at birth are capable to detect structured information and to manipulate symbolic representations as proposed by Gopnik et al. (2004); Meltzoff (2007). For instance, early in development, infants are keen on grasping structure in core domains Spelke (2003); Spelke and Kinzler (2007), inferring causal models and making hypotheses on problems about space, time, numerosity, language and psychology Gopnik et al. (2000); Tenenbaum et al. (2011); Baillargeon and Carey (2012).

We propose that if infants are particularly good at extracting structural information from raw data, it is due to the mechanisms found in the PFC and in the Broca area responsible for gathering information in a coherent manner, for organizing knowledge efficiently and for manipulating it at an abstract-level.

4.1 Structure understanding in visual scenes

In scene understanding tasks, Baillargeon discovered that young infants reason at an abstract level about objects’ categories (inanimated, animated), physics (occlusion, shadows, rigid 3D objects don’t distort or disappear) and properties (soft, round) to simulate what will happen next. She proposes that a link exists between language and event representations Baillargeon (1994); Baillargeon and Carey (2012) and that infants possess a ’Physical Reasoning System’ to endow them with a grasp of intuitive physics about objects.

Similar to grammar in language, this Physical Reasoning System may allow infants to reason about causal and physical events and to detect several physical violations occuring in the visual scene.

For instance, infants were surprised when an object placed behind a screen disappeared and that a different object reappeared after. Young infants possess therefore some expectation about physical knowledge, such as object permanence and object occlusion.

As illustrated in Figure 6-II for object permanence, we present a coherent scenario with either one or two objects occluded by a screen and reappearing after form a coherent sequence with the same structure and possible endings (object 1hideobject 1 and object 2hideobject 2). Another scenario with two impossible endings (object 1hideobject 2 and object 2hideobject 1) will conduct to a violation of the learned rule.

In order to solve this task, some developmental scientists have suggested that infants have to reason at a super-ordinal level of representation (e.g., the structural or symbolic level) and not at the raw pixel level as classical Machine Learning algorithms would usually do Gopnik (2017)

. The Bayesian theory and probabilistic inference are often taken as good candidates to explain infant behaviors for these tasks but we think they are not enough to explain how fast learning is done and how generalization from few examples is carried on. For instance, Bayesian networks are difficult to use when it comes to manipulate density probabilities of a large amount of data and in large directed graphes. Furthermore, the framework of Bayesian theory cannot explain how the compositionality of a rule can be done so readily and how the acquisition of novel symbols can be managed fom raw data. Bayesian nets usually avoid how symbol grouding is made, how the probability density terms are estimated and how the correct graph is selected for belief propagation.

In comparison, we suggest that extracting the super-ordinal representations in sequences is instead a very rapid mechanism, which can be used readily to infer the structure of an unseen sequence or to generate new samples based on it. This differs from the classic Bayesian approach in the sense that one system does not need many trials to learn probability distributions and to generate new sequences.

Extracting relational information with rank-order codes in raw sequences permits to represent temporal and hierarchical structures and to have expectations on future events in visual scenes. Furthermore, since rank-order codes can produce context-free grammars as proposed in section 2.1, we can potentially define rules and construct logical systems with them.

Therefore, in line with the Physical Reasoning System proposed by Baillargeon, we propose that infants have access to a “visual language” thanks to the processing performed in the PFC for extracting structures and rules in visual sequences. Such system is in anycase the implementation of a physical simulator at the pixel level as proposed in Lake et al. (2017), but more the collection of structural patterns that permit to falsify logically relational information across incoming data (rule-based reasoning and fact checking), to anticipate future events (causal inference) or to generate one planning (compositionality and imagination).

4.2 Spatio-temporal structure in cognitive task and learning-to-learn

We can note two experiments employed to emphasize the importance of the frontal areas for problem-solving in visual scenes, one from Harlow on task sets Harlow (1942, 1949) and the other from Piaget on the A-not-B error test Diamond (1985); Diamond and Goldman-Rakic (1989). The two experiments play on repetition and novelty, on masking and on the use of temporal delayed information to predict either the spatial location in the case of the A-not-B error experiment or of the temporal strategy to employ in the case of Harlow’s Task Sets. Importantly, none of the two tests can be explained by simple conditioning and associative reinforcement learning. And both experiments are considered as steps-tones of infant cognitive development.

Recent simulations of the frontal areas could successfully model these experiments using neural fields McClelland et al. (2010), LSTM Wang et al. (2018) or spiking neurons Pitti et al. (2013). These models were based on the active maintenance of information or the inhibition of spurious one.

Since the frontal areas extract abstract structural information in temporal sequences, we propose a different mechanism based on our current neural architecture and framework to explain these results. First, one observation that we can make is that the experiment of the A-not-B test is relatively similar to the experiments performed by Baillargeon except that one object remains hidden in one location A or B and is uncovered after a delay. The temporal structure XYX is still respected, except that here the child has to predict the spatial location of the hidden object, A or B, which corresponds to the last term in the sequence XY[X], X replaced by the A or B location.

Second, in Harlow experiment, one successful strategy for the monkey is to open door A or door B and to learn that the same reward will be given if he chose for the next five trials the same location irrespective to the input stimulus provided. One potential explanation to understand this experiment based on our framework is that the monkey has learned two strategies XXXXXX or XYYYYY, and depending if he receives one reward or not on the first trial irrespective to the door location, X=A or X=B, the first or second strategy will be selected on the second trial.

4.3 Extracting structure in proto-words

In language processing, seminal works by Saffran Saffran et al. (1996); Saffran and Wilson (2003), Marcus Marcus et al. (1999, 2007), Nazzi Nazzi et al. (1998); Hoareau et al. (2019) and Gervain Gervain et al. (2008); Benavides-Varela and Gervain (2017) have showed that babies after 8 months and even neonates are capable to learn artificial grammars and to extract structure in proto-words, like the AAB pattern in the words ’totobu’, ’gagari’, ’mimitu’, although they were not familiar with the specific temporal order of the sound sequence. They showed also that they were sensitive to structure violation if other patterns were presented such as the word ’pesipe’ with the ABA pattern Hoareau et al. (2019); Basirat et al. (2014); Dehaene et al. (2015); see Figure 6-I.

Despite the immaturity of the baby’s brain, its performances in this task indicate that a neural mechanism is at work in order to grasp statistical regularities and structure within speech sequences to conform on a set of grammatical rules that are learned. Interestingly, the PFC has been found active during these tasks as well as the Broca area.

The mechanism we propose to extract the ordinal information in sequences irrespective to the inputs identity may explain the processes behind these results found in babies and neonates and how the PFC and Broca area is potentially performing it. This is in line with an earlier model of the frontal areas that extract abstract structural information in temporal sequences by Dominey Dominey and Ramus (2000); Dominey et al. (2003).

Similarly, our computational experiments done with the neural architecture Inferno Gate permitted to extract temporal structure in sequences of 50 items over a large sound repertoire of 14.000 items, to detect temporal structure violation and to generate easily novel sequences following one temporal order Pitti et al. (2020). Such capabilities were not possible to perform by the current state-of-art recurrent network LSTM.

Recent developmental and ethological comparisons defend the idea that the emergence of language comes from the functional maturation of the human infant brain and does not have a physiological cause Boë et al. (2019) as supported by the current dominant theory Lieberman (1968).It has been suggested that conjunctive cells in frontal areas play an important role for goal-based behaviors Genovesio (2009); Genovesio et al. (2014). We suggest further that rank-order coding may allow the structural learning of tree representations in temporal sequences and that they are necessary for grammar and language.

4.4 Supra-modal structure learning and agency

The sensitivity of infants to spatio-temporal structure in the light of the functioning of the Broca area neurons may permit to interpret differently certain experiments and data found in the cognitive development literature. For instance, since the Broca area encodes super-ordinal information independent to raw input, it might make sense that the detection of structural patterns independent to its source and its original modality can be possible.

We propose that supra-modal or amodal ’contingency’ detection–, which has been proposed to explain imitation Meltzoff (1997); Marshal and Meltzoff (2014), perception and self-consciousness O’Regan et al. (2001) for instance,– might be possible thanks to this brain structure. The mechanism in the Broca area and of rank-order coding may disambiguate our understanding how the brain can extract spatio-temporal structures from raw information and how it can manipulate structured information without representations per se.

The Broca area may create this lingua franca that makes to interpret and communicate the different modalities from each others Guellaï et al. (2019). For instance, a temporal structure extracted from vision might equal a similar temporal structure extracted from proprioception or from sound, passing through the Broca area. Accordingly, amodal information on observational learning may serve the learning of physical causal events and also the observational learning in the social domain.

In our view, the correspondence between modalities can be possible through the detection of similar spatio-temporal structures: e.g., the ascending-descending pattern can be equivalent either to a sound pitch augmenting in crescendo and descending, or to a light luminosity augmenting and descending in intensity or to the hand wrist turning a button back and forth, left-right-left in accordance. The property of the PFC to extract structure from raw information may be useful to calibrate the body signals from each others at an abstract-level, in the sense given by O’Regan and Noe O’Regan et al. (2001), to associate the structure of the changes between raw actions and sensory modalities, if they follow similar temporal patterns.

Terekhov and O’Regan explain how the amodal notions of space and geometry can be abstractly extracted from relative information about sensorimotor signals and by constructing an amodal function of it Terekhov and Kevin O’Regan (2016). Accordingly, other notions taken in core domains and difficult to define can be expressed in the same way: for instance, the notions of numerosity, acceleration, mass, gravity or of softness. Those qualitative properties of perceptual experiences correspond to the theory of the Qualia, which is also linked to the theory of integrated information Balduzzi and Tononi (2009).

Our framework based on the mechanisms of rank-order coding may provide a neural basis to this theory of the sensorimotor approach to perception O’Regan et al. (2001), which emphasizes the central role of information transformations in perception, as well as in higher cognitive notions as ‘self’ and ‘consciousness’ O’Regan (2011).

For instance, most models of sensorimotor integration and agency are based on contingency detection and are seen as low-level processes, via a comparator model Watson (1966, 1994); Hiraki (2006). However, sensorimotor integration and agency can be seen also as a top-down and amodal process via the detection of the same superordinal structure between two or more modalities and the generation of the corresponding sequence in another modality with same pattern. Hence the structure detection in frontal areas may govern the agentive hierarchical control of posterior areas and the supervision also of imitative behaviors. We propose that sensitivity to the relation between events in the visual domain may permit to generate a similar spatio-temporal sequence in the motor domain in a plastic way based on the mechanism found in the Broca area.

Sensorimotor contingency detection appears very early in infancy Nadel et al. (2005); Hiraki (2006). An outstanding question would be if such a supramodal processing that we propose is at work at birth for all modalities, giving rise to this notion of agency and of a bodily Self. Experiments on the Mu rhythm activation and cancellation associated to the body babbling and originating in the somatosensory area and in the mirror neurons system Marshal and Meltzoff (2014), may give some arguments in favor of the idea that an agentivity process is at work very early Rochat (2001, 2003); Nadel et al. (2005); Rochat (2019). A theory in line also with the supra-modal representation system proposed by Meltzoff in Meltzoff (1997).

5 Computational and IT considerations

There are some computational advantages for the PFC to have re-organized the brain’s networks from an information theoretical viewpoint (H3). We present in section 5.1 some considerations for it. We develop in section 5.2 our hypothesis on conscious access and information broadcast, on difference with current machine learning in section 5.3 and its link with embodied cognition in section 5.4.

5.1 Networking the brain’s network

Since the task of retrieving correctly one particular information is incommensurate considering the number of neural units ( cells) and the number of synaptic connections ( dendrites), we advance that the PFC must have literally learned to organize the roadmap of the whole brain’s network; i.e., its connectome. Furthermore, in order to be computationally efficient for rapid performance and robustness, this neuronal mapping has to be organized in a structured and distributed way and has to possess some noise-correction mechanism, of course different from those found in human-made systems, to retrieve back neurons. Such problem of noise-cancellation is at the heart of the revolution of modern communication theory that uses digital codes and error-decoding mechanisms (such as turbo-codes) for fast and reliable telecommunication.

Our proposed digital coding mechanism should serve the detection and encoding of complex motives and patterns with redundant codes to represent information. This would have for advantages to be more robust to structural noise in incoming signals and to detect similarities in clusters following the same structure.

Besides, its alter-ego mechanism, the digital decoding mechanism, has to be particularly important to support information processing tasks such as retrieving specific items or for reassembling them into new ensembles. Therefore, one such digital decoding mechanism should serve the mapping of the brain’s network itself, its ’infra-structure’, to broadcast and recollect faster information encoded into large assemblies of neurons, as shown in sections 3.4 and 3.5. Hence, our suggestion of separating the what and where information –, which means the communication channel from source and destination elements,– would have for advantage to be more robust to structural noise since the learning of items is done somewhere and the retrieving code done elsewhere.

5.2 Information broadcast and conscious access

Because retrieval is viewed as an optimization decoding process, our framework may explain why the bandwidth limitation to memory access is all-or-none and why conscious access is constrained, time-limited and sequential. From this aspect, it is in line with the predictive processing and free-energy account for consciousness, in which consciousness is simply the process of optimizing beliefs through inference Hobson and Friston (2016); Clark et al. (2019); Kanai et al. (2019); Whyte and Smith (2020). Our idea is also in accordance with current main theories of the brain that relate conscious processing, respectively, to global ignition, long-distance broadcasting, and information integration Dehaene et al. (1998); Baars (2005); Tononi (2008); Balduzzi and Tononi (2009); Kanai et al. (2019); Toker and Sommer (2019)

. Although Information Theory and Bayesian Inference have been already proposed to describe at the macroscopic level the required properties of the consciousness machinery, no clear and practical mechanisms have been proposed 

Whyte and Smith (2020). We emphasize here on some novel ideas about the underlying biological mechanisms and neurocognitive observations that may provide additional constraints on the types of coding and communication mechanisms necessary to fulfill conscious processing, as well as potential consequences and hypotheses we can make about the brain code.

For instance, the digital encoding/decoding strategy that we propose should explain then also how distal neurons can be dynamically synchronized into one coherent global assembly. Retrieving the missing neurons that are part of one coherent memory cluster means for us regenerating precisely this complete cluster with an error-correcting decoding mechanism, and with the use of the available distributed codes learned. The efficacy of the working memory can be evaluated in terms of durability and access of information kept: (1) its robustness against catastrophic forgetting and (2) its rapidity to retrieve back any information, even corrupted ones.

5.3 Situation of current AI

Our theory may also give some hints at the behavioral level on why we grasp so easily structural information and why current AI systems don’t. Meanwhile, it may also provide some design principle toward more human-like cognitive architectures.

For instance, current machine learning techniques (e.g. deep networks) rely extensively on big data and large networks to approximate statistical correlations on a relative small number of classes (few hundreds), which is the reverse how humans learn Gopnik (2017); Marcus (2018); Cortese et al. (2019)

. The situation in current AI reminds the earlier ages before digital communication when we have to blast the signal at a higher signal to noise ratio in order to attain reliable communication: if you want to be heard, yell louder. In deep learning terms, if you want more computational power, pack more data in your network and add more layers.

Explicit consideration of the sequential nature of the data poses several problems for deep networks techniques. The memory footprint of a neural network dedicated to a sequence of items (images, speech, text) can quickly become disproportionate. Added to this the difficulty of generalizing, of extrapolating, of catastrophic interference and of subsymbolic treatment of information Smolensky (1988)

, the question of the architecture of the neural network for problems involving item sequences and structures is therefore far from being resolved. Even the architectures of recurrent neural networks, such as the Long-Short-Term-Memory networks (LSTM) 

Hochreiter and Schmidhuber (1997); Gers et al. (2000), which were historically dedicated to time series, are gradually replaced, in speech and text processing, by non-recurring networks using layers of ”attention”.

Novel neural architectures have been proposed recently to incorporate discreteness. Digital neural networks have been proposed by Berrou and colleagues Berrou and Gripon (2012); Berrou et al. (2014) focusing on the problem of memory capacity and organization while Graves and colleagues Graves et al. (2014); Graves (2016) investigated on their computer-like features. The two different networks show interesting features although each lacks the advantages of the other. For instance, Berrou’s algorithm takes its inspiration from Information Theory and telecom networks with LDPC protocol and Graves’ neural network takes its inspiration from computer science and conventional computer architectures.

On the one hand, Berrou’s neural network borrows the technique from telecom networks, showing high memory capacity and sparseness but its use in real case problems and its computational efficacy in real time are not investigated. On the other hand, Graves’ Neural Turing Machine and Differentiable Neural Computer show computational efficacy with the use of pointers and of the random access memory of conventional computers. These features permit them to buffer and manipulate variables and structures. Although they gain the computational capability to manipulate symbols and structures, the problem of how the organization of memory

and information processing are combined as it is in the brain and not separated as it is in computers, still remain. This point relates also to the symbol grounding problem Harnad (1990) as cognition cannot be just the manipulation of symbols.

Although not categorized as digital, Bayesian networks Pearl (2009); Pearl and Mackenzie (2018) and complementary memory networks McClelland et al. (1995); Mcclelland et al. (2020) have been proposed to learn symbols, rules and hierarchical structure in data. These systems show adaptiveness and fast inference with rare events only as infants would do Tenenbaum et al. (2006, 2011); Lake et al. (2017). This competence of learning the rules rapidly from raw data helps then to compose new ones off-the-shelf (compositionality) or to infer next events from past and current observations. These frameworks based on Bayesian probabilistic inference, symbolic processing and causality can explain some results found in childhood Gopnik et al. (2004); Tenenbaum et al. (2006). However, the problem of symbol grounding and of the organization of knowledge is still unsolved as it is for the systems presented above.

In comparison, our framework combines some of the features of these different approaches for memory capacity and organization, for computational capabilities and for prediction power. Our hypothesis furnishes as well a comprehensive explanation how memory might be acquired, organized and structured in the brain. For instance, the STDP rule and rank-order coding can be used for symbolic and sub-symbolic processing and for the organization of memory and knowledge. We believe that the mechanisms of structure encoding and of error-correction optimization, based on the ordinal structure discovery and generation, can be relevant for inference, computation, memory access and fast retrieval. It may explain how infants capture so fast the structure of a task and extrapolate from it to any similar but still novel situations.

5.4 Digital computing and the embodied brain

Traditionally, analog computing is associated with the embodied theories of cognition and digital computing with the cognitive theories of the brain. What does digital communication have to do with Embodiment? Any incoming information from the body is structured by default and fingerprinted due to the spatial agency and topology of the sensors and muscles, their precision and delays, which is not the case in offline and asynchronous databases. Information processing and topology in brain networks might be then decisively affected by geometrical and physical morphology of the body; an idea called morphological computation Pfeifer and Bongard (1999); Pfeifer and Gomez (2009); Pfeifer and Pitti (2012). In our views, the super-ordinal treatment of body signals in the frontal areas of the brain may create a language of the body, with its grammar and rules, and might form a complete logical body theory in the mathematical sense, as it is unfalsifiable. For instance, our senses can be deceived but we can conceive that our observations are wrong and obviously break the law of physics, which is not possible. Physical falsifiability might serve for creating any context-free grammar in the brain for agency and self-assessment or any human-made language (e.g., geometry, alphabet, solfege, actions, computer codes).

The paradoxical conclusion we came with is that to keep trace of information, brain networks may require embodiment to structure itself, to keep trace of information and to scaffold; the brain may become digital by learning the language of the body.

6 Conclusion

In order to keep trace of information, the brain has to deal with the problems of canceling intrinsic noise and of harnessing its own complexity. It has to resolve the problem to locate where information is and how to index new ones. The neural mechanism in charge of this task has to be capable to manipulate neural addresses and to map the brain’s own neural circuitry; its connectome. Such tool is important for information processing and preservation, but also for memory formation and retrieval.

As a shift in evolution, we propose that the neural mechanism used by the PFC to detect structure in temporal sequences, based on the temporal order of incoming information, serves as second purpose to the spatial ordering and indexing of brain networks. This top-down self-organization is done for efficacy purpose.

The sensitivity of PFC units to the rank-order of incoming events at the brain level, can permit to produce a compact code representing the spatial location of neurons distributed over different places, but part of the same temporal cluster. At the unit level, rank-order codes lose information of the exact location of the neurons (their address). At the population level, however, rank-order codes can retrieve back this information using redundancy and orthogonality of the different rank-codes. Since they manipulate relative addresses, rank-order codes may act as neuronal pointers and the type of information processing they are doing may be seen as digital, similar to Fourier transform. In this line of thought, there should be a sufficient number of rank-codes in order to reconstruct perfectly the original memory sequence, which would satisfy the Nyquist-Shannon sampling theorem.

Our ideas are in line with recent proposals that the PFC is the brain router that may manipulate neuronal variables and pointers to construct a neuronal global workspace for conscious access Zylberberg et al. (2010); Dehaene and Changeux (2011) or others that the brain manipulates integrated and differentiated information codes Balduzzi and Tononi (2009).

Our proposal, that the brain may manipulate and compute a kind of digital type of information, may remind the pioneer and provocative works of the founders of computers and computation, John Von Neumann and Alan Turing, but also Claude Shannon. On the one hand, John von Neumann von Neumann (1958) created the standard model of computer architectures based on the separation between the operative and the operand, with memory-stored control and memory-stored programs. He also suggested the idea that the brain might be necessarily a digital parallel addressable memory machine in order to avoid noise, to keep and to compute information. On the other hand, Alan Turing was perfectly aware about the cost of computation that has to endure the human memory system Turing (1950), ’necessary limited’ Turing (1936) (p.231), to process, retrieve and keep trace of information. Besides, noise, storage and transmission are the heart of the concerns of the Communication and Information Theory of Claude Shannon Shannon (1948); Shannon and Weaver (1963).

Current AI architectures (deep networks) mostly neglect that computation has a physical mean and an energetic cost that biological systems cannot afford as they don’t have access to virtually unlimited amount of energy and time, and have an urge to act. Furthermore, DNN still lack the capabilities of good-old fashioned AI for symbolic processing to resolve problems involving variables, hierarchical structures, concepts and higher-level reasoning, particularly those considering physical understanding, language comprehension and problem solving Marcus (2018). The combination of the two types of AI is nonetheless needed to move AI systems beyond items classification problems to an understanding of the environment, the manipulation of meaningful information and the acquisition of common sense. We have showed that the biologically-inspired mechanisms of STDP and of rank-order coding have the computational power for this leap. We suggest that this is done by the digitalization of information for energy consumption, computational effectiveness and for preserving information.

After all, if ’DNA is encoded digital information in the “Strong Sense”’ according to Richard Dawkins Dawkins (1995), the brain may also exploit some kind of digital processing.

Annex

Vocabulary Element Example References
Item Audio element sound
Elementary action turn, push
Visual shape, color, texture
Word
Table 2: Glossary of the term ’patterns’ used in the text to express various contexts and examples.
Vocabulary Example Structure References
Pattern Audio sequences temporal order
Protogrammar
Motor sequence syntax of actions Tanji and Hoshi (2001); Shima et al. (2007); Tanji et al. (2007)
Visual scenes Temporal coherence Fadiga et al. (2009)
Spatial context Schematas Barone and J.P. (2018)
Successive shapes Geometrical rules Averbeck et al. (2003a, b)
Visual sequences Wang et al. (2019)

Table 1: Glossary of the term ’items’ used in the text to express various contexts and examples.

Acknowledgements

AP would like to dedicate this manuscript in memory of Chalom Pitti (1943-2019).

References

  • [1] URL https://wikivisually.com/wiki/Stack-sortable_permutation.
  • Abbott and Nelson [2000] L.F. Abbott and S.B. Nelson. Synaptic plasticity taming the beast. Nature neuroscience, 3:1178–1182, 2000.
  • Abrossimoff et al. [2018] J. Abrossimoff, A. Pitti, and P. Gaussier. Visual learning for reaching and body-schema with gain-field networks. Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob), pages 191–197, 2018.
  • Andersen and Buneo [2002] R.A. Andersen and C.A. Buneo. Intentional maps in posterior parietal cortex. Annu. Rev. Neurosci., 25:189–220, 2002.
  • Arbib [2019] M. Arbib. The aboutness of language and the evolution of the construction-ready brain, page (in press). 07 2019.
  • Arbib [1985] Michael A. Arbib. Schemas for the temporal organization of behavior. Human Neurobiology, 4:63 – 72, 1985.
  • Arbib [2005] Michael A. Arbib. From monkey-like action recognitionto human language: An evolutionaryframework for neurolinguistics. Behavioral and Brain Sciences, 28:105 – 167, 2005.
  • Arbib [2008] Michael A. Arbib. From grasp to language: Embodied concepts and the challenge of abstraction. Journal of Physiology-Paris, 102(1):4 – 20, 2008. ISSN 0928-4257. URL http://www.sciencedirect.com/science/article/pii/S0928425708000120. Links and Interactions Between Language and Motor Systems in the Brain.
  • Arbib et al. [2014] Michael A. Arbib, Brad Gasser, and Victor Barrès. Language is handy but is it embodied? Neuropsychologia, 55:57 – 70, 2014. ISSN 0028-3932. URL http://www.sciencedirect.com/science/article/pii/S002839321300393X. Special Issue in Honor of Marc Jeannerod.
  • Ardesch et al. [2019] D.J. Ardesch, L.H. Scholtens, and M.P. van den Heuvel. From the perspective human an evolutionary connectome. Progress in Brain Research, (250):703–717, 2019.
  • Averbeck et al. [2003a] B.B. Averbeck, M.V. Chafee, D.A. Crowe, and Georgopoulos A.P. Neural activity in prefrontal cortex during copying geometrical shapes. i. single cells encode shape, sequence, and metric parameters. Exp Brain Res., 150(2):127–41, 2003a.
  • Averbeck et al. [2003b] B.B. Averbeck, D.A. Crowe, M.V. Chafee, and Georgopoulos A.P. Neural activity in prefrontal cortex during copying geometrical shapes. ii. decoding shape segments from neural ensembles. Exp Brain Res., 150(2):143–153, 2003b.
  • Baars [2005] B.J. Baars. Global workspace theory of consciousness: toward a cognitive neuroscience of human experience. Progress in brain research, 150:45–53, 2005.
  • Baillargeon [1994] Renee Baillargeon. Physical reasoning in young infants: Seeking explanations for impossible events. British Journal of Developmental Psychology, 12(1):9–33, 1994. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j.2044-835X.1994.tb00616.x.
  • Baillargeon and Carey [2012] Renee Baillargeon and Susan Carey. Core cognition and beyond: The acquisition of physical and numerical knowledge. Early childhood development and later outcome, 01 2012.
  • Balduzzi and Tononi [2009] D. Balduzzi and G. Tononi. Qualia: The geometry of integrated information. PLoS Comput Biol, 5(8):e1000462, 2009.
  • Barone and J.P. [2018] P. Barone and Joseph J.P. Prefrontal cortex and spatial sequencing in macaque monkey. Exp Brain Res, 78:447– 464, 2018.
  • Basirat et al. [2014] Anahita Basirat, Stanislas Dehaene, and Ghislaine Dehaene-Lambertz. A hierarchy of cortical responses to sequence violations in three-month-old infants. Cognition, 132(2):137 – 150, 2014. ISSN 0010-0277. URL http://www.sciencedirect.com/science/article/pii/S0010027714000523.
  • Bassett and Bullmore [2006] D.S. Bassett and E. Bullmore. Small-world brain networks. The Neuroscientist, 12(6):512–523, 2006.
  • Benavides-Varela and Gervain [2017] Silvia Benavides-Varela and Judit Gervain. Learning word order at birth: A nirs study. Developmental Cognitive Neuroscience, 25:198 – 208, 2017. ISSN 1878-9293. URL http://www.sciencedirect.com/science/article/pii/S1878929316301062. Sensitive periods across development.
  • Berrou et al. [2014] C. Berrou, O. Dufor, V. Gripon, and X. Jiang. Information, noise, coding, modulation: What about the brain? 8th International Symposium on Turbo Codes and Iterative Information Processing (ISTC), 44(6):167–172, 2014.
  • Berrou and Gripon [2012] C. Berrou and V. Gripon. Petite mathématique du cerveau. Odile Jacob, 2012.
  • Bi and Poo [1998] G.q. Bi and M.m. Poo. Activity-induced synaptic modifications in hippocampal culture, dependence of spike timing, synaptic strength and cell type. J. Neurscience, 18:10464–10472, 1998.
  • Blohm and Crawford [2009] G. Blohm and J.D. Crawford. Fields of gain in the brain. Neuron, 64:598–600, 2009.
  • Boë et al. [2019] Louis-Jean Boë, Thomas R. Sawallis, Joël Fagot, Pierre Badin, Guillaume Barbier, Guillaume Captier, Lucie Ménard, Jean-Louis Heim, and Jean-Luc Schwartz. Which way to the dawn of speech?: Reanalyzing half a century of debates and data in light of speech science. Science Advances, 5(12), 2019. URL https://advances.sciencemag.org/content/5/12/eaaw3916.
  • Botvinick and Watanabe [2007] M. Botvinick and T. Watanabe. From numerosity to ordinal rank a gain-field model of serial order representation in cortical working memory. The Journal of Neuroscience, 27(32):8636–8642, 2007.
  • Byrge et al. [2014] L. Byrge, O. Sporns, and L.B. Smith. Developmental process emerges from extended brain–body–behavior networks. Trends in Cognitive Sciences, 18(8):395–403, 2014.
  • Changeux and Dehaene [1989] J.P. Changeux and S. Dehaene. Neuronal models of cognitive functions. Cognition, 33:63–109, 1989.
  • Clark et al. [2019] Andy Clark, Karl Friston, and Sam Wilkinson. Bayesing qualia: consciousness as inference, not raw datum. Journal of Consciousness Studies, 26(9-10):19–33, 2019.
  • Cortese et al. [2019] Aurelio Cortese, Benedetto De Martino, and Mitsuo Kawato. The neural and cognitive architecture for learning from a small sample. Current Opinion in Neurobiology, 55:133–141, 2019.
  • Dawkins [1995] Richard Dawkins. River Out of Eden. 1995.
  • Dehaene and Changeux [2011] S. Dehaene and J.P. Changeux. Experimental and theoretical approaches to conscious processing. Neuron, 70:200–227, 2011.
  • Dehaene et al. [1998] S. Dehaene, M. Kerszberg, and J.P. Changeux. A neuronal model of a global workspace in effortful cognitive tasks. Proceedings of the national Academy of Sciences, 95(22):14529–14534, 1998.
  • Dehaene et al. [2015] S. Dehaene, F. Meyniel, C. Wacongne, L. Wang, and C. Pallier. The neural representation of sequences from transition probabilities to algebraic patterns and linguistic trees. Neuron, 88:2–19, 2015.
  • Dehaene and Naccache [2001] S. Dehaene and L. Naccache. Towards a cognitive neuroscience of consciousness: Basic evidence and a workspace framework. Cognition, 79:1–37, 2001.
  • Diamond [1985] A. Diamond. Development of the ability to use recall to guide action, as indicated by infants’ performance on a-not-b. Child Development, 74:24–40, 1985.
  • Diamond and Goldman-Rakic [1989] A. Diamond and P.S. Goldman-Rakic. Comparison of human infants and rhesus monkeys on piaget’s a-not-b task evidence for dependence on dorsolateral prefrontal cortex. Experimental Brain Research, 74:24–40, 1989.
  • Dominey et al. [2003] Peter F. Dominey, Michel Hoen, Jean-Marc Blanc, and Taı̈ssia Lelekov-Boissard. Neurological basis of language and sequential cognition: Evidence from simulation, aphasia, and erp studies. Brain and Language, 86(2):207 – 225, 2003. ISSN 0093-934X. URL http://www.sciencedirect.com/science/article/pii/S0093934X02005291. Understanding Language.
  • Dominey et al. [2006] Peter Ford Dominey, Michel Hoen, and Toshio Inui. A neurolinguistic model of grammatical construction processing. Journal of Cognitive Neuroscience, 18(12):2088–2107, 2006.
  • Dominey and Ramus [2000] Peter Ford Dominey and Franck Ramus. Neural network processing of natural language: I. sensitivity to serial, temporal and abstract structure of language in the infant. Language and Cognitive Processes, 15(1):87–127, 2000.
  • Dominey et al. [1998] P.F. Dominey, T. Lelekov, J. Ventre-Dominey, and M. Jeannerod. Dissociable processes for learning the surface and abstract structure sensorimotor sequences. Journal of Cognitive Neuroscience, 10:734–751, 1998.
  • Edelman [1987] Gerald Edelman. Neural Darwinism. The Theory of Neuronal Group Selection. Basic Books, New York, 1987.
  • Eliasmith et al. [2012] C. Eliasmith, T.C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, and D. Rasmussen. A large-scale model of the functioning brain. Science, 338(6111):1202–1205, 2012.
  • Engel et al. [2001] A.K. Engel, P. Fries, and W. Singer. Dynamic predictions oscillations and synchrony in top-down processing. Nature Rev. Neurosci., pages 704–716, 2001.
  • Engel and Singer [2001] A.K. Engel and W. Singer. Temporal binding and the neural correlates of sensory awareness. 5(1), 2001.
  • Fadiga et al. [2009] L. Fadiga, L. Craighero, and A. D’Ausilio. Broca’s area in language, action, and music. The Neurosciences and Music III—Disorders and Plasticity: Ann. N.Y. Acad. Sci., 1169:448–458, 2009.
  • Fogassi and Ferrari [2007] L. Fogassi and P.F. Ferrari. Mirror neurons and the evolution of embodied language. Current Directions In Psychological Science, 16(3):136–141, 2007.
  • Friederici [2011] A.D. Friederici. The brain basis of language processing: From structure to function. Physiological Reviews, 91(4):1357–1392, 2011.
  • Friederici et al. [2006a] A.D. Friederici, J. Bahlmann, S. Heim, R.I. Schubotz, and A. Anwander. The brain differentiates human and non-human grammars: Functional localization and structural connectivity. Proc Natl Acad Sci USA, 103:2458–2463, 2006a.
  • Friederici et al. [2006b] A.D. Friederici, C.J. Fiebach, M. Schlesewsky, I.D. Bornkessel, and D.Y. von Cramon. Processing linguistic complexity and grammaticality in the left frontal cortex. Cereb Cortex, 70:1709–1717, 16 2006b.
  • Friston [2003] K. Friston. Learning and inference in the brain. Neural Networks, 16(9):1325–1352, 2003.
  • Friston et al. [2006] K. Friston, J. Kilner, and L. Harrison. A free energy principle for the brain. Journal of Physiology-Paris, 100(1-3):70–87, 2006.
  • Friston [2009] K.J. Friston. The free-energy principle a rough guide to the brain? Trends in Cognitive Science, 4(7):293–301, 2009.
  • Friston et al. [2016] K.J. Friston, T. FitzGerald, F. Rigoli, P. Schwartenbeck, J. O’Doherty, and G. Pezzulo. Active inference and learning. Neuroscience & Biobehavioral Reviews, 68:862–879, 2016.
  • Friston and Kiebel [2009] K.J. Friston and S. Kiebel. Predictive coding under the free-energy principle. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364:1211–21, 2009.
  • Fujii and Graybiel [2018] N. Fujii and A.M. Graybiel. Representation of action sequence boundaries by macaque prefrontal cortical neurons. Science, 301:1246–1249, 2018.
  • Fusi et al. [2016] S. Fusi, E.K. Miller, and M. Rigotti. Why neurons mix: high dimensionality for higher cognition. Curr. Opin. Neurobiol., 37:66–74, 2016.
  • Fuster [2001] J. Fuster. The prefrontal cortex—an update time is of the essence. Neuron, 30:319–333, 2001.
  • Gallese and Lakoff [2005] Vittorio Gallese and George Lakoff. The brain’s concepts: the role of the sensory-motor system in conceptual knowledge. Cognitive Neuropsychology, 22(3-4):455–479, 2005. URL https://doi.org/10.1080/02643290442000310. PMID: 21038261.
  • Genovesio et al. [2014] A. Genovesio, S.P. Wise, and R.E. Passingham. Prefrontal–parietal function: from foraging to foresight. Trends in Cognitive Sciences, 18(2):72–81, 2014.
  • Genovesio [2009] A. et al. Genovesio. Feature- and order-based timing representations in the frontal cortex. Neuron, 63:254–266, 2009.
  • Gentilucci and Corballis [2006] Maurizio Gentilucci and Michael Corballis. From manual gesture to speech: A gradual transition. Neuroscience and biobehavioral reviews, 30:949–60, 02 2006.
  • Gers et al. [2000] F.A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with lstm. Neural Comput., 12(10):2451–2471, 2000.
  • Gervain et al. [2008] Judit Gervain, Francesco Macagno, Silvia Cogoi, Marcela Peña, and Jacques Mehler. The neonate brain detects speech structure. Proceedings of the National Academy of Sciences, 105(37):14222–14227, 2008. ISSN 0027-8424. URL https://www.pnas.org/content/105/37/14222.
  • Gopnik [2017] A. Gopnik.

    Making ai more human: Artificial intelligence has staged a revival by starting to incorporate what we know about how children learn.

    Scientific American, 36(6):60–65, 2017.
  • Gopnik et al. [2004] A. Gopnik, C. Glymour, D. Sobel, L. Schulz, T. Kushnir, and D. Danks. A theory of causal learning in children causal maps and bayes nets. Psychological Review, 111:1–31, 2004.
  • Gopnik et al. [2000] A. Gopnik, A.N. Meltzoff, and P.K. Kuhl. The scientist in the crib what early learning tells us about the mind. Publisher William Morrow Paperbacks, 2000.
  • Graves et al. [2014] A. Graves, G. Wayne, and I. Danihelka. Neural Turing Machines. arXiv, 1410.541v2:1–26, 2014.
  • Graves [2016] A. et al. Graves. Hybrid Computing Using a Neural Network with Dynamic External Memory. Nature, 538:471–476, 2016.
  • Guellaï et al. [2019] Bahia Guellaï, Annabel Callin, Frédéric Bevilacqua, Diemo Schwarz, Alexandre Pitti, Sofiane Boucenna, and Maya Gratier. Sensus communis: Some perspectives on the origins of non-synchronous cross-sensory associations. Frontiers in Psychology, 10:523, 2019. ISSN 1664-1078. URL https://www.frontiersin.org/article/10.3389/fpsyg.2019.00523.
  • Harlow [1942] H.F. Harlow. Responses by rhesus monkeys to stimuli having multiple sign-values. Q. McNemar & MA Merrill, Studies in personality, pages 105–123, 1942.
  • Harlow [1949] H.F. Harlow. The formation of learning sets. Psychological Review, 56(1):51–65, 1949.
  • Harnad [1990] S. Harnad. The symbol grounding problem. Physica D, 42:335–346, 1990.
  • Hiraki [2006] K. Hiraki. Detecting contingency a key to understanding development of self and social cognition. Japanese Psychological Research, 48(3):204–212, 2006.
  • Hoareau et al. [2019] Mélanie Hoareau, H. Henny Yeung, and Thierry Nazzi. Infants’ statistical word segmentation in an artificial language is linked to both parental speech input and reported production abilities. Developmental Science, 22(4):e12803, 2019. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/desc.12803.
  • Hobson and Friston [2016] J Allan Hobson and Karl J Friston. A response to our theatre critics. Journal of Consciousness Studies, 23(3-4):245–254, 2016.
  • Hochreiter and Schmidhuber [1997] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9:1735–1780, 1997.
  • Hoen et al. [2006] Michel Hoen, Mathilde Pachot-Clouard, Christoph Segebarth, and Peter Ford Dominey. When broca experiences the janus syndrome: an er-fmri study comparing sentence comprehension and cognitive sequence processing. Cortex, 42(4):605 – 623, 2006. ISSN 0010-9452. URL http://www.sciencedirect.com/science/article/pii/S0010945208703988.
  • Inoue and Mikami [2018] M. Inoue and A. Mikami. Prefrontal activity during serial probe reproduction task: encoding, mnemonic and retrieval processes. J Neurophysiol, 95:1008 –1041, 2018.
  • Izhikevich et al. [2004] E. M. Izhikevich, J. Gally A., and M. Edelman, G. Spike-timing dynamics of neuronal groups. Cerebral Cortex, 14:933–944, 2004.
  • Jeannerod [1994] M. Jeannerod. The representing brain: Neural correlates of motor intention and imagery. Behavioral and Brain Sciences, 17(2):187–202, 1994.
  • Jeannerod [2001] M Jeannerod. Neural simulation of action: a unifying mechanism for motor cognition. neuroimage 14. S103-S109.(doi: 10.1006/nimg. 2001.0832) Crossref, PubMed, ISI, 2001.
  • Jin et al. [2009] D.Z. Jin, N. Fujii, and A.M. Graybiel. Neural representation of time in cortico-basal ganglia circuits. Proc. Natl. Acad. Sci. USA, 106:19156–19161, 2009.
  • Johnston et al. [2019] W. Jeffrey Johnston, Stephanie E. Palmer, and David J. Freedman. Nonlinear mixed selectivity supports reliable neural computation. bioRxiv, 2019. URL https://www.biorxiv.org/content/early/2019/03/14/577288.
  • Kaiser [2017] M. Kaiser. Mechanisms of connectome development. Trends Cogn. Sci., (21):703–717, 2017.
  • Kaiser [2007] Marcus Kaiser. Brain architecture: a design for natural computation. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 365(1861):3033–3045, 2007. URL https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2007.0007.
  • Kanai et al. [2019] R. Kanai, A. Chang, Y. Yu, I.M. de Abril, M. Biehl, and N. Guttenberg. Information generation as a functional basis of consciousness. Neuroscience of Consciousness, 5(1):niz16, 2019.
  • Knott [1968] D. Knott. Vol. 1: Fundamental algorithms. The Art of Computer Programming, Reading, Mass.: Addison-Wesley, 1968.
  • Knott [1977] Gary D. Knott. A numbering system for binary trees. Communications of the ACM, 20(2):113–115, 1977.
  • Koechlin [2014] E. Koechlin.

    An evolutionary computational theory of prefrontal executive function in decision-making.

    Phil. Trans. R. Soc. B, 369:20130474, 2014.
  • Koechlin [2016] E. Koechlin. Prefrontal executive function and adaptive behavior in complex environments. Current Opinion in Neurobiology, 37:1–6, 2016.
  • Koechlin and Jubault [2006] E. Koechlin and T. Jubault. Broca’s area and the hierarchical organization of human behavior. Neuron, 50:963–974, 2006.
  • Koechlin and Summerfield [2007] E. Koechlin and C. Summerfield. An information theoretical approach to prefrontal executive function. Trends in Cognitive Sciences, 11(6):229–235, 2007.
  • Kohonen [1982] T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59–69, 1982.
  • Kriete et al. [2013] Trenton Kriete, David C. Noelle, Jonathan D. Cohen, and Randall C. O’Reilly. Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proc Natl Acad Sci U S A., 110(41):16390–5, 2013.
  • Lake et al. [2017] Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. Building machines that learn and think like people. Behavioral and Brain Sciences, 40:e253, 2017.
  • Laughlin and Sejnowski [2003] Simon Laughlin and Terrence Sejnowski. Communication in neuronal networks. Science (New York, N.Y.), 301:1870–4, 10 2003.
  • Lieberman [1968] Philip Lieberman. Primate vocalizations and human linguistic ability. The Journal of the Acoustical Society of America, 44(6):1574–1584, 1968.
  • Lungarella and Sporns [2005] M. Lungarella and O. Sporns. Information self-structuring: Key principle for learning and development. Proc of the 4th Int Conf on Development and Learning, 11:25–30, 2005.
  • Machens et al. [2010] C.K. Machens, R. Romo, and C.D. Brody. Functional, but not anatomical, separation of “what” and “when” in prefrontal cortex. Journal of Neuroscience, 30(1):350–360, 2010.
  • Mansouri et al. [2006] F.A. Mansouri, K. Matsumoto, and K. Tanaka. Prefrontal cell activities related to monkeys’ success and failure in adapting to rule changes in a wisconsin card sorting test analog. J. Neurosci., 26:2745–2756, 2006.
  • Marcus [2018] Gary Marcus. Deep learning: A critical appraisal, 2018.
  • Marcus et al. [2007] G.F. Marcus, K.J. Fernandes, and S. Johnson. Infant rule learning facilitated by speech. Psychological Science, 18(5):387–391, 2007.
  • Marcus et al. [1999] G.F. Marcus, S. Vijayan, S. Bandi Rao, and P.M. Vishton. Rule learning by seven-month-old infants. Science, 283:77–80, 1999.
  • Marshal and Meltzoff [2014] P.J. Marshal and A.N. Meltzoff. Neural mirroring mechanisms and imitation in human infants. Phil. Trans. R. Soc. B, 369(20130620), 2014.
  • Mcclelland et al. [2020] James Mcclelland, Bruce Mcnaughton, and Andrew Lampinen. Integration of new information in memory: New insights from a complementary learning systems perspective. 01 2020.
  • McClelland et al. [2010] J.L. McClelland, M.M. Botvinick, D.C. Noelle, D.C. Plaut, M.S. Rogers, T.T. Seidenberg, and L. Smith. Letting structure emerge connectionist and dynamical systems approaches to cognition. Trends in Cognitive Science, 14(5):348–356, 2010.
  • McClelland et al. [1995] J.L. McClelland, B.L. McNaughton, and R.C. O’Reilly. Why there are complementary learning systems in the hippocampus and neocortex insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3):419–457, 1995.
  • Meltzoff [1997] A.N. Meltzoff. Explaining facial imitation a theoretical model. Early Development and Parenting, 6:179–192, 1997.
  • Meltzoff [2007] A.N. Meltzoff. Infants’ causal learning Intervention, observation, imitation. In A. Gopnik & L. Schulz (Eds.), Causal learning Psychology, philosophy, and computation. Oxford Oxford University Press., 2007.
  • Meyer et al. [2012] Lars Meyer, Jonas Obleser, Alfred Anwander, and Angela D. Friederici. Linking ordering in broca’s area to storage in left temporo-parietal regions: The case of sentence processing. NeuroImage, 62(3):1987 – 1998, 2012. ISSN 1053-8119. URL http://www.sciencedirect.com/science/article/pii/S105381191200540X.
  • Nadel et al. [2005] J. Nadel, K. Prepin, and M. Okanda. Experiencing contingency and agency first step towards self-understanding in making a mind? Interaction Studies, 6(3):447–462, 2005.
  • Nazzi et al. [1998] T. Nazzi, J. Bertoncini, and J. Mehler. Language discrimination by newborns: Towards an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance, 24(3):1 – 11, 1998.
  • Ninokura et al. [2004] Yoshi Ninokura, Hajime Mushiake, and Jun Tanji. Integration of temporal order and object information in the monkey lateral prefrontal cortex. Journal of neurophysiology, 91:555–60, 02 2004.
  • Oztop et al. [2006] E. Oztop, M. Kawato, and M. Arbib. Mirror neurons and imitation a computationally guided review. Neural Network, 2126:1–18, 2006.
  • Oztop et al. [2013] Erhan Oztop, Mitsuo Kawato, and Michael A. Arbib. Mirror neurons: Functions, mechanisms and models. Neuroscience Letters, 540:43 – 55, 2013. ISSN 0304-3940. URL http://www.sciencedirect.com/science/article/pii/S0304394012013183. The Mirror Neuron System.
  • O’Regan et al. [2001] J. K. O’Regan, , and A. Noe. A sensorimotor account of vision and visual consciousness. Behav. Brain. Sci., (24):939–972, 2001.
  • O’Regan [2011] J. K. O’Regan. Why red doesn’t sound like a bell: Understanding the feel of consciousness. Oxford University Press, 2011.
  • Park and Friston [2013] H.-J. Park and K. Friston. Structural and functional brain networks: From connections to cognition. Sciences, 342(1):1238411–1–8, 2013.
  • Parthasarathy et al. [2017] A. Parthasarathy, R. Herikstad, J.H. Bong, F.S. Medina, C. Libedinsky, and S.-C. Yen. Mixed selectivity morphs population codes in prefrontal cortex. Nat. Neurosci., 20:1770–1779, 2017.
  • Pearl [2009] Judea Pearl. Causality. Cambridge university press, 2009.
  • Pearl and Mackenzie [2018] Judea Pearl and Dana Mackenzie. The book of why: the new science of cause and effect. Basic Books, 2018.
  • Pfeifer and Bongard [1999] R. Pfeifer and J.C. Bongard. How the Body Shapes the Way We Think, A New View of Intelligence. Bradford Books, 1999.
  • Pfeifer and Gomez [2009] R. Pfeifer and G. Gomez. Morphological computation – connecting brain, body, and environment - creating brain-like intelligence from basic principles to complex intelligent systems. LNAI; Creating Brain-Like Intelligence, 5436:66–83, 2009.
  • Pfeifer and Pitti [2012] R. Pfeifer and A. Pitti. La Révolution de L’Intelligence du Corps. Manuella Editions, 2012.
  • Pitti et al. [2012] A. Pitti, A. Blanchard, M. Cardinaux, and P. Gaussier. Gain-field modulation mechanism in multimodal networks for spatial perception. 12th IEEE-RAS International Conference on Humanoid Robots Nov.29-Dec.1, 2012. Business Innovation Center Osaka, Japan, pages 297–302, 2012.
  • Pitti et al. [2013] A. Pitti, R. Braud, S. Mahé, M. Quoy, and P. Gaussier. Neural model for learning-to-learn of novel task sets in the motor domain. Frontiers in Psychology, 4(771), 2013.
  • Pitti et al. [2017] A. Pitti, P. Gaussier, and M. Quoy. Iterative free-energy optimization for recurrent neural networks (inferno). PLoS ONE, 12(3):e0173684, 2017.
  • Pitti et al. [2020] A. Pitti, M. Quoy, C. Lavandier, and S. Boucenna. Gated spiking neural network using iterative free-energy optimization and rank-order coding for structure learning in memory sequences (inferno gate). Neural Networks, 121:242–258, 2020.
  • Rigotti et al. [2013] M. Rigotti, O. Barak, M.R. Warden, X.J. Wang, N.D. Daw, E.K. Miller, and S. Fusi. The importance of mixed selectivity in complex cognitive tasks. Nature, 497(7451):585–590, 2013.
  • Rizzolatti and Arbib [1998] G. Rizzolatti and A. Arbib. Language within our grasp. Trends in Neuroscience, 21:188–194, 1998.
  • Rizzolatti and Craighero [2004] G. Rizzolatti and L. Craighero. The mirror-neuron system. Annu. Rev. Neuroscience, 27:169–192, 2004.
  • Rizzolatti et al. [1996] G. Rizzolatti, L. Fadiga, L. Fogassi, and V. Gallese. Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3:131–141, 1996.
  • Rochat [2001] P. Rochat. The Infant’s World. Harvard Press, 2001.
  • Rochat [2003] P. Rochat. Five levels of self-awareness as they unfold early in life. Consciousness and Cognition, 12:717–731, 2003.
  • Rochat [2019] Philippe Rochat. Self-unity as ground zero of learning and development. Frontiers in Psychology, 10:414, 2019. ISSN 1664-1078. URL https://www.frontiersin.org/article/10.3389/fpsyg.2019.00414.
  • Romo et al. [2018] R. Romo, CD Brody, A. Hernández, and L. Lemus. Neuronal correlates of parametric working memory in the prefrontal cortex. Nature, 399(6735):470, 2018.
  • Rouault and Koechlin [2018] M. Rouault and E. Koechlin. Prefrontal function and cognitive control: from action to language. Current Opinion in Behavioral Sciences, 21:106–111, 2018.
  • Saffran et al. [1996] J. Saffran, R.N. Aslin, and E.L. Newport. Statistical learning by 8-month-old infants. Science, 274:1926–1928, 1996.
  • Saffran and Wilson [2003] J. Saffran and D. Wilson. From syllables to syntax: Multilevel statis- tical learning by 12-month-old infants. Infancy, 4:273–284, 2003.
  • Salinas and Sejnowski [2001] E. Salinas and T. J. Sejnowski. Gain modulation in the central nervous system where behavior, neurophysiology and computation meet. The Neuroscientist, 7:430–440, 2001.
  • Sarma et al. [2016] A. Sarma, N.Y. Masse, X.-J. Wang, and D.J. Freedman. Task-specific versus generalized mnemonic representations in parietal and prefrontal cortices. Nat. Neurosci., 19:143–149, 2016.
  • Shannon [1948] C. E. Shannon. A mathematical theory of communication. Bell Systems Technical Journal. Available at: https://archive.org/details/bellsystemtechni27amerrich/page/379, 27(3):379–423, 623–56, 1948.
  • Shannon and Weaver [1963] C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, 1963.
  • Shima et al. [2007] K. Shima, M. Isoda, H. Mushiake, and J. Tanji. Categorization of behavioural sequences in the prefrontal cortex. Nature, 445:315–318, 2007.
  • Singer [2003] W. Singer. Synchronization, binding and expectancy. pages 1136–1143, 2003.
  • Smolensky [1988] Paul Smolensky. On the proper treatment of connectionism. Behavioral and brain sciences, 11(1):1–23, 1988.
  • Song et al. [2000] S. Song, K.D. Miller, and L.F. Abbott. Competitive hebbian learning and through spike-timing-dependent synaptic plasticity. Nature neuroscience, 3:919–926, 2000.
  • Spelke [2003] E. Spelke. What makes us smart? core knowledge and natural language. In: Language in mind: Advances in the Investigation of language and thought, ed. D. Gentner & S. Goldin-Meadow. MIT Press., page 277–311, 2003.
  • Spelke and Kinzler [2007] E. Spelke and K.D. Kinzler. Core knowledge. Developmental Science, 10(1):89–96, 2007.
  • Sporns et al. [2004] O. Sporns, D. R. Chialvo, M. Kaiser, and C. C. Hilgetag. Organization, development and function of complex brain networks. Trends Cogn. Sci., 8:418–425, 2004.
  • Sporns and Honey [2006] O. Sporns and C. Honey. Small worlds inside big brains. PNAS, 103(51):19219–19220, 2006.
  • Sporns et al. [2000] O. Sporns, G. Tononi, and G. M. Edelman. Connectivity and complexity the relationship between neuroanatomy and brain dynamics. Neural Networks, 13(8-9):909–922, 2000.
  • Stoianov et al. [2018] I.P. Stoianov, C.M.A. Pennartz, C.S. Lansink, and G. Pezzulo. Model-based spatial navigation in the hippocampus-ventral striatum circuit: A computational analysis. PLoS Comput Biol, 14:e1006316, 9 2018.
  • Tanji and Hoshi [2001] J. Tanji and E. Hoshi. Behavioral planning in the prefrontal cortex. Curr. Opin. Neurobiol., 11:164–170, 2001.
  • Tanji et al. [2007] Jun Tanji, Keisetsu Shima, and Hajime Mushiake. Concept-based behavioral planning and the lateral prefrontal cortex. Trends in Cognitive Sciences, 11(12):528–534, 2007.
  • Tenenbaum et al. [2011] J.B. Tenenbaum, C. Kemp, T.L. Griffiths, and N.D. Goodman. How to grow a mind statistics, structure, and abstraction. Science, 331(6022):1279‐–1285, 2011.
  • Tenenbaum et al. [2006] Joshua B. Tenenbaum, Thomas L. Griffiths, and Charles Kemp. Theory-based bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7):309 – 318, 2006.
  • Terekhov and Kevin O’Regan [2016] A. Terekhov and J. Kevin O’Regan. Space as an invention of active agents. Frontiers in Robotics and AI, 3(4), 2016.
  • Thorpe et al. [2001] S. Thorpe, A. Delorme, and R. Van Rullen. Spike-based strategies for rapid processing. Neural Networks, 14:715–725, 2001.
  • Tognoli and J.A.S. [2014] E. Tognoli and Kelso J.A.S. The metastable brain. Neuron, 81:35–48, 2014.
  • Toker and Sommer [2019] D. Toker and F.T. Sommer. Information integration in large brain networks. PLoS Comput Biol, 15(2):e1006807, 2019.
  • Tononi [1992] G. Tononi. Reentry and the integration of brain function. Scientific Contributions to General Psychology, 8:27–51, 1992.
  • Tononi [2008] G. Tononi. Consciousness as integrated information: a provisional manifesto. Biol. Bull., 215:216–242, 2008.
  • Tononi et al. [1992] G. Tononi, O. Sporns, and G.M. Edelman. Reentry and the problem of integrating multiple brain areas: Simulation of dynamic integration in the visual system. Cerebral Cortex, 2:310–335, 1992.
  • Tsuda [2015] I. Tsuda. Chaotic itinerancy and its roles in cognitive neurodynamics. Current Opinion in Neurobiology, 31:67–71, 2015.
  • Tsuda et al. [2008] I. Tsuda, Y. Yamaguchi, S. Kuroda, Y. Fukushima, and M. Tsukada. A mathematical model for the hippocampus towards the understanding of episodic memory and imagination. Progress in Theoretical Physics Supplement, 173:99–108, 2008.
  • Turing [1936] A.M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. Lond. Math. Soc., 2:230–265, 1936.
  • Turing [1950] A.M. Turing. letter in response to a. m. uttley’s talk on ’information, machines and brains’, conference on information theory, 26-29 september 1950. 1950.
  • van den Heuvel and Sporns [2011] M.P. van den Heuvel and O. Sporns. Rich-club organization of the human connectome. J. Neurosci., (31):15775–15786, 2011.
  • Van Rullen et al. [1998] R. Van Rullen, J. Gautrais, A. Delorme, and S. Thorpe. Face processing using one spike per neurone. BioSystems, 48:229–239, 1998.
  • Van Rullen and Thorpe [2002] R. Van Rullen and S. Thorpe. Surfing a spike wave down the ventral stream. Vision Research, 42:2593–2615, 2002.
  • Varela et al. [2001] F.J. Varela, J.P. Lachaux, E. Rodriguez, and J. Martinerie. The brain web phase-synchronization and large-scale brain integration. Nat. Rev. Neuroscience, 2:229–239, 2001.
  • von Neumann [1958] J. von Neumann. The computer and the brain. New Haven, CT: Yale University Press, 1958.
  • Wacongne et al. [2012] C. Wacongne, J.P. Changeux, and S. Dehaene. A neuronal model of predictive coding accounting for the mismatch negativity. J. Neurosci., 32:3665–3678, 2012.
  • Wang et al. [2018] Jane X. Wang, Zeb Kurth-Nelson, Dharshan Kumaran, Dhruva Tirumala, Hubert Soyer, Joel Z. Leibo, Demis Hassabis, and M.M. Botvinick. Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21:860–868, 2018.
  • Wang et al. [2019] L. Wang, M. Amalric, W. Fang, X. Jiang, C. Pallier, S. Figueira, M. Sigman, and S. Dehaene. Representation of spatial sequences using nested rules in human prefrontal cortex. NeuroImage, 186:245–255, 2019.
  • Watson [1966] J. Watson. The development and generalization of ‘contingency awareness’ in early infancy some hypotheses. Merrill Palmer Quarterly, 12:123–135, 1966.
  • Watson [1994] J. Watson. Detection of self the perfect algorithm. In S. Parker, R. Mitchell and M. Boccia (Eds.), Self-awareness in animals and humans Developmental perspectives. Cambridge University Press, 1994.
  • Whyte and Smith [2020] Christopher J Whyte and Ryan Smith. The predictive global neuronal workspace: A formal active inference model of visual consciousness. bioRxiv, 2020.
  • Yamaguti et al. [2011] Yutaka Yamaguti, Shigeru Kuroda, Yasuhiro Fukushima, Minoru Tsukada, and Ichiro Tsuda. A mathematical model for cantor coding in the hippocampus. Neural networks, 24:43–53, 01 2011.
  • Yeh et al. [2018] Fang-Cheng Yeh, Sandip Panesar, David Fernandes, Antonio Meola, Masanori Yoshino, Juan C. Fernandez-Miranda, Jean M. Vettel, and Timothy Verstynen. Population-averaged atlas of the macroscale human structural connectome and its network topology. NeuroImage, 178:57 – 68, 2018. ISSN 1053-8119. URL http://www.sciencedirect.com/science/article/pii/S1053811918304324.
  • Zylberberg et al. [2011] A. Zylberberg, S. Dehaene, P.R. Roelfsema, and M. Sigman. The human turing machine a neural framework for mental programs. Trends in Cognitive Science, 15(7):293–300, 2011.
  • Zylberberg et al. [2010] A. Zylberberg, F. Slezak, S. Roelfsema, P.R. Dehaene, and M. Sigman. The brain’s router: A cortical network model of serial processing in the primate brain, 2010.