A Review on Neural Network Models of Schizophrenia and Autism Spectrum Disorder

06/24/2019 ∙ by Pablo Lanillos, et al. ∙ Technische Universität München The University of Tokyo 9

This survey presents the most relevant neural network models of autism spectrum disorder and schizophrenia, from the first connectionist models to recent deep network architectures. We analyzed and compared the most representative symptoms with its neural model counterpart, detailing the alteration introduced in the network that generates each of the symptoms, and identifying their strengths and weaknesses. For completeness we additionally cross-compared Bayesian and free-energy approaches. Models of schizophrenia mainly focused on hallucinations and delusional thoughts using neural disconnections or inhibitory imbalance as the predominating alteration. Models of autism rather focused on perceptual difficulties, mainly excessive attention to environment details, implemented as excessive inhibitory connections or increased sensory precision. We found an excessive tight view of the psychopathologies around one specific and simplified effect, usually constrained to the technical idiosyncrasy of the network used. Recent theories and evidence on sensorimotor integration and body perception combined with modern neural network architectures offer a broader and novel spectrum to approach these psychopathologies, outlining the future research on neural networks computational psychiatry, a powerful asset for understanding the inner processes of the human brain.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 20

page 23

page 38

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the world, there is a prevalence of schizophrenia (SZ) that ranges between four and seven per 1000 individuals (between three and five million people) saha2005systematic and a prevalence of Autism Spectrum Disorder (ASD) that ranges between six and 16 per 1000 children (between 1 of 150 and 1 of 59 children) baio2018prevalence . SZ and ASD have in common that they both cause deficits in social interaction and are characterized by perceptual peculiarities. While ASD has its onset in early childhood, SZ is typically diagnosed in adults, although in very rare cases, appears during development rapoport2009autism . Similar neural bases have been observed for both disorders pinkham2008neural , which has even led to the suggestion that some SZ cases might be part of the autism spectrum king2011schizophrenia . In fact, there are similarities such that both pathologies show atypical sensorimotor integration and perceptual interpretation. However, there are also striking differences between these disorders. A common symptom of SZ is the occurrence of hallucinations or delusions, in contrast to ASD which is characterized by atypical non-verbal communication and emotional reciprocity. Furthermore, a few savant syndrome cases were reported in ASD individuals with extraordinary skills like painting treffert2009savant . Fig. 1 depicts, in an artistic way, the reality perceived by two individuals in the spectrum of these disorders.

Fig. 1: Artistic pieces representing different perceptions of the world. (a) Hunted, ©2019 Henry Cockburn, a SZ diagnosed artist. (b) Drawing by Nadia Chomyn at the age of 5, a gifted ASD diagnosed child, reprinted from selfe2012nadia , ©2012 Lorna Selfe.

For both disorders, neurological, genetic and environmental causes have been suggested, but to date the causes and underlying cognitive processes remain unclear. A major challenge in diagnosis is their heterogeneity and non-specificity. Heterogeneity means that symptoms, prognosis and treatment responses vary significantly between different subjects. Non-specificity expresses that a single biological basis can result in different phenotypes (multifinality) and different biological bases can result in a single phenotype (equifinality). Non-specificity, as a biological abnormality related to a psychiatric disorder, can be found in many other neurological disorders cross2013identification ; redish2016computational .

Computational modeling of psychopathologies or Computational Psychiatry is one of the potential key players wang2014computational ; montague2012computational ; redish2016computational to tackle heterogeneity and non-specificity, and to better understand the cognitive processes underlying these disorders. Eventually, computational models might help to validate theoretical models, generate new hypothesis or even suggest new treatments. There are different levels of descriptions or units of analysis to study these disorders, which encompass from genes to molecules, to cells, to circuits, to physiology, and then to behaviour. “Computational Psychiatry provides some of the tools to link these levels” adams2016computational .

In particular, neural network models serve, due to their analogy to biological neurons, as a tool to test and generate hypotheses on possible neurological causes

huys2011computational . Artificial neural networks cannot only be useful from the data-driven point of view (e.g., fitting a model to fMRI data), but can also be used as a simplified model of the human brain to replicate human behavior and to investigate which modifications in the connectionist models cause which alterations of behavior.

1.1 Artificial neural network modeling of psychopathologies

Artificial Neural Networks (ANNs or NNs) were first introduced in the 1950’s as an attempt to provide a computational model of the inner processes of the human brain rosenblatt1958perceptron . Nevertheless, their potential was not fully unraveled until the last decades because of limited computational power and data shortage schmidhuber2015deep

. Due to the inspiration from biological processes of our brain and their connectionist nature, these technologies have also opened a door to new research fields that combine disciplines, such as neuroscience and psychology with artificial intelligence and robotics. Within the field of cognitive neuroscience, neural networks are already a tool for getting insights into the complex structures of our brain and gaining a better understanding of how learning, memory or visual perception might work on a neural level

crick1983function ; spitzer1995neurocomputational .

In the late 80’s and early 90’s, neural networks were used for the first time related to psychiatry, trying to imitate psychological disorders hoffman1987computer ; cohen1992context . Early efforts in compiling ANN models for cognitive disorders can be found in reggia1996neural and in gustafsson2004neural

, specifically for autism. Due to immense advances in computational power, 20 years later, computational modeling using ANN and deep learning is becoming a powerful asset to aid the investigation of these disorders. The challenge is to coherently translate the findings in different levels of description into a mathematical connectionist model.

A significant advantages of using ANN models is that they can deal with a vast amount of information and they can learn and predict complex non-linear patterns. The structure of ANNs makes also possible to systematically test which parameter modifications cause effects similar to the symptoms of psychiatric disorders. Furthermore, these ANN models and their alterations may be directly implemented in artificial agents (e.g. robots) filling the last level: comparing the behavior of such agents with behavior observed in patients pfeifer2006body ; cheng2007cb . In this way, existing hypotheses from neuroscience and psychology could be tested, and new hypotheses on potential causes could be formulated.

1.2 Purpose and content overview

This historical review aims at serving as a reference for computational neuroscience, robotics, psychology and psychiatry researchers interested in modeling psychopathologies with neural networks. This work extends general computational modeling reviews reggia1996neural ; gustafsson2004neural ; anticevic2015bridging ; valton2017comprehensive ; moustafa2017neurocomputational by focusing on neural network models for SZ and ASD with detailed explanation of the alterations on a neural level and their associated symptoms, including their technical architectures as well as their mathematical formulation. For completeness, we also included Bayesian and predictive processing models due to their similarities to ANNs and their relevance inside the neuroscience community. Actually, we show that, conceptually, ANN and Bayesian models are taking similar approaches to model these conditions.

We start in Section 2 with a short introduction of the mentioned disorders, listing their main characteristics and symptoms mainly based on the latest Diagnostic and Statistical Manual of Mental Disorders (DSM-5) descriptions.

For readability and due to the heterogeneity of the reviewed methods, in Section 3, we first summarize and discuss the main modeling approaches and hypotheses which are referenced in literature. Afterwards, Section 4 and Section 5 present a comprehensive review of models of SZ and ASD, respectively, organized by the type of modeling approach. To help the reader, we summarized the content of Section 4 and Section 5 into two tables: Tab. 1 for SZ and Tab. 2 for ASD. Finally, in Section 6 some hints about future research on ANN for computational psychiatry particularly for ASD and SZ are described.

2 Pathologies and their symptoms

SZ and ASD are disorders that change the way we perceive and act in the world. Atypicalities in perception and in cognitive processing cause difficulties in connecting with the world, in particular for social interaction. Since the first reports of autistic symptoms kanner1943autistic , both conditions have been closely related, as subjects with ASD were often previously incorrectly diagnosed as schizophrenic. Nowadays, these two pathologies remain still strongly connected as both are associated with atypicalities in sensory processing and integration of information, and due to their strong heritability sandin2017heritability .

2.1 Schizophrenia

SZ is a serious psychiatric disorder that affects a person’s feelings, social behavior and perception of reality. Its biological causes are still unknown, but genetic and environmental factors, i.e. prenatal stress, traumatic experiences or drug use, can be key factors for the development of this disorder. Its symptoms are usually divided into positive symptoms and negative symptoms sims1988symptoms . Positive symptoms generally correspond to increased function, including hallucinations and delusions. Negative symptoms, corresponding to decreased function, are a lack of the normal function such as diminished emotional expression. Positive symptoms are more apparent, but generally respond better to medication. Negative symptoms are more subtle and less responsive to pharmacological treatment. Below some of the most characteristic symptoms of SZ taken from the DSM-5 american2013diagnostic are listed.


Positive symptoms:

  1. Delusions: have convinced beliefs that are not real, and cannot be changed despite clear evidence

  2. Hallucinations: perceive things that do not exist as real, without an external stimulus

  3. Disorganized thinking: difficulty to keep track of thoughts, drift between unrelated ideas during speech

  4. Disorganized or abnormal movements: difficulties to perform goal-directed tasks, catatonic (stopping movement in unconventional posture) or stereotyped (repetitive) movements

Negative symptoms

  1. Diminished emotional expression: reduced expression of emotions through speech, facial expressions or movements

  2. Avolition: lack of interests, inaction

  3. Alogia: diminished speech output

  4. Anhedonia: diminished ability to experience pleasure

  5. Asociality: lack of interest in social interaction

Multiple reports have also associated self-other disturbances to SZ. This means that schizophrenic patients can perceive own and external actions or feelings, but may have problems differentiating them. This could be part of the explanation for auditory hallucinations and struggles during social interaction. Van der Weiden et al. published an extensive review van2015self on possible causes for this disorder. Finally, in more severe cases, motor disorders have been reported morrens2006stereotypy , such as stereotypical and catatonic behavior.

SZ is investigated by many researchers because of its prevalence and its devastating effects on patients, which can have life-changing consequences on the patient’s relationships and social situation. Moreover, its close relation with the inner workings of self-perception and self-other distinction, raises multiple researcher’s interest from areas such as psychology, neuroscience, cognitive science and even developmental robotics.

2.2 Autism spectrum disorder

ASD is a prevalent developmental disorder that has a behavior-based diagnosis due to its still unclear biological causes. It was first introduced in the 1940s by Kanner kanner1943autistic , who presented the cases of eleven children “whose condition [differed] so markedly and uniquely from anything reported so far”, some of them being previously diagnosed as schizophrenic. Actually, the term autistic was originally used for describing symptoms in schizophrenic patients. This kind of disorder mainly affects individual’s social interaction, communication, interests and motor abilities. It is often referred to as a heterogeneous group of disorders, as individuals show very distinct combinations of symptoms with different severity. Nevertheless, there are some characteristic attributes that are commonly associated with ASD, which we have listed from the DSM-5 american2013diagnostic .


Deficits in social communication and interaction:

  1. Impairment in socio-emotional reciprocity: struggle to share common interests and emotions, reduced response or interest in social interaction

  2. Deficits in non-verbal communication: problems integrating verbal and nonverbal communication, and using and understanding gestures or facial expressions

  3. Problems to maintain relationships: problems or absence of interest in understanding relationships and adjusting behavior

Abnormal behavior patterns, interests or activities:

  1. Stereotyped movements or behavior: repetitive motor movements or speech

  2. Attention to sameness: adherence to routines, distress because of small changes

  3. Fixated and restricted interests: strong attachment to certain objects, activities or topics

  4. Hyper- or hyporeactivity to sensory input: indifference to pain, repulsive response to certain sounds or textures, visual fascination

Early identification of individuals with ASD has focused on non-verbal communication interaction, mainly observing attention and gaze behaviours using standardized tests, such as the Autism Diagnostic Observation Schedule (ADOS) lord2012autism .

ASD is thought to be caused by genetic disorders and environmental factors and evidence points at high heritability sandin2017heritability . Furthermore, recent studies, using a computer model of the human fetus, have also highlighted the importance of intrauterine embodied interaction on the development of the human brain and in particular cortical representation of body parts yamada2016embodied . Some authors have suggested that preterm infants might have a higher risk of enduring such developmental disorders.

3 Modeling approaches and hypotheses

ASD and SZ are among the psychiatric disorders which are most commonly investigated using computational modeling. A reason might be the unclear underlying cognitive mechanisms of these disorders which computational models might help to unravel. The studies we discuss in this review often take similar approaches for modeling ASD and SZ. In fact, these two disorders share certain symptoms, such as deficits in social communication and motor impairments manifesting as decreased response or repetitive and stereotyped movements. However, core symptoms of SZ (delusions and hallucinations) are not associated with ASD. Also perceptual atypicalities in both disorders differ in that SZ involves perceptual experiences that occur without an external stimulus whereas ASD is more typically characterized by hypersensitivity to certain stimuli from the environment.

Similarities between computational models of ASD and SZ, therefore, are not so much motivated by similarities in symptoms. Actually, studies modeling SZ focused mainly on delusions and hallucinations which are not predominant in ASD. Similarities instead can be found in the suggested biological causes. Accordingly, similar parameter modifications are investigated in computational models.

There are three main biological causes that are commonly employed in computational models: neural dysconnections111Note that disconnection usually refers to a lack of connection whereas dysconnection describes atypical connectivity which might include decreased as well as increased connectivity., imbalance of excitation and inhibition, and alterations of the precision of predictions or sensory information.

3.1 Dysconnection hypotheses

Especially for SZ, one of the most discussed theories is functional disconnections friston1998disconnection ; lynall2010functional . The main motivation is that SZ cannot be explained by an impairment of a single brain region, but only by a (decreased) interaction between multiple brain regions friston1998disconnection . Disconnections or underconnectivity are also discussed as a potentional cause of ASD frith2004autism ; just2004cortical ; anderson2010decreased , but more recent evidence also points at increased connectivity keown2013local ; supekar2013brain or a distortion of patterns of functional connectivity hahamy2015idiosyncratic .

In the discussed studies for SZ, dysconnection is primarily implemented by an increased pruning of synapses

hoffman1989cortical ; hoffman1997synaptic ; hoffman2011using . Such a pruning is a normal developmental process between adolescence and early adulthood huttenlocher1979synaptic . Computational models demonstrate using Hopfield networks hoffman1989cortical or feed-forward networks hoffman1997synaptic ; hoffman2011using that too strong pruning can cause fragmented recall or the recall of new patterns, which can be related to the symptom of hallucinations in SZ.

Notably, the SZ symptoms replicated with connection pruning focus solely on hallucinations or delusions and might not be appropriate for modelling ASD. In fact, in a biological context, it might be more appropriate to disturb connections between neurons instead of simply cutting them. This idea was followed by Yamashita and Tani yamashita2012spontaneous who induce noise between different hierarchies of neurons (suggested by friston1995schizophrenia ). They demonstrated that this leads to the emergence of inflexible, repetitive motor behavior similar to catatonia symptoms. As this behavior could be also present in ASD, it might be worthwhile to explore the implications of such an impairment for ASD.

Only a single study focuses on dysconnection in ASD. Park and colleagues park2019macroscopic ; ichinoselocal demonstrated that local over-connectivity, especially locally in the prefrontal cortex courchesne2005frontal , affects which frequency patterns of neural activations emerge in spiking neural networks.

3.2 Excitation/inhibition imbalance

An excitation/inhibition (E/I) imbalance is among the most commonly referenced biological evidence for SZ as well as for ASD rubenstein2003model ; sun2012impaired ; snijders2013atypical ; canitano2017autism . E/I imbalance was found in many neurobiological studies on SZ and ASD. Although it is not clear how exactly E/I imbalance translates to changes in cognition and behavior canitano2017autism , it seems to be linked to core symptoms of both disorders such as hallucinations jardri2016hallucinations and social interaction deficits yizhar2011neocortical .

An unanswered question is also of which quality this imbalance is. A recent review of studies regarding ASD found vast evidence for increased inhibition as well as increased excitation dickinson2016measuring . Conflicting results in various brain regions might arise by differences in measurements and their reliability. The most commonly used mechanisms are magnetic resonance spectroscopy which allows to measure the cortical levels of glutamate or GABA, measurements of gamma-band activity (which is hypothesized to be connected to inhibition) or the analysis of the number of glutamate or GABA receptors in post-mortem studies dickinson2016measuring . Another possible interpretation of these conflicting results is that both, increases and decreases, in inhibition and excitation are present in ASD. This hypothesis was put forward by Nagai and colleagues nagai2015influence , suggesting that both impairments share a common underlying mechanism. Their model could show that increased inhibition and increased excitation can simulate the local or global processing bias of ASD, respectively.

Gustafsson gustafsson1997inadequate also connects E/I imbalance to the local processing style of ASD. He implemented increased inhibition in a SOM, in particular, stronger inhibition in the surrounding of receptive fields which led to over-discrimination.

For SZ, although E/I imbalance is commonly associated to SZ in the literature, only the work from jardri2013circular explores E/I imbalance as a modeling mechanism, which was recently supported by some experimental evidence jardri2017experimental . In their model, a stronger excitation or insufficient inhibition caused circular belief propagation: bottom-up and top-down information get mixed up, might be misinterpreted and over-counted. Due to the scarcity of works in this direction further investigating in how far E/I imbalance can replicate hallucinations as suggested in jardri2016hallucinations should be subject to future work.

3.3 Hypo-prior theory and aberrant precision account

The increasing popularity of the Bayesian view on the brain in recent years resulted in many suggestions of how psychiatric disorders could be caused by a failure of correctly integrating bottom-up sensory information with top-down prior expectations. These approaches are inspired by diminished susceptibility of subjects with psychiatric disorders to visual illusions notredame2014visual and the well-known symptom of hypersensitivity to certain stimuli (e.g. lucker2013auditory ).

Problems in the integration of top-down and bottom-up information can be explained by an inadequate estimation of the precision of these signals. A decreased precision of the prior causes a stronger reliance on sensory input, known as the hypo-prior theory which Pellicano and Burr suggested for ASD in 2012

pellicano2012world . Similarly, an increased precision of the bottom-up signal can account for the same consequences lawson2014aberrant . Despite some initial evidence in favor of an overrating of sensory information karvelis2018autistic , it cannot be decided to date which of these theories is correct.

For both, ASD and SZ, typically a weaker influence of predictions and a higher influence of sensory information is suggested pellicano2012world ; lawson2014aberrant . Lawson et al. substantiated aberrant precision on hierarchical predictive coding where different cortical layers could account for both hypo-priors and sensory noise influence lawson2014aberrant , leaving open both hypothesis. In an endeavor to clarify how such theories differ for ASD and SZ, Karvelis and colleagues karvelis2018autistic recently investigated how healthy individuals, scored for traits of ASD and SZ, use prior information in a visual motion perception task. Correlation with autistic traits and the usage of prior information could be confirmed, whereas schizophrenic traits did not correlate. However, this evidence has to be considered preliminary. In computational models adams2013computational , stronger reliance of bottom-up information has been successfully applied to model delusions and hallucinations as well as deficits in motor behavior.

However, it might well be and is intuitively plausible that also an overrating of top-down information can account for the occurrence of hallucinations powers2016hallucinations . In a recent review, Sterzer et al. sterzer2018predictive noticed that too strong as well as too weak priors were proposed for explaining psychosis. They suggested that the way that priors are processed might differ depending on the sensory modality or the hierarchical level of processing, yielding to inconsistent theories and findings.

In line with this idea, computational models for ASD often suggest that an impairment might be present in both extremes idei2017reduced ; philippsen2018understanding . In idei2017reduced

, repetitive movement could be replicated by an aberrant estimation of sensory precision, leading to inflexible behavior, either due to sameness of intentional states (increased sensory variance) or due to high error signals and misrecognition (decreased sensory variance). Similarly,

philippsen2018understanding

suggests that too strong as well as too weak reliance on the sensory signal may impair the internal representation of recurrent neural networks.

Thus, for SZ as well as for ASD, too strong as well as too weak reliance on priors or sensory information seem to be all valid approaches of modeling and future work needs to carefully examine which of the theories applies for which context.

3.4 Alternative modeling approaches

There are alternative theories used in the discussed computational models. Synaptic gain, for instance, has been evaluated for SZ cohen1992context as well as for ASD dovgopoly2013connectionist . In fact, a reduction of synaptic gain might be related to reduced precision of prior beliefs as discussed in adams2018bayesian .

Furthermore, in particular for ASD, altered parameters often manipulate directly the generalization capabilities of the network using commonly optimized network parameters such as number of neurons cohen1994artificial , training time cohen1994artificial ; dovgopoly2013connectionist , or regularization factors dovgopoly2013connectionist ; ahmadi2017bridging . These studies are less inspired by biological evidence and focus more strongly on directly replicating observed behavior (mostly generalization deficits).

Besides, homeostasis has been also proposed as the cause for visual hallucinations within the Charles Bonnet syndrome series2010hallucinations and for tactile hallucinations deistler2019tactileHallucinations

. Both were successfully implemented in a Deep Boltzmann machine . However, it was never studied if is suitable for modelling SZ hallucinations.

4 ANN models of schizophrenia

Model type Paper Disorder Characteristic Biological Evidence Approach
Hopfield Networks R. E. Hoffman, T. H. McGlashan (1987)hoffman1987computer Delusions, sense of mind being controlled by outside force - Storing of an excessive number of memories (memory overload)
R. E. Hoffman, T. H. McGlashan (1989)hoffman1989cortical Hallucinations, delusions, sense of mind being controlled by outside force Reduced connectivity in prefrontal cortex and other regions Excessive connection pruning
D. Horn, E. Ruppin (1995) horn1995compensatory Delusions and hallucinations Reactive synaptic regeneration in frontal cortex Weakening of external input projections, increase of internal projections and noise levels, additional Hebbian component
Feed-forward NNs J. D. Cohen, D. Servan-Schreiber (1992) cohen1992context Disturbances of attention, representation of context Abnormal dopamine activity in prefrontal cortex

Reduction of activation function gain in context-neurons

R. E. Hoffman, T. H. McGlashan (1997) hoffman1997synaptic Auditory hallucinations Reduced connectivity in prefrontal cortex and other regions Excessive connection pruning
R. E. Hoffman et al. (2011) hoffman2011using Delusionary story reconstuction Abnormal dopamine activity, cortical disconnections Increased BP learning rates, excessive connection pruning in working memory
Predictive processing Adams et al. (2013) adams2013computational Delusions and hallucinations, abnormal smooth pursuit eye movement Abnormal neuromodulation of superficial pyramidal cells in high hierarchical levels Abnormal precision computation in the free energy minimization scheme
Circular inference Jardri and Denéve (2013) jardri2013circular Hallucinations and delusions Disruption in the neural excitatory to inhibitory balance Increased excitation / reduced inhibition in belief propagation
Recurrent NNs Y. Yamashita, J. Tani (2012) yamashita2012spontaneous Disturbance of self, feeling of being controlled by outside force, disorganized movements Disconnectivities in hierachical networks of prefrontal and posterior brain regions Noise between context neuron hierarchies in MTRNN
Tab. 1: Overview of neural network models of schizophrenia

In the following section, we present the most important ANN models of SZ. The majority of approaches focus on positive symptoms of SZ, such as hallucinations and delusional behavior, e.g. horn1995compensatory and hoffman1997synaptic . Nevertheless, there have been also other approaches targeting other symptoms, for instance attention characteristics cohen1992context and movement disorders yamashita2012spontaneous . An overview of the most important models is presented in Tab. 1.

4.1 Memory: Hopfield networks

4.1.1 Memory overload

In 1987, Ralph E. Hoffman, professor of psychiatry from Yale, presented the earliest neural network model of SZ hoffman1987computer , inspired by the suggestions of crick1983function , who explored the function of dreams using a neural network model. Hoffman tried to explain the causes of schizophrenic and maniac disorders with simulations using a Hopfield Network, an associative memory ANN that is usually employed to simulate the inner functioning of human memory hopfield1982neural and to store binary memory patterns. It is a recurrent neural network that converges to fixed-point attractors. Its learning is usually based on the famous Hebbian rule, “cells that fire together wire together”, which means that connections between neurons that get activated with temporal causality are increased hebb1949organization . In order to model SZ, the author inspected the network attractors’ behavior after storing an increasing number of binary memories.

Results showed that by increasing the number of binary memory patterns stored, the network reaches “parasitic” states that do not correspond to previously stored memories. With higher numbers of memories or decreased storage capacity, the network’s internal energy minima that correspond to the stored memories might influence each other and create additional deep minima (attractors) that do not correspond to any previously learned pattern. These minima might either only influence the information processing course (mind being controlled by outside force) or lead to convergence to “parasitic states”, which are compared to hallucinations and delusional thoughts. This study did not use biological evidence to support its main thesis that SZ might be caused by memory overload and only compared behavioral observations. However, this model served as a stepping stone for a successor model (see Section 4.1.2).

4.1.2 Memory model with disconnections

Observations that show diminished metabolism in the prefrontal cortex (hypofrontality) of individuals with SZ led to the theory that excessive synaptic pruning might be the reason for the appearance of SZ between adolescence and early adulthood feinberg1982schizophrenia ; keshavan1994schizophrenia . A decline in synaptic density is a normal developmental process huttenlocher1979synaptic ; huttenlocher1982synaptogenesis which might have gone too far in the case of SZ. In 1989, Hoffman and Dobscha used a Hopfield network as a content-addressable memory to retrieve previously stored memories giving a similar input hoffman1989cortical . A “neural Darwinism” principle was applied, which is a pruning rule that erases connections depending on their weights and length. The concrete pruning rule is shown in Eq. (1), with being the weight of the connection between neurons in coordinates and , and the pruning coefficient. The pruning coefficient determines if more or less connections are pruned. Fig. 2 illustrates a possible scenario for this pruning process.

(1)
Fig. 2: Pruning rule used for the Hopfield Network in hoffman1989cortical . The connections are pruned depending on the connection weight and the distance between the connected neurons. A: Connections before pruning. B: Connections after pruning. Reprinted from hoffman1989cortical .

For a moderate level of pruning, the network is still able to perform the memory-retrieval task, but for connection reductions of 80% the network shows fragmented retrieval. This fragmentation was compared to thought disorders observed in SZ, which lead to incoherence, attention deficits or the feeling that one’s mind is being controlled by an outside force. Furthermore, sometimes over-pruned areas converged to patterns not included in any of the stored memories. These were denominated as “parasitic foci”. The authors compared these to hallucinations in SZ because they contained decodable information that does not belong to any stored memory. Occasionally, these parasitic regions extended on a larger area and persisted independently of the input, which was compared do delusional thoughts observed in patients.

4.1.3 Memory model hippocampal region

In 1995, Horn and Ruppin horn1995compensatory ; ruppin1996pathogenesis also introduced a Hopfield-based network to replicate the positive symptoms of SZ. This model was based on the hypothesis by J. R. Stevens stevens1992abnormal that schizophrenic symptoms might be caused by “reactive anomalous sprouting and synaptic reorganization taking place at the frontal lobes, subsequent to the degeneration of temporal neurons projecting at these areas”. The hypothesis takes into account observations that showed atrophic changes in the temporal lobe, and at the same time increased dendritic branching in the frontal lobe of a significant number of schizophrenic patients. Essentially, the idea is that degenerations in temporal lobe regions that are connected to the frontal lobe regions might produce a compensatory reaction in that area, namely increased receptor bindings (frontal lobe connections) and anomalous dendritic sprouting (increased influence from other cortical areas).

The work by Hoffman explained in the previous section suggested that hallucinations should always appear in combination with memory loss in patients, which is not always the case. Following the hypothesis from Stevens, the model from horn1995compensatory would make hallucinations and intact memory capabilities compatible.

The model used in this paper was a Hopfield network taken from tsodyks1988associative ; tsodyks1988enhanced , which is more appropriate for the storage of correlated patterns. This network is used for a pattern retrieval and recovery task, which means that in its original functionality, it receives an external input pattern and outputs the previously learned pattern that corresponds to it, given that a similar one was learned before.

Defining the connection strength (weight) between neuron and as , the learning rule is:

(2)
(3)

where is the internal projection parameter with value always . Eq. (3) describes the initial configuration of the network weights, with ,

being the probability that a memory pattern is chosen to be 1, and

one of the memory patterns.

The input of each neuron at time step is expressed as:

(4)

where is the network input parameter with value in normal conditions, which weights the incoming memory pattern, and

is the neuron output defined by a Sigmoid function with noise level

:

(5)

In order to simulate degenerated temporal lobe projections to the frontal lobe, the input is scaled down by decreasing parameter in Eq. (4). In order to model increased receptor bindings and dendritic sprouting the parameter in Eq. (3) and noise level in Eq. (5) are increased. The parameter scales the internal weights of the network and influences the neuron activation.

After performing these modifications, the network is still able to retrieve previously stored memories, but spontaneously converges to certain memories without a specific input stimulus.

An alternative learning rule during pattern retrieval on a lower time scale is used to account for increased dopamine levels observed in patients with SZ:

(6)

where is a variable that only becomes if the neuron in question has been active during the last iterations. The authors argue that this component also represents the continuous learning during a person’s life and that they imagine being large during childhood, but decreasing over time.

Fig. 3: Highlighted in red are the modifications made on the Hopfield network to imitate schizophrenic behavior: Decrease of external input projections, and increase of internal projections and external noise. Adapted from horn1995compensatory

To sum up, four network modifications were tested on the presented architecture (Fig. 3): (1) weakening of the network input parameter , (2) increase of internal projections , (3) increase of noise levels , and (4) additional Hebbian learning rule (Eq. (6)), where is only 1 if neuron i has been firing during the last iterations.

Combining the reactive modifications to a decrease of (internal connections and external noise) with the described Hebbian rule (even with a small of 0.0025), the spontaneous retrievals are enhanced and get continuously triggered without a concrete retrieval input. This behavior is compared to long-term hallucinations or delusional beliefs characteristic of schizophrenic patients. This results would also fit with the effect of dopaminergic blocking agents (equivalent to reducing the effect of the Hebbian learning rule), which are used to reduce hallucinations in patients.

4.2 Context, language and feed-forward networks

4.2.1 Attention and context representation

In 1992 the first model based on feed-forward neural networks was introduced. The psychology professor Jonathan D. Cohen and neuroscientist David Servan-Schreiber

cohen1992context presented an extensive analysis of a possible explanation for negative symptoms in SZ. More concretely, they focused on disturbances of attention and contextualization problems in schizophrenics, which were for instance reported in garmezy1977psychology and lang1965psychological . Their main hypothesis was that schizophrenics fail to make an internal representation of context and that an abnormal amount of dopamine in the prefrontal cortex is the main cause (cf. Section 4.1.3 as a comparison). The authors refer to previous studies suggesting that the prefrontal cortex is the brain region responsible for maintaining an internal representations of context, and that patients with SZ show dysfunctions and abnormal dopamine levels in this area. In order to test the dopamine-theory of SZ, three experimental tasks were compared to three neural network models, obtaining similar results to empirical observations. They simulated reduced dopamine activity by decreasing the gain of the activation function (the activation function’s slope), described by Eq. (7), in the neurons responsible for context representations. In this equation, we used the same nomenclature as in the original paper, where is the added activation of all incoming connections, the neuron bias and the parameter that is modified. The mentioned idea of modifying the activation function’s gain was based on studies that suggest that high dopamine levels potentiate the neurons’ activation (inhibitory and excitatory) in the prefrontal cortex. The modification of the gain has a similar effect because higher gain values increase the activation function’s slope, which means that even small neuron input values produce either very low neuron activations (equivalent to inhibitory signals) or high activations (equivalent to excitatory signals).

(7)
(a) Stroop test
(b) NN model
Fig. 4: Attention and context. (a) Stroop card test used for SZ, reprinted from henik2004schizophrenia (b) Neural network model used for the Stroop task in cohen1992context . Highlighted in red are the neurons with modified gain.

The first experiment, depicted in Fig. 4, was the Stroop task stroop1935studies , which consists of color words printed in different color inks that are presented to the participants. These words have either congruent stimuli (color and word are the same), conflicting stimuli (color and word contradict each other) or control stimuli (color words printed in black ink or “XXX” printed in a certain color). The subjects must then either always name the letter’s ink color or the written word. This exercise is used to test the participant’s attention capacities, and schizophrenic subjects show overall slower reaction times and perform even worse when conflicting stimuli are shown henik2004schizophrenia . In order to feed the information in the network, the printed word’s ink color and meaning were numerically coded. By reducing the gain on the color naming and word reading units from (normal gain) to they observed a delay in the response time of the network to properly produce a correct answer, similar to what it was observed in schizophrenic diagnosed individuals.

The second experiment, shown in Fig. 5, implemented the Continuous Performance Test (CPT) rosvold1956continuous identical pair version cornblatt1989continuous . It measures participant’s ability to detect repeated pattern of symbols in a longer sequence. Symbols are presented sequentially and the volunteers must detect when the pattern appears consecutively, words or numbers, e.g. “9903”. In this experiment schizophrenics usually struggle with the detection of longer patterns here previous symbols need to be taken into account. Prior stimulus module neurons were used to save the information about previous sequence symbols. To simulate schizophrenic behavior, the authors reduced the gain of the activation-function of the task context yielding to a higher miss-rate in concordance with schizophrenic empirical observations.

(a) CPT test
(b) NN model
Fig. 5: Continuous performance test. (a) Simplified CPT Identical Pair test used (b) Neural network model for the CPT adapted from cohen1992context . Highlighted in red are the neurons whose gain were manually decreased to .

Finally, a lexical disambiguation task depending on context was modeled based on the original work from Chapman et al. chapman1964theory (see Fig. 6). Participants had to solve homonym conflicts (words with more than one meaning), taking into account the context of the sentence. In this case, schizophrenics show worse performances when the needed context to resolve ambiguity comes before the word in question. A similar approach than in the CPT experiment was taken: context neurons gain were manually reduced to as in the previous experiments. It resulted in low performance for the schizophrenic model when the sentence context needed to interpret the ambiguous word stood at the beginning of the sentence.

(a) Task
(b) NN model
Fig. 6: Lexical disambiguation. (a) Task with context dependent meaning word. (b) Neural network model reprinted from cohen1992context . Highlighted in red are the (context) neurons whose gain was reduced to .

4.2.2 Auditory processing

During a person’s life, the number of neurons in the brain peaks during childhood and then decreases by a 30% to 40% in adolescence, which is also the period of time where SZ appears most frequently (adolescence/early adulthood) huttenlocher1979synaptic . Based on this observation and post-mortem findings that suggest neural deficits in the schizophrenic’s cerebral cortex keshavan1994schizophrenia ; margolis1994programmed , Hoffman and McGlashan designed a feed-forward neural network capable of translating phonetic inputs into words hoffman1997synaptic . This model was inspired by Elman’s (1990) model elman1990finding . As illustrated in Fig. 7(a) it consists of one hidden layer and a temporal storage layer that saves a copy of the hidden layer from the previous processing step.

A pruning rule was used to set the value of all connections below a certain threshold to zero. After pruning approximately 30% of the connections, the word detection capabilities of the used network improved222Pruning is a bioinspired standard technique for improving generalization of the network. However, nowadays, dropout approaches have gained popularity over pruning.. However, with excessive pruning the network starts to struggle with detection tasks and shows spontaneous responses during periods without input (shown in Fig. 7). This last observation was associated to auditory hallucinations reported in patients with severe SZ. Furthermore, it supported the common theory that auditory hallucinations might be caused by false identification of own inner speech as externally generated.

(a) NN model
(b) Word detection results
Fig. 7: Auditory hallucinations (a) Neural network model used in hoffman2001book . Input of the network are simulated phonetic codes, output are semantic features of the input word. Highlighted in red are the connections the pruning rule was applied on to imitate schizophrenic symptoms. (b) Word detection results depending on connection pruning. Spontaneous detections are observed for excessive pruning. Reprinted from hoffman2001book with permission.

In posterior tests with healthy patients, schizophrenics with auditory hallucinations showed reduced word detection capabilities compared to schizophrenics without, which fits with the previous simulations. Furthermore, a later review of this paper hoffman2001book highlighted that experiments applying active repetitive transcranial magnetic simulation (active rTMS) on the left temporoparietal cortex, brain region usually associated to speech perception, seemed to reduce hallucinations. This further reinforces the hypothesis of a possible correlation between speech-processing disorders and auditory hallucinations.

4.2.3 Language processing

Another feed-forward model of SZ introduced by R. E. Hoffman and collaborators hoffman2011using uses a network called DISCERN miikkulainen1991natural ; miikkulainen1993subsymbolic ; grasemann2007subsymbolic that is able to learn narrative language and reproduce learned content, e.g. learn a story and reproduce it after feeding it with a fraction of the story.

Based on previous studies about SZ, eight different network modifications were tested: (1) Working Memory (WM) disconnections by pruning of connections with a weight below a certain threshold, (2) Noise addition in working memory by adding of Gaussian noise to WM neuron outputs, (3) WM network gain reduction by reducing the activation function’s gain, (4) WM neuron bias shifts by increasing neuron bias and inducing an increased overall activation, (5) Semantic network distortions by adding noise to word representations in semantic memory, (6) Excessive activation semantic networks by increasing neuron outputs in semantic network, (7) Increased semantic priming by blurring semantic network outputs, (8) Exaggerated prediction-error signaling (hyperlearning) by increasing back-propagation learning rates.

The resulting network behaviors were compared to empirical results using a goodness-of-fit measure (GOF), which compared factors such as story recall success (successfully retelling story), agent confusions (switching of certain story characters), lexical errors and derailed clauses (false interpretation of certain sentences). The authors concluded that (8) hyperlearning and (1) WM disconnections with pruning showed the most reality-similar results. These results for WM disconnections further reinforce the previously presented theory by Hoffman and McGlashan in hoffman1997synaptic that excessive connection pruning during human’s adolescence might be one of the causes for this disorder. Moreover, the authors also suggested that over-learning in schizophrenic brains might cause modifications in previously stored memories, which might lead to delusional or erroneous convictions.

4.3 Bayesian approaches

A number of models of psychiatric symptoms are based on the idea that the brain uses Bayesian inference as a basic principle. This Bayesian brain hypothesis describes the human brain as a generative model of the world that makes predictions about its environment and adapts its internal model depending on the observation provided by the senses. The idea was highly influenced by Hermann Helmholtz’s work in experimental psychology

von1867handbuch that dealt with the brain’s capacity to process ambiguous sensory information. In his words:

“Visual perception is mediated by unconscious inferences.” H. Helmholtz.

(a) Arcimboldo painting
(b) Tacher illusion
(c) Ocampo painting
(d) Dallenbach’s illusion
Fig. 8: Visual illusions where the brain infers different interpretations depending on the prior information or context. (a) Ortaggi in una ciotola o l’Ortolano. G. ©Arcimboldo 1590. (b) Tacher illusion thompson1980margaret . (c) Forever Allways, ©Octavio Ocampo 1976. (d) Dallenbach’s illusion 1952 kmd1951puzzle .

Fig. 8 shows puzzle images that stress that perception depends on prior knowledge as well as sensory input. For instance, if we rotate Arcimboldo’s painting by 180 degree instead of vegetables we will see a human face with a hat. The Tacher illusion can be broken by also rotating the upside down images, and we will see that both faces are different. In particular, mouth and eyes are inverted. In Ocampo’s painting, we can see two old people from a larger distance but two mariachis when viewing the picture from close range. Finally, Dallenbach’s illusion show that even if you know that there is an animal looking at you in the picture, it is impossible to see it until the shape of the cow is highlighted. Afterwards you cannot stop seeing it. In any case, what we perceive depends not only on the raw sensory information, but also on our prior knowledge and predictions we have about the world.

The concept of Bayesian inference assuming that the world is one-dimensional and can be described via Gaussian distributions is illustrated in Fig. 

9. The perception (posterior belief) is inferred from the sensory input (likelihood) and from the model prediction (prior belief) depending on the precision of prior and likelihood. For instance, in the case of a very imprecise (highly variable) prior, the perception would shift more strongly to the direction of the sensory input.

People with SZ and ASD are often found to be more resistant to visual illusions happe1996studying ; mitchell2004visuo . Furthermore, they often show stronger sensitivity to sensory input. These findings led to the suggestion that their inference process, i.e. the way that they combine sensory inputs with prior information, might differ from people without such a disorder.

The idea of the Bayesian brain was further combined with mathematical techniques provided by Bayesian probability theory, yielding models that tried to imitate mental processes, such as the Laplace-Bayes approach described by Jaynes

jaynes1988does and the Helmholtz Machine, a neural network architecture proposed by Dayan and Hinton dayan1995helmholtz

. During the last decade, this approach has gained a lot of support in the neuroscience and computational modeling community. This has led to increasingly complex computational models, like the Hierarchical Temporal Memory (HTM) framework that employs a coincidence detector and a Markov chain

george2009towards .

Fig. 9: Illustration of Bayesian inference: The posterior belief is generated by inference of prior belief and sensory evidence. Depending on the variance (precision) of prior and sensory evidence, the posterior belief will be influenced more by one of the previous. Adapted from adams2013computational .

In 1999, Rao and Ballard rao1999predictive presented a model of the visual cortex that proposed the prediction error (i.e., the error between predicted and observed sensory input) as the information passed to higher hierarchical layers. Other important studies that followed this approach were the predictive processing framework and the free-energy principle proposed by Friston friston2006free , which combined the Helmholtz machine ideas dayan1995helmholtz with the hierarchical prediction error message passing and the Bayesian mathematical framework. Furthermore, some recurrent neural networks architectures have also adopted a similar principle murata2013learning ; yamashita2008emergence .

4.3.1 Free-energy model of schizophrenia

Friston’s free-energy model friston2005theory , probably inspired from his years working with schizophrenic patients, describes the brain functionality as a dynamical inference network. Despite not being implemented as an ANN model, we included it in this review because it is considered one of the most relevant models in the computational neuroscience community.

Under the free-energy principle the brain is seen as a prediction machine that progressively constructs an internal model of the world which is constantly improved, based on the received sensory feedback and the resulting prediction error. Perception (posterior belief) then results from combining the brain’s predictions (prior) with the sensory evidence (likelihood) as shown in Fig. 9. If the prior belief’s precision is higher than the sensory evidence, the posterior will be more similar to the prior. The same applies for the opposite case. Therefore, precision weights the influence of prior and sensory evidence on the posterior belief.

Mathematically, the internal model is updated by minimizing the negative free energy defined in a simplified form in Eq. (8), a lower bound on the KL-divergence that quantifies the difference between the internal belief about the world and reality. It depends on the sensory input and the latent space in a hierarchical fashion.

(8)

Assuming that are the dynamical internal states of the brain, perception is then described as the adaption of given the sensor observations by minimizing the free energy using the gradient descent method described in Eq. (9):

(9)

where is a differential matrix operator that computes the currently expected hidden state, is the error between the predicted (sensory) input from the higher layer and the real input (observation) and is the precision that defines how accurate/noisy that information is to produce a correct inference. For instance, in humans, visual information would have higher precision than proprioceptive sensing for body localization hinz2018drifting .

Based on these concepts, Adams et al. adams2013computational built a computational model of SZ for three different scenarios: Birdsong recognition, a visual tracking task and a simulation of force-matching illusion. Essentially, reducing the encoded precision of prior knowledge and sensory input during Bayesian inference influenced the behavior of the model. More concretely, decreases in prior precision (or, for the force-matching illusion: failure to reduce sensory precision) leads to struggles in birdsong recognition, problems with eye-tracking with occlusion and attribution of agency (force-match illusion). With an additional compensatory decrease of sensory precision (for the force-matching illusion: increase of prior precision), the model showed hallucination-like behavior during the birdsong recognition task or could not distinguish self-touch and touch from others.

Figure 10 shows the first simulated experiment of the birdsong. A precision reduction in the prior from the second level results in prediction error increases (second row). This was interpreted as a failure to predict unsurprising events (e.g. own actions), which reflects that every event seems to be surprising. When trying to compensate this effect with reduced sensory precision in the first level (third row), there is a complete failure of perceptual inference because of the imprecise sensory information provided by the lower level. Even without input, inference takes place, which is compared to hallucinations reported by schizophrenics.

In the remaining two experiments, the authors were also able to reproduce other schizophrenic symptoms: problems in an object eye-tracking scenario with occlusion (through prediction error precision reduction in higher levels) and reduced force-matching illusion (through reduced sensory attenuation and compensatory prediction error precision increase).

Fig. 10: Prediction of birdsong (left), prediction error with respect to stimulus (middle) and used model (right), when last three chirps are omitted. Top row: Unmodified model generates prediction error increases with the first missing chirp, which corresponds to normal behavior. Middle row: With reduced precision at second level the model is unable to predict the third chirp, and the prediction error for missing chirps is reduced. Bottom row: With compensatory sensory precision reduction in first level, there is a complete failure of perceptual inference. Despite the wrong predictions, almost no prediction error is generated due to missing precise sensory information. This behavior is compared to auditory hallucinations. Reprinted from adams2013computational with kind permission.

4.3.2 Bayesian graphical models

In jardri2013circular , Jardri and Denéve presented a hierarchical Bayesian graphical model (i.e, circular inference) which used the belief propagation algorithm in order to model how excitation/inhibition imbalance can cause erroneous percepts (hallucinations) and fixed false beliefs (delusions).

Messages were passed between nodes in different hierarchical levels analogously to belief propagation. Low hierarchical levels correspond to sensory experience and high levels to top-down predictions. Message passing was expressed by the following equation:

(10)

where is the message sent from to at step , is the computed belief, is the connection strength, and and are the parameters that scale of the inhibitory downward and upward loops respectively. The term above means that the node is in a higher hierarchical level than .

They experimented with the two parameters, adjusting them between 1 (normal level of inhibition) and 0 (no inhibition). Simulated results show that equally impaired loops (same below 1) are still able to arrive at a proper inference. Conversely, with unbalanced impaired upward loops () “over-estimation of the strength of sensory evidence and an underweighting of the prior” is produced. This is compatible with over-interpretation of sensory evidence and the reduced influence to illusions observed in schizophrenic patients.

Furthermore, in jardri2017experimental they show that the circular inference model fit nicely with SZ diagnosed patients decisions using the Fisher task as the experimental paradigm. The Fisher task permits the manipulation of the prior and the likelihood allowing comparisons with the Bayesian model predictions. Participants have to decide whether the fish captured comes from the left or the right lake. First, two boxes (left, right) with fish and different sizes are presented (prior): bigger box express higher probability. Secondly, the two lakes (left, right) are presented with fishes inside with two colors (red and black). The proportion of red fishes represent the likelihood of the observation. Finally, participants have to decide if the red fish comes from the left or the right. According to the participants data and their proposed model, descending and ascending loops correlated with negative and positive SZ symptoms respectively.

4.4 Recurrent neural networks

In 2012, Yamashita and Tani presented a model of SZ using a Multiple Timescale Recurrent Neural Network (MTRNN) yamashita2008emergence . Recurrent neural networks (RNN) are used for the recognition and generation of time series, specifically in this study, they are used for sensorimotor sequence learning. The MTRNN is a special type of RNN that mimics the hierarchical structure of animal motor control systems. Biological observations have suggested that human and animal motor movements are segmented into so-called “primitives”. These primitives can then be reused and combined to more complex motor sequences. The MTRNN contains neurons working in different timescales: fast context units (neurons) learn the motion primitives and slow context units work as a sequence generator (see Fig. 11

). This network is trained to perform prediction error minimization, i.e. to build an internal model of the world following the Bayesian brain idea. Training the network using the Backpropagation Through Time algorithm (BPTT), the robot is able to learn multiple motions (e.g., grabbing an object) adapting to different object positions. It is also able to combine these actions into new action sequences by only training the slow context units. The trained network works as a predictor where the sensory input modulates the changes on the slow context units (goals) depending on the error

333There is a strong parallelism between Multiple Timescale RNNs and the hierarchical model proposed by Friston..

Equation 11 describes the dynamics of each neuron at each layer:

(11)

In this formula, the membrane potential of neuron in time step is updated with the neural state of neuron scaled with the (learnable) connection weights The time constant determines the update frequency of the neuron. A small time constant is used for fast context units, and a large time constant for slow context units.

Schizophrenics can have trouble to distinguish self-generated actions from others and in severe cases of SZ, patients can even have problems performing movements, and show repetitive or stereotypical behavior van2015self . Based on observations that suggest that SZ may be caused by disconnections in hierarchical brain regions, mainly between prefrontal and posterior regions friston1995schizophrenia ; banyai2011model , uniformly distributed random noise was added in the connections between fast and slow context units highlighted in Fig. 11 with the red circle. For the evaluation of the model a humanoid robot was used. It had the task of locating an object on a table in front of it and performed different actions depending on the object’s position: if the object was located to the right, the robot was supposed to grab the object and move it back and forth three times. Otherwise, if the object was located to the left, the robot had to grab the object and move it up and down three times.

Fig. 11: (A) Tasks to be performed by the robot: when the object is on the Right move the object backward and forward, when the object is on the Left move the object up and down. (B) MTRNN network architecture. Highlighted in red are the connections between fast and slow context units that are degraded with noise to imitate schizophrenic behavior. Adapted from yamashita2012spontaneous .

They showed that for a small degree of disconnection (small noise addition) the robot had no problems to perform the mentioned task. Nevertheless, increases of spontaneous prediction error were observed and abnormal state switching appeared in the intention-network (slow units). The authors compared these prediction errors to patient’s problems in attribution of agency. Schizophrenics might want to perform an action and have an internal prediction of the upcoming proprioceptive and external states. The increases of prediction error could be seen as incongruences between the intended actions and the results, which can give a person the feeling of not being able to control the consequences of its own actions or it may have problems to perceive these actions as self-generated. For more severe disconnections, the humanoid robot clearly struggled to perform the given task and showed disorganized sequences of movements. These observations were compared to more severe cases of SZ, where cataleptic (stopping) and stereotypical (repetitive) behaviors have been observed.

4.5 Other approaches to delusions and hallucinations

Self-organizing maps (SOM) were also used in spitzer1995neurocomputational as a potential model for schizophrenic delusions, however no specific implementation has been tested until now.

Finally, Deep Restricted Boltzmann Machines (Deep-RBM) have been tested for hallucinations

series2010hallucinations ; deistler2019tactileHallucinations . Although they targeted a different disorders, the similarities are interesting from the computational modeling point of view. They were able to induce hallucinations after training generating spontaneous activations and producing patterns based on the learned ones.

5 ANN models of autistic spectrum disorder

Deficits in social interaction are often the most obvious symptoms of ASD. Hence, for a long time, ASD was mainly considered as a disorder of theory of mind, suggesting that individuals with ASD are characterized by absence or weakening of their ability to reason about the beliefs and mental states of others in social contexts baron1997mindblindness . Whereas this explanation could account for a vast amount of symptoms that become obvious in development and socialization of children with ASD, it was mainly criticized due to its failure to explain similarly prominent non-social symptoms such as restricted interests, desire for sameness or excellent performance in specific areas.

An alternative was suggested in the 90’s with the weak central coherence theory frith1994autism ; happe2006weak . It sees the underlying causes of ASD in the perceptual domain, namely in difficulties to integrate low-level information with higher-level constructs. This “inability to integrate pieces of information into coherent wholes (central coherence)”, stated in frith2003autism , could offer explanations for the aforementioned deficits and also be extended to an explanation of social deficits. An even broader view is provided by the Bayesian brain hypothesis which suggests general deficits in the processing of predictions and sensory information, and can be applied to non-visual perception as well as motor abilities.

Computational models of ASD mainly focus on the atypical processing style suggested by the weak central coherence theory which could be summarized as excessive attention to detail. The majority of models we discuss here replicate deficits in perception cohen1994artificial ; dovgopoly2013connectionist ; gustafsson1997inadequate ; nagai2015influence . Some also tackle atypicalities in memory structure and internal representations mcclelland2000basis ; philippsen2018understanding and inflexibility in motor behavior idei2017reduced . Although most studies suggest connections to social deficits in an indirect way, only one of the models makes a direct connection to theory of mind, by modeling weak central coherence on the level of logical reasoning o2000autism . An overview of the presented approaches is given in Tab. 2.

Model type Paper Disorder Characteristic Biological Evidence Approach
Feed-forward and Simple Recurrent NNs I. L. Cohen cohen1994artificial ; cohen1998neural (1994, 1998) Generalization deficits due to excessive attention to detail Abnormal neural density in various brain regions Excessive or reduced number of neurons, increased training duration
J. L. McClelland (2000) mcclelland2000basis Hyperspecificity of memory concepts Excessive conjunctive coding
Dovgopoly & Mercado dovgopoly2013connectionist (2013) Deficits in visual categorization and generalization Abnormalities in synaptic plasticity Reduced learning rate, negative weight decay (anti-regularization)
Self-Organizing Maps L. Gustafsson (1997) gustafsson1997inadequate Excessive attention to detail Lateral inhibition enhances sensory perception Excessive inhibitory lateral feedback
L. Gustafsson et al. (2004) gustafsson2004self Avoidance of novelty Familiarity preference, higher weighting of close data points
G. Noriega (2007) noriega2007self Domain-based hypersensitivity Early brain overgrowth in children with ASD Variable (increasing) number of neurons, stronger/weaker attention to stimuli
G. Noriega (2008) noriega2008modeling Domain-based hypersensitivity Early brain overgrowth in children with ASD Propagation delays in neural weight updates
Convolutional NN Y. Nagai et al. (2015) nagai2015influence Local/global processing bias Excitation/inhibition imbalance Excitation/inhibition imbalance in visual processing
Spiking NNs J. Park et al. (2019) park2019macroscopic Atypical neural activity: High power in higher frequency bands and decreased signal complexity Increased short-range connectivity in frontal cortex and atypicalities in resting-state EEG Local over-connectivity
Predictive coding Pellicano & Burr (2012) pellicano2012world Excessive attention to detail Hypo-prior: lower precision of prior, stronger focus on sensory input
Lawson et al. (2014) lawson2014aberrant Excessive attention to detail Stronger activation in visual cortex than in prefrontal cortex in ASD Hypo-prior or hyper sensory input: Precision imbalance that leads to excessive reliance on input
Recurrent NNs H. Idei et al. (2017) idei2017reduced Stereotypical behaviors Modification of variance estimation (sensory precision)
Philippsen & Nagai (2018) philippsen2018understanding Reduced generalization capability, heterogeneity among subjects Modification of reliance on external signal and of variance estimation (sensory precision)
Ahmadi & Tani (2017) ahmadi2017bridging Generalization deficits Regularization
Other approaches O’Loughlin and Thagard (2000) o2000autism Weak coherence, Theory of Mind impairment Impairment of coherence optimization in logical reasoning due to strong inhibition
Tab. 2: Overview of neural network models of ASD

5.1 Feed-forward and simple recurrent neural networks

First, we describe approaches using simple connectionist models, typically feed-forward networks for classification tasks. Recurrent connections might be included at a structural level, but networks are not supposed to learn temporal sequences, which is why we refer to them as Simple Recurrent NN. These approaches mainly explored parameters of the network such as number of neurons or learning rate.

5.1.1 Generalization deficits through overfitting

The first neural network model of ASD was proposed by Ira L. Cohen in 1994 cohen1994artificial . It was a feed-forward neural network trained with back-propagation and investigated basic properties of neural networks. Based on studies that suggested that individuals with autism have either too few or too many neurons and neuronal connections (e.g. bauman1991microscopic

), the influence of increased or reduced number of hidden neurons was analyzed. The evaluated task was to classify children with ASD and children with mental retardation into two groups, using features obtained via a diagnostic interview

cohen1993neural . Note that although the considered task was related to ASD, the chosen task is just taken as an example and is not crucial for the findings of this paper.

A training and a test set were used to analyze the network’s accuracy and generalization abilities. The results were compared for an increasing number of hidden units and through different number of trials. The results showed that a small number of hidden neurons translates into low accuracy (high training error) and bad generalization (high testing error) and an increased number of hidden neurons improved the network’s learning accuracy and generalization. When the number of hidden neurons was largely increased, its generalization ability decreased: the network learned too much details of the input data and was not able to adapt to new input data. An increased number of training trials (longer training duration) had a similar effect. For the training set, the network accuracy increased with longer training duration. However, with the test set, the network again showed signs of overfitting, as the accuracy decreased significantly.

Cohen compared these results qualitatively to the learning and behavioral characteristics of children with ASD. In particular, many individuals with ASD show great discrimination capabilities and have no problems with already learned routines, but have problems when trying to abstract information or when confronted with new situations.

Cohen extended this approach in 1998 cohen1998neural to the generalization capability in the presence of extraneous inputs to the network (set to random values). In the task of classifying happy and sad expressions of a simplified cartoon face, generalization was strongly impaired in the presence of extraneous inputs. This might suggest that networks trained for too long tend to attend more to non-relevant input information, instead of focusing on the more informative input neurons.

5.1.2 Precision of memory representations

In mcclelland2000basis , James L. McClelland addressed the tendency of children with ASD to represent concepts in a too specific way, which results in difficulties to recognize two different instances of an object as the same category.

He suggested that in neural networks, this could be explained with the concept of excessive conjunctive coding. Typically, similar inputs to a neural network lead to similar neuron activation patterns. Such pattern overlaps can be useful for sharing existing knowledge and establishing associations. However, too strong associations can also cause interference. Conjunctive coding describes the reduction of such overlap by recoding the input patterns with neurons which only become active for particular combinations of elements. Assuming that what characterizes healthy human learning is a balance between generalization and discrimination, the representation of concepts in subjects with ASD could be characterized by excessive conjunctive coding. This would make a neural network loose the ability to generalize, as activation pattern overlaps cannot be exploited. This idea was not tested, but the author used the neural network shown in Fig. 12 to explain his reasoning.

McClelland presented the example of a semantic network used in mcclelland1995there , as a model of organization of knowledge in memory (see Fig. 12). This model was used to associate words with their meaning, e.g. “robin” and “can” trigger the outputs “grow”, “move” and “fly” because these are the actions a “robin” can perform. The internal layer of the network (highlighted in red in Fig. 12) progressively learns to code the meaning of input words during learning. This means that “robin” and “canary” should cause a very similar activation pattern because a robin has much more in common with a canary than, for instance, a tree. The author suggests that hyperspecificity in perception and memory representations of ASD children might be caused by an abnormality during this process. Namely, excessive conjunctive coding in the internal layer is proposed as a mechanism: an excessive reduction of overlap between representations of similar concept might cause the reported hyperspecificity which would result in generalization deficits. No concrete network parameters are proposed, but it can be imagined that such an effect might be achieved by increasing the number of neurons in the internal layer. In this regard, the approach is similar to Cohen’s suggestion cohen1994artificial , but extended to learning of representations.

Fig. 12: Semantic network used to explain the conjunctive coding hypothesis. In the hidden layers, the feed-forward neural network generates internal representations of the inputs (highlighted in red). Words describing similar concepts should produce similar internal representations that overlap with each other. The author suggests that excessive conjunctive coding to avoid these overlaps could produce excessive discrimination, such as in autistic perception. Adapted from mcclelland2000basis .

5.1.3 Generalization and categorization abilities in visual perception

Dovgopoly and Mercado dovgopoly2013connectionist used an existing model of visual object perception henderson2011pdp to replicate deficits in classification and generalization in ASD. The neural network was a feed-forward network, which modeled visual input processing via two pathways: the ventral cortical pathway (for object identification, including recurrent connections), and the dorsal cortical pathway (for processing of location-relevant information).

The authors replicated behavioral data from church2010atypical and vladusich2010prototypical , separately on both visual pathways, which show deficits in generalization and prototoype formation in children with high-functioning ASD. The experiment was the classification of random dot patterns as category or non-category stimuli church2010atypical , or as category A or category B stimuli vladusich2010prototypical . After adjusting the parameters for replicating typical behavior, four different parameter modifications were tested individually to replicate the data from ASD children. Following evidence for abnormalities in synaptic plasticity in individuals with ASD (e.g. bourgeron2009synaptic ; auerbach2011mutations ), the first two parameters modified how weights in the network were updated.

First, the learning rate was decreased, which corresponds to reduced synaptic plasticity in biological neurons. As a result, network training takes longer and is more prone to lead to exhibit overfitting. Second, generalization of the network was impaired by suppressing regularization using negative weight decay. Weight decay is a method for regularizing neural networks and improving their generalization abilities by keeping the connection weights small krogh1992simple . Typically, weight decay punishes large weights by adding a term to the error function. With a negative weight decay factor instead, anti-regularization is performed, encouraging the increase of weight magnitudes, and, thus, over-complex classification rules. Third, they tested the influence of increasing and decreasing the number of hidden neurons similar to cohen1994artificial ; cohen1998neural , based on neurological evidence of an increased number of cortical minicolumns in the brain of individuals with ASD casanova2006minicolumnar . Finally, the authors adjusted the gain of the neuron’s activation function, to model the increased level of noise that is hypothesized to underlie the relative increase in cortical excitation observed in ASD subjects rubenstein2003model ; yizhar2011neocortical .

The gain of the activation function, as displayed in Eq. (12), manipulates the slope of the activation function. A smaller gain reduces the slope, and makes the network more prone to pass noise instead of signal information to the next processing layers.

(12)

where represents the input to the activation function and is a bias term.

Good replications of the behavioral data were achieved with a decrease of learning rate and a negative weight decay. A negative weight decay also caused a high variability of generalization abilities, depending on the initial network weights, providing a potential explanation for the heterogeneity of findings between different studies. In contrast to cohen1994artificial ; cohen1998neural , an increased number of neurons did not replicate the generalization deficit in ASD children. Moreover, the gain of the activation function could not fully account for the generalization deficit.

5.2 Self-organizing maps

Self-organizing maps (SOMs) are ANNs that are usually used for unsupervised learning and clustering tasks. They model the functionality of cortical feature maps, which are spatially organized neurons that respond to stimuli and self-organize according to the features in stimuli. They are able to learn the relation of different input data such as different sensory inputs. Approaches for modeling ASD with SOMs typically investigate the formation of higher-level representations from sensory input.

5.2.1 Increased lateral feedback inhibition

Lennart Gustafsson presented two models of ASD using SOMs in gustafsson1997inadequate and gustafsson2004self . Inspired by findings on weak central coherence in subjects with ASD and an enhanced ability to discriminate sensory stimuli frith1994autism , he suggested that alterations in the lateral feedback weights between the SOM neurons could result in atypicalities in perception mountcastle1957modality .

In a SOM, each neuron typically has excitatory connections to close neighbors and inhibitory connections to more distant neighbors. They tuned the Mexican-hat curve (Fig. 13) to induce stronger lateral feedback inhibition. Such activation patterns are similar to receptive fields in biological cortices and have been used to model center-surround operators in the visual cortex. Manipulating the lateral connections to achieve a stronger inhibition (such that the integral of the function in Fig. 13 becomes negative), the sensory discrimination ability of the network is increased. Neural columns focus on more narrow features during learning which slows down convergence and might lead to a fragmented feature map. However, excessive lateral inhibition will degrade discriminatory power and cause instabilities in information processing. This behavior is compared to autistic over-discrimination and may also explain fascination or fright of moving objects, due to the instability of its cortical feature maps.

Fig. 13: Mexican-hat function of the SOM. It defines the strength of lateral connections depending on distance to current neuron. The red arrows point to the part that is modified to simulate autistic perception (excessive lateral feedback inhibition). Adapted from gustafsson1997inadequate .

5.2.2 Familiarity preference

In gustafsson2004self , Gustafsson and Papliński evaluated the effect of attention-shift impairment and avoidance of novelty on the formation of cortical feature maps. The used SOM received input stimuli from two sources (compared to two “dialects of a language”), each of which produces 30 different stimuli (“speech sounds”) grouped in three clusters (“phonemes”).

The computational model was run in four different modes. In the first mode, attention was always shifted to the source producing novel input (considered as normal learning). In the second mode, an attention-shift impairment was modeled by shifting attention to novel sources with a very low probability. The third mode implements familiarity preference: attention is shifted to novel sources only if the map is familiar with that source (measured as mean distance of the current stimulus to the map nodes). This map develops a preference over learning to the more familiar source. Finally, a model with both familiarity preference and attention-shift impairment was applied.

The simulation results showed that familiarity preference leads to precise learning of the stimuli from one of the sources (the source with lower variability) in expense of the other source. This might remind of ASD individuals’ characteristic of learning in great detail a narrow field, which leads to increased discrimination and poor generalization. The authors also showed that this impairment can be counteracted by modifying the probabilities of stimuli presentation in response to the system, similar to early intervention in children’s learning process. Maps learned with attention-shift were not impaired, whereas a combination of both mechanisms only sometimes led to an impairment. The authors concluded that, in contrast to speculations in previous work courchesne1994impairment , familiarity preference, rather than attention-shift is a more likely cause for ASD.

5.2.3 Unfolding of feature maps and stimuli coverage

In 2007, Gerardo Noriega noriega2007self modeled abnormalities in the feature coverage and the unfolding of feature maps in SOMs. Neurological evidence suggests abnormal brain development in children with ASD bauman2005neuroanatomic , typically reporting larger growth in young children, which gets reduced later in life courchesne2001unusual ; aylward2002effects . These abnormalities were modeled by manipulating the number of network nodes during the training of the SOM where the structure emerges. Thus, the network dimension is temporarily increased.

Results showed that such disturbance in the physical structure of a SOM does not affect stimuli coverage, but impairs the unfolding of feature maps which might result in sub-optimal representations. Furthermore, the author models hyper- and hyposensitivity to stimuli in a similar way like gustafsson1997inadequate using lateral interactions between neurons. Hyper- or hyposensitivity was modeled by adjusting the neuron weights toward the winner neuron, either with a positive factor (attraction, or hypersensitivity) or with a negative factor (repulsion, or hyposensitivity). This factor converges exponentially toward zero (normal sensitivity) during map formation. The authors showed that hypersensitivity to one of the input domains (stronger attention to this domain, i.e. restricted interests), improves the coverage of stimuli in this domain, but too strong hypersensitivity or a hyposensitivity to stimuli reduces coverage 444Hypersensitivity in gustafsson1997inadequate was implemented as increased inhibition in the neighborhood of neurons (higher specificity of perception), whereas this approach interprets hypersensitivity as a stronger attraction of neighboring signals to signals from a specific domain..

One year later, Noriega extended his approach in noriega2008modeling , investigating propagation delays between neurons. Unlike in normal SOMs where all neurons propagate the information instantaneously to all neighboring neurons, Noriega presented a biologically more realistic approach by introducing delays in the update. He shows that decreased propagation speed has a negative effect on stimuli coverage. As the delayed propagation causes the arrival of competing stimuli at the same time at a neuron, he also altered the way in which these competing stimuli are handled. In his experiments, a high dilution factor, meaning that incoming stimuli are averaged instead of being handled separately, decreased the stimuli coverage and also impaired the topological structure of the map.

5.3 Convolutional neural networks and inhibition imbalance

In 2015, Y. Nagai and colleagues presented an ANN network based on Fukushima’s neocognitron (fukushima1982neocognitron , fukushima1988neocognitron , fukushima2003neocognitron

), seen as the basis for convolutional neural networks, to model visual processing in ASD

nagai2015influence . The hypothesis considered was that there is an excitation/inhibition imbalance in ASD sun2012impaired ; snijders2013atypical ; yizhar2011neocortical .

The structure of the neocognitron for visual processing is illustrated in Fig. 14. The network is trained to recognize patterns by adjusting the weights between and layers. The S-cells in the

layers perform feature extraction. They receive excitatory input from the C-cells in the preceding layer, and inhibitory connections from the V-cells in the same layer. During training, the excitatory connections

are updated and the inhibitory connections are calculated accordingly.

The network was trained for the recognition of numbers ”0” to ”9” in large or small size at different positions. After training, the model was tested with compound numbers (cf. Fig. 15 left) where a larger number is created from multiple smaller numbers. The trained network is able to detect both global (large number, here “2”) and local (small numbers, here “3”) patterns for and , but shows a preference for the global pattern, characteristics that correspond to observations with healthy individuals behrmann2006configural .

Fig. 14: Left: Overview of the neocognitron’s structure. Right: Detailed view of the connections between C-cell layers and S-cell layers . Highlighted in red are the inhibitory connections that are modified to influence the ratio between inhibition and excitation. Adapted from nagai2015influence .

It is known that people with ASD perform differently in such a task, primarily focusing their attention on the details (i.e. the smaller number instead of the larger one). In order to simulate this local processing bias, an imbalance of excitatory and inhibitory connections was simulated by scaling the inhibitory weight with a factor .

The results show that a moderate increase of , which corresponds to increasing inhibition, causes the network to rather detect local patterns, replicating the local processing bias in ASD. When reducing

(increasing excitation), the network does not show any processing bias, rather it looses its ability to differentiate patterns. These results fit with ASD symptoms of hyperesthesia (increased focus on detail) and hypoesthesia (no bias and general difficulty in pattern recognition) and suggest that excitation/inhibition imbalance could account for these symptoms.

Fig. 15: The neocognitron is fed with a visual stimulus consisting of local patterns (here 3) and global patterns (here 2), which are incongruent. In normal conditions the network should be able to detect both local and global patterns.

5.4 Spiking neural networks and local over-connectivity

In ichinose2017local and a follow-up study in park2019macroscopic , it was proposed to use spiking neural network as computational models to investigate the consequences of local over-connectivity, which was found in the prefrontal cortex of ASD brains courchesne2005frontal . The hypothesis considered was that local over-connectivity affects frequency patterns of neural activations.

A spiking neural network is more closely inspired by natural neural networks izhikevich2003simple . Whereas in standard artificial neural networks each neuron fires every time step, neurons in a spiking network only fire if their potential (similar to the membrane potential of biological neurons) reaches a certain threshold. Therefore, more complex firing patterns occur ranging over various frequency bands, comparable to patterns visible in EEG.

The authors investigated the hypothesis that an increased number of short-range connections might cause reduced complexity and differences in the frequency bands which were found in resting-state EEG studies of the brains of ASD subjects bosl2011eeg . To manipulate the degree of local over-connectivity in the network, a parameter based on the small-world paradigm from watts1998collective was used. By default, neurons are connected to six neighboring neurons in a ring lattice as displayed in Fig. 16 (left). A parameter expresses the probability for each of the connections to rewire to other neurons. Thus, determines the randomness of the network (Fig. 16), ranging from regular lattice structure () to random wiring (). Medium values of around describe typically developed networks with local clusters and some short-range connections between the clusters. Notably, the parameter from watts1998collective keeps the overall number of connections in the network intact, such that differences emerge only due to differences in the network structure, not by the total number of neurons or neural connections.

Fig. 16: Three different networks with different degrees of randomness. (a) is a locally over-connected network (corresponding to ASD individuals), (b) is a small-world network with many local clusters and a few longer connections (corresponding to typically developed individuals), (c) is a random network including many wide-range connections. (d) shows the structure of each single neuron group with excitatory (red) and inhibitory (blue) connections. Note that the number of nodes and edges in (a), (b) and (c) remains the same. Reprinted with permission from park2019macroscopic , originally based on watts1998collective .

Networks are formed by generating 100 groups of neurons, corresponding to the brown nodes in Fig. 16. Each group contains 1000 spiking neurons: 800 excitatory and 200 inhibitory neurons, which have an increasing or decreasing effect on the firing probability of postsynaptic neurons, respectively. Neurons are mainly connected to neurons of the same neuron group (intra-group connections), and have connections to six neighboring groups according to Fig. 16 (inter-group connections). Different rewiring probabilities between and are used to determine the initial inter-group connectivity of the network.

After initialization, the network updates its connections according to the rules of spike-time-dependent plasticity izhikevich2003relating : the update of connection weights occurs depending on the timing of firing of the pre- and postsynaptic neurons. If the postsynaptic neuron fires within a certain time window after the presynaptic neuron, the weight of the connection is increased (corresponding to the biological process of long term potentiation). If the presynaptic neuron fires within a time window after the postsynaptic neuron, the connection weight is weakened (long term depression). During this learning period the connection weights self-organize. Tonic random input is presented to the network. After learning, the spontaneous activity of the neurons was recorded (in the absence of input), and compared to the graph-theoretical properties of the network.

The activation patterns were evaluated according to their frequency spectrum and the complexity of the time series, as measured by the multiscale entropy costa2005multiscale . This measure rates the informative content of time series at different temporal scales. High complexity corresponds to the presence of long-range correlations on multiple scales in space and time, low complexity is computed for time-series with perfect regularity or randomness. The evaluation suggested that networks exhibiting local over-connectivity generate more oscillations in high-frequency bands and exhibit lower complexity in the signals than small-world networks. Findings of atypical resting-state EEG for people with ASD, thus, might be explained by local over-connectivity in their brains.

5.5 Bayesian approaches

There are promising models in the literature interpreting ASD on the basis of the Bayesian framework. However, most of these approaches are only conceptual and still lack an implementation. Nevertheless, these approaches are able to explain a wide range of different symptoms which might be caused by an atypical integration of prediction and sensory information lawson2014aberrant ; pellicano2012world .

The first approach supported by the Bayesian brain hypothesis for explaining the non-social symptoms of ASD was proposed by Pellicano and Burr in 2012 pellicano2012world . Their hypo-prior hypothesis suggests that broader or less precise priors cause people with ASD to rely less on their predictions and stronger on sensory input which could explain the hypersensitivity of people with ASD. J. Brock broadened this idea brock2012alternative by proposing that hypersensitivity cannot only be caused by a reduced precision of the prior, but also by an increased precision of sensory input. Lawson and colleagues (2014) lawson2014aberrant summarized these ideas, arguing that both modifications reduced prior precision or increased sensory precision, can cause the same functional consequences. They suggest that the cause could be aberrant precision in general: Expected precision is an important variable that helps us to decide whether to stronger rely on sensory information or prior predictions. Aberrant precision, thus, would alter the way in which we determine whether or not to take the prediction error into account. People with ASD might have problems to accurately estimate this precision. Thus, they might, at the one extreme, try to minimize the prediction error too strongly, or, at the other extreme, fail to minimize the prediction error.

In lawson2017adults , Lawson and colleagues present a study demonstrating that the difference in subjects with ASD seems to be that they overestimate the volatility of the environment. They model the experimental findings using Hierarchical Gaussian Filters. The model parameter that best accounts for the differences in ASD and non-ASD subjects was a meta-parameter describing the model’s predictions about predictions. These findings suggest that people with ASD might build less or weaker expectations about the environment and, therefore, are less surprised in the case of extraordinary events, but constantly moderately surprised.

5.6 Recurrent neural networks

The studies presented here follow the idea of predictive coding which can be seen as an implementation of the Bayesian brain idea: an RNN is used as an internal model of the world and its learning corresponds to the process of adapting network weights in order to perform prediction error minimization. The role of the network is to learn to predict sensory consequences, and integrates these predictions with the perceived sensory information.

5.6.1 Freezing and repetitive behavior in a robotics experiment

Idei and colleagues used the stochastic continuous-time recurrent neural network (S-CTRNN) murata2013learning model with parametric bias (PB) tani2003learning to teach a robot to interact with a human in a ball-playing game. The S-CTRNN with PB learns to predict a time series of proprioceptive (joint angles) and vision features. From the current input, the network estimates the next time step (output) and its predicted precision (variance) as shown in Fig. 17. The state of the PB units reflect the intention of the network, i.e. the ball-playing pattern that the robot believes that they are currently engaged in.

The S-CTRNN was trained offline to perform certain tasks depending on a yellow ball’s position, as depicted in Fig. 17 (left). Synaptic weights and biases of the network, as well as the internal states of the PB units are updated via the backpropagation through time (BPTT) algorithm in order to maximize the likelihood in Eq. (13). This equation describes that at time step of training sequence , the network output of the

-th neuron (a normal distribution defined by the estimated mean (output)

and estimated variance ) properly reflects the desired input data .

(13)

After training, a recognition mechanism (via adaptation of the PB units, while keeping weights and biases fixed) enables the network to switch its behavior depending on the current situation.

To model ASD behavior, the estimated variance (sensory precision) is modified in the activation function of the variance units with the constant in Eq. (14), where is the minimum value and is the output of the -th context unit time step for movement sequence .

(14)

Experimental results with a humanoid NAO robot showed that for the robot behaved normally. For increased variance (reduced precision), the robot seemed to ignore prediction error and performed stopping and stereotypic movements. For decreased variance (increased precision), the robot performed mistaken movement changes or concentrated on certain movements, which also led to sudden freezing and repetitive movements. These results fit with the disordered motor system reported in ASD gowen2013motor , but add the surprising difference to previous studies that increased and decreased sensory precision seem to show the same consequences.

Fig. 17: Left: Overview of the interactive tasks the robot must perform. Right: Overview of the ANN model used for the experiments. Highlighted in red are the variance units where a constant is added to increase or decrease the sensory precision in order to imitate autistic behavior. Adapted from idei2017reduced .

5.6.2 Impairment in internal network representations

Another study using the S-CTRNN to model ASD characteristics is philippsen2018understanding . Using an S-CTRNN murata2013learning , the authors modify two parameters which control how the network makes predictions. Conversely to the other RNN model, which concentrates on replicating behavioral patterns, this study investigates “invisible” features characterizing the network’s learning process. More specifically, the authors evaluate how attention to sensory input and deficits in the prediction of trajectory noise influences the internal representation that a network acquires during learning.

The network as displayed in Fig. 18 is trained to recognize and draw ellipses and “eight” shapes, located at four different (overlapping) positions of the input space (cf. Fig. 19(b)). Inputs and outputs are two-dimensional and the recurrent context layer comprises neurons. Learning is modified in two ways: The parameter determines how much the network relies on external input, as opposed to its own prediction, i.e. gradually switches between open-loop () and near-closed-loop () control. The second parameter is defined analogous to idei2017reduced (see Eq. (14)) and manipulates the estimated variance such that networks with over- or underestimate noisy variations in the signal. Unlike its usage in idei2017reduced , this manipulation is not performed after training, but already during the training process, to account for the developmental nature of ASD.

Fig. 18: The S-CTRNN used in philippsen2018understanding with two parameter modifications. Adapted from philippsen2018understanding

After training, the network’s behavior is evaluated by the network’s ability to produce the trained trajectories. The internal representations are evaluated by collecting the activations of the context layer neurons during the time course of generation. The principle components of this high-dimensional space () indicated that networks tend to reuse internal representation structure for patterns located at the same position in input space. Thus, the authors define “good” internal network representations as representations which reflect these characteristics of the input data and demonstrated that internal representation quality and behavioral performance do not always correspond. Fig. 19(b) shows an example how task performance (top) and internal representation quality (bottom) change depending on the external contribution parameter. The best internal representation quality is achieved with , as the internal representation is well structured, although shows more accurate behavioral performance. These qualitative observations were also quantitatively verified in the high-dimensional space of neurons.

Performance, thus, does not always reflect internal representation quality. Interestingly, for the parameter , both extremes lead to an ASD-like impairment, as schematically depicted in Fig. 19(a). Typical development could correspond to the middle. Whereas the right-hand side expresses high-functioning ASD, the left-hand side describes ASD with severe impairments also at a behavioral level. A single parameter, thus, could account for the observed heterogeneity in the ASD population.

(a) Hypothesis
(b) Experimental results
Fig. 19: Effect of changing the external contribution parameter of the S-CTRNN from Fig. 18 on behavioral output (top) and on internal representation quality, evaluated in the two-dimenisonal principal components space (bottom). Adapted from philippsen2018understanding .

5.6.3 Generalization ability in a variational Bayes recurrent neural network

In ahmadi2017bridging , a novel recurrent network type is introduced, the variational Bayes predictive coding RNN (VBP-RNN). It differs from the S-CTRNN in that variance is not only coded on the output level, but also in the network’s context neurons to enhance the network’s ability to represent uncertainty in the data.

We do not discuss it in detail here, as this study is not focused on modeling ASD, but on representing deterministic vs. probabilistic behavior in an RNN in a coherent way. The analogy to ASD is only an additional remark on the meta-parameter

that performs a trade-off between two terms of the optimization (loss) function.

switches between the typically minimized reconstruction error term () and a regularization term that keeps the posterior distribution of the latent variables (i.e. the context units) similar to its prior. If the network is trained with , it develops deterministic dynamics and exhibits poor generalization capabilities. Values of lead to more randomness in the network and improve generalization, but too high values result in a performance drop.

could therefore model the spectrum of ASD, where is one extreme where the network solely relies on its top-down intentionality and fails to generalize, and too high values of reflect performance impairment due to excessive randomness in the network.

As this parameter controls how much regularization is performed, the approach is similar to dovgopoly2013connectionist where regularization was intentionally impaired.

5.7 Other approaches

In 2000, O’Loughlin and Thagard o2000autism used a connectionist model to simulate weak coherence, and to demonstrate how a failure of maximizing global coherence can cause deficits in theory of mind baron1997mindblindness .

Their network model, a so-called constraint network, is hand-designed according to the task and does not strictly fit an existing network category. The network performs logical reasoning and consists of a set of neurons, each of which corresponds to a logical element such as a belief (expressed as a sentence). Connections between them are set as excitatory and inhibitory, depending on whether two arguments support each other or are contradicting. Weights remain fixed, but the activations of neurons get updated depending on the connections to neighboring cells which can be excitatory (positive) on inhibitory (negative). A decaying factor lets the network’s activation converge to a state after a certain amount of time. Positive activations are then interpreted as an acceptance of this belief, negative activations as a denial.

The authors show that a high level of inhibition, compared to excitation, causes early activated association nodes in the network to suppress concurring hypotheses. The network, therefore, prefers more direct solutions, and makes wrong predictions. The overall coherence of the network, defined as the satisfaction of most constraints in the complete network, is not optimized, which can be considered as weak coherence.

6 Conclusion and future directions

Artificial neural network models of SZ and ASD have been presented as a great tool to fill the gap between theoretical models and biological evidence. Early works were biased by technical restrictions, but recent models are able to capture the same complexity as more conceptual methods like Bayesian models. However, there are still restrictions to fully address heterogeneity and non-specificity challenges of these two psychiatric disorders. Overlap between SZ and ASD (and also other mental disorders), in particular, is present regarding biological evidence. Therefore, it could be beneficial to use computational models to explore the possible consequences of specific biological impairments in general. In fact, some of the models discussed in this review could be considered as models that are not specific to a certain disorder and could be applicable to replicate a variety of different symptoms depending on the exact implementation and setup of the task. Overarching studies for connecting ideas and results across different contexts (ASD, SZ or even other mental disorders) are clearly missing to date and should be considered more strongly in the future.

Computational models of SZ and ASD should be inspired by neurobiological evidence and coherently connect with observed behavior. Instead of focusing a single symptom, models are required for explaining the co-occurrence of multiple symptoms and suggesting reasons for symptom diversity and heterogeneity among subjects. Furthermore, especially for ASD, the developmental nature of the disorder should be taken more strongly into account.

6.1 Developmental factors

Developmental factors are especially relevant for ASD as a developmental disorder. However, they can also be interesting for SZ, in particular to explain why many cases of SZ emerge during adolescence and early adulthood huttenlocher1979synaptic ; feinberg1982schizophrenia ; keshavan1994schizophrenia and to investigate developmental factors which might contribute to the onset of SZ cannon2015schizophrenia . Current models only partially take the developmental process into account and focus more on modeling existing deficits in adult subjects with ASD. For instance, existing models assume an aberrant number of neurons cohen1994artificial ; noriega2007self or differences in the neural connections ichinose2017local ; park2019macroscopic during the development, or they change the way that learning proceeds by altering network regularization dovgopoly2013connectionist ; ahmadi2017bridging or how information are integrated during learning philippsen2018understanding . However, these studies still cannot answer the question of which initial causes promote the appearance of ASD during the development. It might be beneficial to take even one step more back in development, to the development of the human fetus. For instance, a recent study yamada2016embodied suggests that disordered intrauterine embodied interaction during fetal period is a possible factor for neuro-developmental disorders like ASD.

Developmental factors also appear to be important as they might shed light on the remarkable heterogeneity among individuals with these disorders. Individual differences might be caused by genetic as well as environmental factors. Taking development into account, thus, is crucial for gaining a better understanding about the factors that lead to these disorders. An initial study targeting at explaining the heterogeneity of symptoms in different subjects is philippsen2018understanding . They show that ASD might be explained as a continuous spectrum of a single parameter as it has already been suggested previously in nagai2015influence . Different parameter values during development could cause individual differences in the emerged internal representations of a neural network which are not always observable at a behavioral level. Internal representation quality could be related to the network’s generalization capabilities and these invisible deficits could, thus, account for heterogeneous findings in behavioral studies. This work is, however, still too abstract and does not directly translate to how task behavior in ASD studies would be affected by such parameter changes.

For SZ, cortical pruning has been implemented in hoffman1989cortical ; hoffman1997synaptic as an attempt to account for differences in development. However, in all other discussed studies of SZ developmental factors are not considered and could be certainly useful in the future.

6.2 SZ and ASD as disorders of the self

One of the aspects not properly addressed in computational modeling, neither for SZ nor for ASD, is how diagnosed individuals experience their body and self in comparison with control subjects. Unlike ASD, SZ has been considered as a disorder related to self. Recently, Noel and colleagues noel2017spatial and other researchers are discussing body perception factors in both ASD and SZ individuals. Experiments on peripersonal space in body illusions show “opposite” results: Whereas individuals with SZ are more prone to have body illusions thakkar2011disturbances , individuals with ASD show a reduced illusionary effect cascio2012rubber . Hence, the causes of these psychopathologies have a direct impact on the perception of our body and the self. In the case of patients diagnosed with SZ, this relation has been more intensively studied stanghellini2009embodiment and some treatments include embodiment therapies. Hence, NN models of the bodily or sensorimotor self lanillos2017enactive that are able to explain body illusions hinz2018drifting would help to validate the hypothesis in a common framework. In fact, the body experience spectrum could make several psychopathologies comparable and trigger new treatments directions.

6.3 Addressing multifinality and equifinality with sensorimotor integration ANN models

On one hand, from the computational modeling point, the non-specificity of these disorders where a single biological cause could result in different atypicalities is an advantage. According to the majority of the computational models discussed in this manuscript, SZ and ASD are presented as disorders of sensory information fusion or interpretation. Thus, general ANN sensorimotor integration models are an interesting departing point for studying these types of psychopathologies. In particular, ANN models that are able to fit human-like data (control and patients) in different experimental paradigms such as body perceptual tests or decision making tasks.

On the other hand, different models could yield to a similar symptom. Hence, ANN architectures developed for other perceptual syndromes should be systematically studied as the model for other psychopathologies. For instance, Deep-Boltzmann machine hallucinations models series2010hallucinations ; deistler2019tactileHallucinations might be studied in a wider framework including schizophrenic positive symptoms.

6.4 ANN for psychopathologies

In terms of neural network architectures, there is a further need of transferring the knowledge from state-of-the-art recurrent neural networks and deep learning to neurological disorders as it was performed, for instance, with the Neocognitron model of ASD nagai2015influence or the MTRNN model of SZ yamashita2012spontaneous . Theoretical ANN studies, Computational Psychiatry and Neurosciences should be always be in contact to boost the feedback of those disciplines.

In opposition to Bayesian models that lie on a high abstraction of the task, modern ANN approaches schmidhuber2015deep are able to cope with real sensor data such as visual information. For instance, cross-modal learning architectures combined with hierarchical representation learning provide an interesting follow-up to early ANN studies on SZ and ASD. Furthermore, ANN models of Bayesian brain such as predictive coding yamashita2008emergence

and circular inference are a basis for uniting both communities. In fact, recent advances in probabilistic NNs like Variational Autoencoders

kingma2013auto and Variational-RNN fabius2014variational ; ahmadi2017bridging , have provided the mathematical framework to deploy ANN versions of prominent plausible models of the brain such as the free-energy principle friston2010free .

Finally, we presented some works that employ robotics systems’ validation as a useful servant for the behaviour unit/level of analysis yamashita2012spontaneous . The interesting aspect of these approaches is that the internal mechanism of the behaviour comes to light cheng2007cb

. Thus, further models might be supported with experimental machine learning and robotics closing the gap between data models and real world embodied models.

Acknowledgements

This work was supported by SELFCEPTION project (www.selfception.eu) European Union Horizon 2020 Programme (MSCA-IF-2016) under grant agreement no. 741941, JST CREST grant no. JPMJCR16E2, and by JSPS KAKENHI grant no. JP17H06039, JP18K07597 and JP18KT0021.

References

References

  • (1) S. Saha, D. Chant, J. Welham, J. McGrath, A systematic review of the prevalence of schizophrenia, PLoS medicine 2 (5) (2005) e141.
  • (2) J. Baio, L. Wiggins, D. L. Christensen, M. J. Maenner, J. Daniels, Z. Warren, M. Kurzius-Spencer, W. Zahorodny, C. R. Rosenberg, T. White, et al., Prevalence of autism spectrum disorder among children aged 8 years – autism and developmental disabilities monitoring network, 11 sites, united states, 2014, MMWR Surveillance Summaries 67 (6) (2018) 1.
  • (3) J. Rapoport, A. Chavez, D. Greenstein, A. Addington, N. Gogtay, Autism spectrum disorders and childhood-onset schizophrenia: clinical and biological contributions to a relation revisited, Journal of the American Academy of Child & Adolescent Psychiatry 48 (1) (2009) 10–18.
  • (4) A. E. Pinkham, J. B. Hopfinger, K. A. Pelphrey, J. Piven, D. L. Penn, Neural bases for impaired social cognition in schizophrenia and autism spectrum disorders, Schizophrenia research 99 (1-3) (2008) 164–175.
  • (5) B. H. King, C. Lord, Is schizophrenia on the autism spectrum?, Brain research 1380 (2011) 34–41.
  • (6) D. A. Treffert, The savant syndrome: an extraordinary condition. a synopsis: past, present, future, Philosophical Transactions of the Royal Society of London B: Biological Sciences 364 (1522) (2009) 1351–1357.
  • (7)

    L. Selfe, Nadia revisited: A longitudinal study of an autistic savant, Psychology Press, 2012.

  • (8) C.-D. G. of the Psychiatric Genomics Consortium, et al., Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis, The Lancet 381 (9875) (2013) 1371–1379.
  • (9) D. Redish, J. Gordon, Computational Psychiatry: New Perspectives on Mental Illness (Strungmann Forum Reports), Cambridge, MA: MIT Press, 2016.
  • (10) X.-J. Wang, J. H. Krystal, Computational psychiatry, Neuron 84 (3) (2014) 638–654.
  • (11) P. R. Montague, R. J. Dolan, K. J. Friston, P. Dayan, Computational psychiatry, Trends in cognitive sciences 16 (1) (2012) 72–80.
  • (12) R. A. Adams, Q. J. Huys, J. P. Roiser, Computational psychiatry: towards a mathematically informed understanding of mental illness, J Neurol Neurosurg Psychiatry 87 (1) (2016) 53–63.
  • (13) Q. J. Huys, M. Moutoussis, J. Williams, Are computational models of any use to psychiatry?, Neural Networks 24 (6) (2011) 544–551.
  • (14)

    F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain., Psychological review 65 (6) (1958) 386.

  • (15) J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks 61 (2015) 85–117.
  • (16) F. Crick, G. Mitchison, et al., The function of dream sleep, Nature 304 (5922) (1983) 111–114.
  • (17) M. Spitzer, A neurocomputational approach to delusions, Comprehensive Psychiatry 36.
  • (18) R. E. Hoffman, Computer simulations of neural information processing and the schizophrenia-mania dichotomy, Archives of General Psychiatry 44 (2) (1987) 178–188.
  • (19) J. D. Cohen, D. Servan-Schreiber, Context, cortex, and dopamine: a connectionist approach to behavior and biology in schizophrenia., Psychological review 99 (1) (1992) 45.
  • (20) J. A. Reggia, E. Ruppin, R. S. Berndt, Neural modeling of brain and cognitive disorders, Vol. 6, World Scientific, 1996.
  • (21) L. Gustafsson, A. P. Paplinski, Neural network modelling of autism, Recent developments in autism research (2004) 100–134.
  • (22) R. Pfeifer, J. Bongard, How the body shapes the way we think: a new view of intelligence, MIT press, 2006.
  • (23) G. Cheng, S.-H. Hyon, J. Morimoto, A. Ude, J. G. Hale, G. Colvin, W. Scroggin, S. C. Jacobsen, Cb: A humanoid research platform for exploring neuroscience, Advanced Robotics 21 (10) (2007) 1097–1114.
  • (24) A. Anticevic, J. D. Murray, D. M. Barch, Bridging levels of understanding in schizophrenia through computational modeling, Clinical psychological science 3 (3) (2015) 433–459.
  • (25) V. Valton, L. Romaniuk, J. D. Steele, S. Lawrie, P. Seriès, Comprehensive review: computational modelling of schizophrenia, Neuroscience & Biobehavioral Reviews 83 (2017) 631–646.
  • (26) A. A. Moustafa, B. Misiak, D. Frydecka, Neurocomputational models of schizophrenia, Computational Models of Brain and Behavior (2017) 73.
  • (27) L. Kanner, et al., Autistic disturbances of affective contact, Nervous child 2 (3) (1943) 217–250.
  • (28) S. Sandin, P. Lichtenstein, R. Kuja-Halkola, C. Hultman, H. Larsson, A. Reichenberg, The heritability of autism spectrum disorder, Jama 318 (12) (2017) 1182–1184.
  • (29) A. Sims, Symptoms in the mind: An introduction to descriptive psychopathology., Bailliere Tindall Publishers, 1988.
  • (30) A. P. Association, et al., Diagnostic and statistical manual of mental disorders (DSM-5®), American Psychiatric Pub, 2013.
  • (31) A. van der Weiden, M. Prikken, N. E. van Haren, Self–other integration and distinction in schizophrenia: A theoretical analysis and a review of the evidence, Neuroscience & Biobehavioral Reviews 57 (2015) 220–237.
  • (32) M. Morrens, W. Hulstijn, P. J. Lewi, M. De Hert, B. G. Sabbe, Stereotypy in schizophrenia, Schizophrenia research 84 (2-3) (2006) 397–404.
  • (33) C. Lord, M. Rutter, P. C. DiLavore, S. Risi, K. Gotham, S. Bishop, et al., Autism diagnostic observation schedule: ADOS, Western Psychological Services Los Angeles, CA, 2012.
  • (34) Y. Yamada, H. Kanazawa, S. Iwasaki, Y. Tsukahara, O. Iwata, S. Yamada, Y. Kuniyoshi, An embodied brain model of the human foetus, Scientific reports 6 (2016) 27893.
  • (35) K. J. Friston, The disconnection hypothesis, Schizophrenia research 30 (2) (1998) 115–125.
  • (36) M.-E. Lynall, D. S. Bassett, R. Kerwin, P. J. McKenna, M. Kitzbichler, U. Muller, E. Bullmore, Functional connectivity and brain networks in schizophrenia, Journal of Neuroscience 30 (28) (2010) 9477–9487.
  • (37) C. Frith, Is autism a disconnection disorder?, The Lancet Neurology 3 (10) (2004) 577.
  • (38) M. A. Just, V. L. Cherkassky, T. A. Keller, N. J. Minshew, Cortical activation and synchronization during sentence comprehension in high-functioning autism: evidence of underconnectivity, Brain 127 (8) (2004) 1811–1821.
  • (39) J. S. Anderson, T. J. Druzgal, A. Froehlich, M. B. DuBray, N. Lange, A. L. Alexander, T. Abildskov, J. A. Nielsen, A. N. Cariello, J. R. Cooperrider, et al., Decreased interhemispheric functional connectivity in autism, Cerebral cortex 21 (5) (2010) 1134–1146.
  • (40) C. L. Keown, P. Shih, A. Nair, N. Peterson, M. E. Mulvey, R.-A. Müller, Local functional overconnectivity in posterior brain regions is associated with symptom severity in autism spectrum disorders, Cell reports 5 (3) (2013) 567–572.
  • (41) K. Supekar, L. Q. Uddin, A. Khouzam, J. Phillips, W. D. Gaillard, L. E. Kenworthy, B. E. Yerys, C. J. Vaidya, V. Menon, Brain hyperconnectivity in children with autism and its links to social deficits, Cell reports 5 (3) (2013) 738–747.
  • (42) A. Hahamy, M. Behrmann, R. Malach, The idiosyncratic brain: distortion of spontaneous connectivity patterns in autism spectrum disorder, Nature neuroscience 18 (2) (2015) 302.
  • (43) R. E. Hoffman, S. K. Dobscha, Cortical pruning and the development of schizophrenia: a computer model, Schizophrenia bulletin 15 (3) (1989) 477–490.
  • (44) R. E. Hoffman, T. H. McGlashan, Synaptic elimination, neurodevelopment, and the mechanism of hallucinated “voices” in schizophrenia, American Journal of Psychiatry 154 (12) (1997) 1683–1689.
  • (45) R. E. Hoffman, U. Grasemann, R. Gueorguieva, D. Quinlan, D. Lane, R. Miikkulainen, Using computational patients to evaluate illness mechanisms in schizophrenia, Biological psychiatry 69 (10) (2011) 997–1005.
  • (46) P. R. Huttenlocher, et al., Synaptic density in human frontal cortex – developmental changes and effects of aging, Brain Res 163 (2) (1979) 195–205.
  • (47) Y. Yamashita, J. Tani, Spontaneous prediction error generation in schizophrenia, PLoS One 7 (5) (2012) e37843.
  • (48) K. J. Friston, C. D. Frith, Schizophrenia: a disconnection syndrome, Clin Neurosci 3 (2) (1995) 89–97.
  • (49) J. Park, K. Ichinose, Y. Kawai, J. Suzuki, M. Asada, H. Mori, Macroscopic cluster organizations change the complexity of neural activity, Entropy 21 (2) (2019) 214.
  • (50) K. Ichinose, J. Park, Y. Kawai, J. Suzuki, M. Asada, H. Mori, Local over-connectivity reduces the complexity of neural activity: Toward a constructive understanding of brain networks in patients with autism spectrum disorder.
  • (51) E. Courchesne, K. Pierce, Why the frontal cortex in autism might be talking only to itself: local over-connectivity but long-distance disconnection, Current opinion in neurobiology 15 (2) (2005) 225–230.
  • (52) J. Rubenstein, M. M. Merzenich, Model of autism: increased ratio of excitation/inhibition in key neural systems, Genes, Brain and Behavior 2 (5) (2003) 255–267.
  • (53) L. Sun, C. Grützner, S. Bölte, M. Wibral, T. Tozman, S. Schlitt, F. Poustka, W. Singer, C. M. Freitag, P. J. Uhlhaas, Impaired gamma-band activity during perceptual organization in adults with autism spectrum disorders: evidence for dysfunctional network activity in frontal-posterior cortices, Journal of Neuroscience 32 (28) (2012) 9563–9573.
  • (54) T. M. Snijders, B. Milivojevic, C. Kemner, Atypical excitation–inhibition balance in autism captured by the gamma response to contextual modulation, NeuroImage: clinical 3 (2013) 65–72.
  • (55) R. Canitano, M. Pallagrosi, Autism spectrum disorders and schizophrenia spectrum disorders: excitation/inhibition imbalance and developmental trajectories, Frontiers in psychiatry 8 (2017) 69.
  • (56) R. Jardri, K. Hugdahl, M. Hughes, J. Brunelin, F. Waters, B. Alderson-Day, D. Smailes, P. Sterzer, P. R. Corlett, P. Leptourgos, et al., Are hallucinations due to an imbalance between excitatory and inhibitory influences on the brain?, Schizophrenia bulletin 42 (5) (2016) 1124–1134.
  • (57) O. Yizhar, L. E. Fenno, M. Prigge, F. Schneider, T. J. Davidson, D. J. O’shea, V. S. Sohal, I. Goshen, J. Finkelstein, J. T. Paz, et al., Neocortical excitation/inhibition balance in information processing and social dysfunction, Nature 477 (7363) (2011) 171.
  • (58) A. Dickinson, M. Jones, E. Milne, Measuring neural excitation and inhibition in autism: different approaches, different findings and different interpretations, Brain research 1648 (2016) 277–289.
  • (59) Y. Nagai, T. Moriwaki, M. Asada, Influence of excitation/inhibition imbalance on local processing bias in autism spectrum disorder., in: CogSci, 2015.
  • (60) L. Gustafsson, Inadequate cortical feature maps: A neural circuit theory of autism, Biological Psychiatry 42 (12) (1997) 1138–1147.
  • (61) R. Jardri, S. Deneve, Circular inferences in schizophrenia, Brain 136 (11) (2013) 3227–3241.
  • (62) R. Jardri, S. Duverne, A. S. Litvinova, S. Denève, Experimental evidence for circular inference in schizophrenia, Nature communications 8 (2017) 14218.
  • (63) C.-E. Notredame, D. Pins, S. Deneve, R. Jardri, What visual illusions teach us about schizophrenia, Frontiers in integrative neuroscience 8 (2014) 63.
  • (64) J. R. Lucker, Auditory hypersensitivity in children with autism spectrum disorders, Focus on Autism and Other Developmental Disabilities 28 (3) (2013) 184–191.
  • (65) E. Pellicano, D. Burr, When the world becomes ‘too real’: a bayesian explanation of autistic perception, Trends in cognitive sciences 16 (10) (2012) 504–510.
  • (66) R. P. Lawson, G. Rees, K. J. Friston, An aberrant precision account of autism, Frontiers in human neuroscience 8 (2014) 302.
  • (67) P. Karvelis, A. R. Seitz, S. M. Lawrie, P. Seriès, Autistic traits, but not schizotypy, predict increased weighting of sensory information in bayesian visual integration, eLife 7 (2018) e34115.
  • (68) R. A. Adams, K. E. Stephan, H. R. Brown, C. D. Frith, K. J. Friston, The computational anatomy of psychosis, Frontiers in psychiatry 4 (2013) 47.
  • (69) A. R. Powers III, M. Kelley, P. R. Corlett, Hallucinations as top-down effects on perception, Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 1 (5) (2016) 393–400.
  • (70) P. Sterzer, R. A. Adams, P. Fletcher, C. Frith, S. M. Lawrie, L. Muckli, P. Petrovic, P. Uhlhaas, M. Voss, P. R. Corlett, The predictive coding account of psychosis, Biological psychiatry.
  • (71) H. Idei, S. Murata, Y. Chen, Y. Yamashita, J. Tani, T. Ogata, Reduced behavioral flexibility by aberrant sensory precision in autism spectrum disorder: A neurorobotics experiment, in: Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2017 Joint IEEE International Conference on, IEEE, 2017, pp. 271–276.
  • (72) A. Philippsen, Y. Nagai, Understanding the cognitive mechanisms underlying autistic behavior: a recurrent neural network study, in: Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2018 Joint IEEE International Conference on, IEEE, 2018, pp. 84–90.
  • (73) A. Dovgopoly, E. Mercado, A connectionist model of category learning by individuals with high-functioning autism spectrum disorder, Cognitive, Affective, & Behavioral Neuroscience 13 (2) (2013) 371–389.
  • (74) R. A. Adams, Bayesian inference, predictive coding, and computational models of psychosis, in: Computational Psychiatry, Elsevier, 2018, pp. 175–195.
  • (75) I. L. Cohen, An artificial neural network analogue of learning in autism, Biological Psychiatry 36 (1) (1994) 5–20.
  • (76) A. Ahmadi, J. Tani, Bridging the gap between probabilistic and deterministic models: a simulation study on a variational bayes predictive coding recurrent neural network model, in: International Conference on Neural Information Processing, Springer, 2017, pp. 760–769.
  • (77) P. Series, D. P. Reichert, A. J. Storkey, Hallucinations in charles bonnet syndrome induced by homeostasis: a deep boltzmann machine model, in: Advances in Neural Information Processing Systems, 2010, pp. 2020–2028.
  • (78) M. Deistler, Y. Yener, F. Bergner, P. Lanillos, G. Cheng, Tactile hallucinations on artificial skin induced by homeostasis in a deep boltzmann machine, arXiv preprint.
  • (79) D. Horn, E. Ruppin, Compensatory mechanisms in an attractor neural network model of schizophrenia, Neural Computation 7 (1) (1995) 182–205.
  • (80) J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the national academy of sciences 79 (8) (1982) 2554–2558.
  • (81) D. O. Hebb, The organization of behavior. a neuropsychological theory.
  • (82) I. Feinberg, Schizophrenia: caused by a fault in programmed synaptic elimination during adolescence?, Journal of psychiatric research 17 (4) (1982) 319–334.
  • (83) M. S. Keshavan, S. Anderson, J. W. Pettergrew, Is schizophrenia due to excessive synaptic pruning in the prefrontal cortex? the feinberg hypothesis revisited, Journal of psychiatric research 28 (3) (1994) 239–265.
  • (84) P. R. Huttenlocher, C. de Courten, L. J. Garey, H. Van der Loos, Synaptogenesis in human visual cortex—evidence for synapse elimination during normal development, Neuroscience letters 33 (3) (1982) 247–252.
  • (85) E. Ruppin, J. A. Reggia, D. Horn, Pathogenesis of schizophrenic delusions and hallucinations: a neural model, Schizophrenia Bulletin 22 (1) (1996) 105–121.
  • (86) J. R. Stevens, Abnormal reinnervation as a basis for schizophrenia: A hypothesis 49 (1992) 238–43.
  • (87) M. Tsodyks, Associative memory in asymmetric diluted network with low level of activity, EPL (Europhysics Letters) 7 (3) (1988) 203.
  • (88)

    M. V. Tsodyks, M. V. Feigel’man, The enhanced storage capacity in neural networks with low activity level, EPL (Europhysics Letters) 6 (2) (1988) 101.

  • (89) N. Garmezy, The psychology and psychopathology of attention., Schizophrenia Bulletin 3 (3) (1977) 360.
  • (90) P. J. Lang, A. H. Buss, Psychological deficit in schizophrenia: Ii. interference and activation., Journal of Abnormal Psychology 70 (2) (1965) 77.
  • (91) A. Henik, R. Salo, Schizophrenia and the stroop effect, Behavioral and cognitive neuroscience reviews 3 (1) (2004) 42–59.
  • (92) J. R. Stroop, Studies of interference in serial verbal reactions., Journal of experimental psychology 18 (6) (1935) 643.
  • (93) H. E. Rosvold, A. F. Mirsky, I. Sarason, E. D. Bransome Jr, L. H. Beck, A continuous performance test of brain damage., Journal of consulting psychology 20 (5) (1956) 343.
  • (94) B. A. Cornblatt, M. F. Lenzenweger, L. Erlenmeyer-Kimling, The continuous performance test, identical pairs version: Ii. contrasting attentional profiles in schizophrenic and depressed patients, Psychiatry research 29 (1) (1989) 65–85.
  • (95) L. Chapman, J. P. Chapman, G. A. Miller, A theory of verbal behavior in schizophrenia., Progress in experimental personality research 72 (1964) 49.
  • (96) R. L. Margolis, D.-M. Chuang, R. M. Post, Programmed cell death: implications for neuropsychiatric disorders, Biological psychiatry 35 (12) (1994) 946–956.
  • (97) J. L. Elman, Finding structure in time, Cognitive science 14 (2) (1990) 179–211.
  • (98) R. E. Hoffman, T. H. McGlashan, Book review: Neural network models of schizophrenia, The Neuroscientist 7 (5) (2001) 441–454.
  • (99)

    R. Miikkulainen, M. G. Dyer, Natural language processing with modular pdp networks and distributed lexicon, Cognitive Science 15 (3) (1991) 343–399.

  • (100) R. Miikkulainen, Subsymbolic natural language processing: An integrated model of scripts, lexicon, and memory, MIT press, 1993.
  • (101) U. Grasemann, R. Miikulainen, R. Hoffman, A subsymbolic model of language pathology in schizophrenia, in: Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29, 2007.
  • (102) H. Von Helmholtz, Handbuch der physiologischen Optik, Vol. 9, Voss, 1867.
  • (103) P. Thompson, Margaret thatcher: a new illusion., Perception.
  • (104) K. M. Dallenbach, A puzzle-picture with a new principle of concealment, The American journal of psychology (1951) 431–433.
  • (105) F. G. Happé, Studying weak central coherence at low levels: children with autism do not succumb to visual illusions. a research note, Journal of Child Psychology and Psychiatry 37 (7) (1996) 873–877.
  • (106) P. Mitchell, D. Ropar, Visuo-spatial abilities in autism: A review, Infant and Child Development: An International Journal of Research and Practice 13 (3) (2004) 185–198.
  • (107) E. T. Jaynes, How does the brain do plausible reasoning?, in: Maximum-entropy and Bayesian methods in science and engineering, Springer, 1988, pp. 1–24.
  • (108) P. Dayan, G. E. Hinton, R. M. Neal, R. S. Zemel, The helmholtz machine, Neural computation 7 (5) (1995) 889–904.
  • (109) D. George, J. Hawkins, Towards a mathematical theory of cortical micro-circuits, PLoS computational biology 5 (10) (2009) e1000532.
  • (110) R. P. Rao, D. H. Ballard, Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects, Nature neuroscience 2 (1) (1999) 79.
  • (111) K. Friston, J. Kilner, L. Harrison, A free energy principle for the brain, Journal of Physiology-Paris 100 (1-3) (2006) 70–87.
  • (112) S. Murata, J. Namikawa, H. Arie, S. Sugano, J. Tani, Learning to reproduce fluctuating time series by inferring their time-dependent stochastic properties: Application in robot learning via tutoring, IEEE Transactions on Autonomous Mental Development 5 (4) (2013) 298–310.
  • (113) Y. Yamashita, J. Tani, Emergence of functional hierarchy in a multiple timescale neural network model: a humanoid robot experiment, PLoS computational biology 4 (11) (2008) e1000220.
  • (114) K. Friston, A theory of cortical responses, Philosophical transactions of the Royal Society B: Biological sciences 360 (1456) (2005) 815–836.
  • (115) N.-A. Hinz, P. Lanillos, H. Mueller, G. Cheng, Drifting perceptual patterns suggest prediction errors fusion rather than hypothesis selection: replicating the rubber-hand illusion on a robot, arXiv preprint arXiv:1806.06809.
  • (116) M. Bányai, V. A. Diwadkar, P. Érdi, Model-based dynamical analysis of functional disconnection in schizophrenia, Neuroimage 58 (3) (2011) 870–877.
  • (117) S. Baron-Cohen, Mindblindness: An essay on autism and theory of mind, MIT press, 1997.
  • (118) U. Frith, F. Happé, Autism: Beyond “theory of mind”, Cognition 50 (1-3) (1994) 115–132.
  • (119) F. Happé, U. Frith, The weak coherence account: detail-focused cognitive style in autism spectrum disorders, Journal of autism and developmental disorders 36 (1) (2006) 5–25.
  • (120) U. Frith, Autism: Explaining the enigma, Blackwell Publishing, 2003.
  • (121) J. L. McClelland, The basis of hyperspecificity in autism: A preliminary suggestion based on properties of neural nets, Journal of Autism and Developmental Disorders 30 (5) (2000) 497–502.
  • (122) C. O’Laughlin, P. Thagard, Autism and coherence: A computational model, Mind & Language 15 (4) (2000) 375–392.
  • (123) I. Cohen, Neural network analysis of learning in autism, Neural networks and psychopathology (1998) 274–315.
  • (124) L. Gustafsson, A. P. Papliński, Self-organization of an artificial neural network subjected to attention shift impairments and familiarity preference, characteristics studied in autism, Journal of autism and developmental disorders 34 (2) (2004) 189–198.
  • (125) G. Noriega, Self-organizing maps as a model of brain mechanisms potentially linked to autism, IEEE Transactions on neural systems and rehabilitation engineering 15 (2) (2007) 217–226.
  • (126) G. Noriega, Modeling propagation delays in the development of soms—a parallel with abnormal brain growth in autism, Neural Networks 21 (2-3) (2008) 130–139.
  • (127) M. L. Bauman, Microscopic neuroanatomic abnormalities in autism, Pediatrics 87 (5) (1991) 791–796.
  • (128) I. L. Cohen, V. Sudhalter, D. Landon-Jimenez, M. Keogh, A neural network approach to the classification of autism, Journal of autism and developmental disorders 23 (3) (1993) 443–466.
  • (129) J. L. McClelland, B. L. McNaughton, R. C. O’reilly, Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory., Psychological review 102 (3) (1995) 419.
  • (130) C. M. Henderson, J. L. McClelland, A pdp model of the simultaneous perception of multiple objects, Connection Science 23 (2) (2011) 161–172.
  • (131) B. A. Church, M. S. Krauss, C. Lopata, J. A. Toomey, M. L. Thomeer, M. V. Coutinho, M. A. Volker, E. Mercado, Atypical categorization in children with high-functioning autism spectrum disorder, Psychonomic Bulletin & Review 17 (6) (2010) 862–868.
  • (132) T. Vladusich, O. Olu-Lafe, D.-S. Kim, H. Tager-Flusberg, S. Grossberg, Prototypical category learning in high-functioning autism, Autism Research 3 (5) (2010) 226–236.
  • (133) T. Bourgeron, A synaptic trek to autism, Current opinion in neurobiology 19 (2) (2009) 231–234.
  • (134) B. D. Auerbach, E. K. Osterweil, M. F. Bear, Mutations causing syndromic autism define an axis of synaptic pathophysiology, Nature 480 (7375) (2011) 63.
  • (135) A. Krogh, J. A. Hertz, A simple weight decay can improve generalization, in: Advances in neural information processing systems, 1992, pp. 950–957.
  • (136) M. F. Casanova, I. A. van Kooten, A. E. Switala, H. van Engeland, H. Heinsen, H. W. Steinbusch, P. R. Hof, J. Trippe, J. Stone, C. Schmitz, Minicolumnar abnormalities in autism, Acta neuropathologica 112 (3) (2006) 287.
  • (137) V. B. Mountcastle, Modality and topographic properties of single neurons of cat’s somatic sensory cortex, Journal of neurophysiology 20 (4) (1957) 408–434.
  • (138) E. Courchesne, J. Townsend, N. A. Akshoomoff, O. Saitoh, R. Yeung-Courchesne, A. J. Lincoln, H. E. James, R. H. Haas, L. Schreibman, L. Lau, Impairment in shifting attention in autistic and cerebellar patients., Behavioral neuroscience 108 (5) (1994) 848.
  • (139) M. L. Bauman, T. L. Kemper, Neuroanatomic observations of the brain in autism: a review and future directions, International journal of developmental neuroscience 23 (2-3) (2005) 183–187.
  • (140) E. Courchesne, C. Karns, H. Davis, R. Ziccardi, R. Carper, Z. Tigue, H. Chisum, P. Moses, K. Pierce, C. Lord, et al., Unusual brain growth patterns in early life in patients with autistic disorder an mri study, Neurology 57 (2) (2001) 245–254.
  • (141) E. H. Aylward, N. J. Minshew, K. Field, B. Sparks, N. Singh, Effects of age on brain volume and head circumference in autism, Neurology 59 (2) (2002) 175–183.
  • (142) K. Fukushima, S. Miyake, Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition, in: Competition and cooperation in neural nets, Springer, 1982, pp. 267–285.
  • (143) K. Fukushima, Neocognitron: A hierarchical neural network capable of visual pattern recognition., Neural networks 1 (2) (1988) 119–130.
  • (144) K. Fukushima, Neocognitron for handwritten digit recognition, Neurocomputing 51 (2003) 161–180.
  • (145) M. Behrmann, G. Avidan, G. L. Leonard, R. Kimchi, B. Luna, K. Humphreys, N. Minshew, Configural processing in autism and its relationship to face processing, Neuropsychologia 44 (1) (2006) 110–129.
  • (146) K. Ichinose, J. Park, Y. Kawai, J. Suzuki, M. Asada, H. Mori, Local over-connectivity reduces the complexity of neural activity: Toward a constructive understanding of brain networks in patients with autism spectrum disorder, in: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2017, pp. 233–238. doi:10.1109/DEVLRN.2017.8329813.
  • (147) E. M. Izhikevich, Simple model of spiking neurons, IEEE Transactions on neural networks 14 (6) (2003) 1569–1572.
  • (148) W. Bosl, A. Tierney, H. Tager-Flusberg, C. Nelson, Eeg complexity as a biomarker for autism spectrum disorder risk, BMC medicine 9 (1) (2011) 18.
  • (149) D. J. Watts, S. H. Strogatz, Collective dynamics of ‘small-world’networks, nature 393 (6684) (1998) 440.
  • (150) E. M. Izhikevich, N. S. Desai, Relating stdp to bcm, Neural computation 15 (7) (2003) 1511–1523.
  • (151) M. Costa, A. L. Goldberger, C.-K. Peng, Multiscale entropy analysis of biological signals, Physical review E 71 (2) (2005) 021906.
  • (152) J. Brock, Alternative bayesian accounts of autistic perception: comment on pellicano and burr, Trends in cognitive sciences 16 (12) (2012) 573–574.
  • (153) R. P. Lawson, C. Mathys, G. Rees, Adults with autism overestimate the volatility of the sensory environment, Nature neuroscience 20 (9) (2017) 1293.
  • (154) J. Tani, Learning to generate articulated behavior through the bottom-up and the top-down interaction processes, Neural Networks 16 (1) (2003) 11–23.
  • (155) E. Gowen, A. Hamilton, Motor abilities in autism: a review using a computational context, Journal of autism and developmental disorders 43 (2) (2013) 323–344.
  • (156) T. D. Cannon, How schizophrenia develops: cognitive and brain mechanisms underlying onset of psychosis, Trends in cognitive sciences 19 (12) (2015) 744–756.
  • (157) J.-P. Noel, C. J. Cascio, M. T. Wallace, S. Park, The spatial self in schizophrenia and autism spectrum disorder, Schizophrenia research 179 (2017) 8–12.
  • (158) K. N. Thakkar, H. S. Nichols, L. G. McIntosh, S. Park, Disturbances in body ownership in schizophrenia: evidence from the rubber hand illusion and case study of a spontaneous out-of-body experience, PloS one 6 (10) (2011) e27089.
  • (159) C. J. Cascio, J. H. Foss-Feig, C. P. Burnette, J. L. Heacock, A. A. Cosby, The rubber hand illusion in children with autism spectrum disorders: delayed influence of combined tactile and visual input on proprioception, Autism 16 (4) (2012) 406–419.
  • (160) G. Stanghellini, Embodiment and schizophrenia, World Psychiatry 8 (1) (2009) 56–59.
  • (161) P. Lanillos, E. Dean-Leon, G. Cheng, Enactive self: a study of engineering perspectives to obtain the sensorimotor self through enaction, in: IEEE International Conference on Developmental Learning and Epigenetic Robotics, 2017.
  • (162) D. P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprint arXiv:1312.6114.
  • (163) O. Fabius, J. R. van Amersfoort, Variational recurrent auto-encoders, arXiv preprint arXiv:1412.6581.
  • (164) K. Friston, The free-energy principle: a unified brain theory?, Nature reviews neuroscience 11 (2) (2010) 127.