DeepAI

# Hierarchical Selective Recruitment in Linear-Threshold Brain Networks - Part I: Intra-Layer Dynamics and Selective Inhibition

Goal-driven selective attention (GDSA) refers to the brain's function of prioritizing, according to one's internal goals and desires, the activity of a task-relevant subset of its overall network to efficiently process relevant information while inhibiting the effects of distractions. Despite decades of research in neuroscience, a comprehensive understanding of GDSA is still lacking. We propose a novel framework for GDSA using concepts and tools from control theory as well as insights and structures from neuroscience. Central to this framework is an information-processing hierarchy with two main components: selective inhibition of task-irrelevant activity and top-down recruitment of task-relevant activity. We analyze the internal dynamics of each layer of the hierarchy described as a network with linear-threshold dynamics and derive conditions on its structure to guarantee existence and uniqueness of equilibria, asymptotic stability, and boundedness of trajectories. We also provide mechanisms that enforce selective inhibition using the biologically-inspired schemes of feedforward and feedback inhibition. Despite their differences, both schemes lead to the same conclusion: the intrinsic dynamical properties of the (not-inhibited) task-relevant subnetworks are the sole determiner of the dynamical properties that are achievable under selective inhibition.

• 4 publications
• 12 publications
09/05/2018

### Hierarchical Selective Recruitment in Linear-Threshold Brain Networks - Part II: Inter-Layer Dynamics and Top-Down Recruitment

Goal-driven selective attention (GDSA) is a remarkable function that all...
05/15/2021

### A brain basis of dynamical intelligence for AI and computational neuroscience

The deep neural nets of modern artificial intelligence (AI) have not ach...
06/04/2021

### Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics

Deep neural networks excel at image classification, but their performanc...
07/11/2014

### Deep Networks with Internal Selective Attention through Feedback Connections

Traditional convolutional neural networks (CNN) are stationary and feedf...
02/04/2020

### Selective Segmentation Networks Using Top-Down Attention

Convolutional neural networks model the transformation of the input sens...
01/13/2019

### Modeling neural dynamics during speech production using a state space variational autoencoder

Characterizing the neural encoding of behavior remains a challenging tas...

## I Introduction

The human brain is constantly under the influx of sensory inputs and is responsible for integrating and interpreting them to generate appropriate decisions and actions. This influx contains not only the pieces of information relevant to the present task(s), but also a myriad of distractions. Goal-driven selective attention (GDSA) refers to the active selective processing of a subset of information influx while suppressing the effects of others, and is vital for the proper function of the brain.111Note the distinction of this with stimulus-driven selective attention (the reactive shift of focus based on saliency of stimuli) which is not the focus of this work. Examples of GDSA range from selective audition in a crowded place to selective vision in cluttered environments to selective taste/smell in food. As a result, a long standing question in neuroscience involves understanding the brain’s complex mechanisms underlying selective attention [2, 3, 4, 5, 6, 7].

A central element in addressing this question is the role played by the hierarchical organization of the brain [8]. Broadly, this organization places primary sensory and motor areas at the bottom and integrative association areas (prefrontal cortex in particular) at the top. Accordingly, sensory information is processed while flowing up the hierarchy, where decisions are eventually made and transmitted back down the hierarchy to generate motor actions.222Note that the role of memory (being distributed across the brain) is implicit in this simplified stimulus-response description. Indeed, many sensory inputs only form memories (without motor response) or many motor actions result chiefly from memory (without sensory stimulation). The hierarchical aspect is, nevertheless, always present. The top-down direction is also responsible for GDSA, where the higher-order areas differentially “modulate” the activity of the lower-level areas such that only relevant information is further processed. This phenomenon, hereafter called hierarchical selective recruitment (HSR), constitutes the basis for GDSA and has been the subject of extensive experimental research in neuroscience, see e.g., [9, 10, 11, 12, 4, 13, 14, 15, 16, 17, 18]

. However, a complete understanding of how, when (how quick), or where (within the hierarchy) it occurs is still lacking. In particular, the relationship between HSR and the dynamics of the involved neuronal networks is poorly understood. Our goal is to address this gap from a model-based perspective, resorting to control-theoretic tools to explain various aspects of HSR in terms of the synaptic network structure and the dynamics that emerge from it.

333We focus primarily on GDSA because of its long history in neuroscience. Nevertheless, HSR is a generic framework for top-down discrimination between two or more groups of competing populations that may represent thoughts, memories, emotions, or motor actions in addition to sensory information.

The starting point in the development of HSR is the observation that different stimuli, in particular the task-relevant and task-irrelevant ones, are processed by different populations of neurons. With each neuronal population represented by a node in the overall neuronal network of networks, HSR primarily relies on the selective inhibition of the task-irrelevant nodes and the top-down recruitment of the task-relevant nodes of each layer by the layer immediately above. This scheme is motivated by the observations that GDSA is not confined to a single layer, but is rather distributed in a graded manner along the thalamocortical hierarchy [8]. Here, we analyze the dynamics of individual layers as well as the mechanisms for selective inhibition in a bilayer network. The results here set the basis for the study of the mechanisms for top-down recruitment in multilayer networks in our accompanying work [19].

### Literature review

In this work we use dynamical networks with linear-threshold nonlinearities (also called rectified linear units, ReLU, in machine learning) to model the activity of neuronal populations. Linear-threshold models allow for a unique combination between the tractability of linear systems and the dynamical versatility of nonlinear systems, and thus have been widely used in computational neuroscience. They were first proposed as a model for the lateral eye of the horseshoe crab in

[20] and their dynamical behavior has been studied at least as early as [21]. A detailed stability analysis of symmetric (undirected) linear-threshold networks has been carried out in continuous [22] and discrete [23] time: however, this has limited relevance for biological neuronal networks, which are fundamentally asymmetric (due to the presence of excitatory and inhibitory neurons). An initial summary of the properties of general (possibly asymmetric) networks, including the existence and uniqueness of equilibria and asymptotic stability was given in [24], with limited rigorous justification provided later in [25]. Lyapunov-based methods were used in a number of later studies for discrete-time linear-threshold networks [26, 27, 28], but the extension of these results to continuous-time dynamics, which has more relevance to biological neuronal networks, is not clear. In fact, the use of Lyapunov-based techniques in continuous-time networks has remained limited to planar dynamics [29] and restrictive conditions for boundedness of trajectories [30, 29]. Recently, [31] presents interesting properties of competitive (i.e., fully inhibitory) linear-threshold networks, particularly regarding the emergence of limit cycles. However, the majority of neurons in biological neuronal networks are excitatory, making the implications of these results limited. Moreover, all the preceding works are limited to networks with constant exogenous inputs whereas time-varying inputs are essential for modeling inter-layer connections in HSR.

A critical property of linear-threshold networks is that their nonlinearity, while enriching their behavior beyond that of linear systems, is piecewise linear. Accordingly, almost all the theoretical analysis of these networks builds upon the formulation of them as switched affine systems. There exists a vast literature on the analysis of general switched linear/affine systems, see, e.g., [32, 33, 34]. Nevertheless, we have found that the conditions obtained by applying these results to linear-threshold dynamics are more conservative than the ones we obtain using direct analysis of the system dynamics. This is mainly due to the fact that such results, by the essence of their generality, are oblivious to the particular structure of linear-threshold dynamics that can be leveraged in direct analysis.

Selective inhibition has been the subject of extensive research in neuroscience. A number of early studies [4, 11, 12] provided evidence for a mechanism of selective visual attention based on a biased competition between the subnetwork of task-relevant nodes and the subnetwork of task-irrelevant ones. In this model, nodes belonging to these subnetworks compete at each layer by mutually suppressing the activity of each other, and this competition is biased towards task-relevant nodes by the layer immediately above. Later studies [13, 14] further supported this theory using functional magnetic resonance imaging (fMRI) and showed [35], in particular, the suppression of activity of task-irrelevant nodes as a result of GDSA. This suppression of activity is further shown to occur in multiple layers along the hierarchy [36], grow with increasing attention [37, 38], and be inversely related to the power of the task-irrelevant nodes’ state trajectories in the alpha frequency band (-[16]. Here, we use insights from this body of work in developing a theoretical framework for selective inhibition.

### Statement of contributions

The contributions of the paper are twofold. First, we analyze the internal dynamics of a single-layer linear-threshold network as a basis for our study of hierarchical structures. Our results are a combination of previously known results (for which we give simpler proofs) and novel ones, providing a comprehensive characterization of the dynamical properties of linear-threshold networks. Specifically, we show that existence and uniqueness of equilibria, asymptotic stability, and boundedness of trajectories can be characterized using simple algebraic conditions on the network structure in terms of the class of P-matrices (matrices with positive principal minors), totally-Hurwitz matrices (those with Hurwitz principal submatrices, shown to be a sub-class of P-matrices), and Schur-stable matrices, respectively. In addition to forming the basis of HSR, these results solve some of the long-standing open problems in the characterization of linear-threshold networks and are of independent interest. Our second contribution pertains the problem of selective inhibition in a bilayer network composed of two subnetworks. Motivated by the mechanisms of inhibition in the brain, we study feedforward and feedback inhibition mechanisms. We provide necessary and sufficient conditions on the network structure that guarantee selective inhibition of task-irrelevant nodes at the lower-level while simultaneously guaranteeing various dynamical properties of the resulting (partly inhibited, partly active) subnetwork, including existence and uniqueness of equilibria and asymptotic stability. Interestingly, under both mechanisms, these conditions require that (i) there exist at least as many independent inhibitory control inputs as the number of nodes to be inhibited, and (ii) the (not-inhibited) task-relevant part of the lower-level subnetwork intrinsically satisfies the desired dynamical properties. Therefore, when sufficiently many inhibitory control inputs exist, the intrinsic dynamical properties of the task-relevant part are the sole determiner of the dynamical properties achievable under feedforward and feedback inhibitory control. This is particularly important for selective inhibition as asymptotic stability underlies it. These results unveil the important role of task-relevant nodes in constraining the dynamical properties achievable under selective inhibition and have further implications for the number and centrality of nodes that need to be inhibited in order for an unstable-in-isolation subnetwork to gain stability through selective inhibition. For subnetworks that are not stable as a whole, these results provide conditions on the task-relevant/irrelevant partitioning of the nodes that allow for stabilization using inhibitory control.

## Ii Preliminaries

Here, we introduce notational conventions and review basic concepts on matrix analysis and modeling of biological neuronal networks.

### Notation

We use , , and

to denote the set of reals, nonnegative reals, and nonpositive reals, respectively. We use bold-faced letters for vectors and matrices.

, , , and stand for the -vector of all ones, the -vector of all zeros, the -by-zero matrix, and the identity -by- matrix (we omit the subscripts when clear from the context). Given a vector , and refer to its th component. Given , refers to the th entry. For block-partitioned and , , , and refer to the th block of , th block (e.g., row) of , and th block of , respectively. In block representation of matrices, denotes arbitrary blocks whose value is immaterial to the discussion. For , denotes the subspace of spanned by the columns of . If and are vectors, denotes for all . For symmetric , () denotes that is positive (negative) definite. Given , its element-wise absolute value, determinant, spectral radius, and induced -norm are denoted by , , , and , respectively. Similarly, for , is its -norm. For , we make the convention that denotes the diagonal matrix with the elements of on its diagonal. Likewise, for two matrices and , denotes the block-diagonal matrix with and on its diagonal. Given a subspace of a vector space , denotes the orthogonal complement of  in . For a set , denotes its cardinality. For , , which is extended entry-wise to vectors and matrices.

In , a hyper-plane with normal vector passing through is the -dimensional space . A set of hyperplanes is degenerate [39] if their intersection is a point or, equivalently, the matrix composed of their normal vectors is nonsingular. A set is called

• a polytope if it has the form for some ,

• a cone if for any and ,

• a translated cone apexed at if is a cone,

• convex if for any ,

• solid if it has a non-empty interior.

Throughout the paper, measure-theoretic statements are meant in the Lebesgue sense.

### Matrix Analysis

Here, we define and characterize several matrix classes of interest and their inclusion relationships. These matrices play a key role in the forthcoming discussion.

###### Definition II.1.

(Matrix classes). A matrix (not necessarily symmetric) is

1. absolutely Schur stable if ;

2. totally -stable, denoted , if there exists such that for all and ;

3. totally Hurwitz, denoted , if all the principal submatrices of are Hurwitz;

4. a P-matrix, denoted , if all the principal minors of are positive.

In working with P-matrices, the principal pivot transform of a matrix plays an important role. Given

 A=[A11A12A21A22],

with nonsingular , its principal pivot transform is the matrix

 π(A)≜[A11−A12A−122A21A12A−122−A−122A21A−122].

Note that . The next result formalizes several equivalent characterizations of P-matrices.

###### Lemma II.2.

(Properties of P-matrices [40, 41]). is a P-matrix if and only if any of the following holds:

1. is a P-matrix;

2. all real eigenvalues of all the principal submatrices of

are positive;

3. for any there is such that ;

4. the principal pivot transform of is a P-matrix.

The next result states inclusion relationships among the matrix classes in Definition II.1 that will be used in our ensuing discussion.

###### Lemma II.3.

(Inclusions among matrix classes). For , we have

1. ;

2. ;

3. ;

4. .

###### Proof.

(i). From [42, Fact 4.11.19], we have that for any principal submatrix of , which in turn implies by [42, Fact 4.11.17], implying the result.

(ii) It is straightforward to check that satisfies for all .

(iii) Pick an arbitrary and let the permutation be such that

 ΠΣWΠT=[00^W21^W22],

where is the principal submatrix of corresponding to . Then

 P(−I+ΣW) =ΠT(ΠPΠT^P[−I0^W21−I+^W22])Π =ΠT[⋆⋆⋆^P22(−I+^W22)]Π,

where . Thus, by assumption,

 ΠT[⋆⋆⋆(−I+^WT22)^P22+^P22(−I+^W22)]Π<0 ⇒[⋆⋆⋆(−I+^WT22)^P22+^P22(−I+^W22)]<0 ⇒(−I+^WT22)^P22+^P22(−I+^W22)<0,

proving that is Hurwitz. Since is arbitrary, is totally Hurwitz.

(iv) The result follows from Lemma II.2(ii). ∎

For a general matrix , neither of and is bounded by the other. However, if satisfies the Dale’s law (as biological neuronal networks do), i.e., each column is either nonnegative or nonpositive, then where is a diagonal matrix such that . Then,

 ∥W∥ =√ρ(WWT)=√ρ(|W|DD|W|T) =√ρ(|W||W|T)=∥|W|∥≥ρ(|W|),

showing that, in this case, is a less restrictive condition. Figure 1 depicts a Venn diagram of the various matrix classes of interest to help visualize their relationships.

### Dynamical Rate Models of Brain Networks

Here we briefly review, following [43, §7], the fundamental concepts and assumptions that underlie the linear-threshold network model used throughout the paper. In a lumped model, neural circuits are composed of neurons, each receiving an electrical signal at its dendrites from other neurons and generating an electrical response to other neurons at its axon. The transmission of activity from one neuron to another takes place at a synapse, thus the terms pre-synaptic and post-synaptic for the two neurons, respectively. Both the input and output signals mainly consist of a sequence of spikes (action-potentials), as shown in Figure 2 (top panel), which are modeled as impulse trains of the form

 ρ(t)=∑kδ(t−tk),

where denotes the Dirac delta function. In many brain areas the exact timing of seems essentially random, with the information mainly encoded in its firing rate (number of spikes per second). Thus, is modeled as the sample path of an inhomogeneous Poisson point process with rate, say, (cf. Figure 2, bottom panel).

Now, consider a pair of pre- and post-synaptic neurons with rates and , respectively. As a result of , an electrical current forms in the post-synaptic neuron’s dendrites and soma (body). Assuming fast synaptic dynamics, . Let be the proportionality constant, so . The pre-synaptic neuron is called excitatory if and inhibitory if . In other words, excitatory neurons increase the activity of their out-neighbors while inhibitory neurons decrease it. Notice that this is a property of neurons, not synapses, so a neuron either excites all its out-neighbors or inhibits them.

If the post-synaptic neuron receives input from multiple neurons, follows a superposition law,

 Ipost(t)=∑jwpost,jxj(t), (1)

where the sum is taken over its in-neighbors. If is constant, the post-synaptic rate follows , where is a nonlinear “response function”. Among the two widely used response functions, namely, sigmoidal and linear-threshold, we use the latter: . Finally, if is time-varying, “lags” with a time constant , i.e.,

 τ˙xpost(t)=−xpost(t)+[Ipost(t)]+. (2)

Equations (1)-(2) are the basis for our network model described next.

## Iii Problem Formulation

Consider a network of neurons evolving according to (1)-(2). Since the number of neurons in a brain region is very large, it is common to consider a population of neurons with similar activation patterns as a single node. The “firing rate” of such a node is then defined as the average of the individual firing rates. This convention also has the advantage of getting more consistent rates, as the firing pattern of individual neurons may be at times sparse. Accordingly, we use “node” and “population” interchangeably.444Our discussion is nevertheless valid irrespective of whether network nodes represent individual neurons or groups of them. Combining the nodal rates in a vector and synaptic weights in a matrix , we obtain, according to (1)-(2), the linear-threshold network dynamics

 τ˙x(t)=−x(t)+[Wx(t)+d(t)]+,x(0)=x0. (3)

The extra term captures the external inputs to the network including un-modeled background activity and possibly nonzero thresholds (i.e., if a node becomes active when its net input exceeds a threshold other than zero). Note that the right-hand side of (3) is a continuous (though not smooth) vector field in and thus solutions, in the classical sense, are well defined.

Consistent with the vision for hierarchical selective recruitment (HSR) outlined in Section I, we consider multiple interconnected layers of linear-threshold networks of the form (3), as depicted in Figure 3. We denote by the overall neuronal network of networks and by the number of layers. We use and to denote, respectively, the subnetworks of composed of task-relevant and task-irrelevant nodes. For each layer , we use , , and to denote the corresponding subnetwork, and its task-relevant and task-irrelevant sub-subnetworks, respectively, cf. Figure 3. We make the convention that is higher in the hierarchy than if . The activity of each subnetwork is modeled by linear-threshold dynamics of the form (3), where the term represents the internal subnetwork connectivity and the term incorporates, in addition to background activity and nonzero thresholds, the incoming signals from other subnetworks .

It is worth noticing that each layer of the network of networks exhibits rich dynamical behavior when considered in isolation. Indeed, simulations of the dynamics (3) with random instances of and constant reveal that

• locally, the dynamics may have zero, one, or many equilibrium points, where each equilibrium may be stable or unstable independently of others,

• globally, the dynamics is capable of exhibiting different nonlinear phenomena such as limit cycles, multi-stability, and chaos,

• the state trajectories grow unbounded (in reality until saturation) if the excitatory subnetwork is sufficiently strong.

This richness of behavior can only increase if the layer is subject to a time-varying external input , and in particular when interconnected with other layers in the network of networks. Motivated by these observations, our ultimate goal in this work is to characterize the dynamics of complex networks composed of hierarchically-connected linear-threshold subnetworks and the conditions under which their collective dynamics can give rise to HSR. Specifically, we tackle the following problems:

1. [wide]

2. the analysis of the relationship between structure () and dynamical behavior (basic properties such as existence and uniqueness of equilibria (EUE), asymptotic stability, and boundedness of trajectories) for each subnetwork when operating in isolation from the rest of the network ();

3. the analysis of the conditions on the joint structure of each two successive layers and that allows for selective inhibition of by its input from , being equivalent to the stabilization of to (inactivity);

4. the analysis of the conditions on the joint structure of each two successive layers and that allows for top-down recruitment of by its input from , being equivalent to the stabilization of toward a desired trajectory set by (activity);

5. the combination of (ii) and (iii) in a unified framework and its extension to the complete -layer network of networks.

Problems (i) and (ii) are the focus of this paper, whereas we address problems (iii) and (iv) in the accompanying work [19].

## Iv Internal Dynamics of Single-Layer Networks

In this section, we provide an in-depth study of the basic dynamical properties of the network dynamics (3) in isolation. In such case, the external input boils down to background activity and possibly nonzero thresholds, which are constant relative to the timescale . The dynamics (3) thus simplify to

 τ˙x(t)=−x(t)+[Wx(t)+d]+,t≥0. (4)

In the following, we derive conditions in terms of the network structure for EUE, local/global asymptotic stability, and boundedness of trajectories.

### Iv-a Dynamics as Switched Affine System

The nonlinear dynamics (4) is a switched affine system with modes. Each mode of this system corresponds to a switching index , where for each , if the node is active (i.e., ) and if the node is inactive (i.e., ). Clearly, the mode of the system varies with time and within the one corresponding to , we have

 [Wx(t)+d]+=Σ(Wx(t)+d),

where . This switched representation of the dynamics motivates the following assumptions on the weight matrix .

###### Assumption 1.

Assume

1. ;

2. for all the matrices .

Assumption 1 is not a restriction in practice since the set of matrices for which it is not satisfied can be expressed as a finite union of measure-zero sets, and hence has measure zero. By Assumption 1(i), the system of equations defines a non-degenerate set of hyperplanes partitioning into solid convex polytopic translated cones apexed at . For each , let be the associated switching region,

 Ωσ={x∈R≥0|(2Σ−I)(Wx+d)≥0}.

The piecewise-affine dynamics (4) can be written in the equivalent form

 τ˙x=(−I+ΣW)x+Σd,∀x∈Ωσ, σ∈{0,1}n. (5)

Unlike linear systems, the existence of equilibria is not guaranteed for this system. In fact, for each , according to (5), the point

 x∗σ=x∗σ(d)=(I−ΣW)−1Σd, (6)

is the corresponding equilibrium candidate. This equilibrium candidate is indeed an equilibrium if it belongs to the switching region where the description (5) is valid. We next identify conditions for this to be true.

### Iv-B Existence and Uniqueness of Equilibria

Here we characterize the EUE for the dynamics (4). Given , define the equilibria set-valued map by

 h(d)≜{x∈Rn≥0|x=[Wx+d]+}. (7)

The map  can, in particular, take empty values. EUE then precisely corresponds to being single-valued on . If so, with a slight abuse of notation, we take to be an ordinary function.

From the definition (6) of equilibrium candidate, note that if and only if . Then, using Assumption 1, and after some manipulations, we have

 Wx∗σ+d =W(I−ΣW)−1Σd+d (8) =(W−1−Σ)−1Σd+d =(I−WΣ)−1WΣd+d =[(I−WΣ)−1WΣ+I]d=(I−WΣ)−1d.

Therefore,

 x∗σ∈h(d) ⇔x∗σ∈Ωσ ⇔(2Σ−I)(I−WΣ)−1≜Mσd≥0. (9)

Accordingly, for , let

 Δσ≜{d∈Rn|Mσd≥0},

be the set of external inputs such that (4) has an equilibrium in , which is a closed convex polytopic cone.

Note that if for exactly one , then a unique equilibrium exists according to (IV-B). However, when for multiple , the network may have either multiple equilibria or a unique one lying on the boundary between . The next result shows that the quantities can be used to distinguish between these two latter cases.

###### Lemma IV.1.

(Existence of multiple equilibria). Assume satisfies Assumption 1, is arbitrary, and is defined as in (IV-B) for . If there exist such that , then if and only if .

###### Proof.

Clearly,

 x∗σ1=x∗σ2 ⇔Wx∗σ1+d=Wx∗σ2+d ⇔(I−WΣ1)−1d=(I−WΣ2)−1d, (10)

where we have used (8). Since both and are nonnegative, (IV-B) holds if and only if for any such that , which is equivalent to . ∎

Our next result provides an optimization-based condition for EUE that is both necessary and sufficient.

###### Proposition IV.2.

(Optimization-based condition for EUE). Let satisfy Assumption 1 and be as defined in (IV-B) for . For , define and to be the largest and second largest elements of the set

 {mini=1,…,n(Mσd)i|σ∈{0,1}n},

respectively. Then, (4) has a unique equilibrium for each if and only if

 max∥d∥=1μ1(d)μ2(d)<0. (11)
###### Proof.

First, note that is a degenerate case where the origin is the unique equilibrium belonging to all . For any and , if and only if . Thus, EUE for all and all are equivalent. The result then follows from Lemma IV.1 and the fact that, for any ,

 μ1(d)μ2(d)<0 ⇔∃ unique σ∈{0,1}nMσd≥0.

The optimization involved in (11) is usually highly non-convex. However, since the search space is compact, global search methods can be used to verify (11) numerically. Next, we give our main result regarding the EUE that not only is verifiable analytically but also provides insight into the class of matrices that satisfy EUE.

###### Theorem IV.3.

(Existence and uniqueness of equilibria). Consider the network dynamics (4) and assume the weight matrix satisfies Assumption 1. Then, (4) has a unique equilibrium for each if .

###### Proof.

The uniqueness can be shown as a corollary to both [46, Thm 5.3] and [47, Thm 2.2]. However, we provide in the following a novel proof technique based on (IV-B) that establishes both existence and uniqueness. From Lemma IV.1, we know that, for any , (4) has a unique equilibrium if and only if exactly one element of is nonnegative. To check this, we need to check whether

 ∃σ1,σ2∈{0,1}ns.t.Mσ1d≥0  and  Mσ2d≥0.

Note that this is equivalent to saying that there exist such that . A more general question, which is relevant to our discussion, is whether

 M−1σ1x=M−1σ2y,x,y∈On∖{0}, (12)

for any orthant of (including as a special case). Note that is a trivial case and thus excluded. This is the question we analyze in the following. Notice that for any ,

 M−1σ=(I−WΣ)(2Σ−I)=(2Σ−I)−WΣ. (13)

Since nodes can be relabeled arbitrarily, we can assume without loss of generality that and where . Then, it follows from (13) that

 M−1σ1 =⎡⎢ ⎢ ⎢ ⎢ ⎢⎣In1−W11−W1200−W21In2−W2200−W31−W32−In30−W41−W420−In4⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, M−1σ2 =⎡⎢ ⎢ ⎢ ⎢⎣In1−W110−W130−W21−In2−W230−W310In3−W330−W410−W43−In4⎤⎥ ⎥ ⎥ ⎥⎦,

where ’s are submatrices of with appropriate dimensions. Taking the inverse of as a -by- block-triangular matrix [42, Prop 2.8.7] (with the indicated blocks), we get

 Mσ1=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣[In1−W11−W12−W21In2−W22]−10−[W31W32W41W42][In1−W11−W12−W21In2−W22]−1−In3+n4⎤⎥ ⎥ ⎥ ⎥ ⎥⎦,

so direct multiplication gives , with

 B1 B2 B3 =−[W31W32W41W42]B1+[W310W410], B4 =−[W31W32W41W42]B2−[In3−W330−W43−In4].

With this, after some computations one can show that

 (14)

Let be the bracketed block in . Similarly, let denote the corresponding -dimensional sub-vectors in the decomposition of and , respectively. Then, according to Lemma A.1, (12) holds if and only if or

 y23,Γy23∈On2+n3∖{0}, (15)

for some orthant . To check (15), define

 Q =[Q11Q12Q21Q22]≜[[]ccIn1−W11−W12−W21In2−W22]−1, R

It can be shown that

Inverting the left-hand-side matrix as a 2-by-2 block matrix [42, Prop 2.8.7] (the first block is ) and applying the matrix inversion lemma [42, Cor 2.8.8] to the first block of the result, we obtain

 ⎡⎢ ⎢⎣In1−W11−W12−W13−W21In2−W22−W23−W31−W32In3−W33⎤⎥ ⎥⎦−1=⎡⎢ ⎢⎣⋆⋆⋆⋆^B1^B2⋆^B3^B4⎤⎥ ⎥⎦

where

 ^B1 =Q22+[Q21Q22][W13W23]R−1[W31W32][Q12Q22], ^B2 =[Q21Q22][W13W23]R−1, ^B3 =R−1[W31W32][Q12Q22],^B4=R−1.

Therefore, is the principal pivot transform of . Since , Lemma II.2(v) guarantees that , which by Lemma II.2(iv) implies that for every nonzero there exists an index such that , i.e., there does not exist any such that (15) holds, in which case, by (14), . Therefore, we have shown that, for any ,

 y,Mσ1M−1σ2y∈On for some orthant On ⇔y=Mσ1M−1σ2y⇔y23=0.

Recalling that , we have

 Mσ1d,Mσ2d∈ On for some % orthant On (16)

Using Lemma IV.1, the first equivalence in (16) for the particular case of shows the uniqueness of equilibrium. To prove existence, consider the following assignment procedure. For any , let be the orthant that contains (if there are more than one such orthants, we pick one arbitrarily). Define

 SOn={σ∈{0,1}n|Mσd∈On}.

If , we assign to . If , it follows from (16) that

1. is the same for all ; let be this shared value,

2. for some ,

3. there exists with such that for any , is the same for all ,

4. for all .

Due to (iv), belongs to the intersection of orthants ( and other orthants that differ from along the dimensions in ). Therefore, we can assign each of the elements of to its corresponding orthant from the orthants that contain . By repeating this procedure for all unassigned , we can uniquely assign every to an orthant in such that no two are assigned to the same orthant. Therefore, one must be assigned to the positive orthant , showing the existence of an equilibrium and completing the proof. ∎

The fact that the condition in Lemma II.2(iv) is an equivalent characterization of P-matrices suggests that the sufficient condition of Theorem IV.3 is tight. Indeed, extensive simulations with random matrices did not reveal any instance of that is not a P-matrix but for which (4) has a unique equilibrium for all (where we checked the latter using Proposition IV.2). This leads us to the following conjecture, which was also made in [24] as a claim without proof.

###### Conjecture IV.4.

(Necessity of for EUE). Assume the weight matrix satisfies Assumption 1. Then, (4) has a unique equilibrium for all if and only if .

###### Remark IV.5.

(Computational complexity of verifying ). Although the problem of determining whether a matrix is in is straightforward for small , it is known to be co-NP-complete [48], and thus expensive for large networks. Indeed, [49] shows that all the principal minors of have to be checked to prove (though disproving is usually much easier). In these cases, one may need to rely on more conservative sufficient conditions such as or (cf. Lemma II.3) to establish . These conditions, moreover, have the added benefit of providing intuitive connections between the distribution of synaptic weights, network size, and stability. We elaborate more on this point in Section V-C.

###### Example IV.6.

(Uniform excitatory-inhibitory networks). Consider a network of nodes in which are excitatory, are inhibitory, and the synaptic weight between any pair of nodes only depends on their type (the synaptic weight of any inhibitory-to-excitatory connection is , and similarly for ). Also, assume common external inputs