I Introduction
The human brain is constantly under the influx of sensory inputs and is responsible for integrating and interpreting them to generate appropriate decisions and actions. This influx contains not only the pieces of information relevant to the present task(s), but also a myriad of distractions. Goaldriven selective attention (GDSA) refers to the active selective processing of a subset of information influx while suppressing the effects of others, and is vital for the proper function of the brain.^{1}^{1}1Note the distinction of this with stimulusdriven selective attention (the reactive shift of focus based on saliency of stimuli) which is not the focus of this work. Examples of GDSA range from selective audition in a crowded place to selective vision in cluttered environments to selective taste/smell in food. As a result, a long standing question in neuroscience involves understanding the brain’s complex mechanisms underlying selective attention [2, 3, 4, 5, 6, 7].
A central element in addressing this question is the role played by the hierarchical organization of the brain [8]. Broadly, this organization places primary sensory and motor areas at the bottom and integrative association areas (prefrontal cortex in particular) at the top. Accordingly, sensory information is processed while flowing up the hierarchy, where decisions are eventually made and transmitted back down the hierarchy to generate motor actions.^{2}^{2}2Note that the role of memory (being distributed across the brain) is implicit in this simplified stimulusresponse description. Indeed, many sensory inputs only form memories (without motor response) or many motor actions result chiefly from memory (without sensory stimulation). The hierarchical aspect is, nevertheless, always present. The topdown direction is also responsible for GDSA, where the higherorder areas differentially “modulate” the activity of the lowerlevel areas such that only relevant information is further processed. This phenomenon, hereafter called hierarchical selective recruitment (HSR), constitutes the basis for GDSA and has been the subject of extensive experimental research in neuroscience, see e.g., [9, 10, 11, 12, 4, 13, 14, 15, 16, 17, 18]
. However, a complete understanding of how, when (how quick), or where (within the hierarchy) it occurs is still lacking. In particular, the relationship between HSR and the dynamics of the involved neuronal networks is poorly understood. Our goal is to address this gap from a modelbased perspective, resorting to controltheoretic tools to explain various aspects of HSR in terms of the synaptic network structure and the dynamics that emerge from it.
^{3}^{3}3We focus primarily on GDSA because of its long history in neuroscience. Nevertheless, HSR is a generic framework for topdown discrimination between two or more groups of competing populations that may represent thoughts, memories, emotions, or motor actions in addition to sensory information.The starting point in the development of HSR is the observation that different stimuli, in particular the taskrelevant and taskirrelevant ones, are processed by different populations of neurons. With each neuronal population represented by a node in the overall neuronal network of networks, HSR primarily relies on the selective inhibition of the taskirrelevant nodes and the topdown recruitment of the taskrelevant nodes of each layer by the layer immediately above. This scheme is motivated by the observations that GDSA is not confined to a single layer, but is rather distributed in a graded manner along the thalamocortical hierarchy [8]. Here, we analyze the dynamics of individual layers as well as the mechanisms for selective inhibition in a bilayer network. The results here set the basis for the study of the mechanisms for topdown recruitment in multilayer networks in our accompanying work [19].
Literature review
In this work we use dynamical networks with linearthreshold nonlinearities (also called rectified linear units, ReLU, in machine learning) to model the activity of neuronal populations. Linearthreshold models allow for a unique combination between the tractability of linear systems and the dynamical versatility of nonlinear systems, and thus have been widely used in computational neuroscience. They were first proposed as a model for the lateral eye of the horseshoe crab in
[20] and their dynamical behavior has been studied at least as early as [21]. A detailed stability analysis of symmetric (undirected) linearthreshold networks has been carried out in continuous [22] and discrete [23] time: however, this has limited relevance for biological neuronal networks, which are fundamentally asymmetric (due to the presence of excitatory and inhibitory neurons). An initial summary of the properties of general (possibly asymmetric) networks, including the existence and uniqueness of equilibria and asymptotic stability was given in [24], with limited rigorous justification provided later in [25]. Lyapunovbased methods were used in a number of later studies for discretetime linearthreshold networks [26, 27, 28], but the extension of these results to continuoustime dynamics, which has more relevance to biological neuronal networks, is not clear. In fact, the use of Lyapunovbased techniques in continuoustime networks has remained limited to planar dynamics [29] and restrictive conditions for boundedness of trajectories [30, 29]. Recently, [31] presents interesting properties of competitive (i.e., fully inhibitory) linearthreshold networks, particularly regarding the emergence of limit cycles. However, the majority of neurons in biological neuronal networks are excitatory, making the implications of these results limited. Moreover, all the preceding works are limited to networks with constant exogenous inputs whereas timevarying inputs are essential for modeling interlayer connections in HSR.A critical property of linearthreshold networks is that their nonlinearity, while enriching their behavior beyond that of linear systems, is piecewise linear. Accordingly, almost all the theoretical analysis of these networks builds upon the formulation of them as switched affine systems. There exists a vast literature on the analysis of general switched linear/affine systems, see, e.g., [32, 33, 34]. Nevertheless, we have found that the conditions obtained by applying these results to linearthreshold dynamics are more conservative than the ones we obtain using direct analysis of the system dynamics. This is mainly due to the fact that such results, by the essence of their generality, are oblivious to the particular structure of linearthreshold dynamics that can be leveraged in direct analysis.
Selective inhibition has been the subject of extensive research in neuroscience. A number of early studies [4, 11, 12] provided evidence for a mechanism of selective visual attention based on a biased competition between the subnetwork of taskrelevant nodes and the subnetwork of taskirrelevant ones. In this model, nodes belonging to these subnetworks compete at each layer by mutually suppressing the activity of each other, and this competition is biased towards taskrelevant nodes by the layer immediately above. Later studies [13, 14] further supported this theory using functional magnetic resonance imaging (fMRI) and showed [35], in particular, the suppression of activity of taskirrelevant nodes as a result of GDSA. This suppression of activity is further shown to occur in multiple layers along the hierarchy [36], grow with increasing attention [37, 38], and be inversely related to the power of the taskirrelevant nodes’ state trajectories in the alpha frequency band () [16]. Here, we use insights from this body of work in developing a theoretical framework for selective inhibition.
Statement of contributions
The contributions of the paper are twofold. First, we analyze the internal dynamics of a singlelayer linearthreshold network as a basis for our study of hierarchical structures. Our results are a combination of previously known results (for which we give simpler proofs) and novel ones, providing a comprehensive characterization of the dynamical properties of linearthreshold networks. Specifically, we show that existence and uniqueness of equilibria, asymptotic stability, and boundedness of trajectories can be characterized using simple algebraic conditions on the network structure in terms of the class of Pmatrices (matrices with positive principal minors), totallyHurwitz matrices (those with Hurwitz principal submatrices, shown to be a subclass of Pmatrices), and Schurstable matrices, respectively. In addition to forming the basis of HSR, these results solve some of the longstanding open problems in the characterization of linearthreshold networks and are of independent interest. Our second contribution pertains the problem of selective inhibition in a bilayer network composed of two subnetworks. Motivated by the mechanisms of inhibition in the brain, we study feedforward and feedback inhibition mechanisms. We provide necessary and sufficient conditions on the network structure that guarantee selective inhibition of taskirrelevant nodes at the lowerlevel while simultaneously guaranteeing various dynamical properties of the resulting (partly inhibited, partly active) subnetwork, including existence and uniqueness of equilibria and asymptotic stability. Interestingly, under both mechanisms, these conditions require that (i) there exist at least as many independent inhibitory control inputs as the number of nodes to be inhibited, and (ii) the (notinhibited) taskrelevant part of the lowerlevel subnetwork intrinsically satisfies the desired dynamical properties. Therefore, when sufficiently many inhibitory control inputs exist, the intrinsic dynamical properties of the taskrelevant part are the sole determiner of the dynamical properties achievable under feedforward and feedback inhibitory control. This is particularly important for selective inhibition as asymptotic stability underlies it. These results unveil the important role of taskrelevant nodes in constraining the dynamical properties achievable under selective inhibition and have further implications for the number and centrality of nodes that need to be inhibited in order for an unstableinisolation subnetwork to gain stability through selective inhibition. For subnetworks that are not stable as a whole, these results provide conditions on the taskrelevant/irrelevant partitioning of the nodes that allow for stabilization using inhibitory control.
Ii Preliminaries
Here, we introduce notational conventions and review basic concepts on matrix analysis and modeling of biological neuronal networks.
Notation
We use , , and
to denote the set of reals, nonnegative reals, and nonpositive reals, respectively. We use boldfaced letters for vectors and matrices.
, , , and stand for the vector of all ones, the vector of all zeros, the byzero matrix, and the identity by matrix (we omit the subscripts when clear from the context). Given a vector , and refer to its th component. Given , refers to the th entry. For blockpartitioned and , , , and refer to the th block of , th block (e.g., row) of , and th block of , respectively. In block representation of matrices, denotes arbitrary blocks whose value is immaterial to the discussion. For , denotes the subspace of spanned by the columns of . If and are vectors, denotes for all . For symmetric , () denotes that is positive (negative) definite. Given , its elementwise absolute value, determinant, spectral radius, and induced norm are denoted by , , , and , respectively. Similarly, for , is its norm. For , we make the convention that denotes the diagonal matrix with the elements of on its diagonal. Likewise, for two matrices and , denotes the blockdiagonal matrix with and on its diagonal. Given a subspace of a vector space , denotes the orthogonal complement of in . For a set , denotes its cardinality. For , , which is extended entrywise to vectors and matrices.In , a hyperplane with normal vector passing through is the dimensional space . A set of hyperplanes is degenerate [39] if their intersection is a point or, equivalently, the matrix composed of their normal vectors is nonsingular. A set is called

a polytope if it has the form for some ,

a cone if for any and ,

a translated cone apexed at if is a cone,

convex if for any ,

solid if it has a nonempty interior.
Throughout the paper, measuretheoretic statements are meant in the Lebesgue sense.
Matrix Analysis
Here, we define and characterize several matrix classes of interest and their inclusion relationships. These matrices play a key role in the forthcoming discussion.
Definition II.1.
(Matrix classes). A matrix (not necessarily symmetric) is

absolutely Schur stable if ;

totally stable, denoted , if there exists such that for all and ;

totally Hurwitz, denoted , if all the principal submatrices of are Hurwitz;

a Pmatrix, denoted , if all the principal minors of are positive.
In working with Pmatrices, the principal pivot transform of a matrix plays an important role. Given
with nonsingular , its principal pivot transform is the matrix
Note that . The next result formalizes several equivalent characterizations of Pmatrices.
Lemma II.2.
The next result states inclusion relationships among the matrix classes in Definition II.1 that will be used in our ensuing discussion.
Lemma II.3.
(Inclusions among matrix classes). For , we have

;

;

;

.
Proof.
(i). From [42, Fact 4.11.19], we have that for any principal submatrix of , which in turn implies by [42, Fact 4.11.17], implying the result.
(ii) It is straightforward to check that satisfies for all .
(iii) Pick an arbitrary and let the permutation be such that
where is the principal submatrix of corresponding to . Then
where . Thus, by assumption,
proving that is Hurwitz. Since is arbitrary, is totally Hurwitz.
(iv) The result follows from Lemma II.2(ii). ∎
For a general matrix , neither of and is bounded by the other. However, if satisfies the Dale’s law (as biological neuronal networks do), i.e., each column is either nonnegative or nonpositive, then where is a diagonal matrix such that . Then,
showing that, in this case, is a less restrictive condition. Figure 1 depicts a Venn diagram of the various matrix classes of interest to help visualize their relationships.
Dynamical Rate Models of Brain Networks
Here we briefly review, following [43, §7], the fundamental concepts and assumptions that underlie the linearthreshold network model used throughout the paper. In a lumped model, neural circuits are composed of neurons, each receiving an electrical signal at its dendrites from other neurons and generating an electrical response to other neurons at its axon. The transmission of activity from one neuron to another takes place at a synapse, thus the terms presynaptic and postsynaptic for the two neurons, respectively. Both the input and output signals mainly consist of a sequence of spikes (actionpotentials), as shown in Figure 2 (top panel), which are modeled as impulse trains of the form
where denotes the Dirac delta function. In many brain areas the exact timing of seems essentially random, with the information mainly encoded in its firing rate (number of spikes per second). Thus, is modeled as the sample path of an inhomogeneous Poisson point process with rate, say, (cf. Figure 2, bottom panel).
Now, consider a pair of pre and postsynaptic neurons with rates and , respectively. As a result of , an electrical current forms in the postsynaptic neuron’s dendrites and soma (body). Assuming fast synaptic dynamics, . Let be the proportionality constant, so . The presynaptic neuron is called excitatory if and inhibitory if . In other words, excitatory neurons increase the activity of their outneighbors while inhibitory neurons decrease it. Notice that this is a property of neurons, not synapses, so a neuron either excites all its outneighbors or inhibits them.
If the postsynaptic neuron receives input from multiple neurons, follows a superposition law,
(1) 
where the sum is taken over its inneighbors. If is constant, the postsynaptic rate follows , where is a nonlinear “response function”. Among the two widely used response functions, namely, sigmoidal and linearthreshold, we use the latter: . Finally, if is timevarying, “lags” with a time constant , i.e.,
(2) 
Equations (1)(2) are the basis for our network model described next.
Iii Problem Formulation
Consider a network of neurons evolving according to (1)(2). Since the number of neurons in a brain region is very large, it is common to consider a population of neurons with similar activation patterns as a single node. The “firing rate” of such a node is then defined as the average of the individual firing rates. This convention also has the advantage of getting more consistent rates, as the firing pattern of individual neurons may be at times sparse. Accordingly, we use “node” and “population” interchangeably.^{4}^{4}4Our discussion is nevertheless valid irrespective of whether network nodes represent individual neurons or groups of them. Combining the nodal rates in a vector and synaptic weights in a matrix , we obtain, according to (1)(2), the linearthreshold network dynamics
(3) 
The extra term captures the external inputs to the network including unmodeled background activity and possibly nonzero thresholds (i.e., if a node becomes active when its net input exceeds a threshold other than zero). Note that the righthand side of (3) is a continuous (though not smooth) vector field in and thus solutions, in the classical sense, are well defined.
Consistent with the vision for hierarchical selective recruitment (HSR) outlined in Section I, we consider multiple interconnected layers of linearthreshold networks of the form (3), as depicted in Figure 3. We denote by the overall neuronal network of networks and by the number of layers. We use and to denote, respectively, the subnetworks of composed of taskrelevant and taskirrelevant nodes. For each layer , we use , , and to denote the corresponding subnetwork, and its taskrelevant and taskirrelevant subsubnetworks, respectively, cf. Figure 3. We make the convention that is higher in the hierarchy than if . The activity of each subnetwork is modeled by linearthreshold dynamics of the form (3), where the term represents the internal subnetwork connectivity and the term incorporates, in addition to background activity and nonzero thresholds, the incoming signals from other subnetworks .
It is worth noticing that each layer of the network of networks exhibits rich dynamical behavior when considered in isolation. Indeed, simulations of the dynamics (3) with random instances of and constant reveal that

locally, the dynamics may have zero, one, or many equilibrium points, where each equilibrium may be stable or unstable independently of others,

globally, the dynamics is capable of exhibiting different nonlinear phenomena such as limit cycles, multistability, and chaos,

the state trajectories grow unbounded (in reality until saturation) if the excitatory subnetwork is sufficiently strong.
This richness of behavior can only increase if the layer is subject to a timevarying external input , and in particular when interconnected with other layers in the network of networks. Motivated by these observations, our ultimate goal in this work is to characterize the dynamics of complex networks composed of hierarchicallyconnected linearthreshold subnetworks and the conditions under which their collective dynamics can give rise to HSR. Specifically, we tackle the following problems:

[wide]

the analysis of the relationship between structure () and dynamical behavior (basic properties such as existence and uniqueness of equilibria (EUE), asymptotic stability, and boundedness of trajectories) for each subnetwork when operating in isolation from the rest of the network ();

the analysis of the conditions on the joint structure of each two successive layers and that allows for selective inhibition of by its input from , being equivalent to the stabilization of to (inactivity);

the analysis of the conditions on the joint structure of each two successive layers and that allows for topdown recruitment of by its input from , being equivalent to the stabilization of toward a desired trajectory set by (activity);

the combination of (ii) and (iii) in a unified framework and its extension to the complete layer network of networks.
Problems (i) and (ii) are the focus of this paper, whereas we address problems (iii) and (iv) in the accompanying work [19].
Iv Internal Dynamics of SingleLayer Networks
In this section, we provide an indepth study of the basic dynamical properties of the network dynamics (3) in isolation. In such case, the external input boils down to background activity and possibly nonzero thresholds, which are constant relative to the timescale . The dynamics (3) thus simplify to
(4) 
In the following, we derive conditions in terms of the network structure for EUE, local/global asymptotic stability, and boundedness of trajectories.
Iva Dynamics as Switched Affine System
The nonlinear dynamics (4) is a switched affine system with modes. Each mode of this system corresponds to a switching index , where for each , if the node is active (i.e., ) and if the node is inactive (i.e., ). Clearly, the mode of the system varies with time and within the one corresponding to , we have
where . This switched representation of the dynamics motivates the following assumptions on the weight matrix .
Assumption 1.
Assume

;

for all the matrices .
Assumption 1 is not a restriction in practice since the set of matrices for which it is not satisfied can be expressed as a finite union of measurezero sets, and hence has measure zero. By Assumption 1(i), the system of equations defines a nondegenerate set of hyperplanes partitioning into solid convex polytopic translated cones apexed at . For each , let be the associated switching region,
The piecewiseaffine dynamics (4) can be written in the equivalent form
(5) 
Unlike linear systems, the existence of equilibria is not guaranteed for this system. In fact, for each , according to (5), the point
(6) 
is the corresponding equilibrium candidate. This equilibrium candidate is indeed an equilibrium if it belongs to the switching region where the description (5) is valid. We next identify conditions for this to be true.
IvB Existence and Uniqueness of Equilibria
Here we characterize the EUE for the dynamics (4). Given , define the equilibria setvalued map by
(7) 
The map can, in particular, take empty values. EUE then precisely corresponds to being singlevalued on . If so, with a slight abuse of notation, we take to be an ordinary function.
From the definition (6) of equilibrium candidate, note that if and only if . Then, using Assumption 1, and after some manipulations, we have
(8)  
Therefore,
(9) 
Accordingly, for , let
be the set of external inputs such that (4) has an equilibrium in , which is a closed convex polytopic cone.
Note that if for exactly one , then a unique equilibrium exists according to (IVB). However, when for multiple , the network may have either multiple equilibria or a unique one lying on the boundary between . The next result shows that the quantities can be used to distinguish between these two latter cases.
Lemma IV.1.
Proof.
Our next result provides an optimizationbased condition for EUE that is both necessary and sufficient.
Proposition IV.2.
Proof.
First, note that is a degenerate case where the origin is the unique equilibrium belonging to all . For any and , if and only if . Thus, EUE for all and all are equivalent. The result then follows from Lemma IV.1 and the fact that, for any ,
∎
The optimization involved in (11) is usually highly nonconvex. However, since the search space is compact, global search methods can be used to verify (11) numerically. Next, we give our main result regarding the EUE that not only is verifiable analytically but also provides insight into the class of matrices that satisfy EUE.
Theorem IV.3.
Proof.
The uniqueness can be shown as a corollary to both [46, Thm 5.3] and [47, Thm 2.2]. However, we provide in the following a novel proof technique based on (IVB) that establishes both existence and uniqueness. From Lemma IV.1, we know that, for any , (4) has a unique equilibrium if and only if exactly one element of is nonnegative. To check this, we need to check whether
Note that this is equivalent to saying that there exist such that . A more general question, which is relevant to our discussion, is whether
(12) 
for any orthant of (including as a special case). Note that is a trivial case and thus excluded. This is the question we analyze in the following. Notice that for any ,
(13) 
Since nodes can be relabeled arbitrarily, we can assume without loss of generality that and where . Then, it follows from (13) that
where ’s are submatrices of with appropriate dimensions. Taking the inverse of as a by blocktriangular matrix [42, Prop 2.8.7] (with the indicated blocks), we get
so direct multiplication gives , with
With this, after some computations one can show that
(14) 
Let be the bracketed block in . Similarly, let denote the corresponding dimensional subvectors in the decomposition of and , respectively. Then, according to Lemma A.1, (12) holds if and only if or
(15) 
for some orthant . To check (15), define
It can be shown that
Inverting the lefthandside matrix as a 2by2 block matrix [42, Prop 2.8.7] (the first block is ) and applying the matrix inversion lemma [42, Cor 2.8.8] to the first block of the result, we obtain
where
Therefore, is the principal pivot transform of . Since , Lemma II.2(v) guarantees that , which by Lemma II.2(iv) implies that for every nonzero there exists an index such that , i.e., there does not exist any such that (15) holds, in which case, by (14), . Therefore, we have shown that, for any ,
Recalling that , we have
(16)  
Using Lemma IV.1, the first equivalence in (16) for the particular case of shows the uniqueness of equilibrium. To prove existence, consider the following assignment procedure. For any , let be the orthant that contains (if there are more than one such orthants, we pick one arbitrarily). Define
If , we assign to . If , it follows from (16) that

is the same for all ; let be this shared value,

for some ,

there exists with such that for any , is the same for all ,

for all .
Due to (iv), belongs to the intersection of orthants ( and other orthants that differ from along the dimensions in ). Therefore, we can assign each of the elements of to its corresponding orthant from the orthants that contain . By repeating this procedure for all unassigned , we can uniquely assign every to an orthant in such that no two are assigned to the same orthant. Therefore, one must be assigned to the positive orthant , showing the existence of an equilibrium and completing the proof. ∎
The fact that the condition in Lemma II.2(iv) is an equivalent characterization of Pmatrices suggests that the sufficient condition of Theorem IV.3 is tight. Indeed, extensive simulations with random matrices did not reveal any instance of that is not a Pmatrix but for which (4) has a unique equilibrium for all (where we checked the latter using Proposition IV.2). This leads us to the following conjecture, which was also made in [24] as a claim without proof.
Conjecture IV.4.
Remark IV.5.
(Computational complexity of verifying ). Although the problem of determining whether a matrix is in is straightforward for small , it is known to be coNPcomplete [48], and thus expensive for large networks. Indeed, [49] shows that all the principal minors of have to be checked to prove (though disproving is usually much easier). In these cases, one may need to rely on more conservative sufficient conditions such as or (cf. Lemma II.3) to establish . These conditions, moreover, have the added benefit of providing intuitive connections between the distribution of synaptic weights, network size, and stability. We elaborate more on this point in Section VC.
Example IV.6.
(Uniform excitatoryinhibitory networks). Consider a network of nodes in which are excitatory, are inhibitory, and the synaptic weight between any pair of nodes only depends on their type (the synaptic weight of any inhibitorytoexcitatory connection is , and similarly for ). Also, assume common external inputs