Severability of mesoscale components and local time scales in dynamical networks

06/04/2020 ∙ by Yun William Yu, et al. ∙ 0

A major goal of dynamical systems theory is the search for simplified descriptions of the dynamics of a large number of interacting states. For overwhelmingly complex dynamical systems, the derivation of a reduced description on the entire dynamics at once is computationally infeasible. Other complex systems are so expansive that despite the continual onslaught of new data only partial information is available. To address this challenge, we define and optimise for a local quality function severability for measuring the dynamical coherency of a set of states over time. The theoretical underpinnings of severability lie in our local adaptation of the Simon-Ando-Fisher time-scale separation theorem, which formalises the intuition of local wells in the Markov landscape of a dynamical process, or the separation between a microscopic and a macroscopic dynamics. Finally, we demonstrate the practical relevance of severability by applying it to examples drawn from power networks, image segmentation, social networks, metabolic networks, and word association.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 7

page 8

page 21

page 24

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Results

i.1 Severability: mixing and retention in Markov landscapes

Figure 1: (a) Left column: small excerpts from three paintings by Theo van Doesburg’s Composition in dissonances (1919) (top), Paul Klee’s Ancient Sound (1925) (middle), and Claude Monet’s The Japanese Footbridge (1920-22) (bottom). Right column: associated luminosity landscapes obtained from the transition matrix derived from the graph representation of each image. Visualizing the luminosity landscape in van Doesburg’s painting reveals coherent spatio-temporal structures insulated by high barriers, and within which no obstacle would slow down a random walker. On the other extreme, Monet’s rough landscape, when looking at luminosity (i.e. the perceived brightness), is almost featureless with no obvious components. Klee’s landscape is intermediate with significant internal roughness yet noticeable barriers. The balance between the barrier height and the intrinsic roughness translates over to the emergence of components: barrier heights are inversely related to inter-component connection strength and determine escape time, whereas the roughness is inversely related to intra-component connection strength and determines mixing time. (b) The “Markovian” landscape of a hierarchical random graph with three levels and groups of sizes 16, 64, and 256. If a pair of nodes are in the same lowest level size 16 component, they are connected with probability ; else if in the same size 64 group, they are connected with probability ; and everything else is connected with probability . This resulted in an average degree . The severability of a single node (blue circles), 16-node component (green triangles), 64-node component (red diamond), and the entire network (cyan square) are represented as a function of the time evolved for the Markov process. The succession of optimal severabilities at different time scales reveals the hierarchy of mesoscale structures containing a node of interest.

To introduce our method, we draw an analogy from energy landscapes Wales (2006). In particular, we consider the Markov landscape defined by the transition matrix of the standard random walk on a graph, where the nodes (or vertices) of the graph correspond to states and the landscape reflects the transition probabilities between them. Markov landscapes are analogous to energy landscapes although they lack a potential energy function pointing downwards to a minimum energy state. Still, the notions of wells, barriers and roughness translate easily and helpfully to the language of time scale separation. In this picture, a well is a group of states surrounded by high barriers (hence with a long escape time), whereas roughness inside the well is related to the mixing time (low roughness implies a fast mixing time). An illustration of such a landscape can be found in Figure 1a, where we present a 3D representation of the luminosity landscapes of three paintings with very different characteristics; from a well compartmentalised painting by van Doesburg to a rough, featureless excerpt of Monet. In this case, the barriers and roughness are obtained from differences in luminosity of adjacent pixels. If a random walker (e.g. De Gennes’ ant in a labyrinth de Gennes (1976)) is allowed to explore van Doesburg’s luminosity landscape, the observed dynamics will reveal the presence of severable components in the state space; on the other hand, no such components would be expected in Monet’s landscape.

Mathematically, a subset of states of a system is defined to be a severable component if it has both high barriers and low roughness, as extracted by the behaviour of the random walkers on the underlying landscape. As shown below in a precise sense (see Section III.1), such a severable component can be understood as a mesoscale dynamical structure, i.e., a set of states that behave coherently in the eyes of the external environment and which capture a relevant description of the system sitting between individual nodes and the global system. To formalise these notions, we borrow the concepts of mixing and retention from Markov chain quasi-stationarity Darroch and Seneta (1965), as follows. First, we introduce a measure of the mixing over a set of states by appealing to a random walker restricted to those states; is poorly mixing over a timespan if the random walker’s position at Markov times and are strongly correlated. More precisely, we measure mixing by defining a quantity

which measures the total variation distance between the probability distribution over

at time and the quasi-stationary distribution reached at long times, should the walkers remain in (Eq. (5) in Methods). The mixing is thus inversely related to the roughness of the landscape over , since the exploration of is hindered by the roughness of the landscape. Secondly, we characterise the retention over the set , which is directly related to the height of the barriers separating the set of states from the rest of the system, i.e., random walkers tend to stay within if it is hard for them to escape. This is quantified by , a number between and defined as the probability of a walker not escaping by time (Eq. (4) in Methods). Both and therefore range from 0 to 1, where the value of 1 corresponds to perfect retention or mixing, respectively.

We now simply define the severability of the set at time scale as

(1)

Severability can be understood as a compound function that balances mixing and retention for a given set of states over the time scale . If corresponds to a mesoscale dynamical structure, its severability will peak at some time , below which the walkers are poorly mixed and beyond which retention is degraded. In a connected network, the individual node and the entire graph will respectively have good severabilities for Markov time and for the trivial reason: at , retention and mixing will be perfect for any individual node because nothing has diffused, and at , the probabilities will have reached the ultimate stationary distribution, implying perfect mixing coupled with the always perfect retention of the entire graph. At intermediate timescales, severable structures are of intermediate size, based on a combination of mixing and retention; on grid graphs, these optimally severable structures slowly expand with Markov time as higher times allow for mixing of larger diameter regions (Figure 1). Less uniform graphs have more interesting substructures; optimally severable structures remain so over a range of Markov times before jumping in size to another plateau.

This notion is illustrated in Figure 1b, where we show how the process of diffusion of information on a very simple model of a network, given by a hierarchical random graph with three levels of 16, 64 and 256 nodes, leads to severable components of the state space. As the time of diffusion increases, the random walkers gain sufficient probability to overcome the barriers of the landscape, and hence diffuse to larger portions of the network so that the optimal severable components grow from being single nodes at very short times, through each of the intermediate levels over different time scales, to the entire network at long times.

The components of an interconnected dynamical system with high severability have a precise mathematical meaning in terms of local time scale separation (see local time scale theorem in Section III.2 in Methods). Briefly, the existence of a local time scale separation for a group of states in the dynamics of the random walker allows for a simplified model for the dynamical behaviour of the group of nodes when excited by an impulse, i.e. an arrival of probability mass into . High retention and high mixing (implying high severability) at time guarantees: (i) the effect of on the rest of the system when given an impulse can be neglected altogether for time scales less than , and (ii) the subsystem can be accurately approximated to first order by a single state that aggregates all the states of for all times beyond .

In summary, the set can be thought of as a structure of intermediate size whose dynamical response to an impulse permits accurate simplified descriptions. Our local time scale theorem (Section III.2) is inspired by Simon and Ando’s classic result for global time scale separation, yet it differs from it in that it seeks to find the conditions under which one can reproduce correctly the behaviour of a severable component at different time scales, independently of the rest of the system. When the full interconnected system can be partitioned into components with comparable time scales, we recover Simon and Ando’s global theorem (see Supp Inf. B), demonstrating that our local time scale theorem generalizes their result.

i.2 Mesoscale components in power networks

As a first application, we consider the synchronization dynamics of coupled nonlinear phase oscillators with Kuramoto-like sinusoidal coupling Boccaletti et al. (2006), which is found in areas as diverse as laser physics, biological synchrony of cells and animals, and power networks Strogatz (2000). For our example, we will apply severability to a standard power network benchmark. Power networks are composed of two types of nodes: generator buses, which deliver power, and load buses, which consume power. The internal state of each node is described by a voltage, which oscillates with a frequency around a nominal value (e.g. 50 or 60 Hz). The nonlinear dynamics of bus can be modelled as

(2)

where is an inertia (zero for some buses), is a damping coefficient, is the power being injected or withdrawn from the network at node , and indicates the strength of the (symmetric) interaction between and  Dörfler et al. (2013). Given sufficient coupling strength between the nodes, and depending on properties of the coupling matrix , the network converges to a stationary state where all angles in the system oscillate at constant frequency , keeping relatively small constant angle differences with respect to one another.

Although this system is inherently nonlinear, severability is of use here. In Figure 2 we show the results of the application of our analysis to linearized discrete-time random walk dynamics based on node strengths , that is equivalent to the continuous-time nonlinear dynamics for small deviations around the synchronized state (see Supp.Inf. E for details). Our example is on a classic test case for power networks, the IEEE RTS96 test system, composed of three identical copies of the RTS24 test system interlinked with a few extra edges and one extra node. Previous work has used time-scale based identification of global partitions into slow-coherent areas based on global edge-counting or spectral methods Avramovic et al. (1980); Romeres et al. (2013). These global partitions correctly recover the expected components, but by nature require information about the entire network. In contrast, severability recovers the expected components based solely on local information and provides a validation of the components in terms of their dynamical response. More precisely, Fig. 2 shows that the fully nonlinear simulations of the model (2) can be well represented by the aggregated angle variables within the components found with severability: the aggregation of angle variables within the ‘correct’ components has little effect on the dynamics of the other variables of the system, whereas aggregation of an ‘incorrect’ subgroup results in major discrepancies from the full dynamical evolution. This result justifies the simplification of using random walk dynamics in severability, even for more complicated systems.

More generally, using random walks to model higher order dynamics provides a general framework to capture central features of many other dynamics taking place on a network (Section III.4).

Figure 2: Illustration of the three component RTS96 power network test system, which is composed of three copies of the RTS24 benchmark. Above, we have highlighted the three severable components as segments (at , severability of 0.753, 0.753, and 0.807 for green, purple, and black respectively) (a), whereas below (d) we arbitrarily partitioned into two connected components (at , severability of 0.611 and 0.646 for green and black respectively). (b,e) Full dynamics of the power network with starting phase angles chosen to match within a component. (c,f) Full dynamics of the power network using instead a collapsed state representing all of the black component. When the collapsed component is highly severable (top), the reduced representation matches the original system much better than when using arbitrary partitions (bottom).

i.3 On severable components and cliques

Figure 3: (a) Ring of rings. Heavy lines (within rings) correspond to undirected links with weight 2, while light lines between rings to links with weight 1. Severability is able to recover the seeded ring structure (at Markov times ). (b) The citric acid cycleKanehisa and Goto (2000). The blue region is a stable component from Markov time and adds acetyl-CoA from .

Given a network-centric view of severable components, it may come as no surprise that there are some similarities between network community detection and the discovery of severable components. Network communities are groups of nodes with strong connectivity within the group and lower connectivity with the rest of the network. Communities are often captured with local or global metrics that relate the number of edges crossing the boundaries of a community with the edges inside the communities, such as modularity Newman and Girvan (2004), SBM maximum likelihood Karrer and Newman (2011), OSLOM Lancichinetti et al. (2011) or conductance Shi and Malik (2000), sometimes with random walk as a computational tool Andersen et al. (2007); Spielman and Teng (2013). Most of those criteria essentially capture the retention part, , of severability. A few references Kannan et al. (2004); Jeub et al. (2015) also analyse the conductance internal to the cluster, which is a combinatorial criterion capturing essentially the mixing part of severability for . Although not all networks may have an easily measured dynamical interaction taking place on it (e.g. social networks), we can endow the graph with the standard random walk and apply severability to those dynamics. Indeed, the standard random walk on a network can be used to approximate such things as opinion dynamics, information diffusion, and consensus problems, as the dynamics of the random walk is deeply related to the structure of the graph Delvenne et al. (2013); Rosvall and Bergstrom (2008); Morarescu and Girard (2009); Van Dongen (2008); Masuda et al. (2016). As shown in Fig. 1b, the expectation is that such communities are detected as meaningful mesoscale components observed during information diffusion. To validate this idea further, we have used the standard LFR synthetic benchmark network model for community detection, where networks are constructed as dense random Erdős-Rényi graphs interconnected by sparse random links at several levels of coarseness Girvan and Newman (2002); Lancichinetti et al. (2008). As shown in Supp.Inf. F, our greedy algorithm for finding severable components recovers the communities with high fidelity, comparable or superior to other state-of-the-art methods Lancichinetti and Fortunato (2009). We remark that severable components are found from the local diffusive dynamics without global information from the graph.

However, communities in social networks are often characterised by a clique-like structure, showing for instance a low diameter and high density of triangles Palla et al. (2005). While clique-like structures emerge as particular cases of severable components, severability may detect long-range structures that are not akin to communities. An example of this is shown in Fig. 3a, where a ring-of-rings network is correctly revealed by severability. Such non-clique like structures are present in other areas of application, including transportation networks, images, and protein structures Schaub et al. (2012). Another illustration is provided by biochemical networks, and one canonical metabolic pathway is the citric acid cycle. When we analyze the citrate pathway schematic (map00020) in the KEGG databaseKanehisa and Goto (2000) using severability, the search for high severability structures detects the Krebs citric acid cycle. These structures do not fit the standard definition of communities, and indeed are not detected as such by most community detection algorithms Schaub et al. (2012). However, as exemplified by the Krebs cycle, they are nonetheless dynamical structures of importance. Thus, although severable components and network communities share some characteristics, they are different concepts built on different ideas, the former by coherency of dynamics, and the latter by the density of clique-like structure.

i.4 Word association as a diffusion: overlaps and orphans

Figure 4: (a) The five components that the word ‘nature’ belongs to. Nodes and links are coloured by component identification; coloured ovals represent multiple component membership. (b) A broader view of the component landscape surrounding “nature”, depicting also components connected to, but not containing, ‘nature’, including three orphan nodes (see SI for details). Nodes belonging to just one of the components are combined into a single block labelled by the most central word of the component, while nodes belonging to more than one component are separately mentioned in the gray ovals. Note that in many cases, the words used to label the components possess multiple labels themselves. Communities were found by optimizing severability for Markov time and search size .

An important feature of severability as a means to analyzing interconnected sytems is that it allows the possibility of overlaps (a node can belong to more than one group) and orphans (a node can belong to no group, as every group that includes it has higher severability without it). To illustrate these features, we turn to a word association network (the University of South Florida Free Association Norms dataset Nelson et al. (1998)), previously used to highlight the existence of overlapping network communitiesPalla et al. (2005). To build this network, researchers presented words to participants, who were then asked for the first word that came to mind. Hence each node in the network corresponds to a word, and directed links between nodes are weighted according to the proportion of responses linking those two words. For example, when cued with ‘science’, 21.4% of participants wrote ‘biology’.

The very construction of the network is reminiscent of a random walk process representing the mental association based on similarity of meaning and contextual usage, thus making severability able to incorporate the weight and directionality of the network in a natural way. As severability is a local method, it is not necessary to analyse the entire graph to find components. Rather, by analysing increasing horizons on the network an expanding view of associated meanings presents itself from a particular vantage point. Figure 4 shows the word ‘nature’ and the components it belongs to (with maximum search size and Markov time ), as well as the components and orphan nodes to which ‘nature’ is directly linked (see Supp.Inf. L for further details). By permitting overlapping components, we are able to recover the different contexts and meanings associated with a single word.

i.5 Locality in image segmentation: zooming and cropping

Figure 5: Neocortical pyramidal neurons, stained with a fluorescent dye, with resolution reduced to and converted to grayscale by luminosity. (Cyan) At Markov time , segments largely corresponding to cells were found (see S.I. for details). (Yellow) Furthermore, repeating the procedure with a cropped subregion of the image gives largely the same results, with some minor variations along the borders. This commutativity is a key feature of local methods. Despite the fact that severability is not specifically designed for image analysis, the severable components found are of good quality.

In Fig. 5 we apply severability optimisation to the identification of stained neurons in a cell-fluorescence image in order to illustrate visually a central aspect of the method; namely, that it does not rely on global information in order to detect mesoscale components faithfully. Below we show that the results are similar whether the algorithm is run on only some part of the image, or on its entirety.

Image segmentation divides images into subsets of adjacent pixels of similar color or luminosity, and is particularly used for medical and biological imaging. Some of the existing segmentation methods are based on a nominal diffusion dynamics taking place on the lattice graph of pixels Grady (2006); Schaub et al. (2012). In this view, a segment can be seen as a particular case of a severable component, as already suggested in our initial view of the paintings in Fig. 1. To carry out our analysis, we have followed a classic protocol to generate a lattice graph from the image by assigning an edge between pixels weighted by a function of the difference in luminosity and distance (up to a cutoff) Shi and Malik (2000); Browet et al. (2011); Wu and Leahy (1993) (for details see Supp.Inf. G). The severable subsets are self-consistent in that they are found robustly from diffusions starting from any of the members of the set; as these are strongly severable subsets, they tend to be found regardless of which member of the set is used as a starting point (unlike the word association communities of the last section). Both the cells and patches of the background are found as severable components. Furthermore, because severability does not depend on global information, the results do not change significantly when the algorithm is run only on a smaller section of the image: only segments that lie on the edges of the image are affected by cropping. This feature is of potential application to evolving networks, as communities are stable against perturbations and do not need to be recomputed fully when new nodes are added outside of a local neighbourhood.

We note that while other completely local methodsJeub et al. (2015) exhibit a similar commutativity, ‘mostly’ local methods like OSLOM (Order Statistics Local Optimization Method) do not Lancichinetti et al. (2011). Though OSLOM is based on local order statistics, when partitioning, it takes into account some global information, making it noncommutative. In Appendix J, we show that OSLOM behaves differently on a ring of small-world networks vs a single small-world.

Ii Discussion

Real-life dynamics emerging from the interaction of many elementary nodes can sometimes be seen as the interconnection of mesoscale dynamical structures, whose evolution over a time scale of interest can be represented as a single aggregated state interacting with its surroundings. We have introduced a measure for the detection of such severable components which can be well approximated by their aggregated variables. The formal theory, a generalization of Simon and Ando’s classic theory to local time scale separation is illustrated on the particular case of Markov chains, which are representative of a larger class of dynamics, including consensus and synchronization. Severable components, which can coexist at several sizes and time scales, overlap and leave orphan nodes. This dynamical concept is connected to other more particular notions encountered in several classes of systems, including basins (energy landscapes), slow-coherent areas (power networks), segments (image processing), communities (social networks analysis), and rings (biochemical networks) which can be understood as structures with a locally coherent dynamics. On the other hand, other kinds of meso-structures in complex networks (e.g., block models, or roles in ecological systems Beguerisse-Díaz et al. (2014); Cooper and Barahona (2010, 2011)) are global in nature, and do not fall under the condition of locality required in this paper. Hence, while locality is an advantage for large systems, by definition truly global characteristics cannot be thus discovered. However, we have shown in this paper that many classes of structures are in fact locally defined, demonstrating the applicability of severability.

Engineering disciplines traditionally operate by plugging together smaller components, usually seen as black boxes with simple external behaviour regardless of their internal complexity, in order to generate complex systems with controlled behaviour. One may argue that many natural systems are built similarly. In this perspective, we aim here to reverse this process: although complex systems are often too large to analyse in their entirety, our approach here is to try and find if there exist suitable intermediate dynamical components which provide a proper understanding and representation of the complex global dynamics. In this sense, severability serves the role of a local coarse-graining mechanism for the dynamics as observed from a given subset of states in the system. Appealing to the coexistence of local time scales in Markov processes as a means to reveal severable components establishes mathematical connections between diffusion processes and model reduction, linking in a precise sense good mixing and retention in a subsystem to its accurate approximation through coarse-graining while preserving the Markov property.

As Big Data continues to proliferate, severability provides a first step towards the definition of new methods able to tackle the huge wealth of data being collected in all areas of science, technology and social life, much of which comes with a naturally endowed dynamics. Undoubtedly, more challenges lie on the road ahead, such as in the treatment of more sophisticated node dynamics, for example when the dynamics are strongly nonlinear or non-Markovian Rosvall et al. (2014); Delvenne et al. (2015). Yet the importance of dynamics as a key to characterising networks will undoubtedly persist. Ultimately, we hope that the framework of severable components (code available at https://github.com/yunwilliamyu/severability) provides not only a specific solution to recovering mesoscale structures when the dynamics are roughly Markovian, but also a meaningful and practical starting point for more sophisticated methods capable of tackling these more difficult problems.

Iii Methods

iii.1 Formal definition of Severability

The definition of severability uses concepts of graph theory and Markov chains. A graph is a set of nodes (or vertices, or states in the Markov chain terminology) together with another set of links (or edges) between vertices. We assume that every node has at least one outgoing edge, and that all edges are labelled with a positive weight. The weighted, directed graph, is encoded as an adjacency matrix , where is the weight of the edge going from to . The (weighted) out-degree of node is the sum of weights of edges leaving

. The out-degrees can be compiled in the vector

, where is the vector of ones and is the diagonal matrix of out-degrees.

On a given graph , we define a random process in discrete time. A random walker starts from a node at time and jumps at time to any out-neighbour with probability , proportional to the edge weight. Successive jumps at define a Markov chain, or random walk, on the graph. The probability of presence of the random walker evolves as

(3)

where is the normalised probability vector and is the transition matrix, the rows of which are non-negative and sum to one. Provided that the graph is strongly connected and aperiodic (i.e. there is no integer such that all cycles comprise an exact multiple of edges), any initial probability distribution converges to a unique stationary distribution, which is a solution of the fixed-point equation .

Given a connected subset with nodes, let be the submatrix of corresponding to the nodes in . Then we define the retention of the subset over time , , as the probability for a random walker starting with a uniform probability distribution in not to have escaped by time :

(4)

To define mixing, let be the th row of the matrix . Note that because is connected, . Thus the normalised row of , is the probability distribution at time for a random walker starting from node , conditional upon the walker remaining in between 0 and t. We can then define the internal mixing as

(5)

where is the arithmetic mean over the unit-normalised rows of , and we have used the fact that the total variation distance norm is given by

(6)

The internal mixing term approaches 1 as the probability distribution of a random walker starting somewhere uniformly random within the community approaches the quasistationary distribution on that subset of nodes Darroch and Seneta (1965).

Both and are defined to range from 0 to 1, where the value of 1 corresponds to perfect retention or mixing, respectively. We define the severability as compound function of both retention and mixing:

(7)

which can be understood as the quality of the subset to be considered as a separate dynamical mesostructure over time . Severability has an intrinsic resolution parameter , corresponding to the Markov horizon; as increases, the random walker will diffuse to larger parts of the graph, as reflected by the iterations of the submatrix . Note that, from the above definitions, depends only upon the out-links from nodes within ; hence it is a purely local function.

The particular form assumed for retention, mixing and severability is justified by the mathematical properties stated in III.2, and proved in the Supp.Inf.

iii.2 Local time scale separation theorem

iii.2.1 Background: Simon and Ando’s global time scale separation theorem

In 1961, Simon and Ando established a time scale separation theorem, both for general linear systems and Markov chains in particular Simon and Ando (1961), which we present now.

Given a Markov chain

(8)

let us split the nodes in two sets

with a corresponding partition of

(9)

Fix an arbitrary , which will serve as a requested standard of approximation. Assume that is close to a perfectly decoupled transition matrix

(10)

Simon and Ando proved that there is a small enough and a time such that if , then two kinds of approximations are valid for the trajectories of :

  • On the one hand, for all times , the decoupled approximation

    (11)

    is within in norm from the actual solution :

    (12)
  • On the other hand, and more importantly, for all times and in particular for all , the aggregated probabilities

    are within in norm from the approximation

    (13)

    for some real values .

  • Moreover, for times , can be reconstructed as with an error bounded by , for some .

Which norms are chosen in the statement above is irrelevant, as all norms of vectors or matrices of a given size are equivalent up to a factor, making the statement true for any choice of them. For simplicity we stated here the two block case in a Markov chain dynamics, although the theory holds for general linear systems split in an arbitrary number of blocks. It is important to notice that the required depends not only on the given but also potentially on all the entries of the diagonal blocks , as is apparent for example in our own proof of Simon-Ando’s theorem (Supp.Inf. III.2.1). The theorem can therefore only be applied globally, with full knowledge of the dynamics. It is desirable to decouple this global condition into the local conditions to be satisfied by each diagonal block to satisfy the required accuracy , and severability offers one practical way to achieve this, as shown below.

iii.2.2 Statement of the local time scale theorem

Following the same notation as above, consider the first block (denoted in the main text and Methods A), which describes the set of states with severability . The local dynamics of is described by the open dynamical system

(14)

where , defined for all , is the input into the subsystem , i.e., the in-flow of probability from the environment, and is the output of the subsystem, by which it influences the environment by an outflow of probability. By the environment we mean the rest of the state space, described by , itself governed by an open system equation of the same kind as Eq.(14). The global dynamics on can be understood as the feedback interconnection of the two systems, related by the equations: .

Equation (14) describes a relationship between the input sequence and the output sequence . An alternative way to describe an open system in linear response theory is by its impulse response. In our notation, the impulse response can be written as

for all , and zero for . The impulse response characterizes fully the input-output relationship in that the output generated by any input sequence is obtained by the convolution product (defined as for all ), assuming zero probability in initially, . A non-zero initial state can be incorporated by adding an artifical input , setting a state . To approximate the behaviour of described by Eq. (14), we need to approximate the function by another impulse response between the same input and output spaces measured with a given norm. A common metric used in the open systems literature is the one-norm

(15)

whenever it is defined. Of course, given matrices , we can choose any matrix norm for , as they all relate within a constant factor only dependent on the dimension of the matrix. If the approximation is only meant to be valid on time interval , then we can restrict the sum in Eq. (15) to , denoted . An error in the impulse response committed in replacing by will result in an error in the output in the following way, as one can show from elementary algebra:

where can be infinity.

Our local time scale separation theorem makes two statements, regarding the approximability of the impulse response of the nodes , before and after an arbitrary chosen time . The first one at short times follows directly from the high retention implied by a high severability at time , whereas the second one at long times requires a more careful analysis. The theorems are proved in Supp. Inf. A.

Local Time Scale Theorem (Short times).

The system represented by Eq. (14) can be approximated until time by the trivial response , with accuracy .

In other words, the influence of the system on its environment can be neglected altogether over short time scales.

Local Time Scale Theorem (Long times).

The system described by Eq. (14) can be approximated by a one-state system of the following form

(16)

where

is the dominant eigenvalue of

and are appropriate vectors, whose corresponding impulse response is . Vectors

are found from the dominant eigenvectors

, , and , normalised so that . The approximation is valid for all times—including obviously —and the error summed over all times is .

For any given input signal bounded by for all , the exact model described by Eq. (14) and the one-dimensional model given by Eq. (16) deliver outputs whose difference is at all times bounded by .

The constants contained inside in these statements may depend on the dimension of (number of nodes in ), but neither on the specific entries of nor on . In view of these statements, the best time scale separation is given by , at which severability peaks, and the error of the resulting approximations is .

Assuming now that the global network is split into two or several blocks, one may combine the different local approximations and obtain the following version of classic Simon-Ando theorem: given a global dynamics given by Eqs (8) and (9), suppose that we find a common time at which both and , then the short-term and long-term dynamics can be approximated as Eqs (11) and (13) with error bounded by , where the hidden constant only depends on the total number of nodes (Supp. Inf. B). The generalisation to more than two components is straightforward. This version highlights the role of severability of each component and the need to find a common global time scale (possibly suboptimal for each component separately) where each component simultaneously reaches a high severability, for a global time scale separation to emerge.

See Supp. Inf. C for a toy example of comparative application of the global and local time scale separation theorems.

iii.3 Computational aspects of Severability optimisation

We apply a semi-greedy search algorithm to find the optimal component for a starting node , at a chosen Markov time and setting a search size (see Appendix D.1 in the Supp. Inf. for a detailed flowchart).

Briefly, the algorithm proceeds as follows. Without loss of generality, define . Initially, only . Aggregate nodes greedily, except let every third step be a Kernighan-Lin switch of a single node on the boundary of to maximise Kernighan and Lin (1970). After the initial semi-greedy optimisation, the intermediate component that has maximal severability is fine-tuned using Kernighan-Lin switches to find a local maximum. If is in the resulting component, the algorithm stops; otherwise, start over with a different neighbour of the starting node. If all neighbours of have been attempted without success, declare an orphan. For the word association network, every neighbour of “nature” was attempted for the first step, giving the overlapping communities.

A detailed description of other computational aspects of the implementation are discussed in Supp. Inf. D.

iii.4 Markov chain equivalence of dynamical systems

Markov chains, or random walks, are characterized by a dynamics of the form , where is any nonnegative square matrix with all rows summing to one. To every such dynamics we can associate a dual consensus dynamics acting on the column vector , the entries of which are positions, or opinions, of agents, which converge to one another until convergence to the same value if and only if the corresponding random walk converges to a unique stationary distribution.

Positive linear systems are common in economics, biology, chemistry, where variables naturally take nonnegative values. Such systems are characterized by an evolution , or , where is only required to be nonnegative. Under the same connectivity conditions on the network underlying , we know that there is a unique dominant eigenvalue and a corresponding left and right eigenvectors, and respectively, all of which are positive by virtue of the Perron-Frobenius theorem.

This property allows a normalization that transforms the dynamics into a consensus, or random walk, dynamics. The new matrix is , where is the diagonal matrix associated to . It is readily observed that is a valid transition matrix, and is equivalent to except for a global scaling and a change of variable on every node. In particular, it has the same eigenvectors and acts on the same underlying network as . This transformation has an elegant information-theoretic interpretation as the random walk with maximal entropy rate (if is a zero-one matrix) or a free energy (if the nonnegative entries are interpreted as exponential energy barriers along the edges) Ruelle and Gallavotti (1978); Delvenne and Libert (2011).

Markov chains are also defined in continuous time, following an equation of the form , where the continuous-time transition matrix has nonpositive diagonal, nonnegative off-diagonal terms, and zero-sum rows. One also has continuous-time consensus, and any positive continuous-time linear system, characterized by a matrix with nonpositive diagonal and nonnegative off-diagonal terms, can be similarly normalized to a continuous-time Markov chain, which can be sampled to a discrete-time Markov chain.

Some non-linear systems can be linearized around a fixed point. Classic theorems such as Hartman-Grobman’s ensure that the nonlinear and linearized systems are equivalent up to a change of variables in a neighbourhood of the fixed point. Kuramoto oscillators, and power networks dynamics, linearize to consensus dynamics.

Acknowledgements.
The authors would like to thank Antoine Delmotte, Michael Schaub, Arnaud Browet, Florian Dörfler and Renaud Lambiotte for code and/or discussions. Neuron fluorescence imagery is courtesy of Simon Schultz and Marie-Therese Vasilache. Y.W.Y. was partially supported by an Imperial Marshall Scholarship during the early years of this work. J.-C. D. is partly supported by the Flagship European Research Area Network (FLAG-ERA) Joint Transnational Call “FuturICT 2.0”. This work is partially supported by the Engineering and Physical Sciences Research Council of the United Kingdom. Author Contributions M.B. and S.Y. conceived the project and guided the research. J.-C.D. developed the theoretical analyses. Y.W.Y. implemented and designed the experimental analyses. The manuscript was jointly written by all authors.

References

  • R. Andersen, F. Chung, and K. Lang (2007) Using pagerank to locally partition a graph. Internet Mathematics 4 (1), pp. 35–64. Cited by: §I.3.
  • A. Ando and F. M. Fisher (1963) Near-decomposability, partition and aggregation, and the relevance of stability discussions. International Economic Review 4 (1), pp. 53–67. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • B. Avramovic, P. V. Kokotovic, J. R. Winkelman, and J. H. Chow (1980) Area decomposition for electromechanical models of power systems. Automatica 16 (6), pp. 637–648. Cited by: §I.2.
  • M. Beguerisse-Díaz, G. Garduño-Hernández, B. Vangelov, S. N. Yaliraki, and M. Barahona (2014) Interest communities and flow roles in directed networks: the twitter network of the uk riots. Journal of The Royal Society Interface 11 (101), pp. 20140940. Cited by: §II.
  • V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre (2008) Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, pp. P10008. Cited by: Figure 7.
  • S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. Hwang (2006) Complex networks: structure and dynamics. Physics reports 424 (4), pp. 175–308. Cited by: §I.2.
  • A. Browet, P. Absil, and P. Van Dooren (2011) Community detection for hierarchical image segmentation. Combinatorial Image Analysis, pp. 358–371. Cited by: §I.5.
  • A. Bultheel and M. Van Barel (1986) Padé techniques for model reduction in linear system theory: a survey. Journal of Computational and Applied Mathematics 14 (3), pp. 401–438. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • F.R.K. Chung (1997) Spectral graph theory. Regional conference series in mathematics, Amer. Mathematical Society. Cited by: Appendix E.
  • M. Coderch, A. Willsky, S. Sastry, and D. Castanon (1983) Hierarchical aggregation of linear systems with multiple time scales. IEEE Transactions on Automatic Control 28 (11), pp. 1017–1030. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • K. Cooper and M. Barahona (2010) Role-based similarity in directed networks. arXiv preprint arXiv:1012.2726. Cited by: §II.
  • K. Cooper and M. Barahona (2011) Role-similarity based comparison of directed networks. arXiv preprint arXiv:1103.5582. Cited by: §II.
  • S. P. Cornelius, W. L. Kath, and A. E. Motter (2013) Realistic control of network dynamics. Nature communications 4. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • L. Danon, A. Díaz-Guilera, J. Duch, and A. Arenas (2005) Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005 (09), pp. P09008–P09008. External Links: ISSN 1742-5468, Link, Document Cited by: §D.3, §D.3.
  • J. N. Darroch and E. Seneta (1965) On Quasi-Stationary distributions in absorbing Discrete-Time finite markov chains. Journal of Applied Probability 2 (1), pp. 88–100. External Links: ISSN 00219002, Link Cited by: §I.1, §III.1.
  • P. G. de Gennes (1976) La percolation: un concept unificateur. La recherche 7 (72), pp. 919–927. Cited by: §I.1.
  • J. Delvenne, R. Lambiotte, and L. E. Rocha (2015) Diffusion on networked systems is a question of time or structure. Nature communications 6. Cited by: §II.
  • J. Delvenne and A. Libert (2011) Centrality measures and thermodynamic formalism for complex networks. Physical Review E 83 (4), pp. 046117. Cited by: §III.4.
  • J. Delvenne, M. T. Schaub, S. N. Yaliraki, and M. Barahona (2013) Dynamics on and of complex networks. In Oxford Handbook of Innovation, Vol. 2, pp. 221–242. Cited by: §I.3.
  • S. Derisavi, H. Hermanns, and W. H. Sanders (2003) Optimal state-space lumping in markov chains. Information Processing Letters 87 (6), pp. 309–315. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • F. Dorfler and F. Bullo (2011) Topological equivalence of a structure-preserving power network model and a non-uniform kuramoto model of coupled oscillators. In Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pp. 7099–7104. Cited by: Appendix E, Severability of mesoscale components and local time scales in dynamical networks.
  • F. Dörfler and F. Bullo (2012) Synchronization and transient stability in power networks and nonuniform kuramoto oscillators. SIAM Journal on Control and Optimization 50 (3), pp. 1616–1642. Cited by: Appendix E.
  • F. Dörfler, M. Chertkov, and F. Bullo (2013) Synchronization in complex oscillator networks and smart grids. Proceedings of the National Academy of Sciences 110 (6), pp. 2005–2010. Cited by: §I.2, Severability of mesoscale components and local time scales in dynamical networks.
  • M. Girvan and M. E. J. Newman (2002) Community structure in social and biological networks. pnas 99 (12), pp. 7821 –7826. External Links: Link, Document Cited by: §I.3, Severability of mesoscale components and local time scales in dynamical networks.
  • B. H. Good, Y. de Montjoye, and A. Clauset (2010) Performance of modularity maximization in practical contexts. pre 81 (4), pp. 046106. External Links: Link, Document Cited by: Figure 7.
  • L. Grady (2006) Random walks for image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28 (11), pp. 1768–1783. Cited by: §I.5.
  • J. Heidemann, M. Klier, and F. Probst (2012) Online social networks: a survey of a global phenomenon. Computer Networks 56 (18), pp. 3866–3878. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • D. Helbing (2013) Globally networked risks and how to respond. Nature 497 (7447), pp. 51–59. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • T. Ideker and N. J. Krogan (2012) Differential network biology. Molecular systems biology 8 (1), pp. 565. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • L. G. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney (2015) Think locally, act locally: detection of small, medium-sized, and large communities in large networks. Physical Review E 91 (1), pp. 012821. Cited by: §I.3, §I.5.
  • M. Kanehisa and S. Goto (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28 (1), pp. 27 –30. External Links: Link, Document Cited by: Figure 3, §I.3.
  • R. Kannan, S. Vempala, and A. Vetta (2004) On clusterings: good, bad and spectral. Journal of the ACM 51 (3), pp. 497–515. Cited by: §I.3, Severability of mesoscale components and local time scales in dynamical networks.
  • B. Karrer and M. E. Newman (2011) Stochastic blockmodels and community structure in networks. Physical Review E 83 (1), pp. 016107. Cited by: §I.3.
  • B. W. Kernighan and S. Lin (1970)

    An efficient heuristic procedure for partitioning graphs

    .
    bell 49 (2), pp. 291–307. Cited by: §III.3.
  • P. V. Kokotovic, J. O’Reilly, and H. K. Khalil (1986) Singular perturbation methods in control: analysis and design. Academic Press, Inc., Orlando, FL, USA. External Links: ISBN 0124176356 Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • D. Korenblum and D. Shalloway (2003) Macrostate data clustering. Physical Review E 67 (5), pp. 056704. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • A. Lancichinetti, S. Fortunato, and J. Kertész (2009) Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11 (3), pp. 033015. External Links: ISSN 1367-2630, Link, Document Cited by: §D.3, §D.3.
  • A. Lancichinetti, S. Fortunato, and F. Radicchi (2008) Benchmark graphs for testing community detection algorithms. pre 78 (4), pp. 046110. External Links: Link, Document Cited by: §F.1, §I.3.
  • A. Lancichinetti and S. Fortunato (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. pre 80 (1), pp. 016118. External Links: Link, Document Cited by: Figure 8, Figure 9, §F.2, §F.3.
  • A. Lancichinetti and S. Fortunato (2009) Community detection algorithms: a comparative analysis. Physical review E 80 (5), pp. 056117. Cited by: §F.1, §F.2, §I.3.
  • A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. Fortunato (2011) Finding statistically significant communities in networks. PloS one 6 (4), pp. e18961. Cited by: §I.3, §I.5.
  • A. Majdandzic, B. Podobnik, S. V. Buldyrev, D. Y. Kenett, S. Havlin, and H. E. Stanley (2014) Spontaneous recovery in dynamical networks. Nature Physics 10 (1), pp. 34–38. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • N. Masuda, M. A. Porter, and R. Lambiotte (2016) Random walks and diffusion on networks. arXiv preprint arXiv:1612.03281. Cited by: §I.3.
  • B. Moore (1981) Principal component analysis in linear systems: controllability, observability, and model reduction. Automatic Control, IEEE Transactions on 26 (1), pp. 17–32. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • I.C. Morarescu and A. Girard (2009) Opinion dynamics with decaying confidence: application to community detection in graphs. Automatic Control, IEEE Transactions on (99), pp. 1862–1873. Cited by: §I.3.
  • D. L. Nelson, C. L. McEvoy, and T. A. Schreiber (1998) The university of south florida word association, rhyme, and word fragment norms. External Links: Link Cited by: §I.4.
  • M. E. J. Newman and M. Girvan (2004) Finding and evaluating community structure in networks. pre 69 (2), pp. 026113. External Links: Link, Document Cited by: §I.3.
  • R. Olfati-Saber, J. A. Fax, and R. M. Murray (2007) Consensus and cooperation in networked multi-agent systems. ieeep 95 (1), pp. 215–233. Cited by: Appendix E.
  • G. Palla, I. Derenyi, I. Farkas, and T. Vicsek (2005) Uncovering the overlapping community structure of complex networks in nature and society. nature 435 (7043), pp. 814–818. External Links: ISSN 0028-0836, Link, Document Cited by: §F.2, §I.3, §I.4.
  • L. Pernebo and L. M. Silverman (1982) Model reduction via balanced state space representations. Automatic Control, IEEE Transactions on 27 (2), pp. 382–387. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi (2004) Defining and identifying communities in networks. pnas 101 (9), pp. 2658 –2663. External Links: Link, Document Cited by: §F.1.
  • J. D. Roberts (1980) Linear model reduction and solution of the algebraic riccati equation by use of the sign function†. International Journal of Control 32 (4), pp. 677–687. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • D. Romeres, F. Dörfler, and F. Bullo (2013) Novel results on slow coherency in consensus and power networks. In European Control Conference, Zürich, Switzerland, Cited by: §I.2.
  • M. Rosvall and C. T. Bergstrom (2008) Maps of random walks on complex networks reveal community structure. pnas 105 (4), pp. 1118 –1123. External Links: Link, Document Cited by: §I.3.
  • M. Rosvall, A. V. Esquivel, A. Lancichinetti, J. D. West, and R. Lambiotte (2014) Memory in network flows and its effects on spreading dynamics and community detection. Nature communications 5. Cited by: §II.
  • D. Ruelle and G. Gallavotti (1978) Thermodynamic formalism. Vol. 112, Addison-Wesley Reading. Cited by: §III.4.
  • M.T. Schaub, J.C. Delvenne, S.N. Yaliraki, and M. Barahona (2012) Markov dynamics as a zooming lens for multiscale community detection: non clique-like communities and the field-of-view limit. PloS one 7 (2), pp. e32210. Cited by: §I.3, §I.5.
  • J. Shi and J. Malik (2000) Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (8), pp. 888–905. Cited by: §I.3, §I.5.
  • H. A. Simon and A. Ando (1961) Aggregation of variables in dynamic systems. Econometrica: journal of the Econometric Society, pp. 111–138. Cited by: §III.2.1, Severability of mesoscale components and local time scales in dynamical networks, Severability of mesoscale components and local time scales in dynamical networks.
  • D. A. Spielman and S. Teng (2013) A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on Computing 42 (1), pp. 1–26. Cited by: §I.3.
  • S. H. Strogatz (2001) Exploring complex networks. nature 410 (6825), pp. 268–276. External Links: ISSN 0028-0836, Link, Document Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • S. H. Strogatz (2000) From kuramoto to crawford: exploring the onset of synchronization in populations of coupled oscillators. Physica D: Nonlinear Phenomena 143 (1), pp. 1–20. Cited by: §I.2.
  • S. Van Dongen (2008) Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30 (1), pp. 121–141. Cited by: §I.3.
  • D. J. Wales (2006) Energy landscapes: calculating pathways and rates. International Reviews in Physical Chemistry 25 (1-2), pp. 237–282. Cited by: §I.1.
  • R. A. Watson and J. B. Pollack (2005) Modular interdependency in complex dynamical systems. Artificial Life 11 (4), pp. 445–457. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • D. J. Watts and S. H. Strogatz (1998) Collective dynamics of ‘small-world’networks. nature 393 (6684), pp. 440–442. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
  • Z. Wu and R. Leahy (1993) An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 15 (11), pp. 1101–1113. Cited by: §I.5.

Appendix A Proof of the local time scale separation theorem

Proof.

On the short time scale, the validity of the approximation is given by where for (and at ).

We notice that expresses the probabilities of escape within time . In fact retention as introduced in Eq. (4) can be expressed as

(17)

for the choice of -by- matrix norm , given that all entries of are nonnegative. The short time scale local theorem results from which follows directly from the definition of severability.

Equation (16) generates an impulse response for . The difference , decays to zero exponentially provided that has dominant eigenvalue , eigenvectors (normalised so that the entries of row vector sum to one), (normalised so that ) and . Indeed this guarantees that behaves as for high .

If were perfectly stochastic, then and we would have and . As is almost stochastic, we expect that and are in for all , which we can prove indeed in the following way.

It is well known from Perron-Frobenius theory that the dominant eigenvalue of a matrix with positive entries sits between the minimum and maximum row sum.Therefore , therefore . To evaluate , let us call the row-normalized matrix derived from , where every row is scaled so as to sum to one. Then the distance (in any norm) between any two rows of is in , by the definition of internal mixing in Eq. (5). The distance between and on the other hand is in , by definition of the retention. Therefore the distance between any two rows of is in , thus for some positive vector . Premultiplying this equality by , we get , thus . Postmultiplying instead by gets , as required.

Consider the remainder , thus from spectral decomposition properties. We find from the above that .

Choosing the matrix norm , which happens to be submultiplicative ( for all ), and using the identity , applied here to (for which the identity is valid because all eigenvalues of have absolute value ), we deduce a bound for the error on the impulse response:

using also and for any family of -by- nonnegative matrices .

We obtain the same result for , with . This is because , easily obtained from the fact . This approximation has the nice property of preserving the flow of probability: it describes the behaviour of a single super-node that aggregates all the input probability flows, expelling a small fraction of stored probability mass at every step to the nodes in the rest of the system, with weights given by . ∎

Therefore different approximations rule the short-term (where a large retention matters) and long-term behaviours (where fast mixing also matters).

The proof highlights that the theorem is robust with respect to the choice of norms in the definition of severability and in the statement of the theorems, as it changes the hidden constants in a way that only depends on , the number of nodes. The specific definition chosen for severability in this article is motivated by simplicity, convenient computation and good practical results.

Appendix B From local to global: a proof of Simon-Ando’s theorem

We now provide a proof of Simon and Ando’s global time scale theorem, stated in terms of severability of the components. Assume that a partition of the network into two components reveals a common time scale at which each severability is higher than . In the short run, every component can ignore the other and evolve separately, with a resulting error of order . Let us turn to the long run case.

For times , one may write (as the probability mass leaking from component , is in ). From high mixing, all rows of are -close to one another, and -close to a multiple of the dominant eigenvector (quasi-stationary distribution). The same holds for , -close to a multiple of . The full state trajectory thus remains -close to a trajectory of the form , and therefore it is enough to know the two-dimensional trajectory (in fact one-dimensional in the set of probability measures because subject to the constraint ) to reconstruct approximately . This means that , the image of the set of all probability measures under the map is invariant under , has diameter in the direction , and is ‘thin’ in that every point of is -away from that direction.

Now consider the two-dimensional dominant eigenspace of

, generated by the dominant left eigenvector (stationary distribution) of eigenvalue and the second left eigenvector (normalised to unit norm) of eigenvalue . The intersection of that space in the space of probability measures is one-dimensional, of the form . On this eigen-set, the dynamics takes the simple, exact form . Given that , we know that every point is -approximated by the projection on . Therefore, the aggregated dynamics obtained in replacing and by their approximations in terms of and induces a one-dimensional aggregated dynamics on the direction where and are replaced by their aggregation and , and the projected dynamics is given by .

The trajectory initiated by a point , and the trajectory generated by the aggregated by its projection with this projected dynamics, remain -close at all times.

On the other hand, any point in is -close to a point in , and those two points remain -close when both are iterated by , as contracts the 1-distance (or total variation distance ; the induced 1-norm of is 1).

Now we can conclude. The trajectory initiated by any point in (iterated by the exact dynamics ) remain -close at all times from some trajectory in the eigen-set , which itself remains -close at all times from the projected, aggregated dynamics on the direction . Therefore any trajectory in generated by the actual dynamics is -close at all times from the one-dimensional dynamics on the aggregated quantities.

A closer look would show that the projected dynamics taken from the approximation given by the Local Time Scale separation theorem on each block separately, is not strictly identical to the aggregated dynamics presented here, but the trajectories generated by the two one-dimensional dynamics are -close at all times again.

In the above, all hidden constants in the notation are dependent on the specific norms used to measure distances, thus dependent on the number of nodes in each block, but on nothing else.

This completes the proof of Simon and Ando’s global time scale theorem, as given in Methods (see Section III.2.1), since arbitrarily small perturbations from a fixed, block-diagonal transition matrix lead to arbitrarily high severability for arbitrarily large intervals of time.

The global nature of the theorem reveals itself in the fact that it needs simultaneously at time , a high mixing and a high retention in every component, thus shedding light on the conditions required for global time-scale separation to hold.

See next Appendix for a simple example showing that and described in the text in the classic statement of Simon-Ando’s theorem indeed depend on the global information .

Appendix C Global vs Local time scale separation theorems: an example

We apply our version of Simon-Ando’s theorem (formulated in terms of severabilities) to a toy example of four nodes separated into two blocks, or mesoscale components. We then modify the example so that Simon-Ando’s global theorem does not apply any more, but our local theorem still applies.

Consider

(18)

which is -close to the block-diagonal matrix

(19)

Let us compare the trajectories generated by the two initial conditions and , which both lead to the same aggregation (probability 1 in the first block).

If and , then it is clear that their trajectories will remain very different, even at the aggregated level, for a long time, as at times of the order , the first trajectory will be concentrated mostly on the second block (and so will the aggregated trajectory), while the second trajectory will stay confined in the first block.

If then at time at times of the order the first trajectory will be equally split between the two blocks, while the the first trajectory will be again confined in the first block.

Thus if we want to reach a given accuracy in Simon-Ando’s theorem, for instance , we need to take of the order of , which shows the global dependency of on the ‘internal details’ of both blocks. The transition between the short time regime and the long time regimes occurs at time .

In our language, the severability of each block will be high (close to one) for times of between and (if indeed, otherwise the severability remains low at all times). We see indeed that these intervals will start overlapping at time . We can therefore apply our version of Simon-Ando’s theorem, as we have simultaneous high severability in each block, for some time .

This also shows the intrinsically asymptotic nature of Simon-Ando’s original theorem: as is decreased, the peak of severability for each block extends into a plateau stretching until , eventually forcing overlap of plateaus for small enough .

If we consider a slightly more complicated example:

(20)

with then Simon-Ando’s theorem cannot be formally applied, because it assumes a fixed block-diagonal structure, and an arbitrarily small perturbation of it. We find the same conclusion in the language of severability. We see that the severability of each block peaks in the interval of times between and . As these intervals do not coincide, we indeed cannot apply our version of Simon-Ando’s theorem.

Our local time scale theorem is nevertheless applicable to each block separately, and allows us to identify them as mesoscale components reaching high severability at different time scales. This shows that the local time scale theorem is of wider applicability and is a more relevant tool to identify components with dynamical coherence in a complex, heterogenous dynamical system.

Appendix D Computational aspects of Severability

d.1 Severability optimization flowchart

Figure 6: Flowchart of the optimisation procedure to find the most severable component to which a node belongs. For clarity, the Markov time is assumed to be constant in this diagram.

d.2 Computational Complexity

Let be the number of nodes in a graph. The severability of a component of size for a Markov time can be computed in time, where the cubic term comes from schoolbook matrix multiplication. Computation of mixing and retention given are both O() operations, so the total cost is dominated by matrix exponentiation.

The cost can be reduced using fast matrix multiplications techniques; for instance, using Strassen’s method, the total cost would only be . Alternatively, for large , matrix diagonalisation can be first employed, which makes the term negligible, giving a solution.

However, finding good components is more involved than simply computing the severability of a single set of nodes. The cost of the component optimisation algorithm described in Appendix D.1 is more difficult to characterise, as it depends strongly on the number of nodes neighbouring the putative component throughout the procedure. In pathological cases, the cost is , where is the maximum number of nodes permitted in the component, and is the size of the graph. Luckily, this upper bound only occurs in complete graphs, and so is of little relevance as most real networks are far sparser. However, by specifying the maximum component size , one can choose the maximal computational resources one wants to spend trying to find a component.

Potential optimizations include using a random walk to highlight likely candidate neighbours; for instance, by choosing only the

nodes that a random walker uniformly distributed in

would most likely walk to in the next step, or for removal of nodes, the nodes in that have the least density of probability. Such an algorithm would only cost , a significant improvement.

More subtly, the computational cost of the matrix powers might also be reduced, by taking advantage of the fact that for each of the neighbouring components is effectively a rank-2 perturbation of . Furthermore, as briefly mentioned in the discussion, severability is only one way of quantifying the mixing and retention of random walkers. Other, alternate, methods may be found that are quicker.

d.3 Benchmarking against community detection methods

Optimal component cover.

To compare against benchmarks with overlapping components, it is necessary to generate a list of components to cover the network. Simply taking the optimal components of each node is suboptimal, because then there are many duplicate components in the list. Instead, we chose the following naive method:

  1. Let be the set of components; let be the set of nodes that have been assigned to at least one component.

  2. Choose a node that is more connected to unassgined nodes than to nodes in . If no such node exists, end.

  3. Find the optimal component for , and add to and the nodes in to .

  4. Repeat from step 2.

Partitioning.

To compare severability with partitioning methods, it is necessary to turn the optimal component cover into a partition. To do so, first order the components of the cover arbitrarily. Where a node appears in multiple components, always choose the first component it appears in. This procedure is obviously dependent upon the ordering of the components; however, in networks with well-defined partition structure, this method works sufficiently well, as demonstrated in the LFR benchmark.

Choice of Markov Time.

For hierarchical networks, Markov time serves as a useful resolution parameter, allowing for severability to pick out optimal component structure at different levels. However, existing metrics Danon et al. (2005); Lancichinetti et al. (2009) require the selection of a single time . For partitions, this can be done by choosing a Markov time to minimise the number of singleton and overlapping vertices, but other could be chosen.

Quantifying similarity of partitions.

To compare partitions across different methods, normalised mutual information Danon et al. (2005) has been employed. To compare component covers, a generalisation of normalised mutual information that allows for overlapping nodes has been used Lancichinetti et al. (2009). We refer to the generalised variant as simply “normalised mutual information”, without loss of precision as only the generalised variant can be used in the benchmarks with overlapping components.

Appendix E Linearization and dicretization of a network of Kuramoto Oscillators

For power networks, in a number of situations of practical relevance Dorfler and Bullo (2011); Dörfler and Bullo (2012), e.g. when operating in the regime where frequencies have almost synchronized, the term can be reasonably neglected and one may linearize around the steady state trajectory to obtain

(21)

where is defined as . The matrix is called the Laplacian of the network, as it plays the same role in graphs as the Laplace operator in continuous space. It is important to note that this equation also fully characterizes the consensus model of opinion dynamics Olfati-Saber et al. (2007), the heat equation, and random walkers diffusing through the network in continuous time Chung (1997); to wit, the represent, respectively, converging opinions, equalizing temperatures, or the expected fraction of walkers on node at any given time.

In order to build a discrete-time random walk to which our framework can be directly applied, we choose a timestep , where is the smallest natural number such that a modified adjacency matrix is strictly positive. We then measure the severability of random walk dynamics on the graph defined by the modified adjacency matrix .

Appendix F Variants of the LFR benchmark

f.1 Unweighted, undirected, non-overlapping LFR networks

We analyse a class of networks in which components are extremely unevenly sized, a situation in which many popular partitioning methods perform suboptimally. These multi-scale networks are randomly constructed such that both degree and component size distributions follow power laws, with exponents and , respectively. Additional parameters include the total number of nodes , the average degree , the maximum degree , and the intrinsic parameter —not to be confused with the mixing which is part of severability. The fraction of links from a node to other nodes within the same component is given by Lancichinetti et al. (2008). Graph generation parameters were chosen at values typical of real networks: , , , , and Lancichinetti et al. (2008). Severability optimisation was performed with a maximum search size , and partitions were generated from the component cover.

As can be seen in Figure 7, severability performs well, always finding the natural component structure up to until around , when components are no longer defined in a strong sense Radicchi et al. (2004). That severability begins failing at is as expected and consistent with its definition, since at that point random walkers are as likely to escape during each step as to remain within any of the pre-seeded components. Recalling the definition, a component is defined as severable precisely when random walkers tend to stay and mix within it. Even so, the results are comparable to that of Infomap and modularity optimisation using simulated annealing, which have been found to be amongst the most successful methods for this benchmark Lancichinetti and Fortunato (2009).

f.2 Unweighted, undirected, overlapping LFR networks

Further extensions to the LFR benchmark were implemented to allow for components to overlap Lancichinetti and Fortunato (2009). In Figure 8, we compare the component covers from severability to the pre-seeded components. For the optimisation, the maximum search size was used for the upper and lower panels, respectively. The parameters chosen were identical to those used for the evaluation of k-clique percolationPalla et al. (2005) in figure 6 of Ref. Lancichinetti and Fortunato (2009). Comparison with those results shows that severability performs comparably for the smaller component sizes, but significantly better for larger components.

f.3 Weighted, directed, overlapping LFR networks

Severability also loses no accuracy when direction and weight are added to the benchmark Lancichinetti and Fortunato (2009) (as seen in Figure 9). This is expected, since the Markov chain formulation naturally includes both. For the optimisation shown, the maximum search size .

Figure 7: Comparison of severability with modularity and infomap the LFR benchmarks with exponents , , average degree , and maximum component size of 50. Severability optimisation was performed with maximum search size of 50, and Markov time (a value determined as a result of minimising the number of orphan nodes and overlapping nodes). Modularity was optimised for using both simulated annealingGood et al. (2010), which is extremely slow, but gives good results, and a faster heuristic by Blondel, et al Blondel et al. (2008). Each point is an average over ten random realisations.
Figure 8: Severability at Markov time , with an unweighted, undirected, overlapping variant of the LF BenchmarkLancichinetti and Fortunato (2009). The networks have 1000 nodes; the other parameters are , , , . Each point is an average over five random realisations.
Figure 9: Severability at Markov time , with a weighted, directed, overlapping variant of the LF BenchmarkLancichinetti and Fortunato (2009). The networks have 1000 nodes; the other parameters are , , , , , , . Each point is an average over five random realisations.

Appendix G Image processing

The image in Fig. 5 of the main text was pre-processed by reducing the image resolution to a more convenient size and converting to a network using standard methods. Briefly, we connect only adjacent pixels (using the maximum metric) with link weight where is the difference in luminosity and is an adjustable parameter controlling the exponential weight decay. Here we used . Severability was optimized with Markov time , and maximum size .

In a post-processing step, segments with mixing and retention

were removed as outliers, because at high Markov times they correspond to nearly disconnected components. If a component

was completed embedded in (), we keep only the one with higher severability. Communities were then inductively merged if they overlapped by more than 20 pixels until no more merges were possible. Merging is generally relevant when a feature of the network is much larger than the maximum search size; in this case the optimisation method gives overlapping patches of the background, which can then be pieced together. The segments were ordered by average luminosity, and the darker patches were assigned to the background.

Appendix H Ring-of-rings

We also examined the results of running several other popular graph partitioning methods on the ring-of-rings network shown in Figure 3. Infomod, Infomap, and Modularity were all unable to recover the ring structure of the graph (Figure 10).

Figure 10: Ring of rings. As in Figure 3, heavy lines (within rings) correspond to undirected links with weight 2, while light lines between rings to links with weight 1. Severability is able to recover the seeded ring structure (at Markov times ). Infomod returns the entire network, whereas Infomap and Modularity each return only arcs from the rings.

Appendix I Square lattice

As a negative control, It is instructive to consider a network in which there is clearly no structure. For that, we chose a regular 2-D square lattice with each node connected to all 8 neighbours (including diagonal links). We visualise this using a uniformly coloured discrete image, in which each pixel is connected to all of the adjacent pixels with links of equal strength. As can be seen in the figure below, after accounting for symmetry considerations, all components found are transients, which is the expected result.

Additionally, these images strongly suggest a relationship between severability optimisation and diffusion. This is of course quite closely related both to the dependance of severability on random walk dynamics and to the optimisation procedure outlined in Appendix D.1. Along these lines, the optimisation procedure we outlined can be thought of as a modified random walk in which previously explored states are immediately accessible to the random walker, but probability barriers in the “energy landscape” are magnified.

Figure 11: (top) Correlation of Markov time with size on an square lattice. (bottom) The transient components found by severability on a regular lattice at Markov times . Each block is connected to all eight of its neighbouring blocks by a single undirected edge of weight .

Appendix J Ring of small-worlds: commutativity & locality

We further explore commutativity as in Figure 5, by looking at a ring of small-world networks and comparing against OSLOM. We first generate small-world networks using the Watts-Strogatz model. Each node is first connected to its 2 neighbors on both sides. Then every edge is rewired with independent probability , but such that multi-edges cannot exist, so a small world with a total of 5 nodes will not be rewired from a 5-clique.

Note that whereas severability gives the same results when looking at a single small-world network compared to a ring of four of them, OSLOM does not. Some of this is equivalent behaviour, as OSLOM chooses to not consider the entire network as a valid community. For the small-worlds of size 5, 10, and 20, OSLOM returns all individual nodes, which is as valid of an answer as the entire network. However, for the largest of the small-world networks of size 40, OSLOM chooses to split it up into 3 pieces, which is not what it chose in the ring of 4 small-worlds. Severability always gives the small-world at the appropriate times, as it is truly local.

Additionally, OSLOM demonstrates trouble when the scales of the networks are very different. It is unable to recover the 5-clique of the smallest small-world, despite the 5-clique being recoverable when the other communities are of the same size. This comes from the imposition of the same resolution on all communities implicit in OSLOM.

Figure 12: Ring of small world networks of sizes 5, 10, 20, and 40 generated using the Watts-Strogatz model. (a) At different times, severability recovers each small-world network, as expected. Additionally, when the largest small-world is in isolation, it still correctly recovers it a component. (b) OSLOM recovers the three larger small-worlds, but splits up the smallest one, even though in other experiments it can recover 5-cliques. Additionally, when the largest small-world is given in isolation, OSLOM breaks it up, giving three overlapping communities instead.

Appendix K Co-existence of different timescales

Figure 13: 5-clique of 5-cliques attached to a small world network of size 50 generated using the Watts-Strogatz model. (above) At Markov time 2, the 5-cliques are recovered, but not the 50-small world. At Markov time 16, the size 50 small world is recovered, but the 5-cliques aggregate into a 5-clique of 5-cliques. At no one time are both the 5-cliques and the 50 small world simultaneously recovered, because they exist on different time scales.

Appendix L Word Association Extended

Figure 4 only depicted the components including “nature” and the orphans directly connected to that word. However, this is only a small snippet of the entire network. Here, we have displayed all the other components that have at least one link to “nature”, but do not include the word itself. As with Figure 4, the maximum search size and the Markov time .