I Results
i.1 Severability: mixing and retention in Markov landscapes
To introduce our method, we draw an analogy from energy landscapes Wales (2006). In particular, we consider the Markov landscape defined by the transition matrix of the standard random walk on a graph, where the nodes (or vertices) of the graph correspond to states and the landscape reflects the transition probabilities between them. Markov landscapes are analogous to energy landscapes although they lack a potential energy function pointing downwards to a minimum energy state. Still, the notions of wells, barriers and roughness translate easily and helpfully to the language of time scale separation. In this picture, a well is a group of states surrounded by high barriers (hence with a long escape time), whereas roughness inside the well is related to the mixing time (low roughness implies a fast mixing time). An illustration of such a landscape can be found in Figure 1a, where we present a 3D representation of the luminosity landscapes of three paintings with very different characteristics; from a well compartmentalised painting by van Doesburg to a rough, featureless excerpt of Monet. In this case, the barriers and roughness are obtained from differences in luminosity of adjacent pixels. If a random walker (e.g. De Gennes’ ant in a labyrinth de Gennes (1976)) is allowed to explore van Doesburg’s luminosity landscape, the observed dynamics will reveal the presence of severable components in the state space; on the other hand, no such components would be expected in Monet’s landscape.
Mathematically, a subset of states of a system is defined to be a severable component if it has both high barriers and low roughness, as extracted by the behaviour of the random walkers on the underlying landscape. As shown below in a precise sense (see Section III.1), such a severable component can be understood as a mesoscale dynamical structure, i.e., a set of states that behave coherently in the eyes of the external environment and which capture a relevant description of the system sitting between individual nodes and the global system. To formalise these notions, we borrow the concepts of mixing and retention from Markov chain quasistationarity Darroch and Seneta (1965), as follows. First, we introduce a measure of the mixing over a set of states by appealing to a random walker restricted to those states; is poorly mixing over a timespan if the random walker’s position at Markov times and are strongly correlated. More precisely, we measure mixing by defining a quantity
which measures the total variation distance between the probability distribution over
at time and the quasistationary distribution reached at long times, should the walkers remain in (Eq. (5) in Methods). The mixing is thus inversely related to the roughness of the landscape over , since the exploration of is hindered by the roughness of the landscape. Secondly, we characterise the retention over the set , which is directly related to the height of the barriers separating the set of states from the rest of the system, i.e., random walkers tend to stay within if it is hard for them to escape. This is quantified by , a number between and defined as the probability of a walker not escaping by time (Eq. (4) in Methods). Both and therefore range from 0 to 1, where the value of 1 corresponds to perfect retention or mixing, respectively.We now simply define the severability of the set at time scale as
(1) 
Severability can be understood as a compound function that balances mixing and retention for a given set of states over the time scale . If corresponds to a mesoscale dynamical structure, its severability will peak at some time , below which the walkers are poorly mixed and beyond which retention is degraded. In a connected network, the individual node and the entire graph will respectively have good severabilities for Markov time and for the trivial reason: at , retention and mixing will be perfect for any individual node because nothing has diffused, and at , the probabilities will have reached the ultimate stationary distribution, implying perfect mixing coupled with the always perfect retention of the entire graph. At intermediate timescales, severable structures are of intermediate size, based on a combination of mixing and retention; on grid graphs, these optimally severable structures slowly expand with Markov time as higher times allow for mixing of larger diameter regions (Figure 1). Less uniform graphs have more interesting substructures; optimally severable structures remain so over a range of Markov times before jumping in size to another plateau.
This notion is illustrated in Figure 1b, where we show how the process of diffusion of information on a very simple model of a network, given by a hierarchical random graph with three levels of 16, 64 and 256 nodes, leads to severable components of the state space. As the time of diffusion increases, the random walkers gain sufficient probability to overcome the barriers of the landscape, and hence diffuse to larger portions of the network so that the optimal severable components grow from being single nodes at very short times, through each of the intermediate levels over different time scales, to the entire network at long times.
The components of an interconnected dynamical system with high severability have a precise mathematical meaning in terms of local time scale separation (see local time scale theorem in Section III.2 in Methods). Briefly, the existence of a local time scale separation for a group of states in the dynamics of the random walker allows for a simplified model for the dynamical behaviour of the group of nodes when excited by an impulse, i.e. an arrival of probability mass into . High retention and high mixing (implying high severability) at time guarantees: (i) the effect of on the rest of the system when given an impulse can be neglected altogether for time scales less than , and (ii) the subsystem can be accurately approximated to first order by a single state that aggregates all the states of for all times beyond .
In summary, the set can be thought of as a structure of intermediate size whose dynamical response to an impulse permits accurate simplified descriptions. Our local time scale theorem (Section III.2) is inspired by Simon and Ando’s classic result for global time scale separation, yet it differs from it in that it seeks to find the conditions under which one can reproduce correctly the behaviour of a severable component at different time scales, independently of the rest of the system. When the full interconnected system can be partitioned into components with comparable time scales, we recover Simon and Ando’s global theorem (see Supp Inf. B), demonstrating that our local time scale theorem generalizes their result.
i.2 Mesoscale components in power networks
As a first application, we consider the synchronization dynamics of coupled nonlinear phase oscillators with Kuramotolike sinusoidal coupling Boccaletti et al. (2006), which is found in areas as diverse as laser physics, biological synchrony of cells and animals, and power networks Strogatz (2000). For our example, we will apply severability to a standard power network benchmark. Power networks are composed of two types of nodes: generator buses, which deliver power, and load buses, which consume power. The internal state of each node is described by a voltage, which oscillates with a frequency around a nominal value (e.g. 50 or 60 Hz). The nonlinear dynamics of bus can be modelled as
(2) 
where is an inertia (zero for some buses), is a damping coefficient, is the power being injected or withdrawn from the network at node , and indicates the strength of the (symmetric) interaction between and Dörfler et al. (2013). Given sufficient coupling strength between the nodes, and depending on properties of the coupling matrix , the network converges to a stationary state where all angles in the system oscillate at constant frequency , keeping relatively small constant angle differences with respect to one another.
Although this system is inherently nonlinear, severability is of use here. In Figure 2 we show the results of the application of our analysis to linearized discretetime random walk dynamics based on node strengths , that is equivalent to the continuoustime nonlinear dynamics for small deviations around the synchronized state (see Supp.Inf. E for details). Our example is on a classic test case for power networks, the IEEE RTS96 test system, composed of three identical copies of the RTS24 test system interlinked with a few extra edges and one extra node. Previous work has used timescale based identification of global partitions into slowcoherent areas based on global edgecounting or spectral methods Avramovic et al. (1980); Romeres et al. (2013). These global partitions correctly recover the expected components, but by nature require information about the entire network. In contrast, severability recovers the expected components based solely on local information and provides a validation of the components in terms of their dynamical response. More precisely, Fig. 2 shows that the fully nonlinear simulations of the model (2) can be well represented by the aggregated angle variables within the components found with severability: the aggregation of angle variables within the ‘correct’ components has little effect on the dynamics of the other variables of the system, whereas aggregation of an ‘incorrect’ subgroup results in major discrepancies from the full dynamical evolution. This result justifies the simplification of using random walk dynamics in severability, even for more complicated systems.
More generally, using random walks to model higher order dynamics provides a general framework to capture central features of many other dynamics taking place on a network (Section III.4).
i.3 On severable components and cliques
Given a networkcentric view of severable components, it may come as no surprise that there are some similarities between network community detection and the discovery of severable components. Network communities are groups of nodes with strong connectivity within the group and lower connectivity with the rest of the network. Communities are often captured with local or global metrics that relate the number of edges crossing the boundaries of a community with the edges inside the communities, such as modularity Newman and Girvan (2004), SBM maximum likelihood Karrer and Newman (2011), OSLOM Lancichinetti et al. (2011) or conductance Shi and Malik (2000), sometimes with random walk as a computational tool Andersen et al. (2007); Spielman and Teng (2013). Most of those criteria essentially capture the retention part, , of severability. A few references Kannan et al. (2004); Jeub et al. (2015) also analyse the conductance internal to the cluster, which is a combinatorial criterion capturing essentially the mixing part of severability for . Although not all networks may have an easily measured dynamical interaction taking place on it (e.g. social networks), we can endow the graph with the standard random walk and apply severability to those dynamics. Indeed, the standard random walk on a network can be used to approximate such things as opinion dynamics, information diffusion, and consensus problems, as the dynamics of the random walk is deeply related to the structure of the graph Delvenne et al. (2013); Rosvall and Bergstrom (2008); Morarescu and Girard (2009); Van Dongen (2008); Masuda et al. (2016). As shown in Fig. 1b, the expectation is that such communities are detected as meaningful mesoscale components observed during information diffusion. To validate this idea further, we have used the standard LFR synthetic benchmark network model for community detection, where networks are constructed as dense random ErdősRényi graphs interconnected by sparse random links at several levels of coarseness Girvan and Newman (2002); Lancichinetti et al. (2008). As shown in Supp.Inf. F, our greedy algorithm for finding severable components recovers the communities with high fidelity, comparable or superior to other stateoftheart methods Lancichinetti and Fortunato (2009). We remark that severable components are found from the local diffusive dynamics without global information from the graph.
However, communities in social networks are often characterised by a cliquelike structure, showing for instance a low diameter and high density of triangles Palla et al. (2005). While cliquelike structures emerge as particular cases of severable components, severability may detect longrange structures that are not akin to communities. An example of this is shown in Fig. 3a, where a ringofrings network is correctly revealed by severability. Such nonclique like structures are present in other areas of application, including transportation networks, images, and protein structures Schaub et al. (2012). Another illustration is provided by biochemical networks, and one canonical metabolic pathway is the citric acid cycle. When we analyze the citrate pathway schematic (map00020) in the KEGG databaseKanehisa and Goto (2000) using severability, the search for high severability structures detects the Krebs citric acid cycle. These structures do not fit the standard definition of communities, and indeed are not detected as such by most community detection algorithms Schaub et al. (2012). However, as exemplified by the Krebs cycle, they are nonetheless dynamical structures of importance. Thus, although severable components and network communities share some characteristics, they are different concepts built on different ideas, the former by coherency of dynamics, and the latter by the density of cliquelike structure.
i.4 Word association as a diffusion: overlaps and orphans
An important feature of severability as a means to analyzing interconnected sytems is that it allows the possibility of overlaps (a node can belong to more than one group) and orphans (a node can belong to no group, as every group that includes it has higher severability without it). To illustrate these features, we turn to a word association network (the University of South Florida Free Association Norms dataset Nelson et al. (1998)), previously used to highlight the existence of overlapping network communitiesPalla et al. (2005). To build this network, researchers presented words to participants, who were then asked for the first word that came to mind. Hence each node in the network corresponds to a word, and directed links between nodes are weighted according to the proportion of responses linking those two words. For example, when cued with ‘science’, 21.4% of participants wrote ‘biology’.
The very construction of the network is reminiscent of a random walk process representing the mental association based on similarity of meaning and contextual usage, thus making severability able to incorporate the weight and directionality of the network in a natural way. As severability is a local method, it is not necessary to analyse the entire graph to find components. Rather, by analysing increasing horizons on the network an expanding view of associated meanings presents itself from a particular vantage point. Figure 4 shows the word ‘nature’ and the components it belongs to (with maximum search size and Markov time ), as well as the components and orphan nodes to which ‘nature’ is directly linked (see Supp.Inf. L for further details). By permitting overlapping components, we are able to recover the different contexts and meanings associated with a single word.
i.5 Locality in image segmentation: zooming and cropping
In Fig. 5 we apply severability optimisation to the identification of stained neurons in a cellfluorescence image in order to illustrate visually a central aspect of the method; namely, that it does not rely on global information in order to detect mesoscale components faithfully. Below we show that the results are similar whether the algorithm is run on only some part of the image, or on its entirety.
Image segmentation divides images into subsets of adjacent pixels of similar color or luminosity, and is particularly used for medical and biological imaging. Some of the existing segmentation methods are based on a nominal diffusion dynamics taking place on the lattice graph of pixels Grady (2006); Schaub et al. (2012). In this view, a segment can be seen as a particular case of a severable component, as already suggested in our initial view of the paintings in Fig. 1. To carry out our analysis, we have followed a classic protocol to generate a lattice graph from the image by assigning an edge between pixels weighted by a function of the difference in luminosity and distance (up to a cutoff) Shi and Malik (2000); Browet et al. (2011); Wu and Leahy (1993) (for details see Supp.Inf. G). The severable subsets are selfconsistent in that they are found robustly from diffusions starting from any of the members of the set; as these are strongly severable subsets, they tend to be found regardless of which member of the set is used as a starting point (unlike the word association communities of the last section). Both the cells and patches of the background are found as severable components. Furthermore, because severability does not depend on global information, the results do not change significantly when the algorithm is run only on a smaller section of the image: only segments that lie on the edges of the image are affected by cropping. This feature is of potential application to evolving networks, as communities are stable against perturbations and do not need to be recomputed fully when new nodes are added outside of a local neighbourhood.
We note that while other completely local methodsJeub et al. (2015) exhibit a similar commutativity, ‘mostly’ local methods like OSLOM (Order Statistics Local Optimization Method) do not Lancichinetti et al. (2011). Though OSLOM is based on local order statistics, when partitioning, it takes into account some global information, making it noncommutative. In Appendix J, we show that OSLOM behaves differently on a ring of smallworld networks vs a single smallworld.
Ii Discussion
Reallife dynamics emerging from the interaction of many elementary nodes can sometimes be seen as the interconnection of mesoscale dynamical structures, whose evolution over a time scale of interest can be represented as a single aggregated state interacting with its surroundings. We have introduced a measure for the detection of such severable components which can be well approximated by their aggregated variables. The formal theory, a generalization of Simon and Ando’s classic theory to local time scale separation is illustrated on the particular case of Markov chains, which are representative of a larger class of dynamics, including consensus and synchronization. Severable components, which can coexist at several sizes and time scales, overlap and leave orphan nodes. This dynamical concept is connected to other more particular notions encountered in several classes of systems, including basins (energy landscapes), slowcoherent areas (power networks), segments (image processing), communities (social networks analysis), and rings (biochemical networks) which can be understood as structures with a locally coherent dynamics. On the other hand, other kinds of mesostructures in complex networks (e.g., block models, or roles in ecological systems BeguerisseDíaz et al. (2014); Cooper and Barahona (2010, 2011)) are global in nature, and do not fall under the condition of locality required in this paper. Hence, while locality is an advantage for large systems, by definition truly global characteristics cannot be thus discovered. However, we have shown in this paper that many classes of structures are in fact locally defined, demonstrating the applicability of severability.
Engineering disciplines traditionally operate by plugging together smaller components, usually seen as black boxes with simple external behaviour regardless of their internal complexity, in order to generate complex systems with controlled behaviour. One may argue that many natural systems are built similarly. In this perspective, we aim here to reverse this process: although complex systems are often too large to analyse in their entirety, our approach here is to try and find if there exist suitable intermediate dynamical components which provide a proper understanding and representation of the complex global dynamics. In this sense, severability serves the role of a local coarsegraining mechanism for the dynamics as observed from a given subset of states in the system. Appealing to the coexistence of local time scales in Markov processes as a means to reveal severable components establishes mathematical connections between diffusion processes and model reduction, linking in a precise sense good mixing and retention in a subsystem to its accurate approximation through coarsegraining while preserving the Markov property.
As Big Data continues to proliferate, severability provides a first step towards the definition of new methods able to tackle the huge wealth of data being collected in all areas of science, technology and social life, much of which comes with a naturally endowed dynamics. Undoubtedly, more challenges lie on the road ahead, such as in the treatment of more sophisticated node dynamics, for example when the dynamics are strongly nonlinear or nonMarkovian Rosvall et al. (2014); Delvenne et al. (2015). Yet the importance of dynamics as a key to characterising networks will undoubtedly persist. Ultimately, we hope that the framework of severable components (code available at https://github.com/yunwilliamyu/severability) provides not only a specific solution to recovering mesoscale structures when the dynamics are roughly Markovian, but also a meaningful and practical starting point for more sophisticated methods capable of tackling these more difficult problems.
Iii Methods
iii.1 Formal definition of Severability
The definition of severability uses concepts of graph theory and Markov chains. A graph is a set of nodes (or vertices, or states in the Markov chain terminology) together with another set of links (or edges) between vertices. We assume that every node has at least one outgoing edge, and that all edges are labelled with a positive weight. The weighted, directed graph, is encoded as an adjacency matrix , where is the weight of the edge going from to . The (weighted) outdegree of node is the sum of weights of edges leaving
. The outdegrees can be compiled in the vector
, where is the vector of ones and is the diagonal matrix of outdegrees.On a given graph , we define a random process in discrete time. A random walker starts from a node at time and jumps at time to any outneighbour with probability , proportional to the edge weight. Successive jumps at define a Markov chain, or random walk, on the graph. The probability of presence of the random walker evolves as
(3) 
where is the normalised probability vector and is the transition matrix, the rows of which are nonnegative and sum to one. Provided that the graph is strongly connected and aperiodic (i.e. there is no integer such that all cycles comprise an exact multiple of edges), any initial probability distribution converges to a unique stationary distribution, which is a solution of the fixedpoint equation .
Given a connected subset with nodes, let be the submatrix of corresponding to the nodes in . Then we define the retention of the subset over time , , as the probability for a random walker starting with a uniform probability distribution in not to have escaped by time :
(4) 
To define mixing, let be the th row of the matrix . Note that because is connected, . Thus the normalised row of , is the probability distribution at time for a random walker starting from node , conditional upon the walker remaining in between 0 and t. We can then define the internal mixing as
(5) 
where is the arithmetic mean over the unitnormalised rows of , and we have used the fact that the total variation distance norm is given by
(6) 
The internal mixing term approaches 1 as the probability distribution of a random walker starting somewhere uniformly random within the community approaches the quasistationary distribution on that subset of nodes Darroch and Seneta (1965).
Both and are defined to range from 0 to 1, where the value of 1 corresponds to perfect retention or mixing, respectively. We define the severability as compound function of both retention and mixing:
(7) 
which can be understood as the quality of the subset to be considered as a separate dynamical mesostructure over time . Severability has an intrinsic resolution parameter , corresponding to the Markov horizon; as increases, the random walker will diffuse to larger parts of the graph, as reflected by the iterations of the submatrix . Note that, from the above definitions, depends only upon the outlinks from nodes within ; hence it is a purely local function.
The particular form assumed for retention, mixing and severability is justified by the mathematical properties stated in III.2, and proved in the Supp.Inf.
iii.2 Local time scale separation theorem
iii.2.1 Background: Simon and Ando’s global time scale separation theorem
In 1961, Simon and Ando established a time scale separation theorem, both for general linear systems and Markov chains in particular Simon and Ando (1961), which we present now.
Given a Markov chain
(8) 
let us split the nodes in two sets
with a corresponding partition of
(9) 
Fix an arbitrary , which will serve as a requested standard of approximation. Assume that is close to a perfectly decoupled transition matrix
(10) 
Simon and Ando proved that there is a small enough and a time such that if , then two kinds of approximations are valid for the trajectories of :

On the one hand, for all times , the decoupled approximation
(11) is within in norm from the actual solution :
(12) 
On the other hand, and more importantly, for all times and in particular for all , the aggregated probabilities
are within in norm from the approximation
(13) for some real values .

Moreover, for times , can be reconstructed as with an error bounded by , for some .
Which norms are chosen in the statement above is irrelevant, as all norms of vectors or matrices of a given size are equivalent up to a factor, making the statement true for any choice of them. For simplicity we stated here the two block case in a Markov chain dynamics, although the theory holds for general linear systems split in an arbitrary number of blocks. It is important to notice that the required depends not only on the given but also potentially on all the entries of the diagonal blocks , as is apparent for example in our own proof of SimonAndo’s theorem (Supp.Inf. III.2.1). The theorem can therefore only be applied globally, with full knowledge of the dynamics. It is desirable to decouple this global condition into the local conditions to be satisfied by each diagonal block to satisfy the required accuracy , and severability offers one practical way to achieve this, as shown below.
iii.2.2 Statement of the local time scale theorem
Following the same notation as above, consider the first block (denoted in the main text and Methods A), which describes the set of states with severability . The local dynamics of is described by the open dynamical system
(14)  
where , defined for all , is the input into the subsystem , i.e., the inflow of probability from the environment, and is the output of the subsystem, by which it influences the environment by an outflow of probability. By the environment we mean the rest of the state space, described by , itself governed by an open system equation of the same kind as Eq.(14). The global dynamics on can be understood as the feedback interconnection of the two systems, related by the equations: .
Equation (14) describes a relationship between the input sequence and the output sequence . An alternative way to describe an open system in linear response theory is by its impulse response. In our notation, the impulse response can be written as
for all , and zero for . The impulse response characterizes fully the inputoutput relationship in that the output generated by any input sequence is obtained by the convolution product (defined as for all ), assuming zero probability in initially, . A nonzero initial state can be incorporated by adding an artifical input , setting a state . To approximate the behaviour of described by Eq. (14), we need to approximate the function by another impulse response between the same input and output spaces measured with a given norm. A common metric used in the open systems literature is the onenorm
(15) 
whenever it is defined. Of course, given matrices , we can choose any matrix norm for , as they all relate within a constant factor only dependent on the dimension of the matrix. If the approximation is only meant to be valid on time interval , then we can restrict the sum in Eq. (15) to , denoted . An error in the impulse response committed in replacing by will result in an error in the output in the following way, as one can show from elementary algebra:
where can be infinity.
Our local time scale separation theorem makes two statements, regarding the approximability of the impulse response of the nodes , before and after an arbitrary chosen time . The first one at short times follows directly from the high retention implied by a high severability at time , whereas the second one at long times requires a more careful analysis. The theorems are proved in Supp. Inf. A.
Local Time Scale Theorem (Short times).
The system represented by Eq. (14) can be approximated until time by the trivial response , with accuracy .
In other words, the influence of the system on its environment can be neglected altogether over short time scales.
Local Time Scale Theorem (Long times).
The system described by Eq. (14) can be approximated by a onestate system of the following form
(16)  
where
is the dominant eigenvalue of
and are appropriate vectors, whose corresponding impulse response is . Vectorsare found from the dominant eigenvectors
, , and , normalised so that . The approximation is valid for all times—including obviously —and the error summed over all times is .For any given input signal bounded by for all , the exact model described by Eq. (14) and the onedimensional model given by Eq. (16) deliver outputs whose difference is at all times bounded by .
The constants contained inside in these statements may depend on the dimension of (number of nodes in ), but neither on the specific entries of nor on . In view of these statements, the best time scale separation is given by , at which severability peaks, and the error of the resulting approximations is .
Assuming now that the global network is split into two or several blocks, one may combine the different local approximations and obtain the following version of classic SimonAndo theorem: given a global dynamics given by Eqs (8) and (9), suppose that we find a common time at which both and , then the shortterm and longterm dynamics can be approximated as Eqs (11) and (13) with error bounded by , where the hidden constant only depends on the total number of nodes (Supp. Inf. B). The generalisation to more than two components is straightforward. This version highlights the role of severability of each component and the need to find a common global time scale (possibly suboptimal for each component separately) where each component simultaneously reaches a high severability, for a global time scale separation to emerge.
See Supp. Inf. C for a toy example of comparative application of the global and local time scale separation theorems.
iii.3 Computational aspects of Severability optimisation
We apply a semigreedy search algorithm to find the optimal component for a starting node , at a chosen Markov time and setting a search size (see Appendix D.1 in the Supp. Inf. for a detailed flowchart).
Briefly, the algorithm proceeds as follows. Without loss of generality, define . Initially, only . Aggregate nodes greedily, except let every third step be a KernighanLin switch of a single node on the boundary of to maximise Kernighan and Lin (1970). After the initial semigreedy optimisation, the intermediate component that has maximal severability is finetuned using KernighanLin switches to find a local maximum. If is in the resulting component, the algorithm stops; otherwise, start over with a different neighbour of the starting node. If all neighbours of have been attempted without success, declare an orphan. For the word association network, every neighbour of “nature” was attempted for the first step, giving the overlapping communities.
A detailed description of other computational aspects of the implementation are discussed in Supp. Inf. D.
iii.4 Markov chain equivalence of dynamical systems
Markov chains, or random walks, are characterized by a dynamics of the form , where is any nonnegative square matrix with all rows summing to one. To every such dynamics we can associate a dual consensus dynamics acting on the column vector , the entries of which are positions, or opinions, of agents, which converge to one another until convergence to the same value if and only if the corresponding random walk converges to a unique stationary distribution.
Positive linear systems are common in economics, biology, chemistry, where variables naturally take nonnegative values. Such systems are characterized by an evolution , or , where is only required to be nonnegative. Under the same connectivity conditions on the network underlying , we know that there is a unique dominant eigenvalue and a corresponding left and right eigenvectors, and respectively, all of which are positive by virtue of the PerronFrobenius theorem.
This property allows a normalization that transforms the dynamics into a consensus, or random walk, dynamics. The new matrix is , where is the diagonal matrix associated to . It is readily observed that is a valid transition matrix, and is equivalent to except for a global scaling and a change of variable on every node. In particular, it has the same eigenvectors and acts on the same underlying network as . This transformation has an elegant informationtheoretic interpretation as the random walk with maximal entropy rate (if is a zeroone matrix) or a free energy (if the nonnegative entries are interpreted as exponential energy barriers along the edges) Ruelle and Gallavotti (1978); Delvenne and Libert (2011).
Markov chains are also defined in continuous time, following an equation of the form , where the continuoustime transition matrix has nonpositive diagonal, nonnegative offdiagonal terms, and zerosum rows. One also has continuoustime consensus, and any positive continuoustime linear system, characterized by a matrix with nonpositive diagonal and nonnegative offdiagonal terms, can be similarly normalized to a continuoustime Markov chain, which can be sampled to a discretetime Markov chain.
Some nonlinear systems can be linearized around a fixed point. Classic theorems such as HartmanGrobman’s ensure that the nonlinear and linearized systems are equivalent up to a change of variables in a neighbourhood of the fixed point. Kuramoto oscillators, and power networks dynamics, linearize to consensus dynamics.
Acknowledgements.
The authors would like to thank Antoine Delmotte, Michael Schaub, Arnaud Browet, Florian Dörfler and Renaud Lambiotte for code and/or discussions. Neuron fluorescence imagery is courtesy of Simon Schultz and MarieTherese Vasilache. Y.W.Y. was partially supported by an Imperial Marshall Scholarship during the early years of this work. J.C. D. is partly supported by the Flagship European Research Area Network (FLAGERA) Joint Transnational Call “FuturICT 2.0”. This work is partially supported by the Engineering and Physical Sciences Research Council of the United Kingdom. Author Contributions M.B. and S.Y. conceived the project and guided the research. J.C.D. developed the theoretical analyses. Y.W.Y. implemented and designed the experimental analyses. The manuscript was jointly written by all authors.References
 Using pagerank to locally partition a graph. Internet Mathematics 4 (1), pp. 35–64. Cited by: §I.3.
 Neardecomposability, partition and aggregation, and the relevance of stability discussions. International Economic Review 4 (1), pp. 53–67. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Area decomposition for electromechanical models of power systems. Automatica 16 (6), pp. 637–648. Cited by: §I.2.
 Interest communities and flow roles in directed networks: the twitter network of the uk riots. Journal of The Royal Society Interface 11 (101), pp. 20140940. Cited by: §II.
 Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, pp. P10008. Cited by: Figure 7.
 Complex networks: structure and dynamics. Physics reports 424 (4), pp. 175–308. Cited by: §I.2.
 Community detection for hierarchical image segmentation. Combinatorial Image Analysis, pp. 358–371. Cited by: §I.5.
 Padé techniques for model reduction in linear system theory: a survey. Journal of Computational and Applied Mathematics 14 (3), pp. 401–438. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Spectral graph theory. Regional conference series in mathematics, Amer. Mathematical Society. Cited by: Appendix E.
 Hierarchical aggregation of linear systems with multiple time scales. IEEE Transactions on Automatic Control 28 (11), pp. 1017–1030. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Rolebased similarity in directed networks. arXiv preprint arXiv:1012.2726. Cited by: §II.
 Rolesimilarity based comparison of directed networks. arXiv preprint arXiv:1103.5582. Cited by: §II.
 Realistic control of network dynamics. Nature communications 4. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005 (09), pp. P09008–P09008. External Links: ISSN 17425468, Link, Document Cited by: §D.3, §D.3.
 On QuasiStationary distributions in absorbing DiscreteTime finite markov chains. Journal of Applied Probability 2 (1), pp. 88–100. External Links: ISSN 00219002, Link Cited by: §I.1, §III.1.
 La percolation: un concept unificateur. La recherche 7 (72), pp. 919–927. Cited by: §I.1.
 Diffusion on networked systems is a question of time or structure. Nature communications 6. Cited by: §II.
 Centrality measures and thermodynamic formalism for complex networks. Physical Review E 83 (4), pp. 046117. Cited by: §III.4.
 Dynamics on and of complex networks. In Oxford Handbook of Innovation, Vol. 2, pp. 221–242. Cited by: §I.3.
 Optimal statespace lumping in markov chains. Information Processing Letters 87 (6), pp. 309–315. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Topological equivalence of a structurepreserving power network model and a nonuniform kuramoto model of coupled oscillators. In Decision and Control and European Control Conference (CDCECC), 2011 50th IEEE Conference on, pp. 7099–7104. Cited by: Appendix E, Severability of mesoscale components and local time scales in dynamical networks.
 Synchronization and transient stability in power networks and nonuniform kuramoto oscillators. SIAM Journal on Control and Optimization 50 (3), pp. 1616–1642. Cited by: Appendix E.
 Synchronization in complex oscillator networks and smart grids. Proceedings of the National Academy of Sciences 110 (6), pp. 2005–2010. Cited by: §I.2, Severability of mesoscale components and local time scales in dynamical networks.
 Community structure in social and biological networks. pnas 99 (12), pp. 7821 –7826. External Links: Link, Document Cited by: §I.3, Severability of mesoscale components and local time scales in dynamical networks.
 Performance of modularity maximization in practical contexts. pre 81 (4), pp. 046106. External Links: Link, Document Cited by: Figure 7.
 Random walks for image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28 (11), pp. 1768–1783. Cited by: §I.5.
 Online social networks: a survey of a global phenomenon. Computer Networks 56 (18), pp. 3866–3878. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Globally networked risks and how to respond. Nature 497 (7447), pp. 51–59. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Differential network biology. Molecular systems biology 8 (1), pp. 565. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Think locally, act locally: detection of small, mediumsized, and large communities in large networks. Physical Review E 91 (1), pp. 012821. Cited by: §I.3, §I.5.
 KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28 (1), pp. 27 –30. External Links: Link, Document Cited by: Figure 3, §I.3.
 On clusterings: good, bad and spectral. Journal of the ACM 51 (3), pp. 497–515. Cited by: §I.3, Severability of mesoscale components and local time scales in dynamical networks.
 Stochastic blockmodels and community structure in networks. Physical Review E 83 (1), pp. 016107. Cited by: §I.3.

An efficient heuristic procedure for partitioning graphs
. bell 49 (2), pp. 291–307. Cited by: §III.3.  Singular perturbation methods in control: analysis and design. Academic Press, Inc., Orlando, FL, USA. External Links: ISBN 0124176356 Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Macrostate data clustering. Physical Review E 67 (5), pp. 056704. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11 (3), pp. 033015. External Links: ISSN 13672630, Link, Document Cited by: §D.3, §D.3.
 Benchmark graphs for testing community detection algorithms. pre 78 (4), pp. 046110. External Links: Link, Document Cited by: §F.1, §I.3.
 Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. pre 80 (1), pp. 016118. External Links: Link, Document Cited by: Figure 8, Figure 9, §F.2, §F.3.
 Community detection algorithms: a comparative analysis. Physical review E 80 (5), pp. 056117. Cited by: §F.1, §F.2, §I.3.
 Finding statistically significant communities in networks. PloS one 6 (4), pp. e18961. Cited by: §I.3, §I.5.
 Spontaneous recovery in dynamical networks. Nature Physics 10 (1), pp. 34–38. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Random walks and diffusion on networks. arXiv preprint arXiv:1612.03281. Cited by: §I.3.
 Principal component analysis in linear systems: controllability, observability, and model reduction. Automatic Control, IEEE Transactions on 26 (1), pp. 17–32. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Opinion dynamics with decaying confidence: application to community detection in graphs. Automatic Control, IEEE Transactions on (99), pp. 1862–1873. Cited by: §I.3.
 The university of south florida word association, rhyme, and word fragment norms. External Links: Link Cited by: §I.4.
 Finding and evaluating community structure in networks. pre 69 (2), pp. 026113. External Links: Link, Document Cited by: §I.3.
 Consensus and cooperation in networked multiagent systems. ieeep 95 (1), pp. 215–233. Cited by: Appendix E.
 Uncovering the overlapping community structure of complex networks in nature and society. nature 435 (7043), pp. 814–818. External Links: ISSN 00280836, Link, Document Cited by: §F.2, §I.3, §I.4.
 Model reduction via balanced state space representations. Automatic Control, IEEE Transactions on 27 (2), pp. 382–387. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Defining and identifying communities in networks. pnas 101 (9), pp. 2658 –2663. External Links: Link, Document Cited by: §F.1.
 Linear model reduction and solution of the algebraic riccati equation by use of the sign function†. International Journal of Control 32 (4), pp. 677–687. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Novel results on slow coherency in consensus and power networks. In European Control Conference, Zürich, Switzerland, Cited by: §I.2.
 Maps of random walks on complex networks reveal community structure. pnas 105 (4), pp. 1118 –1123. External Links: Link, Document Cited by: §I.3.
 Memory in network flows and its effects on spreading dynamics and community detection. Nature communications 5. Cited by: §II.
 Thermodynamic formalism. Vol. 112, AddisonWesley Reading. Cited by: §III.4.
 Markov dynamics as a zooming lens for multiscale community detection: non cliquelike communities and the fieldofview limit. PloS one 7 (2), pp. e32210. Cited by: §I.3, §I.5.
 Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (8), pp. 888–905. Cited by: §I.3, §I.5.
 Aggregation of variables in dynamic systems. Econometrica: journal of the Econometric Society, pp. 111–138. Cited by: §III.2.1, Severability of mesoscale components and local time scales in dynamical networks, Severability of mesoscale components and local time scales in dynamical networks.
 A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on Computing 42 (1), pp. 1–26. Cited by: §I.3.
 Exploring complex networks. nature 410 (6825), pp. 268–276. External Links: ISSN 00280836, Link, Document Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 From kuramoto to crawford: exploring the onset of synchronization in populations of coupled oscillators. Physica D: Nonlinear Phenomena 143 (1), pp. 1–20. Cited by: §I.2.
 Graph clustering via a discrete uncoupling process. SIAM Journal on Matrix Analysis and Applications 30 (1), pp. 121–141. Cited by: §I.3.
 Energy landscapes: calculating pathways and rates. International Reviews in Physical Chemistry 25 (12), pp. 237–282. Cited by: §I.1.
 Modular interdependency in complex dynamical systems. Artificial Life 11 (4), pp. 445–457. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 Collective dynamics of ‘smallworld’networks. nature 393 (6684), pp. 440–442. Cited by: Severability of mesoscale components and local time scales in dynamical networks.
 An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 15 (11), pp. 1101–1113. Cited by: §I.5.
Appendix A Proof of the local time scale separation theorem
Proof.
On the short time scale, the validity of the approximation is given by where for (and at ).
We notice that expresses the probabilities of escape within time . In fact retention as introduced in Eq. (4) can be expressed as
(17)  
for the choice of by matrix norm , given that all entries of are nonnegative. The short time scale local theorem results from which follows directly from the definition of severability.
Equation (16) generates an impulse response for . The difference , decays to zero exponentially provided that has dominant eigenvalue , eigenvectors (normalised so that the entries of row vector sum to one), (normalised so that ) and . Indeed this guarantees that behaves as for high .
If were perfectly stochastic, then and we would have and . As is almost stochastic, we expect that and are in for all , which we can prove indeed in the following way.
It is well known from PerronFrobenius theory that the dominant eigenvalue of a matrix with positive entries sits between the minimum and maximum row sum.Therefore , therefore . To evaluate , let us call the rownormalized matrix derived from , where every row is scaled so as to sum to one. Then the distance (in any norm) between any two rows of is in , by the definition of internal mixing in Eq. (5). The distance between and on the other hand is in , by definition of the retention. Therefore the distance between any two rows of is in , thus for some positive vector . Premultiplying this equality by , we get , thus . Postmultiplying instead by gets , as required.
Consider the remainder , thus from spectral decomposition properties. We find from the above that .
Choosing the matrix norm , which happens to be submultiplicative ( for all ), and using the identity , applied here to (for which the identity is valid because all eigenvalues of have absolute value ), we deduce a bound for the error on the impulse response:
using also and for any family of by nonnegative matrices .
We obtain the same result for , with . This is because , easily obtained from the fact . This approximation has the nice property of preserving the flow of probability: it describes the behaviour of a single supernode that aggregates all the input probability flows, expelling a small fraction of stored probability mass at every step to the nodes in the rest of the system, with weights given by . ∎
Therefore different approximations rule the shortterm (where a large retention matters) and longterm behaviours (where fast mixing also matters).
The proof highlights that the theorem is robust with respect to the choice of norms in the definition of severability and in the statement of the theorems, as it changes the hidden constants in a way that only depends on , the number of nodes. The specific definition chosen for severability in this article is motivated by simplicity, convenient computation and good practical results.
Appendix B From local to global: a proof of SimonAndo’s theorem
We now provide a proof of Simon and Ando’s global time scale theorem, stated in terms of severability of the components. Assume that a partition of the network into two components reveals a common time scale at which each severability is higher than . In the short run, every component can ignore the other and evolve separately, with a resulting error of order . Let us turn to the long run case.
For times , one may write (as the probability mass leaking from component , is in ). From high mixing, all rows of are close to one another, and close to a multiple of the dominant eigenvector (quasistationary distribution). The same holds for , close to a multiple of . The full state trajectory thus remains close to a trajectory of the form , and therefore it is enough to know the twodimensional trajectory (in fact onedimensional in the set of probability measures because subject to the constraint ) to reconstruct approximately . This means that , the image of the set of all probability measures under the map is invariant under , has diameter in the direction , and is ‘thin’ in that every point of is away from that direction.
Now consider the twodimensional dominant eigenspace of
, generated by the dominant left eigenvector (stationary distribution) of eigenvalue and the second left eigenvector (normalised to unit norm) of eigenvalue . The intersection of that space in the space of probability measures is onedimensional, of the form . On this eigenset, the dynamics takes the simple, exact form . Given that , we know that every point is approximated by the projection on . Therefore, the aggregated dynamics obtained in replacing and by their approximations in terms of and induces a onedimensional aggregated dynamics on the direction where and are replaced by their aggregation and , and the projected dynamics is given by .The trajectory initiated by a point , and the trajectory generated by the aggregated by its projection with this projected dynamics, remain close at all times.
On the other hand, any point in is close to a point in , and those two points remain close when both are iterated by , as contracts the 1distance (or total variation distance ; the induced 1norm of is 1).
Now we can conclude. The trajectory initiated by any point in (iterated by the exact dynamics ) remain close at all times from some trajectory in the eigenset , which itself remains close at all times from the projected, aggregated dynamics on the direction . Therefore any trajectory in generated by the actual dynamics is close at all times from the onedimensional dynamics on the aggregated quantities.
A closer look would show that the projected dynamics taken from the approximation given by the Local Time Scale separation theorem on each block separately, is not strictly identical to the aggregated dynamics presented here, but the trajectories generated by the two onedimensional dynamics are close at all times again.
In the above, all hidden constants in the notation are dependent on the specific norms used to measure distances, thus dependent on the number of nodes in each block, but on nothing else.
This completes the proof of Simon and Ando’s global time scale theorem, as given in Methods (see Section III.2.1), since arbitrarily small perturbations from a fixed, blockdiagonal transition matrix lead to arbitrarily high severability for arbitrarily large intervals of time.
The global nature of the theorem reveals itself in the fact that it needs simultaneously at time , a high mixing and a high retention in every component, thus shedding light on the conditions required for global timescale separation to hold.
See next Appendix for a simple example showing that and described in the text in the classic statement of SimonAndo’s theorem indeed depend on the global information .
Appendix C Global vs Local time scale separation theorems: an example
We apply our version of SimonAndo’s theorem (formulated in terms of severabilities) to a toy example of four nodes separated into two blocks, or mesoscale components. We then modify the example so that SimonAndo’s global theorem does not apply any more, but our local theorem still applies.
Consider
(18) 
which is close to the blockdiagonal matrix
(19) 
Let us compare the trajectories generated by the two initial conditions and , which both lead to the same aggregation (probability 1 in the first block).
If and , then it is clear that their trajectories will remain very different, even at the aggregated level, for a long time, as at times of the order , the first trajectory will be concentrated mostly on the second block (and so will the aggregated trajectory), while the second trajectory will stay confined in the first block.
If then at time at times of the order the first trajectory will be equally split between the two blocks, while the the first trajectory will be again confined in the first block.
Thus if we want to reach a given accuracy in SimonAndo’s theorem, for instance , we need to take of the order of , which shows the global dependency of on the ‘internal details’ of both blocks. The transition between the short time regime and the long time regimes occurs at time .
In our language, the severability of each block will be high (close to one) for times of between and (if indeed, otherwise the severability remains low at all times). We see indeed that these intervals will start overlapping at time . We can therefore apply our version of SimonAndo’s theorem, as we have simultaneous high severability in each block, for some time .
This also shows the intrinsically asymptotic nature of SimonAndo’s original theorem: as is decreased, the peak of severability for each block extends into a plateau stretching until , eventually forcing overlap of plateaus for small enough .
If we consider a slightly more complicated example:
(20) 
with then SimonAndo’s theorem cannot be formally applied, because it assumes a fixed blockdiagonal structure, and an arbitrarily small perturbation of it. We find the same conclusion in the language of severability. We see that the severability of each block peaks in the interval of times between and . As these intervals do not coincide, we indeed cannot apply our version of SimonAndo’s theorem.
Our local time scale theorem is nevertheless applicable to each block separately, and allows us to identify them as mesoscale components reaching high severability at different time scales. This shows that the local time scale theorem is of wider applicability and is a more relevant tool to identify components with dynamical coherence in a complex, heterogenous dynamical system.
Appendix D Computational aspects of Severability
d.1 Severability optimization flowchart
d.2 Computational Complexity
Let be the number of nodes in a graph. The severability of a component of size for a Markov time can be computed in time, where the cubic term comes from schoolbook matrix multiplication. Computation of mixing and retention given are both O() operations, so the total cost is dominated by matrix exponentiation.
The cost can be reduced using fast matrix multiplications techniques; for instance, using Strassen’s method, the total cost would only be . Alternatively, for large , matrix diagonalisation can be first employed, which makes the term negligible, giving a solution.
However, finding good components is more involved than simply computing the severability of a single set of nodes. The cost of the component optimisation algorithm described in Appendix D.1 is more difficult to characterise, as it depends strongly on the number of nodes neighbouring the putative component throughout the procedure. In pathological cases, the cost is , where is the maximum number of nodes permitted in the component, and is the size of the graph. Luckily, this upper bound only occurs in complete graphs, and so is of little relevance as most real networks are far sparser. However, by specifying the maximum component size , one can choose the maximal computational resources one wants to spend trying to find a component.
Potential optimizations include using a random walk to highlight likely candidate neighbours; for instance, by choosing only the
nodes that a random walker uniformly distributed in
would most likely walk to in the next step, or for removal of nodes, the nodes in that have the least density of probability. Such an algorithm would only cost , a significant improvement.More subtly, the computational cost of the matrix powers might also be reduced, by taking advantage of the fact that for each of the neighbouring components is effectively a rank2 perturbation of . Furthermore, as briefly mentioned in the discussion, severability is only one way of quantifying the mixing and retention of random walkers. Other, alternate, methods may be found that are quicker.
d.3 Benchmarking against community detection methods
Optimal component cover.
To compare against benchmarks with overlapping components, it is necessary to generate a list of components to cover the network. Simply taking the optimal components of each node is suboptimal, because then there are many duplicate components in the list. Instead, we chose the following naive method:

Let be the set of components; let be the set of nodes that have been assigned to at least one component.

Choose a node that is more connected to unassgined nodes than to nodes in . If no such node exists, end.

Find the optimal component for , and add to and the nodes in to .

Repeat from step 2.
Partitioning.
To compare severability with partitioning methods, it is necessary to turn the optimal component cover into a partition. To do so, first order the components of the cover arbitrarily. Where a node appears in multiple components, always choose the first component it appears in. This procedure is obviously dependent upon the ordering of the components; however, in networks with welldefined partition structure, this method works sufficiently well, as demonstrated in the LFR benchmark.
Choice of Markov Time.
For hierarchical networks, Markov time serves as a useful resolution parameter, allowing for severability to pick out optimal component structure at different levels. However, existing metrics Danon et al. (2005); Lancichinetti et al. (2009) require the selection of a single time . For partitions, this can be done by choosing a Markov time to minimise the number of singleton and overlapping vertices, but other could be chosen.
Quantifying similarity of partitions.
To compare partitions across different methods, normalised mutual information Danon et al. (2005) has been employed. To compare component covers, a generalisation of normalised mutual information that allows for overlapping nodes has been used Lancichinetti et al. (2009). We refer to the generalised variant as simply “normalised mutual information”, without loss of precision as only the generalised variant can be used in the benchmarks with overlapping components.
Appendix E Linearization and dicretization of a network of Kuramoto Oscillators
For power networks, in a number of situations of practical relevance Dorfler and Bullo (2011); Dörfler and Bullo (2012), e.g. when operating in the regime where frequencies have almost synchronized, the term can be reasonably neglected and one may linearize around the steady state trajectory to obtain
(21) 
where is defined as . The matrix is called the Laplacian of the network, as it plays the same role in graphs as the Laplace operator in continuous space. It is important to note that this equation also fully characterizes the consensus model of opinion dynamics OlfatiSaber et al. (2007), the heat equation, and random walkers diffusing through the network in continuous time Chung (1997); to wit, the represent, respectively, converging opinions, equalizing temperatures, or the expected fraction of walkers on node at any given time.
In order to build a discretetime random walk to which our framework can be directly applied, we choose a timestep , where is the smallest natural number such that a modified adjacency matrix is strictly positive. We then measure the severability of random walk dynamics on the graph defined by the modified adjacency matrix .
Appendix F Variants of the LFR benchmark
f.1 Unweighted, undirected, nonoverlapping LFR networks
We analyse a class of networks in which components are extremely unevenly sized, a situation in which many popular partitioning methods perform suboptimally. These multiscale networks are randomly constructed such that both degree and component size distributions follow power laws, with exponents and , respectively. Additional parameters include the total number of nodes , the average degree , the maximum degree , and the intrinsic parameter —not to be confused with the mixing which is part of severability. The fraction of links from a node to other nodes within the same component is given by Lancichinetti et al. (2008). Graph generation parameters were chosen at values typical of real networks: , , , , and Lancichinetti et al. (2008). Severability optimisation was performed with a maximum search size , and partitions were generated from the component cover.
As can be seen in Figure 7, severability performs well, always finding the natural component structure up to until around , when components are no longer defined in a strong sense Radicchi et al. (2004). That severability begins failing at is as expected and consistent with its definition, since at that point random walkers are as likely to escape during each step as to remain within any of the preseeded components. Recalling the definition, a component is defined as severable precisely when random walkers tend to stay and mix within it. Even so, the results are comparable to that of Infomap and modularity optimisation using simulated annealing, which have been found to be amongst the most successful methods for this benchmark Lancichinetti and Fortunato (2009).
f.2 Unweighted, undirected, overlapping LFR networks
Further extensions to the LFR benchmark were implemented to allow for components to overlap Lancichinetti and Fortunato (2009). In Figure 8, we compare the component covers from severability to the preseeded components. For the optimisation, the maximum search size was used for the upper and lower panels, respectively. The parameters chosen were identical to those used for the evaluation of kclique percolationPalla et al. (2005) in figure 6 of Ref. Lancichinetti and Fortunato (2009). Comparison with those results shows that severability performs comparably for the smaller component sizes, but significantly better for larger components.
f.3 Weighted, directed, overlapping LFR networks
Appendix G Image processing
The image in Fig. 5 of the main text was preprocessed by reducing the image resolution to a more convenient size and converting to a network using standard methods. Briefly, we connect only adjacent pixels (using the maximum metric) with link weight where is the difference in luminosity and is an adjustable parameter controlling the exponential weight decay. Here we used . Severability was optimized with Markov time , and maximum size .
In a postprocessing step, segments with mixing and retention
were removed as outliers, because at high Markov times they correspond to nearly disconnected components. If a component
was completed embedded in (), we keep only the one with higher severability. Communities were then inductively merged if they overlapped by more than 20 pixels until no more merges were possible. Merging is generally relevant when a feature of the network is much larger than the maximum search size; in this case the optimisation method gives overlapping patches of the background, which can then be pieced together. The segments were ordered by average luminosity, and the darker patches were assigned to the background.Appendix H Ringofrings
Appendix I Square lattice
As a negative control, It is instructive to consider a network in which there is clearly no structure. For that, we chose a regular 2D square lattice with each node connected to all 8 neighbours (including diagonal links). We visualise this using a uniformly coloured discrete image, in which each pixel is connected to all of the adjacent pixels with links of equal strength. As can be seen in the figure below, after accounting for symmetry considerations, all components found are transients, which is the expected result.
Additionally, these images strongly suggest a relationship between severability optimisation and diffusion. This is of course quite closely related both to the dependance of severability on random walk dynamics and to the optimisation procedure outlined in Appendix D.1. Along these lines, the optimisation procedure we outlined can be thought of as a modified random walk in which previously explored states are immediately accessible to the random walker, but probability barriers in the “energy landscape” are magnified.
Appendix J Ring of smallworlds: commutativity & locality
We further explore commutativity as in Figure 5, by looking at a ring of smallworld networks and comparing against OSLOM. We first generate smallworld networks using the WattsStrogatz model. Each node is first connected to its 2 neighbors on both sides. Then every edge is rewired with independent probability , but such that multiedges cannot exist, so a small world with a total of 5 nodes will not be rewired from a 5clique.
Note that whereas severability gives the same results when looking at a single smallworld network compared to a ring of four of them, OSLOM does not. Some of this is equivalent behaviour, as OSLOM chooses to not consider the entire network as a valid community. For the smallworlds of size 5, 10, and 20, OSLOM returns all individual nodes, which is as valid of an answer as the entire network. However, for the largest of the smallworld networks of size 40, OSLOM chooses to split it up into 3 pieces, which is not what it chose in the ring of 4 smallworlds. Severability always gives the smallworld at the appropriate times, as it is truly local.
Additionally, OSLOM demonstrates trouble when the scales of the networks are very different. It is unable to recover the 5clique of the smallest smallworld, despite the 5clique being recoverable when the other communities are of the same size. This comes from the imposition of the same resolution on all communities implicit in OSLOM.
Appendix K Coexistence of different timescales
Appendix L Word Association Extended
Figure 4 only depicted the components including “nature” and the orphans directly connected to that word. However, this is only a small snippet of the entire network. Here, we have displayed all the other components that have at least one link to “nature”, but do not include the word itself. As with Figure 4, the maximum search size and the Markov time .
Comments
There are no comments yet.