The Metric Space of Networks

We study the question of reconstructing a weighted, directed network up to isomorphism from its motifs. In order to tackle this question we first relax the usual (strong) notion of graph isomorphism to obtain a relaxation that we call weak isomorphism. Then we identify a definition of distance on the space of all networks that is compatible with weak isomorphism. This global approach comes equipped with notions such as completeness, compactness, curves, and geodesics, which we explore throughout this paper. Furthermore, it admits global-to-local inference in the following sense: we prove that two networks are weakly isomorphic if and only if all their motif sets are identical, thus answering the network reconstruction question. Further exploiting the additional structure imposed by our network distance, we prove that two networks are weakly isomorphic if and only if certain essential associated structures---the skeleta of the respective networks---are strongly isomorphic.

Authors

• 10 publications
• 32 publications
04/06/2020

Weakly and Strongly Aperiodic Subshifts of Finite Type on Baumslag-Solitar Groups

We study the periodicity of subshifts of finite type (SFT) on Baumslag-S...
11/21/2021

Strictification of weakly stable type-theoretic structures using generic contexts

We present a new strictification method for type-theoretic structures th...
06/21/2021

Recolouring weakly chordal graphs and the complement of triangle-free graphs

For a graph G, the k-recolouring graph ℛ_k(G) is the graph whose vertice...
02/17/2020

On the Approximability of Weighted Model Integration on DNF Structures

Weighted model counting admits an FPRAS on DNF structures. We study weig...
12/18/2018

Qualitative graph limit theory. Cantor Dynamical Systems and Constant-Time Distributed Algorithms

The goal of the paper is to lay the foundation for the qualitative analo...
11/20/2019

02/12/2021

Leveraging Global Parameters for Flow-based Neural Posterior Estimation

Inferring the parameters of a stochastic model based on experimental obs...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

One of the prevalent hypotheses used in systems biology and network analysis is that complex networks are assembled from simpler subnetworks called motifs [shen2002network, sporns2004motifs, alon2007network, milo2002network, alon2006introduction]. For example, motifs have been used to characterize transcription regulation networks, protein-protein interactions, and to simulate network datasets that resemble real brain networks across a variety of structural measures [sporns2004motifs, yeger2004network, konagurthu2008origin]. These considerations motivate the following theoretical question:

Question 1.

Is it possible to reconstruct, up to isomophism, a network from the knowledge of its subnetworks?

In this paper we provide an answer to the question above. The motivation for our answer to Question (1) is rooted in the metric space literature, specifically a construction called a curvature class due to Mikhail Gromov [gromov-book, 1.19+]. Given a metric space and , the th curvature class of , denoted , is the collection of distance matrices that can be realized by -tuples of points in . Gromov proved that two compact metric spaces and are isometric (i.e. related by a distance-preserving bijection) if and only if for all [gromov-book, 3.27]. Thus the knowledge of (the countably many) curvature classes is sufficient to recover the full structure of the metric space (which may be uncountable). Our strategy is to prove an analogous result in the setting of general networks.

In order to be able to reason about and eventually answer Question (1), we first need to clarify several concepts. For example: what is a sufficiently general definition of network, what is a suitable notion of isomorphism between two networks, and how can we relate networks to metric spaces?

Networks may have asymmetric edge relations and data attached to each node, so intuitively, they should be represented as edge-weighted directed graphs with self-loops, where the edge weights are allowed to be arbitrary real numbers. Such a model for a network can alternatively be expressed as a square matrix of real values, i.e. the adjacency matrix of the graph. Thus when dealing with finite networks, a reasonable model for a network is a pair , where is a finite set of nodes and is a weight function, i.e. the edge weights. Real-world networks that arise in computational settings are necessarily finite, but when they are very large, they may be modeled as objects with infinite cardinality. To accommodate this possibility while still maintaining some control over the underlying node set, we choose to model a general network as follows:

Definition 1.

A network is a pair where is a first countable topological space and is a continuous function. The collection of all networks is denoted .

We also consider the subcollection of compact networks, which satisfy the additional restriction that the underlying set is compact. We denote the collection of all compact networks by , and the subcollection of finite networks by . Notice that such a network model is a generalization of a metric space: for to be a metric space, needs to satisfy additional assumptions such as symmetry and triangle inequality, and the underlying topology is assumed to be the metric topology generated by open balls in .

Recall that a space is first countable if each point in the space has a countable local basis (see [counterexamples, p. 7] for more details). First countability is a technical condition guaranteeing that when the underlying topological space of a network is compact, it is also sequentially compact.

Interestingly, the model that we have just described has already appeared in the applied mathematics literature, at least in the setting of finite networks. In recent years, various authors have used the model of in applying topological data analysis methods such as hierarchical clustering and persistent homology to network data [carlsson2013axiomatic, clust-net, nets-asilo, dowker-arxiv, dowker-asilo, pph]. An additional ingredient in each of these papers was a notion of network distance between objects in . However, until recently the theoretical foundations of this network distance were unknown. In [dn-part1], we generalized the network distance to all of , studied its computational aspects, and developed a notion of isomorphism called weak isomorphism that turned out to be compatible with . These notions of and weak isomorphism are key players in our search for a motif reconstruction theorem. In this paper, we continue laying down the foundations of . In particular, we complete our answer to the following question, which we had raised and partially answered in our previous work:

Question 2.

What is the “continuous limit” of a convergent sequence of finite networks?

Returning to the question about motif reconstruction, recall that one natural notion of isomorphism in the network setting is the standard notion of graph isomorphism, which we call strong isomorphism in our context. Specifically, two networks and are said to be strongly isomorphic, denoted , if they are related by a weight-preserving bijection, i.e. a map such that for all . The notion of weak isomorphism is a relaxation of this condition.

Definition 2.

Two networks and are weakly isomorphic, denoted , if there exists a set and surjections such that:

 ωX(φX(v),φX(v′))=ωY(φY(v),φY(v′))for all v,v′∈V.

With regards to subnetworks: we organize all the motifs present in a given network into motif sets. For each , the -motif set is the collection of weight matrices obtained from -tuples of points in , possibly with repetition. We formalize this next.

Definition 3 (Motif set).

For each and each , define to be the map , where the notation refers to the square matrix associated with the sequence. Note that is simply a map that sends each sequence of length to its corresponding weight matrix. Let denote the closed subsets of . Then let denote the map defined by

 (X,ωX)↦{ΨnX(x1,…,xn):x1,…,xn∈X}.

We refer to as the -motif set of . Notice that the image of is closed in because each coordinate is the continuous image of the compact set under , hence the image of is compact in and hence closed.

It is easy to come up with examples of networks that share the same motif sets, but are not strongly isomorphic. Instead, we hypothesize that if two networks share the same motif sets, then they are weakly isomorphic, i.e. are at -distance zero. In pursuing this idea, we develop the theory of throughout this paper, ultimately answering both Questions (1) and (2). At the same time, we find a surprising answer to the following question relating weak and strong isomorphism:

Question 3.

Does weak isomorphism between two networks imply that some essential substructures are strongly isomorphic?

1.1. Contributions and organization of the paper

In this paper, we develop the theory of the network distance , which lies at the core of Questions (1-3) posed above. We prove that the metric space of weak isomorphism classes of compact networks endowed with is complete (§LABEL:sec:completeness). Thus Question (2) can be answered as follows: a convergent sequence of finite networks limits to a compact network, i.e. a compact, first countable topological space equipped with a continuous weight function. We show that the pseudometric space , while not compact, contains many precompact families (§LABEL:sec:completeness), and moreover is geodesic (§LABEL:sec:geodesic).

We define a construction for any network called a “skeleton”. Using properties of skeleta, we show that for two compact networks (with some additional topology assumptions), the following are equivalent: weak isomorphism between the two networks, strong isomorphism between their skeleta, and equality of their motif sets. In other words, such networks can be recovered from their motif sets. This forms our answer to Question (1) (§LABEL:sec:compact).

1.2. Results used from prior work

We adopt our definition of a network as a first countable topological space with a continuous weight function from [dn-part1]. There we also proved the following result about the pseudometric structure of :

Theorem 1 (Weak isomorphism in compact networks).

The collection of compact networks is a pseudometric space when equipped with . Moreover, for any , we have if and only if and are weakly isomorphic.

We already exploited motif sets to provide computable lower bounds for in [dn-part1]. The main result enabling this is the stability theorem that we explain next. For each , we write to denote the Hausdorff distance between closed subsets of equipped with the metric.

Theorem 2 (Stability of motif sets).

Let . For any ,

 dn(Mn(X),Mn(Y))≤2dN(X,Y).

1.3. Related literature

In the graph theory literature, the problem of deciding how much information is encoded in the subgraph structure of a graph has a long history. Boutin and Kemper outline some of these efforts in [boutin2007lossless], and also prove, using combinatorial methods, that a large class of graphs can be fully determined from the distribution of their subtriangles. In our language, this is analogous to saying that implies , where equality is in the sense of graph isomorphism. We move away from the combinatorial approach, and reformulate the problem to find when implies , where is a certain (pseudo)metric on the space of all networks. This converts the content of Question (1) to a question about finding an appropriate network similarity measure.

The network distance at the core of this paper is structurally based on the Gromov-Hausdorff distance [gromov-book, gromov1981structures] proposed by Mikhail Gromov in the early 1980s. Beyond its origins in metric geometry [burago, petersen2006riemannian], the Gromov-Hausdorff distance between metric spaces has found applications in the context of shape and data analysis [dgh-sgp, dghlp-focm, dgh-props, clust-um]. The close analogy with highlights some of the merits of our definition of

: it yields a very natural completion of the space of weak isomorphism classes of finite networks, and admits geodesics interpolating between any two networks. The analogous results in the setting of compact metric spaces can be found in

[petersen2006riemannian, ivanov2015gromov, dgh-note].

1.4. Notation and basic terminology

We will denote the cardinality of any set by . For any set we denote by the collection of all finite subsets of . For a topological space , we write to denote the closed subsets of . For a given metric space , the Hausdorff distance between two nonempty subsets is given by:

We will denote the non-negative reals by . The all-ones matrix of size will be denoted . Given a function between two sets and , the image of will be denoted or . Given a topological space and a subset , we will write to denote the closure of .

2. Networks: Examples and Constructions

2.1. Examples of motif sets

We begin with some examples of networks and their motif sets. We also provide examples of infinite networks that fall within the framework of and .

Example 3.

We first introduce networks with one or two nodes (see Figure 1).

• A network with one node can be specified by , and will be denoted by . We have if and only if .

• A network with two nodes will be denoted by , where . Given , if and only if there exists a permutation matrix of size such that .

• Any -by- matrix induces a network on nodes, which we refer to as . Notice that if and only if and there exists a permutation matrix of size such that

Remark 4.

Already from Figure 1, it is evident that if , then and are weakly isomorphic. This can be generalized as follows. Let and suppose is a surjective map such that for all . Then and are weakly isomorphic, i.e. . This result follows from Definition 2 by: (1) choosing , (2) letting be the identity map, and (3) letting .

Example 5.

Consider the two networks from Figure 1. Then we have and

 M2(N2(Ω))={(αααα),(ββββ),(αδγβ),(βγδα)},M2(N1(α))={(αααα)}.

In line with our discussion in the introduction, we wish to examine the extent to which motif sets determine the structure of a network. To proceed pedagogically, we begin with the following:

Approach 1 (strong isomorphism and motif sets).

Let . Then for all if and only if

This approach is not immediately fruitful: by setting in Example 5 (also see Remark 4), we see that and have the same motif sets, but are clearly not related by a bijection. The strong isomorphism approach does work with some strong additional assumptions (Theorem 6). More importantly, the strong isomorphism approach works in the setting of compact metric spaces, and making it work in the network setting provides motivation for some of our main results.

The failure of motif sets in characterizing strongly isomorphic networks leads one to hope that weakly isomorphic networks might be an appropriate object of study.

Approach 2 (Weak isomorphism and motif sets).

Let . Then for all if and only if

One of our main results is that the preceding statement is in fact true. The approach via weak isomorphism will be the focus of §LABEL:sec:compact.

We conclude this section by showing that with additional assumptions of genericity, the motif sets contain all the information of a finite network up to strong isomorphism. To say that a finite network is generic means if and only if and .

Theorem 6.

Let . Suppose and are generic, and for each . Then .

Proof of Theorem 6.

Since and are generic, we have and . Thus . Let . For any with , define:

 D(X) ={ΨnX[(xi)ni=1]:xi≠xk if % i≠k} R(X) ={ΨnX[(x′i)ni=1]:∃j≠k,x′j=x′k}

Then we may write , and

In particular, . We claim that , and thus . Let . By genericity, each entry in is distinct. Also, . So for some sequence in . Suppose . Then there exist such that . Thus the term () appears in with multiplicity greater than 1. This is a contradiction, so . By a symmetric argument, we conclude . Next, let be a sequence of distinct elements in . Note that includes each element of . Since , there exists a sequence of distinct elements such that Now define a bijection by . This gives us the required (strong) isomorphism. ∎

Interested readers should look at [boutin2007lossless], where Boutin and Kemper give conditions under which complete, undirected, weighted graphs with self-loops are determined by the distributions of their three-node subgraphs. In our language, the result by Boutin and Kemper would be similar to the implication .

2.2. Examples of infinite networks: the directed circles

The collections , , and contain the collections of all metric spaces, compact metric spaces, and finite metric spaces, respectively. It is interesting to identify networks in these families that are not just metric spaces. In §2.1, we provided some examples of finite, asymmetric networks. Here we provide examples of infinite, asymmetric networks in both the compact and noncompact cases. These constructions appear in detail in [dn-part1]. See Figure 2 for an illustration.

Define For any , define , with the convention . Then is the counterclockwise geodesic distance along the unit circle in from to . Next for each , define

 ω→S1(eiθ1,eiθ2):=→d(θ1,θ2).

Now fix . For each , define

 ω→S1,ρ(eiθ1,eiθ2):=min(→d(θ1,θ2),ρ→d(θ2,θ1)).

The pair equipped with the discrete topology is a directed circle network, and the pair equipped with the standard topology of is a directed circle network with reversibility . The difference is that allows for travel only in the counterclockwise direction, whereas allows for travel in the clockwise direction (see Figure 2). It turns out that equipped with the discrete topology is a noncompact asymmetric network, and equipped with the standard topology on is a compact asymmetric network [dn-part1].

2.3. Skeletons and blow-up networks

As we saw in the simple examples discussed above, strong isomorphism implies weak isomorphism, and weak isomorphism does not in general imply strong isomorphism. One may nevertheless wonder whether strong and weak isomorphism may be related in the sense of Question (3) posed above. We show that the answer to this question is positive. The following definitions enable us to formulate the appropriate statement.

Definition 4 (Automorphisms).

Let . We define the automorphisms of to be the collection

 Aut(X):={φ:X→X:φ a weight % preserving bijection}.
Definition 5 (Poset of weak isomorphism).

Let . Define a set as follows:

 p(X):={(Y,ωY)∈CN: there exists a % surjective, weight preserving map φ:X→Y}.

Next we define a partial order on as follows: for any ,

 (Y,ωY)⪯(Z,ωY)⟺ there exists a surjective, % weight preserving map φ:Z→Y.

Then the set equipped with is called the poset of weak isomorphism of .

Definition 6 (Terminal networks in CN).

Let . A compact network is terminal if:

1. For each , there exists a weight preserving surjection .

2. Let . If and are weight preserving surjections, then there exists such that .