It appeared recently [10, 1, 3] that most real-world complex networks (like the internet topology, data exchanges, web graphs, social networks, or biological networks) have some non-trivial properties in common. In particular, they have a very low density, low average distance and diameter, an heterogeneous degree distribution, and a high local density (usually captured by the clustering coefficient ). Models of complex networks aim at reproducing these properties. Random 222In all the paper, random means uniformly chosen in a given class. graphs with given numbers of vertices and edges  fit the density and distance properties, but they have homogeneous degree distributions and low local density. Random graphs with prescribed distributions  and the preferential attachment model  fit the same requirement, with the degree distribution in addition, but they still have a low local density. As these models are very simple, formally and computationnaly tractable, and rather intuitive, there is nowadays a wide consensus on using them.
However, when one wants to capture the high local density in addition to previous properties, there is no clear solution. In particular, we are unable to construct a random graph with prescribed degree distribution and local density. As a consequence, many proposals have been made, e.g. [10, 3, 7, 6] , each with its own advantages and drawbacks. Among the most promising approaches, [6, 7] propose to model complex networks based on the properties of their clique incidence bipartite graph (see definition below). They show that generating bipartite graphs with prescribed degree distributions for bottom and top vertices and interpreting them as clique incidence graphs results in graphs fitting all the complex network properties listed above, including heterogeneous degree distribution and high local density.
, the neighbourhoods of vertices in the clique incidence bipartite graph of a real-world complex network generally have significant intersections: cliques strongly overlap and vertices belong to many cliques in common. On the opposite, when one generates a random bipartite graph with prescribed degree distributions, the obtained bipartite graph have much smaller neighbourhood intersections, almost always limited to at most one vertex (under reasonable assumptions on the degree distributions). Indeed, the process of generation based on the bipartite graph is equivalent to randomly choosing sets of vertices of the graph (with prescribed size distribution) that we all link together. Because of the constraints imposed on this size distribution by the low density of the graph , the probability of choosing several vertices in common between two such random sets tends to zero when the graph grows. As a consequence, the bipartite model fails in capturing the overlapping nature of cliques in complex networks. This leads in particular to graphs which have many more edges than the original ones (two cliques of sizelead to edges in the model graph, while the overlap between cliques make this number much smaller in the original graph).
Since the random generation process of the bipartite graph is not able to generate non-trivial neighbourhood intersections (that is having cardinality at least two), a natural direction to try to solve this problem consists in using a structure explicitly encoding these intersections. This can be done using a tripartite graph instead of a bipartite one: one may encode any bipartite graph into a tripartite one where is the set of non-trivial maximal bicliques (complete bipartite graphs having at least two bottom vertices and two top vertices) of and is obtained from by adding the edges between any biclique in and all the vertices of which belong to and removing the edges between vertices of . This process, which we call factorisation, can be iterated to encode any graph in a multipartite one where there are hopefully no non-trivial neighbourhood intersections.
In this paper, we show that this iterated factorising process do not end for some graphs. We then introduce variations of this base process and study them with regard to termination issue. Our main result is the design of such a process, which we call clean factorisation, that terminates on any arbitrary graph. In addition, we show that the multipartite graph on which terminates this process has remarkable combinatorial properties and is strongly related to a fundamental combinatorial object. Namely, its vertices are in bijection with the chains of the inf-semilattice of intersections of maximal cliques of the graph. Finally, we give an upper bound on the size and computation time of the graph on which terminates the iterated clean factorising process of , under reasonable hypothesis on the degree distributions of the clique incidence bipartite graph of ; therefore showing that this multipartite graph can be used in practice for complex network modelling.
Outline of the paper
We first give a few notations and basic definitions useful in the whole paper. We then consider the most immediate generalisation of the bipartite decomposition (Section 2) and show that it leads to infinite decompositions in some cases. We propose a more restricted version in Section 3, which seems to converge but for which the question remains open. Finally, we propose another restricted version in Section 4 for which we prove that the decomposition scheme always terminates.
Notations and preliminary definitions
All graphs considered here are finite, undirected and simple (no loops and no multiple edges). A graph having vertex set and edge set will be denoted by . We also denote by the vertex set of . The edge between vertices and will be indifferently denoted by or .
A -partite graph is a graph whose vertex set is partitioned into parts, with edges between vertices of different parts only (a bipartite graph is a -partite graph, a tripartite graph a -partite graph, etc): with . The vertices of , for any , are called the -th level of , and the vertices of are called its upper vertices.
denotes the set of maximal cliques of a graph , and the neighbourhood of a vertex in . When is -partite, we denote by , where , the set of neighbours of at level : . When the graph referred to is clear from the context, we omit it in the exponent. A biclique of a graph is a set of vertices of the graph inducing a complete bipartite graph. We denote the clique incidence graph of , i.e. its bipartite decomposition: where .
In all the paper, an operation will play a key role, we name it factorisation and define it generically as follows.
Definition 1 (factorisation)
Given a -partite graph with and a set of subsets of , we define the factorisation of with respect to as the -partite graph where:
is the set of maximal (with respect to inclusion) elements of ,
When , the factorisation is said to be effective.
In the rest of the paper, we will refine the notion of factorisation by using different sets on which is based the factorisation operation, and we will study termination of the graph series resulting from each of these refinements.
The converse operation of the factorisation operation is called projection.
Definition 2 (projection)
Given a -partite graph with , we define the projection of as the -partite graph where is the set of edges between any pair of vertices of having a common neighbour in .
It is worth to note that the projection is the converse of the factorisation operation independently from the set used in the definition of the factorisation.
2 Weak factor series
As explained before, our goal is to improve the bipartite model of [6, 7] in order to be able to encode non-trivial clique overlaps, that is overlaps whose cardinality is at least two. Since these overlaps in the graph result from the neighbourhood overlaps of the upper vertices, the purpose of the new model we propose is to encode the graph into a multipartite one by recursively eliminating all non-trivial neighbourhood overlaps of the upper vertices. We first describe this process informally, then give its formal definition and exhibit an example for which it does not terminate.
Neighbourhood overlaps of the upper vertices in a bipartite graph may be encoded as follows. For any maximal333The reason why one would take the maximal bicliques is simply to try to encode all neighbourhood overlaps using a reduced number of new vertices. Notice that there are other ways to reduce even more the number of new vertices created, for example by taking a biclique cover of the edge set of . This is however out of the scope of this paper. biclique that involves at least two upper vertices and two other vertices, we introduce a new vertex in a new level , add all edges between and the elements of , and delete all the edges of , as depicted on Figure 1. We obtain this way a tripartite graph which encodes (one may obtain from by the projection operation) and which has no non-trivial neighbourhood overlaps in its first two level ( and ).
This process, which we call a factorising step, may be repeated on the tripartite graph obtained (as well as on any multipartite graph) by considering the bipartite graph between the upper vertices and the other vertices of the tripartite (or multipartite) graph, see Figure 1. All -partite graphs obtained along this iterative factorising process have no non-trivial neighbourhood overlap between the vertices of their first levels. Then, the key question is to know whether the process terminates or not.
We will now formally define the factorising process and show that it may result in an infinite sequence of graphs. In the following sections, we will restrict the definition of the factorising step in order to always obtain a finite representation of the graph.
Definition 3 ( and weak factor graph)
Given a -partite graph with , we define the set as:
The weak factor graph of is the factorisation of with respect to .
The weak factorisation admits a converse operation, called projection, which is defined in Section 1. It implies that the factor graph of , as well as its iterated factorisations, is an encoding of .
The weak factor series defined below is the series of graphs produced by recursively repeating the weak factorising step.
Definition 4 (weak factor series )
The weak factor series of a graph is the series of graphs in which is the clique incidence graph of and, for all , is the weak factor graph of : . If for some the weak factor operation is not effective then we say that the series is finite.
Figure 1 gives an illustration for this definition. In this case, the weak factor series is finite. However, this is not true in general; see Figure 2. Intuitively, this is due to the fact that a vertex may be the base for an infinite number of factorising steps (like vertex in the example of Figure 2). The aim of the next sections is to avoid this case by giving more restrictive definitions.
3 Factor series
In the previous section, we have introduced weak factor series which appear to be the most immediate extension of bipartite decompositions of graphs. We showed that, unfortunately, weak factor series are not necessarily finite. In this section, we introduce a slightly more restricted definition that forbids the repeated use of a same vertex to produce infinitely many factorisations (as observed on the example of Figure 2). However, we have no proof that it necessarily gives finite series, which remains an open question.
Definition 5 ( and factor graph)
Given a -partite graph with , we define the set as:
The factor graph of is the factorisation of with respect to .
This new definition results from the restriction of the weak factor definition by considering only sets such that the vertices of have at least two common neighbours at level . In this way, the creation of new vertices depends only on the edges between levels and (even though some other edges may be involved in the factorisation operation). Thus, a vertex will not be responsible for infinitely many creations of new vertices. This restriction also plays a key role in the convergence proof of the clean factor series, defined in next section. That is why we think it may be possible that it is sufficient to guarantee the convergence of the factor series, but we could not prove it with this sole hypothesis.
4 Clean factor series
In the two previous sections, we studied two multipartite decompositions of graphs. The first one is very natural but it does not lead to finite objects. The second one remains very general but we were unable to prove that it leads to finite object. As a first step towards this goal, we introduce here a more restricted definition for which we prove that the decomposition is finite. This new combinatorial object has many interesting features, and we consider it worth of study in itself. In particular, we prove that it is a decomposition of a well-known combinatorial object: the inf-semi-lattice of the intersections of maximal cliques of . This correspondence allows to calculate quantities of graph from elements of . One of such results is an explicit formula (not presented here) giving the number of triangles in , which is a very important parameter of complex networks.
The clean factor graph (defined below) is a proper restriction of the factor graph in which the vertices at level used to create a new vertex at level are required to have exactly the same neighbourhoods at all levels strictly below level , except at level . Intuitively, this requirement implies that the new factorisations push further the previous ones and are not simply a rewriting at a higher level of a factorisation previously done. The particular role of level will allow us to differentiate vertices of the multipartite graph by assigning them sets of nodes at level . Let us now formally define the clean factor graph and its corresponding series.
Definition 6 ( and clean factor graph)
Given a -partite graph with , we define the set as:
The clean factor graph of is the factorisation of with respect to .
Definition 7 (clean factor series )
The clean factor series of a graph is the series of graphs in which is the clique incidence graph of , , and, for all , is the clean factor graph of : . If for some the clean factor operation is not effective then we say that the series is finite.
The rest of this section is devoted to proving the following theorem.
For any graph , the clean factor series is finite.
Let be the clean factor series of . For any , any and any , we denote by the set and by the set .
Remark: In the rest of the paper, when referring to Definition 6, it is worth keeping in mind that for and , the sets and used in the definition are precisely the sets and .
We denote by the set ; and by the set . For any , we denote by the set . We also denote by the set .
It is clear from the definition that is closed under intersection, this is also the case for .
In all the ’s of the clean factor series, vertices at level correspond to vertices of , vertices at level correspond to the maximal cliques of , that is for any . That is the reason why in the following we do not distinguish between the elements of and those of . We will show that the vertices of correspond to the elements of . Indeed, is a bijection from to . First, for any , by definition, , then belongs to . Let . Let us show that is a maximal element of . First note that and then . Now, if you augment with an element of , since , will decrease. Thus is maximal and there is a corresponding such that . Furthermore, it is straightforward to see that the maximality of implies that . Which proves the uniqueness of the such that .
Let be a graph and let be its clean factor series. The characterising sequence of a vertex , with , is defined by:
is the unique element444By convention, when . of such that .
Note that is properly defined. Indeed, since is closed under intersection, a simple recursion would show that for all and for all , .
Theorem 2 is our main combinatorial tool for proving the finiteness of the clean factor series (Theorem 1). Its proof is rather intricate, but it gives much more information than the finiteness of the series. By associating a sequence of sets to each vertex in levels greater than in the multipartite graph, we show that each such vertex corresponds to a chain of the inf-semi-lattice of the intersections of maximal cliques of . The correspondence thereby highlighted between this very natural structure and the multipartite factorisation scheme we introduced is non-trivial and of great combinatorial interest.
Let be a graph and its clean factor series. We then have the following properties:
, , and if , and if ,
, , ,
, , .
For lack of space, we do not give the proof of Theorem 2. It can be made by recursion on . The key of our proof is that we could characterise, for any , the vertices at level involved in the creation of a new vertex at level : roughly, they are those vertices such that there exist and is such that . Then, the characterising sequence of the created vertex is . Please refer to the webpages of the authors for a complete version of the paper including proof of Theorem 2.
Theorem 1 is a corollary of Theorem 2. Indeed, Theorem 2 states that the characterising sequence of any node at level is such that . The strict inclusions imply that the length of the characterising sequence, which is equal to , cannot exceed the height of the inclusion order of elements of . Since , necessarily is empty. It follows that the clean factor series is finite and stops at rank at most .
Size of the multipartite model
The size of the multipartite graph obtained at termination of the clean factor series can be exponential in theory, as the number of maximal cliques itself may be exponential. But in practice, its size is quite reasonable and it can be computed efficiently. Theorem 3 below shows that under reasonable hypotheses, the size of only linearly depends on the number of vertices of , with a multiplicative constant reflecting the complexity of imbrication of maximal cliques.
If every vertex of is involved in at most maximal cliques and if every maximal clique of contains at most vertices, then .
This upper bound can be obtained by bounding the number of sequences in two different ways: either by consedering sequences ending with a fixed set , which are obtained by starting from set and removing vertices one by one; or by considering sequences starting with a fixed set , which are obtained by starting from a maximal clique containing and intersecting it by one more maximal clique containing at each step.
In practice, parameters and are quite small, as they are often constrained by the context itself independently from the size of the graph. Then, the size of is small. An important consequence is that, using algorithms enumerating the cliques or bi-cliques of a graph (see  for a recent survey), can be computed efficiently, that is in low polynomial time, since the number of maximal cliques is small.
Many questions arise from our work. The first one is to find minimal restrictions of the factorising process that guarantee termination. On the other hand, for processes that do not always terminate, one may determine on which classes of graphs those processes terminate. Another question of interest is the termination speed, as well as the size of the obtained encoding: proving upper bounds with softer hypothesis would be desirable.
Finally, the use of multipartite decompositions as models of complex networks, in the spirit of the bipartite decomposition, asks for several questions. In this context, the key issue is to generate a random multipartite graph while preserving the properties of the original graph. To do so, one has to express the properties to preserve as functions of basic multipartite properties (like degrees, for instance) and to generate random multipartite graphs with these properties. This is a promising direction for complex network modelling, but much remains to be done.
Acknowledgements. We warmly thank Jean-Loup Guillaume, Stefanie Kosuch and Clémence Magnien for helpful discussions.
-  (2002) Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47. Cited by: §1.
-  (1999) Emergence of scaling in random networks. Science 286, pp. 509–512. Cited by: §1.
-  (2002) Evolution of networks. Advances in Physics 51. Cited by: §1, §1.
-  (1959) On random graphs I. Publications Mathematics Debrecen 6, pp. 290–297. Cited by: §1.
-  (2009) Enumeration aspects of maximal cliques and bicliques. Discrete Applied Mathematics 157 (7), pp. 1447 – 1459. Note: External Links: Cited by: §4.
-  (2004) Bipartite structure of all complex networks. Information Processing Letters (IPL) 90 (5), pp. 215–221. Cited by: §1, §1, §2.
-  (2006) Bipartite graphs as models of complex networks. Physica A 371, pp. 795–813. Cited by: §1, §2.
-  (2008) Basic notions for the analysis of large two-mode networks. Social Networks 30 (1), pp. 31–48. Cited by: §1.
-  (1995) A critical point for random graphs with a given degree sequence. Random Structures and Algorithms. Cited by: §1.
-  (1998) Collective dynamics of small-world networks. Nature 393, pp. 440–442. Cited by: §1, §1.