Community detection in networks via nonlinear modularity eigenvectors

08/18/2017 ∙ by Francesco Tudisco, et al. ∙ Universität Saarland 0

Revealing a community structure in a network or dataset is a central problem arising in many scientific areas. The modularity function Q is an established measure quantifying the quality of a community, being identified as a set of nodes having high modularity. In our terminology, a set of nodes with positive modularity is called a module and a set that maximizes Q is thus called leading module. Finding a leading module in a network is an important task, however the dimension of real-world problems makes the maximization of Q unfeasible. This poses the need of approximation techniques which are typically based on a linear relaxation of Q, induced by the spectrum of the modularity matrix M. In this work we propose a nonlinear relaxation which is instead based on the spectrum of a nonlinear modularity operator M. We show that extremal eigenvalues of M provide an exact relaxation of the modularity measure Q, however at the price of being more challenging to be computed than those of M. Thus we extend the work made on nonlinear Laplacians, by proposing a computational scheme, named generalized RatioDCA, to address such extremal eigenvalues. We show monotonic ascent and convergence of the method. We finally apply the new method to several synthetic and real-world data sets, showing both effectiveness of the model and performance of the method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper is concerned with the problem of finding leading communities in a network. A community is roughly defined as a set of nodes being highly connected inside and poorly connected with the rest of the graph. Identifying important communities in a complex network is a highly relevant problem which has applications in many disciplines, such as computer science, physics, neuroscience, social science, biology, and many others, see e.g. [43, 19, 51, 54].

In order to address this problem from the mathematical point of view one needs a quantitative definition of what a community is. To this end several merit functions have been introduced in the recent literature [24]. A very popular and successful idea is based on the concept of modularity introduced by Newman and Girvan in [44].

The modularity measure of a set of nodes  in a graph quantifies the difference between the actual and expected weight of edges in , if edges were placed at random according to a random null model. A subgraph is then identified as a community if the modularity measure of is “large enough”.

The modularity-based community detection problem thus boils down to a combinatorial optimization problem, that is reminiscent of another famous task known as graph partitioning. Graph partitioning can be roughly described as the problem of finding a

-partition of the set of vertices of , where is a given number of disjoint sets to be identified.

Modularity-based community detection does not prescribe the number of subsets into which the network is divided, and it is generally assumed that the graph is intrinsically structured into groups that are delimited to some extent. The main objective is to reveal the presence and the consistency of such groups.

As modularity-based community detection is known to be NP-hard [7], different strategies have been proposed to compute an approximate solution. Linear relaxation approaches are based on the spectrum of specific matrices (as the modularity matrix or the Laplacian matrix) and have been been widely explored and applied to various research areas, see e.g. [21, 40, 41, 52]

. Computational heuristics have been developed for optimizing directly the discrete quality function (see e.g.

[45, 36]), including for example greedy algorithms [14], simulated annealing [28] and extremal optimization [18]. Among them, the locally greedy algorithm known as Louvain method [5] is arguably the most popular one. In recent years, and mostly in the context of graph partitioning, nonlinear relaxation approaches have been proposed (see for instance [8, 9, 29, 56]). In the context of community detection, a nonlinear relaxation based on the Ginzburg-Landau functional is considered for instance in [32, 6], where it is shown to be -convergent to the discrete modularity optimum.

In this paper, we propose two nonlinear relaxations of different modularity set functions, and prove them to be exact, in the sense that the maximum values of our proposed nonlinear relaxations are equal to the maximum of the corresponding modularity set functions. More precisely, we introduce two nonlinear relaxations that are based on a nonlinear modularity operator . We associate to two different Rayleigh quotients, inducing two different notions of eigenvalues and eigenvectors of and we prove two Cheeger-type results for that show that the maximal eigenvalues of associated to such Rayleigh quotients coincide with the maxima of two different modularity measures of the graph. Interestingly enough we observe that the modularity matrix completely overlooks the difference between these two modularity measures, which instead allows to address individually.

Although nonlinearity generally prevents us to compute the eigenvalues of , the optimization framework proposed in [30] allows for an algorithm addressing the minimization of positive valued Rayleigh quotients. As the Rayleigh quotients we associate to attain positive and negative values, here we extend that method to a wider class of ratios of functions, proving monotonic descent and convergence to a nonlinear eigenvector.

The paper is organized as follows: Section 2 gives an overview of the concept of modularity measure, modularity matrix and the Newman’s spectral method for community detection, as proposed in [44]. In Section 3 we define the nonlinear modularity operator and the associated Rayleigh and dual Rayleigh quotients. We show that both ensure an exact relaxation of suitable modularity-based combinatorial optimization problems on the graph. In Section 4 we propose a nonlinear spectral method for community detection in networks through the eigenvectors of the nonlinear modularity and, finally, in Section 5 we show extensive results on synthetic and real-world networks highlighting the improvements that nonlinearity ensures over the standard linear relaxation approach.

1.1 Notation

Throughout this paper we assume that an undirected graph is given, with the following properties: is the vertex set equipped with the positive measure ; is the edge set equipped with positive weight function . The symbol denotes the set of positive numbers. The vertex set is everywhere identified with . We denote by the weighted scalar product . Similarly, for we let be the weighted norm on .

Given two subsets , the set of edges between nodes in and is denoted by . When and coincide we use the short notation . The overall weight of a set is the sum of the weights in the set, thus for , we write

Special notations are reserved to the case where is the whole vertex set. Precisely, is the degree of the node , and the volume of the set .

For a subset we write to denote the complement and we let

be the characteristic vector

if and otherwise.

2 Modularity measure

A central problem in graph mining is to look for quantitative definitions of community. Although there is no universally accepted definition and a variety of merit functions have been proposed in recent literature, the global definition based on the modularity quality function proposed by Newman and Girvan [44] is an effective and very popular one [24]. Such measure is based on the assumption that is a community of nodes if the induced subgraph contains more edges than expected, if edges were placed at random according to a random graph model (also called null-model).

Let be the expected graph of the random ensemble , with weight measure . The definition of modularity of , is as follows

(1)

so that if the actual weight of edges in exceeds the expected one in . A set of nodes is a community if it has positive modularity, and the associated subgraph is called a module. A number of different null-models and variants of the modularity measure have been considered in recent literature, see e.g. [22, 47, 2, 50].

An alternative formulation relates with a normalized version of the modularity, where the measure of the set is used as a balancing function, for different choices of the measure . We define the normalized modularity of as follows

(2)

As we discuss in Section 5, the use of such normalized version can help to identify small group of nodes as important communities in the graph, whereas it is known that the standard (unnormalized) measure tends to overlook small groups [25].

The definition of modularity of a subset is naturally extended to the measure of the modularity of a partition of , by simply looking at the sum of the modularities: given a partition of , its modularity and normalized modularity are defined respectively by

Clearly the normalization factor does not affect the community structure and is considered here for compatibility with previous works. When the partition consists of only two sets we use the shorter notation and for and , respectively.

The definition and effectiveness of the modularity measure (1) highly depends on the chosen random model . A very popular and successful one, considered originally by Newman and Girvan in [44], is based on the Chung-Lu random graph (see f.i. [12, 1, 46]) and its weighted variant [21]. For the sake of completeness, we recall hereafter the definition of weighted Chung-Lu model.

Definition 1

Let , and let

be a nonnegative random variable parametrized by the scalar parameter

, whose expectation is . We say that a graph with weight function follows the -weighted Chung-Lu random graph model if, for all , are independent random variables distributed as where .

The unweighted model coincides with the special case of where

is the Bernoulli trial with success probability

. On the other hand, if has a continuous part, then may contain graphs with generic weighted edges. In any case, as in the original Chung-Lu model, if is a random graph drawn from then the expected degree of node is .

Given the degree sequence of the actual network , we assume from now on that the null-model follows the weighted Chung-Lu random graph model above, with . Note that, under this assumption, the modularity measure (1) becomes and we have, in particular, , for any .

The main contributions we propose in this work deal with the leading module problem, that is the problem of finding a subset having maximal modularity. Due to the identity , such problem coincides with finding the bi-partition of the vertex set, having maximal modularity. Note that, for the special case of partitions consisting of two sets, the corresponding modularity and normalized modularity set functions are

(3)

2.1 The modularity matrix and the spectral method

Looking for a leading module is a major task in community detection which coincides with the discovery of an optimal bi-partition of into communities, in terms of modularity. This problem is equivalent to maximizing the modularity and normalized modularity through the set functions and , respectively, over the possible subsets of , namely computing the quantities

(4)

As both and are NP-hard optimization problems [7], a globally optimal solution for large graphs is out of reach. One of the best known techniques for an approximate solution to these problems – typically referred to as “spectral method” – relates with the modularity matrix, and its leading eigenpair. Let be the vector of the degrees of the graph, the normalized modularity matrix of , with vertex measure , is defined as follows

Note that the term is the entry of the one-rank adjacency matrix of the expected graph of a random ensemble following the weighted Chung-Lu random model.

The spectral method roughly selects a bi-partition of the vertex set accordingly with the sign of the elements in an eigenvector of , associated with its largest eigenvalue . It is proved in [22] that if is not an eigenvector of , then is a simple eigenvalue and thus is uniquely defined. If , one computes such that , then the vertex set is partitioned into and , being . If , the graph is said algebraically indivisible, i.e. it resembles no community structure (see e.g. [41, 21]). The motivations behind this technique are based on a relaxation argument, that we discuss in what follows.

The Rayleigh quotient of is the real valued function

As the matrix is symmetric with respect the weighted scalar product , its eigenvalues can be characterized as variational values of . In particular, if the eigenvalues of are enumerated in descending order, then is the global maximum of ,

(5)

The quantity can be rewritten in terms of , thus in terms of . Consider the binary vector . Using the identities , and , we get , thus

(6)

Computing the global optimum of over is NP-hard. However, this maximum can be approximated by dropping the binary constraint on and, thus, transforming the problem into the eigenvalue problem (5), which can be easily solved. This observation is one of the main motivations of the spectral method based on the modularity matrix and its largest eigenvalue , whereas the main drawback of this approach is that, in general, the eigenvalue can arbitrary differ from the actual modularity .

From Equation (6) we can see that coincides with when evaluated on binary vectors . For this reason and the fact that coincides with an eigenvalue of the linear operator we say that is a linear relaxation of .

Before concluding this section we would like to point out another drawback of the linear relaxation approach which, to our opinion, is often overlooked: as we will show in Section 5, the solutions of and are in general far from being the same, however the linear relaxation approach in principle ignores such difference. In fact, the linear relaxation of the modularity set function is also a linear relaxation of the normalized modularity set function . We show such observation via the following

Proposition 1

If the largest eigenvalue of is positive, then is a linear relaxation of both and .

Proof

We already observed that coincides with on the set of binary vectors . A similar simple argument is used for . Consider the vector . Since we get and . Note that , thus

(7)

Therefore, and coincide on the set of binary vectors such that (for suitable ). As has a positive eigenvalue by assumption, dropping the binary constraint and recalling that , we get .

3 Tight nonlinear modularity relaxation

In this section we introduce a nonlinear modularity operator , through a natural generalization of the modularity matrix . To this operator we associate a Rayleigh quotient and a dual Rayleigh quotient to which naturally correspond a notion of nonlinear eigenvalues and eigenvectors. We use the new Rayleigh quotients to derive nonlinear relaxations of the modularity and the normalized modularity set functions, respectively. Moreover, unlike the standard linear relaxation, we show that such relaxations are tight, that is we prove a Cheeger-type result showing that certain eigenvalues of coincide with the graph modularities (4).

3.1 Nonlinear modularity operator

The nonlinear modularity operator we are going to define is related with the Clarke’s subdifferential (see [13] e.g.). We recall that, for Lipschitz around , the subdifferential of at is defined as the following subset of

The subdifferential of the one norm and the infinity norm are of particular importance of us. For these particular functions explicit expressions for are available. We recall them below in (8) and (13), respectively.

As the absolute value is not differentiable at zero, the subdifferential of the -norm is the set valued map defined by

(8)

where if and if . Note that if then any component of belongs to the image of the corresponding component of . Precisely, if and only if for all .

In order to define the nonlinear modularity operator, let us first observe that, due to the identity , for , the following formula holds for the modularity matrix :

This implies the following identity

(9)

for any . Thus we define the nonlinear modularity operator as follows:

(10)

Note that, by definition, for any we have

(11)

Since the right-hand side of (11) does not depend on the choice of the vector , we write to denote the quantity in (11), in analogy with (9). Note that and coincide on binary vectors, for instance when . Also note that is strictly related with the total variation of the vector . More precisely, is the difference of two weighted total variations of , as we will discuss with more detail in Section 3.4. For completeness, we recall that the weighted total variation of is the scalar function

where are the nonnegative weights.

We now consider two Rayleigh quotients associated with , defined as follows

(12)

where and . The functions and generalize the Rayleigh quotient of the linear modularity, and we will show in the next section that the global maxima of and provide an exact nonlinear relaxation of the modularity and normalized modularity set functions, defined in (3), respectively. Here we show that the optimality conditions for and are related to a notion of eigenvalues and eigenvectors for the nonlinear modularity operator . We also briefly discuss the underlying mathematical reason why naturally generalizes into and .

3.2 Nonlinear modularity eigenvectors

As for the 1-norm, we consider the subdifferential of the infinity norm . For a vector , let be the indices such that , then the subdifferential of the infinity norm is the set valued map defined by

(13)

where, for , and denotes the convex hull.

To the subdifferentials and correspond a notion of eigenvalue and eigenvector of

Definition 2

We say that is a nonlinear eigenvalue of with eigenvector if either or .

We have

Proposition 2

Let be a critical point of , then is a nonlinear eigenvector of such that with . Similarly, if is a critical point of , then is a nonlinear eigenvector of such that with .

Proof

Let denote the subdifferential. A direct inspection reveals that , where is the diagonal matrix and is the vector with components

. Using the chain rule for

(see e.g. [13]) we get

Therefore implies . As , a similar computation shows the proof for .

Thus critical points and critical values of and satisfy generalized eigenvalue equations for the nonlinear modularity operator . Despite the linear case, where the eigenvalues of the modularity matrix coincide with the variational values of , the number of eigenvalues of defined by means of the Rayleigh quotients in (12) is much larger than just the set of variational ones. However in many situations the variational spectrum plays a central role, as for instance in the case of the nonlinear Laplacian [55, 11, 17]. This work provides a further example: in what follows we consider the dominant eigenvalues of , coinciding with suitable variational values of and , we prove two optimality Cheeger-type results and we discuss how to use these eigenvalues to locate a leading module in the network by means of a nonlinear spectral method. The task of multiple community detection can also be addressed by successive bi-partitions, as we discuss in Section 5.3. Advantages of the nonlinear spectral method over the linear one are highlighted Section 5 where extensive numerical results are shown.

3.3 On the relation between , and

We briefly discuss the mathematical reason why generalizes into and . This gives further reasoning to the definition in (12). To this end we suppose for simplicity that . Therefore for all . Given the graph , consider the linear difference operator entrywise defined by , , and let be the real valued function . Then we can write

where we use the compact notation , even though that quantity is not a norm on , as attains positive and negative values. We have, as a consequence, . A natural generalization of such quantity is therefore given by

where, for and , we are using the notation . Clearly is retrieved from for . Now, let be the Hölder conjugate of , that is the solution of the equation . As , the quantity is in fact a special case of as well. The Rayleigh quotients in (12) are obtained by plugging into and , respectively. Even though in this work we shall focus only on the case , we believe that further investigations on and for different values of would be of significant interest. Figure 1 outlines this observation and the relation between the set valued functions and and the Rayleigh quotients and , for the specific values . The next section gives further detail in this sense.

3.4 Exact relaxation via nonlinear Rayleigh quotients

From (6) and Proposition 1 we deduce that the leading eigenvalue of the modularity matrix is an upper bound for both the quantities and . This intuitively motivates the use of such eigenvalue and the corresponding eigenvectors to approximate the modularity of the graph. However is an approximation that can be arbitrarily far from the true value of the modularity. In particular, when is the degree vector, a Cheeger-type inequality showing a lower bound for in terms of has been shown in [23], whereas a lower bound for is known only for regular graphs [21], the general case being still an open problem.

In what follows we show that moving from the linear to the nonlinear modularity operator, allows to shrink the distance between the combinatorial quantities and defined in (4) and the spectrum of . More precisely, we show that the new Rayleigh quotients and , as for , coincide with the modularity and normalized modularity functions and , respectively, on suitable set of binary vectors. However, unlike the linear case, we prove that the quantities

(14)

coincide exactly with the modularity and normalized modularity , respectively. For these reasons we say that the functions and are exact nonlinear relaxations of the modularity and normalized modularity set functions, respectively. The diagram in Figure 1 summarizes these relaxation relations.

         approximate, linear

         approximate, linear

Figure 1: This diagram summarizes how the relaxation functions and (for ) are related with and and which are their main properties.

To address the case of we make use of the Lovász extension of the modularity set function. The Lovász extension, also referred to as Choquet integral, allows the extension of set valued functions to the entire space and is particularly well-suited to deal with optimization of sub-modular functions. We refer to [3] for a careful introduction to the topic. Below we recall one possible definition of the Lovász extension

Definition 3

Given the set of vertices , let be the power set of , and consider a function . For a given vector let be any permutation such that and let be the set

The Lovász extension of is defined by

We collect in the next proposition some useful properties of the Lovász extension, which will be helpful in the following. We refer to [3] for their proofs.

Proposition 3 (Some properties of the Lovász extension)

Consider two set valued functions such that . Then

  1. is the Lovász extension of , i.e. .

  2. For all it holds .

  3. is positively one-homogeneous, i.e. for all .

  4. Given a graph let denote its edge weight function and let denote the set valued function . The Lovász extension of is the weighted total variation

  5. Given and consider the level set . Then

The formula at point is actually one of the many equivalent definitions of the Lovász extension and is sometimes referred to as the co-area theorem.

Remark

From the proposition above we deduce that is the Lovász extension of the modularity function and it corresponds to the difference of two weighted total variations of .

In fact, given a graph with weight function , consider the complete graph with weight function , where and are the degree of node and the volume of , respectively. Then, for any we have

Therefore, from (1) and the identity , we can decompose the modularity of a set into , where is the set valued function defined at point 4 of Proposition 3. Combining points 1 and 4 of Proposition 3 we obtain

(15)

The following technical lemma will be useful in the proof of Theorem 3.1 below, being one of our two main theorems of the section.

Lemma 1

Let be set valued functions such that for all s.t. . If , then

Proof

Suppose w.l.o.g. that the entries of are labeled in ascending order, that is . We have

As and we get

We get as a consequence

and this proves the claim.

The above lemma allows us to show that is an exact nonlinear relaxation of the modularity function

Theorem 3.1

Let be the Rayleigh quotient defined in (12) and let . Then , for any and

Proof

For a subset , consider the vector . Then

and . Therefore and

(16)

To show the reverse inequality we use Lemma 1 and Remark 3.4. By (15) we have . Now let be the constant function . As , we can use such into Lemma 1, with , to get

where the second identity holds since is positively one-homogeneous (Proposition 3, point 3). Combining the latter inequality with (16) we conclude.

We now prove an analogous result involving and , To this end we formulate the following Lemma 2. The proof is a straightforward modification of the proof of Lemma 3.1 in [30], and is omitted for brevity.

Lemma 2

A function is positively one-homogeneous, even, convex and for any if and only if there exists such that where is a closed symmetric convex set such that for any .

The following theorem shows that is an exact nonlinear relaxation of the normalized modularity function .

Theorem 3.2

Let be the Rayleigh quotient defined in (12) and let . Then for any and