1. Introduction
Clustering in graphs has been one of the most fundamental tools for analyzing and understanding the components of complex networks. It has been used extensively in many important applications to distributed systems (Hendrickson and Leland, 1995; Simon, 1991; Van Driessche and Roose, 1995), compression (Rossi et al., 2015; Buehrer and Chellapilla, 2008), image segmentation (Shi and Malik, 2000; Felzenszwalb and Huttenlocher, 2004), document and word clustering (Dhillon, 2001), among others. Most clustering methods focus on simple flat/homogeneous graphs where nodes and edges represent a single entity and relationship type, respectively. However, heterogeneous graphs consisting of nodes and edges of different types are seemingly ubiquitous in the realworld. In fact, most realworld systems give rise to rich heterogeneous networks that consist of multiple types of diversely interdependent entities (Shi et al., 2017; Sun and Han, 2013)
. This heterogeneity of real systems is often due to the fact that, in applications, data usually contains semantic information. For example in research publication networks, nodes can represent authors, papers, or venues and edges can represent coauthorships, references, or journal/conference appearances. Such heterogeneous graph data can be represented by an arbitrary number of matrices and tensors that are coupled with respect to one or more types as shown in Figure
1.Clusters in heterogeneous graphs that contain multiple types of nodes give rise to communities that are significantly more complex. Joint analysis of multiple graphs may capture finegrained clusters that would not be captured by clustering each graph individually as shown in (Banerjee et al., 2007; Rossi and Zhou, 2015). For instance, simultaneously clustering different types of entities/nodes in the heterogeneous graph based on multiple relations where each relation is represented as a matrix or a tensor (Figure 1). It is due to this complexity and the importance of explicitly modeling how those entity types mix to form complex communities that make the problem of heterogeneous graph clustering a lot more challenging. Moreover, the complexity, representation, and modeling of the heterogeneous graph data itself also makes this problem challenging (See Figure 1). Extensions of clustering methods for homogeneous graphs to heterogeneous graphs are often nontrivial. Many methods require complex schemas and are very specialized, allowing for two graphs with particular structure. Furthermore, most clustering methods only consider first order structures in graphs, i.e., edge connectivity information. However, higherorder structures play a nonnegligible role in the organization of a network.
Higherorder connectivity patterns such as small induced subgraphs called graphlets (network motifs) are known to be the fundamental building blocks of simple homogeneous networks (Milo et al., 2002) and are essential for modeling and understanding the fundamental components of these networks (Ahmed et al., 2015; Ahmed et al., 2016; Benson et al., 2016). However, such (untyped) graphlets are unable to capture the rich (typed) connectivity patterns in more complex networks such as those that are heterogeneous, labeled, signed, or attributed. In heterogeneous graphs (Figure 1), nodes and edges can be of different types and explicitly modeling such types is crucial. In this work, we introduce the notion of a typedgraphlet and use it to uncover the higherorder organization of rich heterogeneous networks. The notion of a typedgraphlet captures both the connectivity pattern of interest and the types. We argue that typedgraphlets are the fundamental building blocks of heterogeneous networks. Note homogeneous, labeled, signed, and attributed graphs are all special cases of heterogeneous graphs as shown in Section 2.
In this paper, we propose a general framework for higherorder clustering in heterogeneous graphs. The framework explicitly incorporates heterogeneous higherorder information by counting typed graphlets that explicitly capture node and edge types. Typed graphlets generalize the notion of graphlets to rich heterogeneous networks as they explicitly capture the higherorder typed connectivity patterns in such networks. Using these as a basis, we propose the notion of typedgraphlet conductance that generalizes the traditional conductance to higherorder structures in heterogeneous graphs. The proposed approach reveals the higherorder organization and composition of rich heterogeneous complex networks. Given a graph and a typedgraphlet of interest , the framework forms the weighted typedgraphlet adjacency matrix by counting the frequency that two nodes cooccur in an instance of the typedgraphlet. Next, the typedgraphlet Laplacian matrix is formed from
and the eigenvector corresponding to the second smallest eigenvalue is computed. The components of the eigenvector provide an ordering
of the nodes that produce nested sets of increasing size . We demonstrate theoretically that with the minimum typedgraphlet conductance is a nearoptimal higherorder cluster.The framework provides mathematical guarantees on the optimality of the higherorder clustering obtained. The theoretical results extend to typed graphlets of arbitrary size and avoids restrictive special cases required in prior work. Specifically, we prove a Cheegerlike inequality for typedgraphlet conductance. This gives bounds where is the minimum typedgraphlet conductance, is the value given by Algorithm 1, and is a constant—at least as small as which depends on the number of edges in the chosen typed graphlet. Notably, the bounds of the method depend directly on the number of edges of the arbitrarily chosen typed graphlet (as opposed to the number of nodes) and inversely on the quality of connectivity of occurrences of the typed graphlet in a heterogeneous graph. This is notable as the formulation for homogeneous graphs and untyped graphlets proposed in (Benson et al., 2016) is in terms of nodes and requires different theory for untypedgraphlets with a different amount of nodes (e.g., untyped graphlets with 3 nodes vs. 4 nodes and so on). In this work, we argue that it is not the number of nodes in a graphlet that are important, but the number of edges. This leads to a more powerful, simpler, and general framework that can serve as a basis for analyzing higherorder spectral methods. Furthermore, even in the case of untyped graphlets and homogeneous graphs, the formulation in this work leads to tighter bounds for certain untyped graphlets. Consider a 4node star and 3node clique (triangle), both have 3 edges, and therefore would have the same bounds in our framework even though the number of nodes differ. However, in (Benson et al., 2016), the bounds for the 4node star would be different (and larger) than the 3node clique. This makes the proposed formulation and corresponding bounds more general and in the above case provides tighter bounds compared to (Benson et al., 2016).
The experiments demonstrate the effectiveness of the approach quantitatively for three important tasks. First, we demonstrate the approach for revealing high quality clusters across a wide variety of graphs from different domains. In all cases, it outperforms a number of stateoftheart methods with an overall improvement of x over all graphs and methods. Second, we investigate the approach for link prediction. In this task, we derive higherorder typedgraphlet node embeddings (as opposed to clustering) and use these embeddings to learn a predictive model. Compared to stateoftheart methods, the approach achieves an overall improvement in and AUC of 18.7% and 14.4%, respectively. Finally, we also demonstrate the effectiveness of the approach quantitatively for graph compression where it is shown to achieve a mean improvement of across all graphs and methods. Notably, these application tasks all leverage different aspects of the proposed framework. For instance, link prediction uses the higherorder node embeddings given by our approach whereas graph compression leverages the proposed typedgraphlet spectral ordering (Definition 11).
The paper is organized as follows. Section 2 describes the general framework for higherorder spectral clustering whereas Section 3 proves a number of important results including mathematical guarantees on the optimality of the higherorder clustering. Next, Section 4 demonstrate the effectiveness of the approach quantitatively for a variety of important applications including clustering, link prediction, and graph compression. Section 5 discusses and summarizes related work. Finally, Section 6 concludes.
2. Framework
In this work, we propose a general framework for higherorder clustering in heterogeneous graphs. Table 1 lists all our notation.
2.1. Heterogeneous Graph Model
We represent a heterogeneous complex system using the following heterogeneous graph model.
Definition 1 (Heterogeneous Graph).
A heterogeneous graph is an ordered tuple comprised of

a graph where is the node set and is the edge set,

a mapping referred to as the nodetype mapping where is a set of node types,

a mapping referred to as the edgetype mapping where is a set of edge types.
We denote the node set of a heterogeneous graph as and its edge set as .
A homogeneous graph can be seen as a special case of a heterogeneous graph where . Note that a heterogeneous graph may be unweighted or weighted and it may be undirected or directed, depending on the underlying graph structure. Moreover, it may also be signed or labeled where corresponds to a label assigned to node (or edge ).
In general, a heterogeneous network can be represented by an arbitrary number of matrices and tensors that are coupled, i.e., the tensors and matrices share at least one type with each other (Acar et al., 2011; Rossi and Zhou, 2016). See Figure 1 for an example of a heterogeneous network represented as a coupled matrixtensor.
2.2. Graphlets
Graphlets are small connected induced subgraphs (Pržulj et al., 2004; Ahmed et al., 2015). The simplest nontrivial graphlet is the 1storder structure of a node pair connected by an edge. Higherorder graphlets correspond to graphlets with greater number of nodes and edges. Most graph clustering algorithms only take into account edge connectivity, 1storder graphlet structure, when determining clusters. Moreover, these methods are only applicable for homogeneous graphs. For example, spectral clustering on the normalized Laplacian of the adjacency matrix of a graph partitions it in a way that attempts to minimize the amount of edges, 1storder structures, cut (Kannan et al., 2004).
In this section, we introduce a more general notion of graphlet called typedgraphlet that naturally extends to both homogeneous and heterogeneous networks. In this paper, we will use to represent a graph and or to represent graphlets.
2.2.1. Untyped graphlets
We begin by defining graphlets for homogeneous graphs with a single type.
Definition 2 (Untyped Graphlet).
An untyped graphlet of a homogeneous graph is a connected, induced subgraph of .
Given an untyped graphlet in some homogeneous graph, it may be the case that we can find other topologically identical “appearances” of this structure in that graph. We call these appearances untypedgraphlet instances.
Definition 3 (UntypedGraphlet Instance).
An instance of an untyped graphlet in homogeneous graph is an untyped graphlet in that is isomorphic to .
As we shall soon see, it will be important to refer to the set of all instances of a given graphlet in a graph. Forming this set is equivalent to determining the subgraphs of a graph isomorphic to the given graphlet. Nevertheless, we usually only consider graphlets with up to four or five nodes, and have fast methods for discovering instances of such graphlets (Ahmed et al., 2015; Ahmed et al., 2016; Ahmed et al., 2016a, b; Rossi et al., 2017).
2.2.2. Typed graphlets
In heterogeneous graphs, nodes and edges can be of different types and so explicitly (and jointly) modeling such types is essential (Figure 1). To generalize higherorder clustering to handle such networks, we introduce the notion of a typedgraphlet that explicitly captures both the connectivity pattern of interest and the types. Notice that typedgraphlets are a generalization of untypedgraphlets and thus are a more powerful representation.
Definition 4 (Typed Graphlet).
A typed graphlet of a heterogeneous graph is a connected induced heterogeneous subgraph of in the following sense:

is an untyped graphlet of ,

, that is, is the restriction of to

, that is, is the restriction of to .
We can consider the topologically identical “appearances” of a typed graphlet in a graph that preserve the type structure.
Definition 5 (TypedGraphlet Instance).
An instance of a typed graphlet of heterogeneous graph is a typed graphlet of such that:

is isomorphic to ,

and , that is, the sets of node and edge types are correspondingly equal.
The set of unique typedgraphlet instances of in is denoted as .
Note that we are not interested in preserving the type structure via the isomorphism, only its existence, that is, we are not imposing the condition that the node and edge types coincide via the graph isomorphism. This condition is too restrictive.
2.2.3. Motifs
Before we proceed, we briefly address some discrepancies between our definition of graphlets and that of papers such as (Benson et al., 2016; Arenas et al., 2005). Although it might be a simple matter of semantics, the differences should be noted and clarified to avoid confusion. Some papers refer to what we refer to graphlets as motifs. Yet, motifs usually refer to recurrent and statistically significant induced subgraphs (Pržulj et al., 2004; Milo et al., 2002).
To find the motifs of a graph, one must compare the frequency of appearances of a graphlet in the graph to the expected frequency of appearances in an ensemble of random graphs in a null model associated to the underlying graph. Current techniques for computing the expected frequency in a null model requires us to generate a graph that follows the null distribution and then compute the graphlet frequencies in this sample graph (Milo et al., 2002; Albert and Barabási, 2002). These tasks are computationally expensive for large networks as we have to sample many graphs from the null distribution. On the other hand, any graphlet can be arbitrarily specified in a graph and does not depend on being able to determine whether it is is statistically significant. In any case, a motif is a special type of graphlet, so we prefer to work with this more general object.
graph  
node set of  
edge set of  
graphlet of  
set of unique instances of in  
typedgraphlet adjacency matrix of based on  
typedgraphlet normalized Laplacian of based on  
weighted heterogeneous graph induced by  
subset of  
cut of where  
degree of node  
typedgraphlet degree of node based on  
volume of under  
typedgraphlet volume of based on under  
cut size of under  
typedgraphlet cut size of based on under  
conductance of under  
typedgraphlet conductance of based on under  
conductance of  
typedgraphlet conductance of based on 
2.3. TypedGraphlet Conductance
In this section, we introduce the measure that will score the quality of a heterogeneous graph clustering built from typed graphlets. It is extended from the notion of conductance defined as:
where is a cut of a graph, is the number of edges crossing cut and is the total degrees of the vertices in cluster (Fortunato, 2010; Kannan et al., 2004). Note that its minimization achieves the sparsest balanced cut in terms of the total degree of a cluster.
The following definitions apply for a fixed heterogeneous graph and typed graphlet. Assume we have a heterogeneous graph and a typed graphlet of .
Note 0 ().
We denote the set of unique instances of in as .
Definition 6 (TypedGraphlet Degree).
The typedgraphlet degree based on of a node is the total number of incident edges to over all unique instances of . We denote and compute this as
Definition 7 (TypedGraphlet Volume).
The typedgraphlet volume based on of a subset of nodes is the total number of incident edges to any node in over all instances of . In other words, it is the sum of the typedgraphlet degrees based on over all nodes in . We denote and compute this as
Recall that a cut in a graph is a partition of the underlying node set into two proper, nonempty subsets and where
. We denote such a cut as an ordered pair
. For any given cut in a graph, we can define a notion of cut size.Definition 8 (TypedGraphlet Cut Size).
The typedgraphlet cut size based on of a cut in is the number of unique instances of crossing the cut. We denote and compute this as
Note that a graphlet can cross a cut with any of its edges. Therefore, it has more ways in which it can add to the cut size than just an edge.
Having extended notions of volume and cut size for higherorder typed substructures, we can naturally introduce a corresponding notion of conductance.
Definition 9 (TypedGraphlet Conductance).
The typedgraphlet conductance based on of a cut in is
and the typedgraphlet conductance based on of is defined to be the minimum typedgraphlet conductance based on over all possible cuts in :
(1) 
The cut which achieves the minimal typedgraphlet conductance corresponds to the cut that minimizes the amount of times instances of are cut and still achieves a balanced partition in terms of instances of in the clusters.
2.4. TypedGraphlet Laplacian
In this section, we introduce a notion of a higherorder Laplacian of a graph. Assume we have a heterogeneous graph and a typed graphlet of .
2.4.1. Typedgraphlet adjacency matrix
Suppose we have the set . Then, we can form a matrix that has the same dimensions as the adjacency matrix of and has its entries defined by the count of unique instances of containing edges in .
Definition 10 (TypedGraphlet Adjacency Matrix).
Suppose that . The typedgraphlet adjacency matrix of based on is a weighted matrix defined by
for . That is, the entry of is equal to the number of unique instances of that contain nodes as an edge.
Having defined , a weighted adjacency matrix on the set of nodes , we can induce a weighted graph. We refer to this graph as the graph induced by and denote it as .
Note 0 ().
From the definition of , we can easily show that for any .
2.4.2. Typedgraphlet Laplacian
We can construct the weighted normalized Laplacian of :
where is defined by
for . We also refer to this Laplacian as the typedgraphlet normalized Laplacian based on of . The normalized typedgraphlet Laplacian is the fundamental structure for the method we present in Section 2.5.
2.5. TypedGraphlet Spectral Clustering
In this section, we present an algorithm for approximating the optimal solution to the minimum typedgraphlet conductance optimization problem:
(2) 
Minimizing the typedgraphlet conductance encapsulates what we want: the solution achieves a bipartition of that minimizes the number of instances of that are cut and is balanced in terms of the total graphlet degree contribution of all instances of on each partition.
The issue is that minimizing typedgraphlet conductance is hard. To see this, consider the case where your graphlet is the 1storder graphlet, that is, a pair of nodes connected by an edge. Minimizing the standard notion of conductance, which is known to be NPhard (Cook, 1971), reduces to minimizing this special case of 1storder untypedgraphlet conductance minimization. Therefore, obtaining the best graphletpreserving clustering for large graphs is an intractible problem. We can only hope to achieve a nearoptimal approximation.
2.5.1. Algorithm
We present a typedgraphlet spectral clustering algorithm for finding a provably nearoptimal bipartition in Algorithm 1. We build a sweeping cluster in a greedy manner according to the typedgraphlet spectral ordering defined as follows.
Definition 11 (TypedGraphlet Spectral Ordering).
Let denote the eigenvector corresponding to the nd smallest eigenvalue of the normalized typedgraphlet Laplacian . The typedgraphlet spectral ordering is the permutation
of coordinate indices such that
that is, is the permutation of coordinate indices of that sorts the corresponding coordinate values from smallest to largest.
2.5.2. Extensions
Algorithm 1 generalizes the spectral clustering method for standard conductance minimization (Shi and Malik, 2000) and untypedgraphlet conductance minimization. We demonstrated the reduction of standard conductance minimization above. Untypedgraphlet conductance minimization is also generalized since homogeneous graphs can be seen as heterogeneous graphs with a single node and edge type. It is straightforward to adapt the framework to other arbitrary (sparse) cut functions such as ratio cuts (Gaertler, 2005), normalized cuts (Shi and Malik, 2000), bisectors (Gaertler, 2005), normalized association cuts (Shi and Malik, 2000), among others (Schaeffer, 2007; Fortunato, 2010; Shi and Malik, 2000).
Multiple clusters can be found through simple recursive bipartitioning (Kannan et al., 2004). We could also embed the lower eigenvectors of the normalized typedgraphlet Laplacian into a lower dimensional Euclidean space and perform means, or any other Euclidean clustering algorithm, then associate to each node its corresponding cluster in this space (Ng et al., 2002; Kannan et al., 2004). It is also straightforward to use multiple typedgraphlets for clustering or embeddings as opposed to using only a single typedgraphlet independently. For instance, the higherorder typedgraphlet adjacency matrices can be combined in some fashion (e.g., summation) and may even be assigned weights based on the importance of the typedgraphlet. Moreover, the typedgraphlet conductance can be adapted in a straightforward fashion to handle multiple typedgraphlets.
2.5.3. Discussion
Benson et al. (Benson et al., 2016) refers to their higherorder balanced cut measure as motif conductance and it differs from our proposed notion of typedgraphlet conductance. However, the definition used matches more with a generalization known as the edge expansion. The edge expansion of a cut is defined as
(3) 
The balancing is in terms of the number of vertices in a cluster. Motif conductance was defined with a balancing in terms of the number of vertices in any graphlet instance. To be precise, for any set of vertices , let the cluster size of in based on be
(4) 
Note that this does not take into account the degree contributions of each graphlet, only its node count contributions to a cluster . In terms of our notation, untyped “motif conductance” of a cut is defined in that work as
Since this does not take into account node degree information, this is more of a generalization of edge expansion (Hoory et al., 2006; Alon, 1997), “graphlet expansion”, if you will, rather than conductance. The difference is worth noting because it has been shown that conductance minimization gives better partitions than expansion minimization (Kannan et al., 2004). By only counting nodes, we give equal importance to all the vertices in a graphlet. Arguably, it is more reasonable to give greater importance to the vertices that not only participate in many graphlets but also have many neighbors within a graphlet and give lesser importance to vertices that have more neighbors that do not participate in a graphlet or do not have many neighbors within a graphlet. Our definition of typedgraphlet volume captures this idea to give an appropriate general notion of conductance.
2.6. TypedGraphlet Node Embeddings
Algorithm 2 summarizes the method for deriving higherorder typed motifbased node embeddings (as opposed to clusters/partitions of nodes, or an ordering for compression/analysis, see Section 4.3). In particular, given a typedgraphlet adjacency matrix, Algorithm 2 outputs a matrix of node embeddings. For graphs with many connected components, Algorithm 2 is called for each connected component of and the resulting embeddings are stored in the appropriate locations in the overall embedding matrix .
Multiple typedgraphlets can also be used to derive node embeddings. One approach that follows from (Rossi et al., 2018a) is to derive lowdimensional node embeddings for each typedgraphlet of interest using Algorithm 2. After obtaining all the node embeddings for each typedgraphlet, we can simply concatenate them all into one single matrix . Given , we can simply compute another lowdimensional embedding to obtain the final node embeddings that capture the important latent features from the node embeddings from different typedgraphlets.
3. Theoretical Analysis
In this section, we show the nearoptimality of Algorithm 1. The idea is to translate what we know about ordinary conductance for weighted homogeneous graphs, for which there has been substantial theory developed (Chung, 1997; Chung and Oden, 2000; Chung, 1996), to this new measure we introduce of typedgraphlet conductance by relating these two quantities. Through this association, we can derive Cheegerlike results for and for the approximation given by the typedgraphlet spectral clustering algorithm (Algorithm 1). As in the previous section, assume we have a heterogeneous graph and a typed graphlet . Also, assume we have the weighted graph induced from the typedgraphlet adjacency matrix .
We prove two lemmas from which our main theorem will immediately hold. Lemma 1 shows that the typedgraphlet volume and ordinary volume measures match: total typedgraphlet degree contributions of typedgraphlet instances matches with total counts of typedgraphlet instances on edges for any given subset of nodes. In contrast, Lemma 2 shows that equality does not hold for the notions of cut size. The reason lies in the fact that for any typedgraphlet instance, typedgraphlet cut size on only counts the number of typedgraphlet instances cut whereas ordinary cut size on counts the number of times typedgraphlet instances are cut. Therefore, these two measure at least match and at most differ by a factor equal to the size of the , which is a fixed value that is small for the typed graphlets we are interested in, of size 3 or 4. Thus, we are able to reasonably bound the discrepancy between the notions of cut sizes.
Using these two lemmas, we immediately get our main result in Theorem 3 which shows the relationship between and in the form of tightly bound inequality that is dependent only on the number of edges in . From this theorem, we arrive at two important corollaries. In Corollary 4, we prove Cheegerlike bounds for typedgraphlet conductance. In Corollary 5, we show that the output of Algorithm 1 gives a nearoptimal solution up to a square root factor and it goes further to show bounds in terms of the optimal value to show the constant of the approximation algorithm which depends on the second smallest eigenvalue of the typedgraphlet adjacency matrix and the number of edges in . This last result does not give a purely constantfactor approximation to the graph conductance because of their dependence on and , yet it still gives a very efficient, and nontrivial, approximation for fixed and . Moreover, the second part of Corollary 5 provides intuition as to what makes a specific typedgraphlet a suitable choice for higherorder clustering. Typedgraphlets that have a good balance of small edge set size and strong connectivity in the heterogeneous graph—in the sense that the second eigenvalue of the normalized typedgraphlet Laplacian is large—will have a tighter upper bound to their approximation for minimum typedgraphlet conductance. Therefore, this last result in Corollary 5 provides a way to quickly and quantitatively measure how good a typed graphlet is for determining higherorder organization before even executing the clustering algorithm.
Note 0 ().
In the case of the simple 1storder untyped graphlet, i.e., a node pair with an interconnecting edge, we recover the results for traditional spectral clustering since in this case. Furthermore, if is a homogeneous graph, i.e., , we get the special case of untyped graphletbased spectral clustering. Therefore, our framework generalizes the methods of traditional spectral clustering and untypedgraphlet spectral clustering for homogeneous graphs.
In the following analysis, we let represent the Boolean predicate function and let be the edge weight of edge in .
Lemma 0 ().
Let be a subset of nodes in . Then,
Proof.
(5)  
(6)  
(7)  
(8)  
(9)  
(10)  
(11)  
(12) 
∎
Lemma 0 ().
Let be a cut in . Then,
Proof.
For subsequent simplification, we define to be the set of edges in that cross cut :
(13) 
Then,
(14)  
(15)  
(16)  
(17)  
(18) 
Note that for an instance such that , there exists at least one edge in cut by and at most all edges in are cut by . Clearly, if , then no edge is cut by . This shows that for such an instance we have
(19) 
Therefore, Equation 18 satisfies the following inequalities:
(20)  
(21) 
Referring to Definition 8 for typedgraphlet cut size and noting that since is a connected graph,
(22) 
we find that
(23) 
Plugging this into Inequalities 2021, we get
(24) 
or, equivalently,
(25) 
∎
Theorem 3 ().
Proof.
Let be any cut in . From Lemma 2, we have that
(26) 
Lemma 1 shows that . Therefore, if we divide these inequalities above by , we get that
(27) 
by the definitions of conductance and typedgraphlet conductance. Since this result holds for any subset , it implies that
(28) 
∎
Corollary 0 ().
Let be the second smallest eigenvalue of . Then,
Proof.
Corollary 0 ().
Let be the cluster output of Algorithm 1 and let be its corresponding typedgraphlet conductance on based on . Then,
Moreover, if we let be the second smallest eigenvalue of , then
where
showing that, for a fixed and , Algorithm 1 is a approximation algorithm to the typedgraphlet conductance minimization problem.
Proof.
Clearly since is the minimal typedgraphlet conductance. To prove the upper bound, let be the cut that achieves the minimal conductance on , that is, . Then,
(31)  
(32)  
(33)  
(34) 
Inequality 31 follows from the fact that achieves the minimal typedgraphlet conductance. Inequality 32 follows from Inequality 27 in Theorem 3.
Inequality 33 follows from Cheeger’s inequality for weighted graphs (see (Chung, 1996) for a proof).
Inequality 34 follows from the lower bound in Corollary 4.
4. Experiments
This section empirically investigates the effectiveness of the proposed approach quantitatively for typedgraphlet spectral clustering (Section 4.1), link prediction using the higherorder node embeddings from our approach (Section 4.2) and the typedgraphlet spectral ordering for graph compression (Section 4.3). Unless otherwise mentioned, we use all 3 and 4node graphlets.
Graph  
yahoomsg  100.1k  739.8k  2  3  2  3  4  3  3  3  2 
dbpedia  495.9k  921.7k  4  8  0  6  10  5  0  0  0 
digg  283.2k  4.6M  2  4  3  4  5  4  4  4  2 
movielens  28.1k  170.4k  3  7  1  6  9  6  3  3  0 
citeulike  907.8k  1.4M  3  5  0  3  6  3  0  0  0 
fbCMU  6.6k  250k  3  10  10  15  15  15  15  15  15 
reality  6.8k  7.7k  2  4  3  4  5  4  4  4  2 
gene  1.1k  1.7k  2  4  4  5  5  5  5  5  5 
citeseer  3.3k  4.5k  6  56  40  124  119  66  98  56  19 
cora  2.7k  5.3k  7  82  49  202  190  76  157  73  19 
webkb  262  459  5  31  21  59  59  23  51  32  8 
polretweet  18.5k  48.1k  2  4  4  5  5  5  5  5  4 
webspam  9.1k  465k  3  10  10  15  15  15  15  15  15 
fbrelationship  7.3k  44.9k  6  50  47  112  109  85  106  89  77 
Enzymesg123  90  127  2  4  3  5  5  5  4  3  0 
Enzymesg279  60  107  2  4  4  5  5  5  5  5  0 
Enzymesg293  96  109  2  4  1  5  5  1  2  1  0 
Enzymesg296  125  141  2  4  1  4  5  2  1  1  0 
NCI109g4008  90  105  2  3  0  3  3  0  0  0  0 
NCI109g1709  102  106  3  5  0  5  5  1  0  0  0 
NCI109g3713  111  119  3  4  0  6  4  0  0  0  0 
NCI1g3700  111  119  3  4  0  6  4  0  0  0  0 
DSH 
KCoreH 
LPH 
LouvH 
SpecH 
GSpecH 
TGS 


yahoomsg  0.5697  0.6624  0.2339  0.3288  0.0716  0.2000  0.0588 
dbpedia  0.7414  0.5586  0.4502  0.8252  0.9714  0.9404  0.0249 
digg  0.4122  0.4443  0.7555  0.3232  0.0006  0.0004  0.0004 
movielens  0.9048  0.9659  0.7681  0.8620  0.9999  0.6009  0.5000 
citeulike  0.9898  0.9963  0.9620  0.8634  0.9982  0.9969  0.7159 
fbCMU  0.6738  0.9546  0.9905  0.8761  0.5724  0.8571  0.5000 
reality  0.7619  0.3135  0.2322  0.1594  0.6027  0.0164  0.0080 
gene  0.8108  0.9298  0.9151  0.8342  0.4201  0.1667  0.1429 
citeseer  0.5000  0.6667  0.6800  0.6220  0.0526  0.0526  0.0333 
cora  0.0800  0.9057  0.8611  0.8178  0.0870  0.0870  0.0500 
webkb  0.2222  0.9286  0.6154  0.8646  0.6667  0.3333  0.2222 
polretweet  0.5686  0.6492  0.0291  0.0918  0.6676  0.0421  0.0220 
webspam  0.8551  0.9331  0.9844  0.7382  0.9918  0.5312  0.5015 
fbrelationship  0.6249  0.9948  0.5390  0.8392  0.9999  0.5866  0.4972 
Enzymes123  0.8667  0.8889  0.5696  0.6364  0.6768  0.5204  0.3902 
Enzymes279  0.9999  0.4444  0.5179  0.4444  0.2929  0.3298  0.2747 
Enzymes293  1.0000  0.4857  0.9444  0.3793  0.7677  0.5000  0.3023 
Enzymes296  1.0000  0.7073  0.9286  0.7344  0.6406  0.5000  0.3212 
NCI1094008  0.7619  0.4324  0.8462  0.8235  0.3500  0.4556  0.3204 
NCI1091709  0.4000  0.3171  0.1429  0.4615  0.3922  0.3654  0.1333 
NCI1093713  0.4074  0.3793  0.7500  0.4583  0.6667  1.0000  0.2000 
NCI13700  0.4074  0.3793  0.7500  0.4583  0.3333  0.6667  0.2500 
Avg. Rank  4.59  4.77  4.64  4.32  4.27  3.27  1 
4.1. Clustering
We quantitatively evaluate the proposed approach by comparing it against a wide range of stateoftheart community detection methods on multiple heterogeneous graphs from a variety of application domains with fundamentally different structural properties (Rossi and Ahmed, 2015a).

Densest Subgraph (DSH) (Khuller and Saha, 2009): This baseline finds an approximation of the densest subgraph in using degeneracy ordering (Erdős and Hajnal, 1966; Rossi and Ahmed, 2014). Given a graph with nodes, let be the subgraph induced by nodes. At the start, and thus . At each step, node with smallest degree is selected from and removed to obtain . Afterwards, we update the corresponding degrees of and density . This is repeated to obtain . From , we select the subgraph with maximum density .

Label Propagation (LPH) (Raghavan et al., 2007): Label propagation takes a labeling of the graph (in this case, induced by the heterogeneous graph), then for each node in some random ordering of the nodes, the node label is updated according to the label with maximal frequency among its neighbors. This iterative process converges when each node has the same label as the maximum neighboring label, or the number of iterations can be fixed. The final labeling induces a partitioning of the graph and the partition with maximum modularity is selected.

Louvain (LouvH) (Blondel et al., 2008): Louvain performs a greedy optimization of modularity by forming small, locally optimal communities then grouping each community into one node. It iterates over this twophase process until modularity cannot be maximized locally. The community with maximum modularity is selected.

Spectral Clustering (SpecH) (Chung, 1997): This baseline executes spectral clustering on the normalized Laplacian of the adjacency matrix to greedily build the sweeping cluster that minimizes conductance.

UntypedGraphlet Spec. Clustering (GSpecH) (Benson et al., 2016): This baseline computes the untypedgraphlet adjacency matrix and executes spectral clustering on the normalized Laplacian of this matrix to greedily build the sweeping cluster that minimizes the untypedgraphlet conductance.
Note that we append the original method name with to indicate that it was adapted to support community detection in arbitrary heterogeneous graphs (Figure 1) since the original methods were not designed for such graph data.
Mean  

DS  KC  LP  Louv  Spec  GSpec  Gain  
yahoomsg  9.69x  11.27x  3.98x  5.59x  1.22x  3.40x  5.86x 
dbpedia  29.78x  22.43x  18.08x  33.14x  39.01x  37.77x  30.03x 
digg  1030x  1110x  1888x  808x  1.50x  1.00x  806.75x 
movielens  1.81x  1.93x  1.54x  1.72x  2.00x  1.20x  1.70x 
citeulike  1.38x  1.39x  1.34x  1.21x  1.39x  1.39x  1.35x 
fbCMU  1.35x  1.91x  1.98x  1.75x  1.14x  1.71x  1.64x 
reality  95.24x  39.19x  29.02x  19.92x  75.34x  2.05x  43.46x 
gene  5.67x  6.51x  6.40x  5.84x  2.94x  1.17x  4.75x 
citeseer  15.02x  20.02x  20.42x  18.68x  1.58x  1.58x  12.88x 
cora  10.00x  13.33x  17.22x  16.36x  1.74x  1.74x  10.07x 
webkb  1.00x  4.18x  2.77x  3.89x  3.00x  1.50x  2.72x 
polretweet  25.85x  29.51x  1.32x  4.17x  30.35x  1.91x  15.52x 
webkbspam  1.71x  1.86x  1.96x  1.47x  1.98x  1.06x  1.67x 
fbrelationship  1.26x  2.00x  1.08x  1.69x  2.01x  1.18x  1.54x 
Enzymesg123  2.22x  2.28x  1.46x  1.63x  1.73x  1.33x  1.78x 
Enzymesg279  3.64x  1.62x  1.89x  1.62x  1.07x  1.20x  1.84x 
Enzymesg293  3.31x  1.61x  3.12x  1.25x  2.54x  1.65x  2.25x 
Enzymesg296  3.11x  2.20x  2.89x  2.29x  1.99x  1.56x  2.34x 
NCI109g4008  2.38x  1.35x  2.64x  2.57x  1.09x  1.42x  1.91x 
NCI109g1709  3.00x  2.38x  1.07x  3.46x  2.94x  2.74x  2.60x 
NCI109g3713  2.04x  1.90x  3.75x  2.29x  3.33x  5.00x  3.05x 
NCI1g3700  1.63x  1.52x  3.00x  1.83x  1.33x  2.67x  2.00x 
Mean Gain  56.89x  58.23x  91.62x  42.74x  8.24x  3.47x  (43.53x) 
We evaluate the quality of communities using their external conductance score (Gleich and Seshadhri, 2012; Almeida et al., 2011). This measure has been identified as one of the most important cutbased measures in a seminal survey by Schaeffer (Schaeffer, 2007) and extensively studied in many disciplines and applications (Shi and Malik, 2000; Voevodski et al., 2009; Chung, 1997; Gleich and Seshadhri, 2012; Almeida et al., 2011; Kannan et al., 2004; Schaeffer, 2007). Results are reported in Table 3. As an aside, all methods take as input the same heterogeneous graph . Overall, the results in Table 3 indicate that the proposed approach is able to reveal better high quality clusters across a wide range of heterogeneous graphs. The heterogeneous network statistics and properties including the number of unique typed motifs for each induced subgraph pattern is shown in Table 2.
We also provide the improvement (gain) achieved by TGS clustering over the other methods in Table 4. Note improvement is simply where is the external conductance of the solution given by algorithm and denotes the TGS algorithm. Values less than 1 indicate that TGS performed worse than the other method whereas values indicate the improvement factor achieved by TGS. Overall, TGS achieves a mean improvement of x over all graph data and baseline methods (Table 4). Note the last column of Table 4 reports the mean improvement achieved by TGS over all methods for each graph whereas the last row reports the mean improvement achieved by TGS over all graphs for each method. Figure 2 shows how typed graphlet conductance (Eq. 1) changes as a function of community size for three different typedgraphlets.
4.2. Link Prediction in Heterogeneous Graphs
This section quantitatively demonstrates the effectiveness of TGS for link prediction.
4.2.1. Higherorder TypedGraphlet Embeddings
In Section 4.1 we used the approach for higherorder clustering and quantitatively evaluated the quality of them. In this section, we use the approach proposed in Section 2 to derive higherorder typedgraphlet node embeddings and quantitatively evaluate them for link prediction. Algorithm 2 summarizes the method for deriving higherorder typed motifbased node embeddings (as opposed to clusters/partitions of nodes, or an ordering for compression/analysis, see Section 4.3). In particular, given a typedgraphlet adjacency matrix, Algorithm 2 outputs a matrix of node embeddings. For graphs with many connected components, Algorithm 2 is called for each connected component of and the resulting embeddings are stored in the appropriate locations in the overall embedding matrix .
Graph  Heterogeneous Edge Types  

movielens  3  userbymovie, userbytag 
tagbymovie  
dbpedia  4  personbywork (produced work), 
personhasoccupation,  
workbygenre (workassociatedwithgenre)  
yahoomsg  2  userbyuser (communicated with), 
userbylocation (communication location) 
4.2.2. Experimental Setup
We evaluate the higherorder typedgraphlet node embedding approach (Algorithm 2) against the following methods: DeepWalk (DW) (Perozzi et al., 2014), LINE (Tang et al., 2015), GraRep (Cao et al., 2015), spectral embedding (untyped edge motif) (Ng et al., 2002), and spectral embedding using untypedgraphlets. All methods output (=128)dimensional node embeddings where . For DeepWalk (DW) (Perozzi et al., 2014), we perform 10 random walks per node of length 80 as mentioned in (Grover and Leskovec, 2016). For LINE (Tang et al., 2015), we use 2ndorder proximity and perform 60 million samples. For GraRep (GR) (Cao et al., 2015), we use
. In contrast, the spectral embedding methods do not have any hyperparameters besides
which is fixed for all methods. As an aside, all methods used for comparison were modified to support heterogeneous graphs (similar to how the other baseline methods from Section 4.1 were modified). In particular, we adapted the methods to allow multiple graphs as input consisting of homogeneous or bipartite graphs that all share at least one node type (See Table 5 and Figure 1) and from these graphs we construct a single large graph by simply ignoring the node and edge types and relabeling the nodes to avoid conflicts.4.2.3. Comparison
Given a partially observed graph with a fraction of missing/unobserved edges, the link prediction task is to predict these missing edges. We generate a labeled dataset of edges. Positive examples are obtained by removing of edges uniformly at random, whereas negative examples are generated by randomly sampling an equal number of node pairs
. For each method, we learn embeddings using the remaining graph. Using the embeddings from each method, we then learn a logistic regression (LR) model to predict whether a given edge in the test set exists in
or not. Experiments are repeated for 10 random seed initializations and the average performance is reported. All methods are evaluated against four different evaluation metrics including
, Precision, Recall, and AUC.DW  LINE  GR  Spec  GSpec  TGS  
movielens  0.8544  0.8638  0.8550  0.8774  0.8728  0.9409  
Prec.  0.9136  0.8785  0.9235  0.9409  0.9454  0.9747  
Recall  0.7844  0.8444  0.7760  0.8066  0.7930  0.9055  
AUC  0.9406  0.9313  0.9310  0.9515  0.9564  0.9900  
dbpedia  0.8414  0.7242  0.7136  0.8366  0.8768  0.9640  
Prec.  0.8215  0.7754  0.7060  0.7703  0.8209  0.9555  
Recall  0.8726  0.6375  0.7323  0.9669  0.9665  0.9733  
AUC  0.8852  0.8122  0.7375  0.9222  0.9414  0.9894  
yahoo  0.6927  0.6269  0.6949  0.9140  0.8410  0.9303  
Prec.  0.7391  0.6360  0.7263  0.9346  0.8226  0.9432  
Recall  0.5956  0.5933  0.6300  0.8904  0.8699  0.9158  
AUC  0.7715  0.6745  0.7551  0.9709  0.9272  0.9827  
Note DW=DeepWalk and GR=GraRep. 
For link prediction (LibenNowell and Kleinberg, 2003; Ahmed et al., 2018), entity resolution/network alignment, recommendation and other machine learning tasks that require edge embeddings (features) (Rossi et al., 2018a)
, we derive edge embedding vectors by combining the learned node embedding vectors of the corresponding nodes using an edge embedding function
. More formally, given dimensional embedding vectors and for node and , we derive a dimensional edge embedding vector where is defined as one of the following edge embedding functions:Note is the elementwise (Hadamard) product, is the Hadamard power, and is the elementwise max.
Table 5 summarizes the heterogeneous network data used for link prediction. In particular, the types used in each of the heterogeneous networks are shown in Table 5 as well as the specific types involved in the edges that are predicted (e.g., the edge type being predicted). The results are provided in Table 6. Results are shown for the best edge embedding function. In Table 6, TGS is shown to outperform all other methods across all four evaluation metrics. In all cases, the higherorder typedgraphlet spectral embedding outperforms the other methods (Table 6) with an overall mean gain (improvement) in of 18.7% (and up to 48.4% improvement) across all graph data. In terms of AUC, TGS achieves a mean gain of 14.4% (and up to 45.7% improvement) over all methods. We posit that an approach similar to the one proposed in (Rossi et al., 2018a) could be used with the typedgraphlet node embeddings to achieve even better predictive performance. This approach would allow us to leverage multiple typedgraphlet Laplacian matrices for learning more appropriate higherorder node embeddings.
4.3. Graph Compression
In Section 4.1 we used the approach for higherorder clustering whereas Section 4.2 demonstrated the effectiveness of the approach for link prediction. However, the framework can be leveraged for many other important applications including graph compression (Boldi and Vigna, 2004; Boldi et al., 2011; Chierichetti et al., 2009; Rossi and Zhou, 2018; Liakos et al., 2014). In this section, we explore the proposed approach for graph compression. Compression has two key benefits. First, it reduces the amount of IO traffic (Rossi and Zhou