Higher-Order Clustering in Heterogeneous Information Networks

11/28/2018 ∙ by Yu Shi, et al. ∙ University of Illinois at Urbana-Champaign 0

As one type of complex networks widely-seen in real-world application, heterogeneous information networks (HINs) often encapsulate higher-order interactions that crucially reflect the complex nature among nodes and edges in real-world data. Modeling higher-order interactions in HIN facilitates the user-guided clustering problem by providing an informative collection of signals. At the same time, network motifs have been used extensively to reveal higher-order interactions and network semantics in homogeneous networks. Thus, it is natural to extend the use of motifs to HIN, and we tackle the problem of user-guided clustering in HIN by using motifs. We highlight the benefits of comprehensively modeling higher-order interactions instead of decomposing the complex relationships to pairwise interaction. We propose the MoCHIN model which is applicable to arbitrary forms of HIN motifs, which is often necessary for the application scenario in HINs due to their rich and diverse semantics encapsulated in the heterogeneity. To overcome the curse of dimensionality since the tensor size grows exponentially as the number of nodes increases in our model, we propose an efficient inference algorithm for MoCHIN. In our experiment, MoCHIN surpasses all baselines in three evaluation tasks under different metrics. The advantage of our model when the supervision is weak is also discussed in additional experiments.



There are no comments yet.


page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Heterogeneous information network (HIN) has been shown to be a powerful approach to model linked objects in real-world scenarios with rich and informative type information (Shi et al., 2017; Sun and Han, 2013)

. Many HIN-based methodologies have been proposed for applications such as classification, clustering, recommendation, and outlier detection 

(Shi et al., 2017; Sun and Han, 2013). Meanwhile, complex real-world networks often embody mechanism backed by the underlying higher-order interactions (Sporns and Kötter, 2004; Pržulj, 2007; Ugander et al., 2013; Benson et al., 2016; Milo et al., 2002; Yaveroğlu et al., 2014; Benson et al., 2018), where the “players” in the interactions are nodes in the network. Researchers have since been using network motifs to reveal such higher-order interactions. Leveraging motifs is shown to be useful in tasks such as clustering (Yin et al., 2017; Benson et al., 2016), ranking (Zhao et al., 2018) and representation learning (Sankar et al., 2017; Zhang et al., 2018). Note that the term higher-order interaction is sometimes used interchangeably with high-order interaction in the literature (Zhou et al., 2017b), and the problem of clustering using signals from higher-order interactions is referred to as higher-order clustering (Yin et al., 2017; Benson et al., 2016; Yin et al., 2018). We also remark that motifs in the context of HINs are sometimes referred to as the meta-graphs, and we choose motifs over meta-graphs in this paper primarily because meta-graph has been used under a different definition in the study of clustering (Strehl and Ghosh, 2002; Mimaroglu and Erdil, 2011; Punera and Ghosh, 2007) as to be discussed in Section 2.

Figure 1. Overview of the proposed method MoCHIN that directly models all players in higher-order interactions. Each type of nodes in the HIN corresponds to a color and a shape in the figure. To leverage signals from higher-order interactions without collapsing them into pairwise interactions, MoCHIN transcribes such information into a series of tensors. The order of each tensor is identical to the number of nodes in the corresponding motif. The tensors constructed in this way are sparse. In the task of user-guided clustering in HINs, the tensors constructed as such can provide a rich pool of fine-grained semantics, which can thereby fit a wider spectrum of guidance provided by different users.

Clustering is a traditional and fundamental task in network mining (Han et al., 2011). In the context of HINs, the problem of user-guided clustering is particularly of interest, because HINs with nodes and edges of different types can have multiple semantic facets and user guidance on the intended semantic facet is often needed to generate more specific and meaningful clustering results (Sun et al., 2012; Shi et al., 2017; Luo et al., 2014; Jiang et al., 2017; Gujral and Papalexakis, 2018). Exploiting higher-order interactions revealed by motifs offers us the opportunity to provide better solutions to this important problem since it could potentially generate a richer pool of signals reflecting the rich semantics of an HIN to better fit different users’ guidance.

However, it is challenging to develop a principled HIN clustering method that exploits signals from higher-order interaction revealed by motifs as comprehensively as possible. This is because most network clustering algorithms are based on signals concerning the relatedness between each pair of nodes (Han et al., 2011). While a body of research has shown that it is beneficial to derive a set of features for each node pair using motifs (Huang et al., 2016; Fang et al., 2016; Zhao et al., 2017; Zhou et al., 2017a; Jiang et al., 2017; Liu et al., 2018b) and one can subsequently obtain clustering by traditional pairwise methods, this approach essentially collapses higher-order interaction into pairwise interactions, which is an irreversible process. Such irreversible process is not always desirable as it could cause information loss. For example, consider a motif instance involving three players – nodes A, B, and C. After collapsing the higher-order interaction among A, B, and C into pairwise interactions, we are still able to sense the tie between A and C. However, such a tie between A and C should have an essential semantic facet depending on B, which encodes the essential information about the higher-order interaction. The connection between this essential semantic facet and node B would be lost if only the collapsed pairwise interactions were modeled. In user-guided clustering, the information only accessible through the concrete node B can be critical in determining whether A and C should be clustered together. We will further discuss this point by real-world example in Section 4 and experiments in Section 7.

Furthermore, while it is easy to come up with semantically meaningful HIN motifs as with meta-paths (Huang et al., 2016; Fang et al., 2016; Sun et al., 2012), motifs in HINs can have more complex topology compared to motifs in homogeneous networks do, which are very often restricted to be triadic (Benson et al., 2016; Yin et al., 2017; Paranjape et al., 2017). In order to fully unleash the power of HIN motifs and leverage the signals extracted by them, we are motivated to propose a method that applies to arbitrary forms of HIN motifs without additional constraints.

To tackle these challenges, we propose to directly model the higher-order interactions by first comprehensively transcribing them via motifs into a series of tensors. In this way, all nodes involved in the higher-order interactions will contribute to the effort in finding clusters. Based on this intuition, we propose the MoCHIN model, short for Motif-based Clustering in HINs, with an overview illustrated in Figure 1. MoCHIN first transcribes information revealed by motifs into a series of tensors and then performs clustering by joint non-negative tensor decomposition with an additional mechanism to reflect user guidance. This approach does not rely on the pairwise clustering methods and can hence better retain the information captured by different motifs to suit the needs of user-guided clustering in the semantic-rich HINs.

In this direction, an additional challenge arises from inducing tensors via corresponding motifs, because the size of the tensor grows exponentially as the number of nodes involved in the motif increases. Fortunately, motif instances are often sparse in real-world networks just as the number of edges is usually significantly smaller than the number of node pairs in a large real-world network, and this observation is to be corroborated in Section 7.6. We hence develop an efficient inference algorithm taking advantage of the sparsity of the derived tensors and the structure of the proposed MoCHIN model. Two real-world datasets and three tasks validate the effectiveness of the proposed model and the inference algorithm. We will release the codes and the processed data used in the experiment once the paper is published. Lastly, we summarize our contributions as follows:

  1. We identify the utility of modeling higher-order interaction without collapsing it into pairwise interactions to avoid losing the rich and subtle information captured by motifs.

  2. We propose the MoCHIN model that captures higher-order interaction via motif-based comprehensive transcription and applies to arbitrarily many, arbitrary forms of HIN motifs. We also develop an efficient inference algorithm for MoCHIN, leveraging the sparse nature of motif instances in real-world networks.

  3. Experiments on two real-world HINs and three tasks demonstrated the effectiveness and efficiency of the proposed method as well as the utility of motifs and the tensor-based modeling approach in the task of HIN clustering.

2. Related Work

Network motifs. The formation of complex networks is often partially attributed to the higher-order interactions among objects in real-world scenarios (Benson et al., 2016; Milo et al., 2002; Yaveroğlu et al., 2014; Benson et al., 2018). The modeling of such interactions has been shown to be useful in many research areas such as neuroscience (Sporns and Kötter, 2004), biological networks (Pržulj, 2007), and social networks (Ugander et al., 2013). Network motifs, or graphlets, are usually used to identify such higher-order interaction (Yin et al., 2017; Benson et al., 2016). One popular direction of research on network motifs has centered on efficiently counting motif instances such as triangles and more complex motifs (Ahmed et al., 2015; Bressan et al., 2017; Stefani et al., 2017; Jha et al., 2015). Applications of motifs have also been found in tasks such as network partition and clustering (Yin et al., 2017; Benson et al., 2016; Klymko et al., 2014; Tsourakakis et al., 2017) as well as ranking (Zhao et al., 2018). Researchers have also studied enriching motif with additional attributes, such as temporal information, which has been shown to be instrumental in various network mining tasks (Paranjape et al., 2017; Li et al., 2018).

Motifs in heterogeneous information networks. In the context of HINs, network motifs are sometimes referred to as meta-graphs or meta-structures and have been studied recently (Sankar et al., 2017; Huang et al., 2016; Fang et al., 2016; Jiang et al., 2017; Zhao et al., 2017; Zhang et al., 2018; Fionda and Pirrò, 2017; Zhou et al., 2017a; Liu et al., 2018b; Liu et al., 2018a). A large portion of these works studies pairwise relationship such as relevance or similarity (Huang et al., 2016; Fang et al., 2016; Zhao et al., 2017; Fionda and Pirrò, 2017; Zhou et al., 2017a; Liu et al., 2018b; Liu et al., 2018a), and some other addresses the problem of representation learning (Sankar et al., 2017; Zhang et al., 2018). Note that some of these prior works define meta-graphs or meta-structures to be directed acyclic graphs (Zhao et al., 2017; Huang et al., 2016; Zhang et al., 2018; Zhou et al., 2017a)

, while we do not enforce this restriction on the definition of HIN motifs in general. We also remark that the term “meta-graph” is sometimes defined as a derived graph with indicator vectors being its vertices (nodes) 

(Strehl and Ghosh, 2002), and clustering problem based on this definition of meta-graph has been studied for more than a decade (Strehl and Ghosh, 2002; Mimaroglu and Erdil, 2011; Punera and Ghosh, 2007). Therefore, we stick to the term “motif” to refer to the higher-order structural pattern of interest in this paper.

Clustering in heterogeneous information networks As a fundamental data mining problem, clustering has been studied for HINs (Shi et al., 2017; Sun and Han, 2013; Sun et al., 2009; Sun et al., 2009; Shi et al., 2014; Li et al., 2017; Sun et al., 2012, 2012). One line of HIN clustering study leverages the synergetic effect of simultaneously tackling ranking and clustering (Sun et al., 2009; Sun et al., 2009; Shi et al., 2014; Chen et al., 2015). Clustering on specific types of HINs such as those with additional attributes has also been studied (Li et al., 2017; Sun et al., 2012). As in our paper, Wu et al. (Wu et al., 2017) resort to tensors to represent HINs for clustering. Their solution employs one tensor to describe one HIN as a whole and does not model different semantics implied by different structural patterns.

User guidance, or semi-supervision, brings significantly more potentials to HIN clustering by providing a small portion of seeds. (Sun et al., 2012; Shi et al., 2017). This is because HINs often carry rich semantics from different facets, and user-guided clustering enables users to inject intention on the semantics of clustering results. To reveal the different semantics in an HIN, pioneering works exploit the meta-path, a special case of the motif, and reflect user-guidance by using the corresponding meta-paths (Sun et al., 2012; Luo et al., 2014).

To the best of our knowledge, there are no existing studies on motif-based HIN clustering applicable to arbitrarily many, arbitrary forms of HIN motifs. A meta-graph–guided random walk algorithm is shown to have superior performance than the performance of only using meta-paths, which requires the used motifs to have undirected edges to ensure symmetry in the transition matrix (Jiang et al., 2017). Also, this method does not distinguish the semantic of motif AP4TPA from that of APTPA, as to be shown in Section 4, due to the design on how a random walk is sampled under a motif. Sankar et al. (Sankar et al., 2017)

propose a convolutional neural network method based on motifs which can potentially be used for user-guided HIN clustering. This approach restricts the motifs of concern to those with a target node, a context node, and auxiliary nodes. Gujral et al. 

(Gujral and Papalexakis, 2018) propose a method based on tensor constructed from stacking a set of adjacency matrices, which can successfully reflect user guidance and different semantic aspects. While in practice, one can derive an adjacency matrix by counting instances under one meta-path or higher-order motif, this tensor-based method essentially leverages features derived for node pairs, instead of directly modeling higher-order interactions among multiple nodes.

Matrix and tensor factorization for clustering. By factorizing edges that represent pairwise interactions in a network, matrix factorization has been shown to be able to reveal the underlying composition of objects (Lee and Seung, 1999). In this direction, a large body of study has been carried out on clustering networks using non-negative matrix factorization (NMF) (Liu et al., 2013; Lee and Seung, 2001; Ding et al., 2006). As a natural extension beyond pairwise interaction, tensor has been used to model interaction among multiple objects for decades (Tucker, 1966; Harshman, 1970)

. A wide range of applications have also been discussed in the field of data mining and machine learning 

(Papalexakis et al., 2017; Kolda and Bader, 2009).

For the study of clustering and related issues, many algorithms have been developed for homogeneous networks by factorizing a single tensor (Shashua and Hazan, 2005; Cao et al., 2016; Sheikholeslami et al., 2016; Benson et al., 2015; Cao et al., 2015). A line of work transforms a network to a -rd order tensor via triangles, which is essentially one specific type of network motif (Sheikholeslami et al., 2016; Benson et al., 2015). Researchers have also explored weak supervision in guiding tensor factorization based analysis (Cao et al., 2016)

. A large number of non-negative tensor factorization methods have been proposed for practical problems in computer vision 

(Shashua and Hazan, 2005). Besides, tensor-based approximation algorithms for clustering also exist in the literature (Sutskever et al., 2009; Cao et al., 2015). One recent work on local network clustering considering higher-order conductance shares our intuition since it operates on tensor transcribed by a motif without decomposing into pairwise interactions (Zhou et al., 2017b). This method is designed for the scenario where one motif is given. Different from the approach proposed in our paper, all the above methods are not designed for heterogeneous information networks, where the use of multiple motifs is usually necessary to reflect the rich semantics in HINs. Finally, we remark that to the best of our knowledge existing tensor-based clustering methods for HINs (Gujral and Papalexakis, 2018; Wu et al., 2017) either do not jointly model multiple motifs or would essentially decompose the higher-order interactions into pairwise interactions.

3. Preliminaries

In this section, we define related concepts and notations.

Definition 3.1 (Heterogeneous information network and schema  (Sun and Han, 2013)).

An information network is a directed graph with a node type mapping and an edge type mapping . When the number of node types or the number of edge types , the network is referred to as a heterogeneous information network (HIN). The schema of an HIN is an abstraction of the meta-information of the node types and edge types of the given HIN.

As an example, Figure 1(a) illustrates the schema of the DBLP network we are to use in Section 7. Additionally, we denote the set of all nodes with the same type by .

(a) The schema (Shi et al., 2018).
(b) The AP4TPA motif.
(c) The APTPA motif, which is also a meta-path.
Figure 2. Examples of schema and motif in the DBLP network.
Definition 3.2 (HIN motif and HIN motif instance).

In an HIN , an HIN motif is a structural pattern defined by a graph on the type level with its node being a node type of the original HIN and an edge being an edge type of the given HIN. Additional constraints can be optionally added such as two nodes in the motif cannot be simultaneously matched to the same node instance in the given HIN. Further given an HIN motif, an HIN motif instance under this motif is a subnetwork of the HIN that matches this pattern.

Figure 1(b) gives an example of a motif in the DBLP network with four distinct terms, which we refer to as . If a motif is a path graph, it could also be considered as a meta-path (Sun et al., 2011, 2012). The motif, , in Figure 1(c) is one such example.

Definition 3.3 (Tensor, -mode product, mode- matricization (Papalexakis et al., 2017)).

A tensor is a multidimensional array. For an -th–order tensor , we denote its entry by . The -mode product of and a matrix is denoted by , where , and . We denote matrix the mode- matricization, i.e., mode- unfolding, of the tensor , where the -th column of is obtained by vectorizing the -th order tensor with on the -th index.

For simplicity, we denote . Additionally, we define , where is the Kronecker product (Papalexakis et al., 2017).

Lastly, we introduce a useful lemma that converts the norm of the difference between two tensors to that between two matrices.

Lemma 3.4 ((De Lathauwer et al., 2000) ).

For all ,

where is the Frobenius norm.

Figure 3. A subnetwork of DBLP containing Eric Xing, David Blei, Hualiang Zhuang, Chengkai Li, and Pascual Martinez. According to the ground truth data, Xing and Blei were graduated from the same research group. While many path instances can be observed between Eric Xing and authors from other groups, motif instances can only be found between Eric Xing and David Blei. Moreover, the concrete terms and papers involved in these motif instances are also informative if users wish to cluster authors from the same research group together.

4. Higher-Order Interaction in Real-World Dataset.

In this section, we use a real-world example to motivate the design of our proposed method that aims to comprehensively and directly model higher-order interaction revealed by motifs.

DBLP is a bibliographical network in the computer science domain (Tang et al., 2008) that contains nodes with type author, paper, term, etc. In Figure 3, we plot out a subnetwork involving five authors: Eric Xing, David Blei, Hualiang Zhuang, Chengkai Li, and Pascual Martinez. According to a set of ground truth labels, Xing and Blei graduated from the same research group, while the other three authors graduated from other groups. Under meta-path , one would be able to find many path instances from Eric Xing to authors from multiple groups. However, if we use motif , motif instances can only be found over Eric Xing and David Blei, but not between Xing and authors from other groups. This implies motifs can provide more subtle information than meta-paths do, and if a user wishes to cluster authors by research groups, motif can be very informative.

Furthermore, if we look into the motif instances that are matched to Xing and Blei, the involved terms such as dirichlet are very specific to their group’s research interest. Modeling the interaction among dirichlet and other nodes can kick in more information even if users ultimately only wish to obtain clustering results on authors. If one only used motifs to generate features for node pairs without more comprehensive modeling of the higher-order interaction revealed by motifs, such information would be lost. In Section 7, we will further quantitatively validate the utility of comprehensive modeling of the higher-order interaction.

5. The MoCHIN model

In this section, we describe the proposed MoCHIN model step by step with an emphasis on its intention to more comprehensively model higher-order interaction while availing user guidance.

5.1. Revisit on Clustering by Non-Negative Matrix Factorization

Non-negative matrix factorization (NMF) has been a popular method for the problem of network clustering (Liu et al., 2013; Lee and Seung, 2001; Ding et al., 2006). While additional constraints or regularization terms are usually enforced to ensure unique solution or certain other properties, the basic NMF-based clustering algorithm solves the following optimization problem for given adjacency matrix


where is the Frobenius norm, denotes matrix is non-negative, and , are two matrices with being the number of clusters. In this model, the -th column of or that of gives the inferred cluster membership of the -th node in the network. The intuition of the model stems from using the inner product of the cluster membership of the -th node and that of the -th node to reconstruct the existence of the edge represented by non-zero entry in the adjacent matrix.

5.2. Single-Motif–Based Clustering in HINs

Recall that an edge essentially characterizes the pairwise interaction between two nodes. To model higher-order interaction revealed by motifs without first collapsing it into pairwise interactions in the problem of clustering, a natural solution is to use the inferred cluster membership of all nodes involved in a motif instance to reconstruct the existence of this motif instance. This solution can be formulated by non-negative tensor factorization (NTF), and a line of research on NTF itself (Papalexakis et al., 2017; Shashua and Hazan, 2005) and clustering algorithm by factorizing a single tensor (Sheikholeslami et al., 2016; Benson et al., 2015; Cao et al., 2015) can be found in the literature.

Specifically, given a single motif with nodes having node type , , …, of the HIN, we transcribe the higher-order interaction revealed by this motif to a -th–order tensor with dimension . We set the entry of to if a motif instance is found to be matched to the following nodes: -th of , -th of , …, -th of ; and set it to otherwise. By extending Eq. (1), whose objective can be equivalently written as with

being the identity matrix, we can approach the clustering problem by solving


where is the -th order identity tensor with dimension , is the entry-wise -1 norm introduced as regularization to avoid trivial solution, and is the regularization coefficient. We also note that this formulation is essentially CP decomposition (Hitchcock, 1927; Papalexakis et al., 2017) together with additional l- regularization and non-negative constraints. In this paper, we write the CP decomposition part of the formulation in a way different from its most common form for notation convenience in the inference section (Section 6) considering the presence of regularization and constraints.

Symbol Definition
, The set of nodes and the set of edges
, The set of node types and the set of edge types
, The node type mapping and the edge type mapping
The set of all nodes with type
The set of candidate motifs
The number of nodes in motif
The tensor constructed from motif
The seed mask matrix for node type
The cluster membership matrix
for the -th node in motif
The consensus matrix for node type
The vector of motif weights
The number of clusters
, ,

The hyperparameters

The mode-k product of a tensor and a matrix
The Kronecker product of two matrices
Table 1. Summary of symbols

5.3. Proposed Model for Motif-Based Clustering in HINs

Real-world HINs often contain rich and diverse semantic facets due to its heterogeneity (Sun et al., 2012; Shi et al., 2018; Sun and Han, 2013). To reflect the different semantic facets of an HIN, a set of more than one candidate motifs are usually necessary for the task of user-guided clustering. With additional clustering seeds provided by users, the MoCHIN model selects the motifs that are both meaningful and pertinent to the seeds.

To this end, we assign motif-specific weights , such that and for all . Denote the tensor constructed by motif , the cluster membership matrix for the -th node in motif , the number of nodes in motif , and the node type of the -th node in motif . For each node type of the HIN, we put together cluster membership matrices concerning this type with motif weights considered to construct the consensus matrix

where equals to if is true and 0 otherwise. With this notation, is simply the number of nodes in motif that are of type .

Furthermore, we intend to let (i) each cluster membership be close to its corresponding node-type–specific consensus matrix and (ii) the consensus matrices not assign seed nodes to the wrong cluster. We hence propose the following overall objective for the MoCHIN model with the third and the fourth term modeling the aforementioned two intentions


where is the Hadamard product and is the seed mask matrix for node type . Its entry if the -th node of type is a seed node and it should not be assigned to cluster according to user guidance, and otherwise.

Finally, solving the problem of HIN clustering by modeling higher-order interaction and automatically selecting motifs can be converted to solving the following problem


where is the standard simplex. To the best of our knowledge, there is no method similar to ours that simultaneously model multiple motifs in an HIN without decomposing higher-order interactions into pairwise interactions.

6. The Inference Algorithm

In this section, we first describe the algorithm for solving the optimization problem as in Eq. (4). Then, a series of speed-up tricks are introduced to circumvent the curse of dimensionality, where direct computation on the tensors would be problematic since a motif involving many nodes would induce a tensor with a formidably large number of entries.

6.1. Update and

Each clustering membership matrix with non-negative constraints is involved in all terms of the objective function (Eq. (3)), where and . We hence develop multiplicative update rules for that guarantees monotonic decrease at each step, accompanied by projected gradient descent (PGD) (Nesterov, 2013) to find global optimal of by exploiting its convexity. Overall, we solve the optimization problem by alternating between and .

To update when and are fixed under non-negative constraints, we derive the following theorem. For notation convenience, we further denote , where .

Theorem 6.1 ().

The following update rule for monotonically decreases the objective function.


where for any matrix , , .

Inspired by prior art on non-negative matrix factorization (Lee and Seung, 2001), we provide the proof for this theorem on non-negative tensor factorization as follows.


With the equivalency given by Lemma 3.4

we construct the following auxiliary function

Straightforward derivation can show the following three relations hold: (i) , (ii) , and (iii) is convex with respect to . Therefore, by setting , one can find is minimized at , where is the righthand side of Eq. (5), and . It follows that setting to monotonically decreases the objective function which is exactly the update rule in Theorem 6.1. ∎

For fixed , the objective function Eq. (3) is convex with respect to . We therefore use PGD to update by performing projection onto the standard simplex after each step of gradient descent, where the gradient can be derived with straightforward calculation, which we omit due to space limitations.

Input : {, supervision , the number of clusters , hyperparameters , , and
Output : the cluster membership matrices
1 begin
2       while not converged do
3             for  do
4                   while not converged do
5                         for  do
6                               Find local optimum of by Eq. (5).
9            Find global optimum of by PGD.
Algorithm 1 The MoCHIN inference algorithm

6.2. Computational Speed-Up

In this section, we describe a series of speed-up tricks, with which the complexity would be governed no longer by the dimension of the tensors but by the number of motif instances in the network.

Unlike the scenario where researchers solve the NTF problem with tensors of fixed order regardless of the applied dataset, our problem is specifically challenging because a motif can involve multiple nodes. For instance, the AP4TPA motif discussed in Section 4 is one real-world example involving nodes. Using this motif in the model would induce an -th order tensor. Consequently, the fact that the tensor size grows exponentially with the order of the tensor poses a special challenge to conducting motif-based clustering via tensor factorization.

In the proposed inference algorithm, the direct computation of three terms involve complexity subject to the size of the tensor: the first term in the numerator of Eq. (5), , the first term in the denominator of Eq. (5), , and the first term of the objective function Eq. (3), . Fortunately, the computation of all these terms can be significantly simplified by exploiting the sparsity of tensor () and the composition of dense matrix .

Consider the example that motif involves nodes, each node type has node instances, and the nodes are to be clustered into clusters. Then the induced dense matrix would have entries, and tensor would have entries. As a result, directly computing the first term in the numerator of Eq. (5), , would involve matrix multiplication of a dense entry matrix. However, given the sparsity of , one may denote the set of indices of the non-zero entries in tensor and derive the following equivalency

where is Hadamard product of a sequence and is one-hot column vector of size that has entry at index . Computing the right-hand side of this equivalency involves the summation over Hadamard product of a small sequence of small vectors, which has a complexity of with being the number of non-zero entries of a tensor. In other words, if the previous example comes with matched motif instances, the complexity would decrease from manipulating a -entry dense matrix to a magnitude of .

The first term in the denominator of Eq. (5) again involves matrix multiplication of the huge dense matrix . Leveraging the composition of , one can show that

As such, instead of multiplying a huge dense matrix, one may only compute Hadamard product and matrix multiplication over a few relatively small matrices. Note that in the previous example, has entries, while has only entries and .

Thirdly, evaluating the loss function Eq. (

3) for determining convergence involves the computation of the Frobenius norm of its first term, i.e., , which is a huge, dense tensor. Again by exploiting the desirable sparsity property of , we can calculate the Frobenius norm of as follows


This equivalency transform the computation of a dense and potentially high-order tensor to that of a sparse tensor accompanied by a couple of matrix manipulation. The complexity of the first and the second term in the above formula are and , respectively, thanks to the sparsity of . With the complexity of the third term being , the overall complexity is reduced from to . That is, considering the previous example, the complexity of evaluating this Frobenius norm would decrease from a magnitude of to a magnitude of .

It is worth noting that the trick introduced in the last equivalency, Eq. (6), has already been proposed in the study of Matricized Tensor Times Khatri-Rao Product (MTTKRP) (Bader and Kolda, 2007; Choi and Vishwanathan, 2014; Smith et al., 2015). MTTKRP and our model share a similarity in this trick because, unlike update rule Eq. (5), evaluating the loss function Eq. (3) does not involve the non-negative constraints.

Finally, we remark that the above computation can be highly parallelized, which has further promoted the efficiency of the proposed algorithm in our implementation. An empirical efficiency study on two datasets is to be presented in Section 7.6. We summarize the algorithm in Algorithm 1.

7. Experiments

In this section, we present the quantitative evaluation results on two real-world datasets through multiple tasks and carry out case studies to analyze the performance of the proposed MoCHIN model under various circumstances.

7.1. Datasets and Evaluation Tasks

Datasets. We use two real-world HINs for experiments.

  • DBLP is a heterogeneous information network that serves as a bibliography of researchers in computer science area (Tang et al., 2008). The network consists of 5 types of node: author (), paper (), key term (), venue () and year (). The key terms are extracted and released by Chen et al. (Chen and Sun, 2017). The edge types include authorship, term usage, venue published, year published, and the reference relationship. The first four edge types are undirected, and the last one is directed. The schema of the DBLP network is shown in Figure 1(a). In DBLP, we select two candidate motifs for all applicable methods, including and , where is a also a meta-path representing author writes a paper that refers another paper written by another author and was introduced in Section 4.

  • YAGO

    is a knowledge graph constructed by merging Wikipedia, GeoNames and WordNet. YAGO dataset consists of 7 types of nodes: person (

    P), organization (O), location (L), prize (R), work (W), position (S) and event (E). There are 24 types of edges in the network, with 19 undirected edge types and 5 directed edge types as shown by the schema of the YAGO network in Figure 4. In YAGO, the candidate motifs used by all compared methods include , , , , , where the first three are also meta-paths with the number in superscript being type of edge given in Figure 4. is the motif that 2 people simultaneously co-created (edge type ) two pieces of work, and is the motif that 3 people who created (edge type ), directed (edge type ) and acted (edge type ) in a piece of work, respectively.

Figure 4. The schema of YAGO (Shi et al., 2018).

Evaluation tasks. In order to validate the proposed model’s capability in reflecting different guidance given by different users, we use two sets of labels on authors to conduct two tasks in DBLP similar to previous study (Sun et al., 2012). Additionally, we design another task on YAGO with labels on persons. We will release datasets and labels used in the experiment once the paper is published. DBLP-group – Clustering authors to research groups where they graduated, which is an expanded label set from the “four-group dataset” (Sun et al., 2012). The “four-group dataset” includes researchers from four renowned research groups led by Christos Faloutsos, Michael I. Jordan, Jiawei Han, and Dan Roth. Additionally, we add another group of researchers, who have collaborated with at least one of the researchers in the “four-group dataset” and label them as the fifth group with the intention to involve more subtle semantics in the original HIN. of the authors with labels are randomly selected as seeds from user-guidance. We did not use for seed ratio as in the following two tasks because the number of authors to be clustered in this task is small. The resulted HIN processed as such consists of 19,500 nodes and 108,500 edges. DBLP-area – Clustering authors to research areas, which is expanded from the “four-area dataset” (Sun et al., 2012), where the definition of the 14 areas is derived from the Wikipedia page: List of computer science conferences111https://en.wikipedia.org/wiki/List of computer science conferences. of the authors with labels are randomly selected as seeds from user-guidance. The HIN processed in this way has 16,100 nodes and 30,239 edges. YAGO – Clustering people to 10 popular countries in the YAGO dataset. We knock out all edges with edge type wasBornIn, and if a person had an edge with one of the 10 countries, we assign this country to be the label of this person. Additionally, to avoid making our task trivial, we remove all other types of edges between person and location. of the people are randomly selected as seeds from user-guidance. There are 17,109 nodes and 70,251 edges in the processed HIN.

Evaluation metrics. We use three metrics to evaluate the quality of the clustering results generated by each model: Accuracy (Micro-F1), Macro-F1, and NMI. Accuracy refers to a measure of statistical bias. More precisely it is defined by the division of the number of correctly labeled data by the total size of the dataset. Note that in multi-class classification tasks, accuracy is always identical to Micro-F1. Macro-F1

refers to the arithmetic mean of the F1 score across all different labels in the dataset, where the F1 score is the harmonic mean of precision and recall for a specific label.

NMI is the abbreviation for normalized mutual information. Numerically, it is defined as the division of mutual information by the arithmetic mean of the entropy of each label in the data. For all these metrics, higher values indicate better performance.

Task DBLP-group DBLP-area YAGO
Metric Acc./Micro-F1 Macro-F1 NMI Acc./Micro-F1 Macro-F1 NMI Acc./Micro-F1 Macro-F1 NMI
KNN 0.4249 0.2566 0.1254 0.4107 0.4167 0.2537 0.3268 0.0921 0.0810
KNN+Motifs 0.4549 0.2769 0.1527 0.4811 0.4905 0.3296 0.3951 0.1885 0.1660
GNetMine (Ji et al., 2010) 0.5880 0.6122 0.3325 0.4847 0.4881 0.3469 0.3832 0.2879 0.1772
PathSelClus (Sun et al., 2012) 0.5622 0.5535 0.3246 0.4361 0.4520 0.3967 0.3856 0.3405 0.2864
MoCHIN 0.6910 0.6753 0.5486 0.5318 0.5464 0.4396 0.6134 0.5563 0.4607
Table 2. Quantitative evaluation on clustering results under multiple metrics in three tasks.

7.2. Baselines and Experiment setups

Baselines. We use four different baselines to obtain insight on different aspects of the performance of MoCHIN.

  • KNN is a classification algorithm where the label of each object in the test set is assigned to the most common one among the labels of its nearest neighbors. This is a homogeneous method that does not distinguish different node types. In our scenario, the distance between two nodes is defined as the length of the shortest path between them.

  • KNN+Motifs serves as a direct comparison to the proposed MoCHIN

    model, since KNN+Motifs can also use signals generated by arbitrary forms of motifs, but does not directly model all players in higher-order interactions. To extract information from motifs, we construct a motif-based network for each candidate motif, where an edge is constructed if two nodes are matched to a motif instance in the original HIN. The KNN algorithm is then applied again on each motif-based network. Finally, a linear combination is applied to the outcome probability matrices generated by KNN from the motif-based networks and the original HIN. Weights used in the linear combination are tuned to the best. When using this baseline method, we additionally add

    into the set of candidate motifs for both DBLP tasks and add , , and for the YAGO task.

  • GNetMine is a graph-based regularization framework to address the transductive classification problem in heterogeneous information networks (Ji et al., 2010). This method only leverages edge-level information without considering structural patterns such as meta-paths or motifs.

  • PathSelClus is a probabilistic graphical model that performs clustering tasks on heterogeneous networks by integrating meta-path selection with user-guided object clustering (Sun et al., 2012). When using this baseline method, we additionally add , , , , and, into the set of candidate meta-paths for both DBLP tasks as suggested by the original paper (Sun et al., 2012) and add , , and for YAGO task.

Experiment setups. For MoCHIN, we set hyperparameters , and across all tasks in our experiments. We also add each edge type as an edge-level motif into the set of candidate motifs for MoCHIN. For each baseline, we always tune its hyperparameters to achieve the best performance in each task.

7.3. Quantitative Evaluation Result

We quantitatively evaluate the effectiveness of the proposed MoCHIN model against the baselines and report the main results in Table 2. Overall, MoCHIN uniformly outperformed all baselines in all three tasks under all metrics. We remark that these three metrics measure different aspects of the model performance. One example is that, in the DBLP-area task, PathSelClus outperforms GNetMine under Macro-F1 and NMI, while GNetMine outperforms PathSelClus under Acc./Micro-F1. As a result, achieving superior performance uniformly under all metrics is strong evidence that MoCHIN with higher-order interaction directly modeled is armed with greater modeling capability in the task of user-guided HIN clustering.

MoCHIN exploits signals from motifs more comprehensively and achieves superior performance even with fewer motifs. Recall that both MoCHIN and the baseline KNN+Motifs exploit signals from motifs, while KNN+Motifs uses motifs to transform higher-order interactions into signals over node pairs. In our experiments, KNN+Motifs cannot get evaluation results as good as our method, which implies merely using motifs to generate pairwise features cannot fully exploit signals from motifs. In fact, the set of non–edge-level motifs used in the baseline is always a superset of that used in MoCHIN. We interpret this result as follows: although MoCHIN uses fewer motifs, by modeling all players in the higher order interaction, it implicitly captures information carried by other motifs, which justifies the use of a more complex model.

KNN is disadvantaged on imbalanced data when the supervision is weak. Across all the three tasks conducted in the experiments, the ground truth labels in the YAGO dataset are the most imbalanced. As presented in Table 2, KNN performs notably worse on YAGO with seed ratio under Macro-F1 and NMI, which are more sensitive to model performance on rare classes compared to Accuracy (Micro-F1). In other words, KNN tends to achieve inferior results on rare classes when supervision is weak, and data is imbalanced. We recommend using other heterogeneous methods that consider the type information in this scenario. The results in Section 7.5 can further validate this point, where experiments with varied seed ratio are presented.

The APPA motif. The AP4TPA motif.
Metric Acc./Micro-F1 Macro-F1 NMI Result for
Eric Xing
W/o both 0.6481 0.6307 0.5048
W/ APPA 0.6652 0.6529 0.4715
W/ AP4TPA 0.6738 0.6548 0.5293
Full model 0.6910 0.6753 0.5486
Table 3. Ablation study of the MoCHIN model on the DBLP-group task, conducted by optionally removing the non–edge-level motifs, and , from the full model.

7.4. Impact of Candidate Motif Choice

In this section, we conduct a case study on how the choice of candidate motifs impacts the performance of the proposed MoCHIN model and additionally use the concrete example in Figure 3 of Section 4 to understand the model outputs.

As introduced in Section 7.1, the candidate motifs MoCHIN used in the DBLP tasks are all edge types with two non–edge-level motifs: and . We conducted an ablation study by taking out either or both of the two non–edge-level motifs and compared with the original full model in the DBLP-group task, and the results are reported in Table 3. It can be seen that the full MoCHIN model outperformed all baselines. This result shows that using motifs does make a difference in clustering.

Moreover, we also looked into the concrete example in Figure 3 of Section 4 and check how each model outputted cluster assignment for Eric Xing. The result is also included in Table 3, which shows only model versions equipped with made the correct assignment on Eric Xing. This observation echos the intuition discussed in Section 4.

(a) Accuracy (Micro-F1).
(b) Macro-F1.
Figure 5. Quantitative evaluation on the YAGO task under varied seed ratio.

7.5. Varied Seed Ratio

In addition to using people as seeds for the YAGO task reported in Table 2, we additionally experiment under varied seed ratio , , and for YAGO. The results under Accuracy and Macro-F1 are reported in Figure 5. We omit NMI due to space limitations.

For all methods under all metrics, the performance increased as the seed ratio increased, which was a natural outcome of progressively stronger supervision. Additionally, MoCHIN still outperformed all baselines under all circumstances. Notably, the difference in performance between MoCHIN and the baselines shrank as seed ratio increased. This suggests when supervision is strong enough, the pairwise edge level signal can provide progressively sufficient information to obtain reasonable results. On the other hand, MoCHIN is particularly attractive when supervision is weak for being able to extract more subtle information from limited data.

7.6. Efficiency Study

In this section, we empirically evaluate the efficiency of the proposed algorithm with a focus on the speed-up tricks described in Section 6.2

. Specifically, we estimate the runtime for inferring all parameters involved in one motif while all other parameters are fixed, or equivalently, reaching convergence of the while-loop from line 4 to line 6 in Algorithm 


This study was conducted on both the DBLP dataset and the YAGO dataset for each of their respective non–edge-level motifs: APPA and AP4TPA in DBLP; 2P2W and 3PW in YAGO. The non–edge-level motifs are studied because (i) they are more complex in nature and (ii) the tensors induced by edge-level motifs are essentially matrices, the study of which degenerates to the well-studied case of non-negative matrix factorization. To downsample the HINs, we randomly knock out a portion of papers in DBLP or persons in YAGO. The involved edges and the nodes that become dangling after the knock-out are also removed from the network. The reason node type paper and person are used is that they are associated with the most diverse edge types in DBLP and YAGO, respectively. In the end, we obtain a series of HINs with , , . , of papers or persons left.

To more accurately evaluate the efficiency of the proposed algorithm in this study, we turn off the parallelization in our implementation and use only one thread. We record the wall-clock runtime for inferring all parameters of each concerned motif, , while fixing the motif weights and parameters of other motifs, . The experiment is executed on a machine with Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz. The result is reported in Figure 6.

The proposed algorithm empirically achieves near-linear efficiency in inferring parameters of each given motif. As presented in Figure 6, the runtime for all motifs on both datasets are approximately linear to the number of involved motif instances. This result is in line with the analysis provided in Section 6.2 and justifies the effectiveness of the speed-up tricks.

Moreover, we also reported the number of motif instances against the number of nodes regardless of type in each downsampled network. For all four studied motifs, we do observe motif instances are sparse and do not explode quickly as the size of the network increases.

(a) APPA in DBLP.
(b) AP4TPA in DBLP.
(c) 2P2W in YAGO.
(d) 3PW in YAGO.
Figure 6. Wall-clock runtime for inferring all parameters of one motif and the number of nodes against the number of motif instances in a series of downsampled HINs. The proposed algorithm empirically achieves near-linear efficiency, and motif instances are indeed sparse in HINs.

8. Conclusion and future works

We studied the problem of user-guided clustering in heterogeneous information networks with the intention to model higher-order interactions. To solve this problem, we identified that it is crucial to model higher-order interactions without collapsing them into pairwise interactions in order to avoid losing the rich and subtle information. Based on this intuition, we proposed the MoCHIN model, which models higher-order interaction more comprehensively and applies to arbitrary forms of motifs. An inference algorithm with computational speed-up techniques was also developed. Experiments validated the effectiveness of the proposed model and the utility of comprehensively modeling higher-order interactions without collapsing them into pairwise relation.

Future works include exploring further methodologies to join signals from multiple motifs, which is currently realized by a simple linear combination in the MoCHIN model. Furthermore, as the current model takes user guidance by injecting labels of the seeds, it is also of interest to extend MoCHIN to the scenario where guidance is made available by providing must-link and cannot-link constraints on node pairs.


  • (1)
  • Ahmed et al. (2015) Nesreen K Ahmed, Jennifer Neville, Ryan A Rossi, and Nick Duffield. 2015. Efficient graphlet counting for large networks. In ICDM.
  • Bader and Kolda (2007) Brett W Bader and Tamara G Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205–231.
  • Benson et al. (2018) Austin R Benson, Rediet Abebe, Michael T Schaub, Ali Jadbabaie, and Jon Kleinberg. 2018. Simplicial closure and higher-order link prediction. arXiv preprint arXiv:1802.06916 (2018).
  • Benson et al. (2015) Austin R Benson, David F Gleich, and Jure Leskovec. 2015.

    Tensor spectral clustering for partitioning higher-order network structures. In

    Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, 118–126.
  • Benson et al. (2016) Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science 353, 6295 (2016), 163–166.
  • Bressan et al. (2017) Marco Bressan, Flavio Chierichetti, Ravi Kumar, Stefano Leucci, and Alessandro Panconesi. 2017. Counting graphlets: Space vs time. In WSDM.
  • Cao et al. (2016) Bokai Cao, Chun-Ta Lu, Xiaokai Wei, S Yu Philip, and Alex D Leow. 2016. Semi-supervised tensor factorization for brain network analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 17–32.
  • Cao et al. (2015) Xiaochun Cao, Xingxing Wei, Yahong Han, and Dongdai Lin. 2015. Robust face clustering via tensor decomposition. IEEE transactions on cybernetics 45, 11 (2015), 2546–2557.
  • Chen et al. (2015) Junxiang Chen, Wei Dai, Yizhou Sun, and Jennifer Dy. 2015. Clustering and ranking in heterogeneous information networks via gamma-poisson model. In ICDM.
  • Chen and Sun (2017) Ting Chen and Yizhou Sun. 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification. In WSDM. ACM.
  • Choi and Vishwanathan (2014) Joon Hee Choi and S Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In Advances in Neural Information Processing Systems. 1296–1304.
  • De Lathauwer et al. (2000) Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000.

    A multilinear singular value decomposition.

    SIMAX (2000).
  • Ding et al. (2006) Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006. Orthogonal nonnegative matrix t-factorizations for clustering. In KDD.
  • Fang et al. (2016) Yuan Fang, Wenqing Lin, Vincent W Zheng, Min Wu, Kevin Chen-Chuan Chang, and Xiao-Li Li. 2016. Semantic Proximity Search on Graphs with Metagraph-based Learning. In ICDE. IEEE.
  • Fionda and Pirrò (2017) Valeria Fionda and Giuseppe Pirrò. 2017. Meta Structures in Knowledge Graphs. In International Semantic Web Conference. Springer, 296–312.
  • Gujral and Papalexakis (2018) Ekta Gujral and Evangelos E Papalexakis. 2018. SMACD: Semi-supervised Multi-Aspect Community Detection. In ICDM.
  • Han et al. (2011) Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data mining: concepts and techniques. Elsevier.
  • Harshman (1970) Richard A Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an” explanatory” multimodal factor analysis. (1970).
  • Hitchcock (1927) Frank L Hitchcock. 1927. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics (1927).
  • Huang et al. (2016) Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta Structure: Computing Relevance in Large Heterogeneous Information Networks. In KDD. ACM.
  • Jha et al. (2015) Madhav Jha, C Seshadhri, and Ali Pinar. 2015. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW.
  • Ji et al. (2010) Ming Ji, Yizhou Sun, Marina Danilevsky, Jiawei Han, and Jing Gao. 2010. Graph regularized transductive classification on heterogeneous information networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 570–586.
  • Jiang et al. (2017) He Jiang, Yangqiu Song, Chenguang Wang, Ming Zhang, and Yizhou Sun. 2017. Semi-supervised learning over heterogeneous information networks by ensemble of meta-graph guided random walks. In AAAI.
  • Klymko et al. (2014) Christine Klymko, David Gleich, and Tamara G Kolda. 2014. Using triangles to improve community detection in directed networks. arXiv preprint arXiv:1404.5874 (2014).
  • Kolda and Bader (2009) Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455–500.
  • Lee and Seung (1999) Daniel D Lee and H Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788.
  • Lee and Seung (2001) Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems. 556–562.
  • Li et al. (2017) Xiang Li, Yao Wu, Martin Ester, Ben Kao, Xin Wang, and Yudian Zheng. 2017. Semi-supervised clustering in attributed heterogeneous information networks. In WWW.
  • Li et al. (2018) Yuchen Li, Zhengzhi Lou, Yu Shi, and Jiawei Han. 2018. Temporal Motifs in Heterogeneous Information Networks. In MLG.
  • Liu et al. (2013) Jialu Liu, Chi Wang, Jing Gao, and Jiawei Han. 2013. Multi-view clustering via joint nonnegative matrix factorization. In SDM, Vol. 13. SIAM, 252–260.
  • Liu et al. (2018a) Zemin Liu, Vincent W Zheng, Zhou Zhao, Zhao Li, Hongxia Yang, Minghui Wu, and Jing Ying. 2018a. Interactive Paths Embedding for Semantic Proximity Search on Heterogeneous Graphs. In KDD.
  • Liu et al. (2018b) Zemin Liu, Vincent W Zheng, Zhou Zhao, Fanwei Zhu, Kevin Chen-Chuan Chang, Minghui Wu, and Jing Ying. 2018b. Distance-aware dag embedding for proximity search on heterogeneous graphs. AAAI.
  • Luo et al. (2014) Chen Luo, Wei Pang, and Zhe Wang. 2014. Semi-supervised clustering on heterogeneous information networks. In PAKDD.
  • Milo et al. (2002) Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science 298, 5594 (2002), 824–827.
  • Mimaroglu and Erdil (2011) Selim Mimaroglu and Ertunc Erdil. 2011. Combining multiple clusterings using similarity graph. Pattern Recognition 44, 3 (2011), 694–703.
  • Nesterov (2013) Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media.
  • Papalexakis et al. (2017) Evangelos E Papalexakis, Christos Faloutsos, and Nicholas D Sidiropoulos. 2017. Tensors for data mining and data fusion: Models, applications, and scalable algorithms. TIST 8, 2 (2017), 16.
  • Paranjape et al. (2017) Ashwin Paranjape, Austin R Benson, and Jure Leskovec. 2017. Motifs in temporal networks. In WSDM.
  • Pržulj (2007) Nataša Pržulj. 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183.
  • Punera and Ghosh (2007) Kunal Punera and Joydeep Ghosh. 2007. Soft cluster ensembles. Advances in fuzzy clustering and its applications (2007), 69–90.
  • Sankar et al. (2017) Aravind Sankar, Xinyang Zhang, and Kevin Chen-Chuan Chang. 2017. Motif-based Convolutional Neural Network on Graphs. arXiv preprint arXiv:1711.05697 (2017).
  • Shashua and Hazan (2005) Amnon Shashua and Tamir Hazan. 2005. Non-negative tensor factorization with applications to statistics and computer vision. In ICML.
  • Sheikholeslami et al. (2016) Fatemeh Sheikholeslami, Brian Baingana, Georgios B Giannakis, and Nikolaos D Sidiropoulos. 2016. Egonet tensor decomposition for community identification. In Signal and Information Processing (GlobalSIP), 2016 IEEE Global Conference on. IEEE, 341–345.
  • Shi et al. (2017) Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. 2017. A survey of heterogeneous information network analysis. TKDE 29, 1 (2017), 17–37.
  • Shi et al. (2014) Chuan Shi, Ran Wang, Yitong Li, Philip S Yu, and Bin Wu. 2014. Ranking-based clustering on general heterogeneous information networks by network projection. In CIKM.
  • Shi et al. (2018) Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, and Jiawei Han. 2018. Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks. In KDD.
  • Smith et al. (2015) Shaden Smith, Niranjay Ravindran, Nicholas D Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and parallel sparse tensor-matrix multiplication. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 61–70.
  • Sporns and Kötter (2004) Olaf Sporns and Rolf Kötter. 2004. Motifs in brain networks. PLoS biology 2, 11 (2004), e369.
  • Stefani et al. (2017) Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, and Eli Upfal. 2017. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD 11, 4 (2017), 43.
  • Strehl and Ghosh (2002) Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. JMLR 3, Dec (2002), 583–617.
  • Sun et al. (2012) Yizhou Sun, Charu C Aggarwal, and Jiawei Han. 2012. Relation strength-aware clustering of heterogeneous information networks with incomplete attributes. Proceedings of the VLDB Endowment (2012).
  • Sun and Han (2013) Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explorations 14, 2 (2013), 20–28.
  • Sun et al. (2011) Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992–1003.
  • Sun et al. (2009) Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng, and Tianyi Wu. 2009. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In EDBT.
  • Sun et al. (2012) Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. 2012. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In KDD.
  • Sun et al. (2009) Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD. ACM, 797–806.
  • Sutskever et al. (2009) Ilya Sutskever, Joshua B Tenenbaum, and Ruslan R Salakhutdinov. 2009. Modelling relational data using bayesian clustered tensor factorization. In Advances in neural information processing systems. 1821–1828.
  • Tang et al. (2008) Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In KDD.
  • Tsourakakis et al. (2017) Charalampos E Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. 2017. Scalable motif-aware graph clustering. In WWW.
  • Tucker (1966) Ledyard R Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 3 (1966), 279–311.
  • Ugander et al. (2013) Johan Ugander, Lars Backstrom, and Jon Kleinberg. 2013. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In WWW. ACM, 1307–1318.
  • Wu et al. (2017) Jibing Wu, Zhifei Wang, Yahui Wu, Lihua Liu, Su Deng, and Hongbin Huang. 2017.

    A Tensor CP decomposition method for clustering heterogeneous information networks via stochastic gradient descent algorithms.

    Scientific Programming 2017 (2017).
  • Yaveroğlu et al. (2014) Ömer Nebil Yaveroğlu, Noël Malod-Dognin, Darren Davis, Zoran Levnajic, Vuk Janjic, Rasa Karapandza, Aleksandar Stojmirovic, and Nataša Pržulj. 2014. Revealing the hidden language of complex networks. Scientific reports 4 (2014), 4547.
  • Yin et al. (2018) Hao Yin, Austin R Benson, and Jure Leskovec. 2018. Higher-order clustering in networks. Physical Review E 97, 5 (2018), 052306.
  • Yin et al. (2017) Hao Yin, Austin R Benson, Jure Leskovec, and David F Gleich. 2017. Local higher-order graph clustering. In KDD.
  • Zhang et al. (2018) Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2018. MetaGraph2Vec: Complex Semantic Path Augmented Heterogeneous Network Embedding. arXiv preprint arXiv:1803.02533 (2018).
  • Zhao et al. (2018) Huan Zhao, Xiaogang Xu, Yangqiu Song, Dik Lun Lee, Zhao Chen, and Han Gao. 2018. Ranking Users in Social Networks with Higher-Order Structures. In AAAI.
  • Zhao et al. (2017) Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. 2017. Meta-graph based recommendation fusion over heterogeneous information networks. In KDD.
  • Zhou et al. (2017b) Dawei Zhou, Si Zhang, Mehmet Yigit Yildirim, Scott Alcorn, Hanghang Tong, Hasan Davulcu, and Jingrui He. 2017b. A local algorithm for structure-preserving graph cut. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 655–664.
  • Zhou et al. (2017a) Yu Zhou, Jianbin Huang, Heli Sun, Yizhou Sun, and Hong Chong. 2017a. DMSS: A Robust Deep Meta Structure Based Similarity Measure in Heterogeneous Information Networks. arXiv preprint arXiv:1712.09008 (2017).