become time-consuming in large-scale networks. This motivates researchers to develop network embedding techniques which aim to learn a distributed representation vector for each node in a network. An effective network embedding should preserve the similarity between nodes in order to reconstruct the original network.
The word2vec  idea has inspired many studies for network representation learning, most of which are in the context of homogeneous information networks, such as DeepWalk , LINE , and node2vec . A homogeneous information network is a simple structural network, where all nodes and links are considered to belong to a single class.
However, in practice, there are usually multiple types of nodes (e.g., authors and papers in DBLP) and links (e.g., cite and publish) that compose a heterogeneous information network (HIN). To measure the similarity between nodes in HINs, many customized similarity or relevance measures based on meta-paths have been proposed in recent years [16, 29]. For example, a meta-path (denoted as ) indicates two authors having their publications in the same venue. Comparing to meta-path-based relevance measures utilizing only simple structural information, meta-graph  is recently proposed to capture complex structural information in HINs. In short, meta-graph is a special directed acyclic graph (DAG) which contains at least two embedded meta-paths, such as a DAG containing and as shown in Figure 1, where is the topic of a paper.
Meta-graph is an effective tool to calculate the relevance score between nodes in HINs, where a higher score indicates that there are more meta-graph instances between two nodes, i.e., a closer relationship. How to explore meta-graphs for representation learning in HINs is still an open question. An intuitive idea for meta-graph-based representation learning is to learn the node embedding by leveraging multiple meta-graphs between nodes in HINs. However, existing meta-graph-based relevance measures only utilize the strong relations as defined by the meta-graphs themselves, and they usually ignore the weak relations as indicated by their embedded meta-paths. To address this problem, we propose to learn the node embedding by leveraging both meta-graph and its embedded meta-paths for similarity search. An effective representation learning based on a single meta-graph should contain both strong and weak relations embedded in this meta-graph. In addition, we explore a novel meta-graph-based similarity measure to compute relevance scores that can better capture the strong relations between nodes in HINs.
In summary, there are three-fold contributions of this paper: 1) We are the first to propose the meta-graph-based node embedding method in HINs. Specifically, we develop two kinds of node embedding methods based on meta-graph, named MEGA and MEGA++ respectively. 2) We introduce GraphSim which is an effective meta-graph-based similarity measure with best performance comparing to previous meta-graph-based similarity measures, such as StructCount and SCSE. 3) Our approaches show the best performance comparing to other competing methods on two real-world datasets.
Ii Preliminary and Problem Formulation
In this section, we first introduce some related concepts and notations from multilinear algebra. Then, we review some concepts and approaches involved in HIN analysis including meta-graph and relevance measure. Last part, we formulate the problem of node embedding in HINs.
Ii-a Multilinear Algebra
The basic mathematical object of multilinear algebra is the tensor, a higher order generalization of vectors (first order tensors) and matrices (second order tensors) to multiple indices. The order of a tensor is the number of dimensions, also known as modes or ways. An -th order tensor is represented as , where is the cardinality of its -th mode, . An element of a vector , a matrix , or a tensor is denoted by , , , etc., depending on the number of modes. All vectors are column vectors unless otherwise specified. For an arbitrary matrix , its -th row and -th column vector are denoted by and , respectively.
Definitions of outer product, partial symmetric tensor, mode- matricization, and CP factorization are given below, which will be applied to present our approach.
(Outer Product) The outer product of vectors for is an -th order tensor and defined element-wise by for all values of the indices.
(Partial Symmetric Tensor) An -th order tensor is a rank-one partial symmetric tensor if it is partial symmetric on modes , and can be written as the tensor product of vectors, i.e.,
(Mode- Matricization) The mode- matricization or unfolding of an -th order tensor is denoted by and is of size , where .
(CP Factorization) For a general tensor , its CANDECOMP/PARAFAC (CP) factorization is
where for , are factor matrices of size , is the number of factors, and is used for shorthand.
To obtain the CP factorization
, the objective is to minimize the following estimation error:
However, is not jointly convex w.r.t. . A widely used optimization technique is the Alternating Least Squares (ALS) algorithm, which alternatively minimize for each variable while fixing the other, that is,
Ii-B Meta Graph
(Meta-Graph ) A meta-graph is a directed acyclic graph (DAG) defined on a HIN schema . A meta-graph contains a single source node with 0 in degree and a single target node with 0 out degree. Mathematically, a meta-graph , where is a set of nodes, is a set of edges, is the of source node, and is the target node,.
Since a meta-graph only has one source node and one target node, not all sub-graphs of HINs can be meta-graph.
(Meta-graph-based Relevance Measure) Given a HIN and a meta-graph , the similarity of any two nodes with respect to is defined as:
where is a meta-graph instance of , and is the relevance score between and , which will be determined by the number of meta-graph instances connecting them.
Prior works provide different meta-graph-based relevance measures, such as StructCount, SCSE and BSCSE .
Ii-C Problem Formulation
We study the problem of meta-graph-based node embedding in the HIN. Given a HIN , we have two goals in this study. First, we want to explore a customized meta-graph-based relevance measure which can more efficiently capture the complex structural information. Second, we aim at finding an effective node embedding that can better preserve the closeness between nodes in a HIN based on a meta-graph and its embeded meta-paths analysis. Specifically, we integrate all the similarity information of a meta-graph and its embedded meta-paths into a symmetric matrix and a partial symmetric tensor, and perform multilinear analysis of the coupled partial symmetric tensor and symmetric matrix to find the node embedding.
In this section, we will introduce a brand new similarity measure, and the embedding techniques of MEGA++. First, we will introduce a meta-graph-based similarity measure named GraphSim. Then, we proposed a coupled tensor-matrix decomposition to obtain a joint embedding for nodes in HINs.
Iii-a GraphSim: A Normalized version of StructCount
First, we want to propose a new meta-graph-based similarity measure called GraphSim. In previous work, Huang et al.  proposed three meta-graph-based similarity measures: StructCount, SCSE, and BSCSE which is a mixed measure based on previous two measures. GraphSim can be viewed as a normalized version of StructCount.
StructCount  is a straightforward meta-graph-based similarity measure in HIN, which counts the number of meta-graph instances in the graph with an as source and an as target object.
(GraphSim) A meta-graph-based similarity measure. Given a symmetric meta-graph , GraphSim between two nodes is defined as:
where is a meta-graph instance between and , is that between and , and is that between and .
Comparing to StructCount, GraphSim is normalized version of StructCount. is determined by two parts: First, the number of meta-graph instance between by following ; Second, the balance of their visibility, where the visibility is defined as the number of meta-graph instances between themselves. Normalized relevance score can present better relation between different nodes. For example, an author published all his four papers with . published five papers with , and published other five papers with other authors. Without normalized process, the relation between and is closer than and . However, for common sense, we should agree and have closer relation, which indicates GraphSim is a better measure.
Iii-B Mega++: Node Embedding by CTMD
In this section, we show how to jointly consider similarity information of a meta-graph and its embedded meta-paths to learn a node embedding. The basic idea is the integration of similarity matrices and coupled embedding by joint factorization. Specifically, we first compute a meta-graph similarity matrix using the proposed GraphSim, denoted as , and for each meta-path , compute an embedded meta-path similarity matrix using the PathSim, denoted as . Next, we concatenate the embedded meta-path similarity matrices of different embedded meta-paths to form a third-order tensor comprising three modes: nodes, nodes, and paths, denoted as . Then, we introduce a novel coupled tensor-matrix decomposition (CTMD) method to find common latent features between and . Last, we use the latent features to measure the similarity between different nodes in the HIN.
In the following, we detail the CTMD method, which can be seen as a special case of the coupled tensor-matrix decomposition  with input partial symmetric tensor and symmetric matrix . Notice that since similarity matrix is symmetric, the resulting is a partial symmetric tensor, and is a symmetric matrix.
Tensors (including matrix) provide a natural and efficient representation for a meta-graph data, but there is no guarantee that such representation will be good for subsequent learning, since learning will only be successful if the regularities that underlie the data can be discerned by the model. Tensor factorization is a powerful tool to analyze tensors. In previous work, it was found that CP factorization (which is a higher order generalization of SVD) is particularly effective to acknowledge the connections and find valuable features among tensor data . Motivated by these observations, we exploit the benefits of CP and SVD factorizations to find an effective embedding in the sense of meta-path-based similarity tensor and meta-graph similarity matrix .
Based on above analysis, we design our CTMD objective function as below:
where and are latent matrices. Specifically, is jointly learned from both meta-graph and meta-path similarity information.
The objective function in Eq. (7) is non-convex with respect to and together, thus there is no closed-form solution. We introduce an effective iteration method to solve this problem. The main idea is to decouple the parameters using an Alternating Direction Method of Multipliers (ADMM) approach , by alternatively optimizing the objective with respect to one variable, while fixing others.
Update : First, we optimize while fixing . Notice that the objective function in Eq. (7) involves a fourth-order term with respect to which is difficult to optimize directly. To obviate this problem, we use a variable substitution technique and minimize the following objective function
where is an auxiliary variable.
The augmented Lagrangian function of Eq. (8) is
where is the Lagrange multiplier, and is the penalty parameter which can be adjusted efficiently according to .
To compute , Eq. (9) can be transformed as
where is the mode-1 matricization of , and .
By setting the derivative of Eq. (10) with respect to to zero, we obtain the closed-form solution
To efficiently compute , we consider the following property of the Khatri-Rao product of two matrices
Then the auxiliary matrix can be optimized successively in a similar way, and the solution is
where is the mode-2 matricization of , and .
Moreover, we optimize the Lagrange multiplier using the gradient descent method by
Update : Next, we optimize while fixing and . We need to optimize the following objective function
where is the mode-3 matricization of , and .
By setting the derivative of Eq. (15) with respect to to zero, we obtain the closed-form solution as
The overall algorithm is summarized in Algorithm 1.
Iii-C Time Complexity
The estimate for the update of according to Eq. (11) is as follows: for the computation of the term ; for the computation of the term due to Eq. (12) and for its Cholesky decomposition; for the computation of the system solution that gives the updated value of . An analogous estimate can be derived for the update of .
Overall, the updates of model parameters and , require O() arithmetic operations in total.
In this section, we conduct extensive experiments in order to test the effectiveness of the proposed methods: GraphSim, MEGA and MEGA++. We first introduce two real-life datasets and a set of methods to be compared. Then, we evaluate the effectiveness of proposed methods on four data mining tasks: clustering, classification, parameter analysis and time analysis.
We use two real datasets (e.g. DBLP-4-Area and YAGO Movie) in the evaluation. Table I shows some statistics about them. DBLP-4-Area  is the subset of original DBLP, which contains 5,237 papers (P), 5,915 authors (A), 18 venues (V), 4,479 topics (T). The authors and venues are from 4 areas: database, data mining, machine learning and information retrieval. YAGO Movie is a subset of YAGO , which contains 7,332 movies (M), 10,789 actors (A), 1,741 directors (D), 3,392 producers (P) and 1,483 composers (C). The movies are divided into five genres: action, horror, adventure, sci-fi and crime. The guided meta-graphs are designed for three tasks as shown in the Figures 1 and 3.
The proposed methods are compared with meta-graph-based relevance measures (e.g. StructCount, SCSE, and BSCSE ), and network embedding approaches (e.g. DeepWalk , and LINE ) in clustering and classification tasks. The experimental results are shown in the following sections.
|Pre. Meta-Graph Measures||Pre. Network Embedding||OUR WROKS|
|Pre. Meta-Graph Measures||Pre. Network Embedding||OUR WROKS|
Iv-a Clustering Results
We first conduct a clustering task to evaluate the performance of the compared methods on DBLP and YAGO Movie datasets. For DBLP, we use the areas of authors as ground-truth label for clustering authors (A2A), and use the areas of venues as labels for clustering venues (V2V). For YAGO Movie, we use the genres of movies as labels (M2M). To be specific, we use
-means on the derived meta-graph-based relevance matrices for the clustering task. To evaluate the results, we use NMI and purity as evaluation metrics.
Clustering results of the three tasks are shown in Table II. Comparing to previous meta-graph-based relevance measures, the proposed GraphSim always shows the best performance of all. We observe at least 19.94% improvement in NMI of GraphSim method when compared with the previous meta-graph-based relevance measure on clustering the venues and authors in DBLP, respectively. The clustering results can be sensitive to initialization of centroid seeds, so we set 100 times of random initializations. All methods show worse performance on YAGO Movie than DBLP, but the proposed methods, especially MEGA++, show the best performance comparing to prior works..
Iv-B Classification Results
We then conduct a classification task. Comparing to the clustering task, in DBLP we do not evaluate the results of classifying the venues, as the total number of venues is only 18. We first apply previous methods and our works to generate the similarity matrices or embedding space of the original network. Then, we randomly partition the samples, and set 80% samples as training set and the rest as testing set. Last, we applynearest neighbor (k-NN) classifier with to evaluate the methods with training and testing dataset [29, 15]. To prevent the special case of random partition, we repeat and use different random partition 10 times in total. For multi-label classification task, we use the average Macro-F1 score and Micro-F1 score as the evaluation metrics.
GraphSim outperforms the existing relevance measures (e.g StructCount, SCSE and BSCSE) because it represents a better relations between objects in the HINs by normalizing the presence of meta-graph structures. MEGA++ outperforms all the baselines because it captures both lower-order (i.e. meta-path) and higher-order (i.e. meta-graph) structural information by facilitating the use of coupled tensor-matrix decomposition method to obtain a joint embedding for nodes in HINs.
Iv-C Parameter Analysis
In this section, we first analyze the parameter sensitivity of our methods as shown in Figure 10. We use two evaluation metrics, Normalized Mutual Information(NMI), and Purity (both the larger, the better), to evaluate the performances of our methods for clustering task. In Figure 10 (a)-(b), the penalty parameters of MEGA is used for minimizing the Frobenius Norm of embedding space and in Eq. (8), and the best performance is achieved when is set as 3.2768e-04. From 10 (b)-(c), setting the embedding dimensions as 5 shows the best performance for both MEGA and MEGA++. MEGA++ outperforms MEGA with the same number of embedding dimensions. The Figure 10 (e)-(f) show the two penalty parameters and of MEGA++. The penalty parameter of MEGA++ is the same as that in MEGA. The penalty parameter of MEGA++ is used for minimizing the Frobenius Norm of meta-graph similarity matrix and its embedding space in Eq. (7). We find that and produce the best performance of clustering task.
Iv-D Time Analysis
In this section, we evaluate the execution time of MEGA++. In Table IV, it shows the execution time is linear with respect to the embedding dimensions. Based on the time complexity of MEGA++, when we have a fixed size of dataset, the embedding dimensions , and the number of views are linear with respect to the execution time. Sometimes, MEGA++ can be early stopped when it is already converge, so as to the same time consuming of DBLP (A2A) with and . The same results are shown in the real testing on three tasks, which show the efficiency of MEGA++.
V Related Work
V-a Network Embedding
Network embedding want to learn a low-dimensional representations from a network. Previous traditional works 
usually construct the affinity graph using the feature vectors of the vertexes and then compute the eigenvectors of the affinity graph. Some other groups use matrix factorization to represent graph as adjacency matrix.
Recently, DeepWalk  and LINE  are proposed for learning the network embedding. Besides these two most popular node embedding methods, many other network embedding are proposed recent years [6, 10, 33, 7]. [6, 33]
learn the node embedding by deep learning encoder methods. However, none previous node embedding methods consider the meta-graph and its embedded meta-paths information.
V-B Tensor Learning and Embedding
Just like deep learning, tensor learning becomes very hot and popular topic in recent years due to the stronger computing capability and lower computation cost [14, 19, 11, 20, 24, 5, 12]. Coupled tensor matrix embedding tries to fuse multiple information sources where matrices and tensors sharing some common modes are jointly embedding . A gradient-based optimization approach for joint tensor-matrix analysis is proposed by Acar et al. .
V-C Multi-view Learning
Multi-view learning is a hot idea to think one object with different views [26, 27, 23, 13]. In this paper, we think the HIN with different views such as meta-paths and meta-graph, and fuse the different information for node embedding. However, none of these frameworks can be directly applicable to learn jointly embedding with a partial symmetric tensor and a symmetric matrix, and also do not leverage meta-path and meta-structure information for similarity search in HINs.
Vi Conclusion and Future Work
In this paper, we proposed a new meta-graph-based relevance measure, i.e. GraphSim, and two node embeddings, i.e. MEGA and MEGA++, by leveraging a meta-graph and its embedded meta-paths similarity information. In the experiment, MEGA++ shows better performance than other compared methods in different tasks. In the future, we can expend our proposed node embedding for a single meta-graph to multiple meta-graphs node embedding in a HIN. Meanwhile, we can utilize heterogeneous and homogeneous information together for node embedding.
This work is supported in part by NSFC through grants No. 61503253 and 61672313, NSF through grants No. IIS-1526499, IIS-1763325, and CNS-1626432, and NSF of Guangdong Province through grant No. 2017A030313339.
-  Evrim Acar, Tamara G Kolda, and Daniel M Dunlavy. All-at-once optimization for coupled matrix and tensor factorizations. arXiv:1105.3422, 2011.
-  Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J Smola. Distributed large-scale natural graph factorization. In WWW. ACM, 2013.
-  Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001.
-  Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 2011.
-  Bokai Cao, Lifang He, Xiaokai Wei, Mengqi Xing, Philip S Yu, Heide Klumpp, and Alex D Leow. t-bne: Tensor-based brain network embedding. In SDM. SIAM, 2017.
-  Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. Heterogeneous network embedding via deep architectures. In KDD. ACM, 2015.
-  Ting Chen and Yizhou Sun. Task-guided and path-augmented heterogeneous network embedding for author identification. In WSDM. ACM, 2017.
-  Beyza Ermiş, Evrim Acar, and A Taylan Cemgil. Link prediction in heterogeneous data via generalized coupled tensor factorization. DMKD, 2015.
-  Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD. ACM, 2016.
-  Huan Gui, Jialu Liu, Fangbo Tao, Meng Jiang, Brandon Norick, and Jiawei Han. Large-scale embedding learning in heterogeneous event data. In ICDM. IEEE, 2016.
Tengjiao Guo, Le Han, Lifang He, and Xiaowei Yang.
A ga-based feature selection and parameter optimization for linear support higher-order tensor machine.Neurocomputing, 2014.
-  Lifang He, Xiangnan Kong, Philip S Yu, Xiaowei Yang, Ann B Ragin, and Zhifeng Hao. Dusk: A dual structure-preserving kernel for supervised tensor learning with applications to neuroimages. In SDM. SIAM, 2014.
-  Lifang He, Chun-Ta Lu, Hao Ding, Shen Wang, Linlin Shen, S Yu Philip, and Ann B Ragin. Multi-way multi-level kernel modeling for neuroimaging classification. In CVPR, 2017.
-  Lifang He, Chun-Ta Lu, Guixiang Ma, Shen Wang, Linlin Shen, S Yu Philip, and Ann B Ragin. Kernelized support tensor machines. In ICML, 2017.
-  Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. Meta structure: Computing relevance in large heterogeneous information networks. In KDD. ACM, 2016.
-  Ni Lao and William W Cohen. Relational retrieval using a combination of path-constrained random walks. Machine learning, 2010.
-  Athanasios P Liavas and Nicholas D Sidiropoulos. Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers. TSP, 2015.
-  Zhouchen Lin, Risheng Liu, and Zhixun Su. Linearized alternating direction method with adaptive penalty for low-rank representation. In NIPS, 2011.
-  Xiaolan Liu, Tengjiao Guo, Lifang He, and Xiaowei Yang. A low-rank approximation-based transductive support tensor machine for semisupervised classification. TIP, 2015.
-  Chun-Ta Lu, Lifang He, Weixiang Shao, Bokai Cao, and Philip S Yu. Multilinear factorization machines for multi-task multi-view learning. In WSDM. ACM, 2017.
-  Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv, 2013.
-  Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In KDD. ACM, 2014.
-  Weixiang Shao, Lifang He, Chun-Ta Lu, Xiaokai Wei, and Philip S Yu. Online unsupervised multi-view feature selection. ICDM, 2016.
-  Weixiang Shao, Lifang He, and S Yu Philip. Clustering on multi-source incomplete data via tensor modeling and factorization. In PAKDD. Springer, 2015.
-  Lichao Sun, Weiran Huang, Philip S Yu, and Wei Chen. Multi-round influence maximization. In KDD. ACM, 2018.
-  Lichao Sun, Yuqi Wang, Bokai Cao, S Yu Philip, Witawas Srisa-An, and Alex D Leow. Sequential keystroke behavioral biometrics for mobile user identification via multi-view deep learning. In ECML-PKDD. Springer, 2017.
-  Lichao Sun, Xiaokai Wei, Jiawei Zhang, Lifang He, S Yu Philip, and Witawas Srisa-an. Contaminant removal for android malware detection systems. In BigData. IEEE, 2017.
-  Yizhou Sun, Jiawei Han, Charu C Aggarwal, and Nitesh V Chawla. When will it happen?: relationship prediction in heterogeneous information networks. In WSDM. ACM, 2012.
-  Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, 2011.
-  Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S Yu, and Xiao Yu. Pathselclus: Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. TKDD, 2013.
-  Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW. ACM, 2015.
-  Charles F Van Loan. Structured matrix problems from tensors. In Exploiting Hidden Structure in Matrix Computations: Algorithms and Applications. Springer, 2016.
-  Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In KDD. ACM, 2016.