1 Introduction
Graph structure is a common and flexible data structure that can represent data in a variety of fields, including social networks, biological proteinprotein networks, knowledge network, etc. People can use graphs to efficiently store and access relational knowledge about interactive entities. For example, in a social network, a node can represent a person, and nodes are connected with edges indicating that people know each other. In recent years, due to the rapidly increase in the amount of data, the formed graphs have become increasingly complicated, which makes it difficult to extract valid information from complicated organizations. Therefore, it is important to process complex raw data in advance and convert them into a form that can be effectively developed for gaining good results.
Highdimensional complex data are expected to be represented as a simple, easytoprocess lowdimensional representation. However, traditional manual feature extraction requires a great deal of manpower and relies on highly specialized knowledge. Thus, representation learning has played a key role in graph machine learning. Representation learning is a technical method to learn the characteristics of data, transforming raw data into a form that can be effectively developed using machine learning. It avoids the trouble of manually extracting features and allows the computer to learn how to extract features while learning to use features, namely, learning how to learn. Representation learning can be regarded as a type of preprocessing, does not directly obtain the results but an effective representation for producing desirable results. In other words, the choice of representation usually depends on subsequent learning tasks, i.e., a good representation should make learning of downstream tasks easier.
The main topic of representation learning on graph is to deal with the relational mode or connection pattern. For this learning, effective encoding of the basic structures such as nodes, edges, subgraphs will lead to quantitative understanding of the data knowledge, and help promote the learning efficiency combined with the downstream tasks. Recently, many valuable results have been obtained. Laplacian feature map is one of the earliest and most famous representation learning methods whose loss function weights pairs of nodes according to their proximity in the graph
, which is a direct encoding method. DeepWalk and node2vecalso rely on direct encoding. However, instead of attempting to decode fixed deterministic distance metrics, these methods gain the representation of the target objects through the random walk on the graph, which makes the graph proximity measure more flexible and has led to superior performance in a number of settings. However, these direct encoding methods have disadvantages such as too many parameters and insufficient use of information in the graph (such as node characteristics). Therefore, the graph neural network (GNN) framework which obtains the representation of the target object through deep learning is developed
. Inspired by the parameter sharing operation in the convolutional neural network, the graph convolution network (GCN) is developed (Kipf et al.
), so that the convolution operation can be applied to the irregular graph data (relative to the regular image data). However, all the above methods start from the binary relationship (i.e., edge) in the graph, and can not leverage the higherorder local structure (i.e., motifs) in the graph, which may help to explore more effective information in more complex graph structures.Present work. This paper develops a new framework that combines motif with traditional representation learning. We first analyze the important statistics in the graph; then the graph convolutional neural network is chosen as the basic model and a new framework is developed. This new framework is named graph convolutional multilayer networks based on motifs (mGCMN), which can improve the accuracy of the task while spending a little more time. It is believed that combining motifs is in essence to redefine the node neighbors and redistribute the weight of the graph network. And we apply a variety of motifs and conduct a large number of experiments, all of which obtain better test results than the baselines. At the same time, the relationship between classification accuracy and clustering coefficient is revealed.
The rest of this article is organized as follows. In Section 2, related past work is outlined and in Section 3, the MGCMN which is our representation learning method, is introduced. Our experiments will be introduced in Section 4 and the results are given in Section 5. In Section 6, the relevant work and conclusions are discussed.
2 Related Work
Our method is related to recent advances in the concepts and applications of motif, as well as previous representation learning methods such as semisupervised learning methods that apply convolutional neural networks to graph structure data. So this section will focus on the previous work closely related to mGCMN.
2.1 The Concept and Application of Motif
Motif is the interconnection pattern that occurs in complex networks, whose number is significantly higher than that in random networks under given conditions. It is generally considered to be the basic building blocks of a complex network. For example, Fig.1A shows all the 3order directed motifs.
Motif is important and previous research has established that it provides a new perspective on identifying graph types. For example, two transcriptional regulatory networks (transcriptional regulatory networks are biochemical networks responsible for regulating gene expression in cells) correspond to organisms from different fields: eukaryotes (Saccharomyces cerevisiae) and bacteria (E. coli). Two transcription networks show the same motifs: a threenode pattern which is called ”feedforward loop” and a fournode pattern which is called ”bifan”, and general trends in the food web are shown as: a threenode pattern which is called ”three chain” and a fournode pattern which is called ”biparallel” (The four motifs are shown in Fig.1B). The food web responds to energy flow, while the gene regulatory network responds to information flow, which seems to have a significantly different structure from energy flow; on the other hand, we can capture important structural information (such as geographic location information, urban hubs, etc.) that is difficult to capture through the edges by selecting the appropriate motifs.
2.2 Representation Learning Method
Representation learning is a technical method to learn the characteristics of data. It converts the original data into a lowdimensional informationrich form that is convenient for machine learning to develop effectively. From a certain perspective, it can be regarded as a dimensionality reduction method. Due to the flexibility of the graph structure, a lot of raw data is stored in the form of graphs, so the representation learning methods introduced below are all graph representation learning methods. Generally speaking, it can be divided into three categories:
2.2.1 Embedding Approaches Based on Factorization
Inspired by classic techniques for dimensionality reduction, early methods for learning representations of nodes largely focused on matrixfactorization approaches. A representative example here is: Laplacian eigenmaps method, which we can view within the encoderdecoder framework as a direct encoding approach. Following the Laplacian eigenmaps method, there are a large number of representation learning methods based on inner product, such as Graph Factorization (GF), GraRep and HOPE. And the main difference of them is that the basic matrix used is different. In GF method, the original adjacency matrix of graph is used. And GraRep is based on various powers of the original adjacency matrix. As for HOPE, more general variants of the original adjacency matrix are considered.
2.2.2 Embedding Approaches Based on Random Walk
This type of method is also a type of direct coding, where the key innovation is to optimize node embeddings. Instead of using deterministic graphical proximity measures, this kind of method uses flexible, random graphical proximity measures (essentially, it is the frequency of node pairs appearing in the same random walks), which performs well in many scenarios. The representative methods are node2vec, DeepWalk and HARP
, etc. Node2vec creatively uses two hyperparameters (backtracking parameter
and forward parameter ) to control random walk, making it a compromise between depthfirst random walk and breadthfirst random walk. And Deepwalk is another famous approach based on random walks, which uses truncated random walk to convert the nonlinear graph structure into multiple linear sequences of nodes. As for HARP, a process called graph coarsening is used in this method, which merges closely related nodes in graph into ”super nodes” , and then DeepWalk, node2vec or other methods is run on the formed new graph.2.2.3 Embedding Approaches Based on Neural Network
The above two types of node embedding methods are direct encoding methods. However, these direct encoding methods independently generate a representation vector for each node trained, which leads to many disadvantages: i) no shared parameters between nodes; ii) high computational complexity; iii) failing to leverage node attributes during encoding; iv) only for known nodes. This leads to the emergence of a neural networkbased node representation method, which overcomes the above disadvantages and achieves excellent results in many aspects. The representative methods are Deep Neural Graph Representations
(DNGR), Structural Deep Network Embeddings(SDNE), Graph Neural Network (GNN) and Graph Convolutional Network (GCN), etc. The DNGR and SDNE methods reduce the computational complexity, which use deep learning methods (autoencoder
) to compress the relevant information of the node’s local neighbors. And GNN is an original graph neural network which implements a function that maps the graph and one of its nodes to Euclidean space. As for GCN, it is a very wellknown method first proposed by Kipf et al.. In this method, the convolution operation (representing any node as a function of its neighborhood, like convolutional neural network in the field of image processing) is cleverly applied to the graph structure.3 Method
The key idea of our method is that we believe that combining motifs is in essence to redefine the node neighbors and redistribute the weight of the graph network. And we regard the graph convolution network combined with motif as a preprocessing tool. In general, we combine the custom motif matrix M with the graph convolution network to process the nodes’ local neighborhood feature information (for example, the nodes’ text attributes, statistical properties), and pass the result into the fully connected network to get the final classification result. The process is shown in Fig.2.
First, the custom motif matrix M converts the original edge adjacency graph into a motif adjacency graph (equivalent to redefining the weight); then, the graph convolution operation is performed on the motif adjacency graph to obtain the middle embedded representation of each point; finally, the intermediate embedded representation is input to a fully connected network for further processing to obtain the final embedded representation.
Next, we first introduce the custom motif matrix and various motifs in Section 3.1; then, we describe the mGCMN embedding algorithm to generate embeddings for nodes in Section 3.2; finally in Section 3.3, we give complexity analysis of the algorithm and make a proof at the same time.
3.1 Motif Matrix
We will explain motif matrix in detail in this section. For the convenience of explanation, some commonly used symbols are agreed on. Formally, let graph , where is the set of the nodes in network, and is the set of the edges, . Given a labeled network with node feature information , where ( is the number of the nodes in network, is the feature dimension) is the feature information matrix and ( is the feature dimension) is the label information matrix, our goal is to use the labels of some of the nodes for training, and generate a vector representation matrix of the nodes.
Then, we give:
Definition 1: Given a graph , a motif with central node (the node currently being followed) is defined as , where is the node set of containing , and satisfies that , if is in , then .
Definition 2: An instance of motif with central node on graph is a subgraph of , where and , satisfying (i) , and ; (ii) , if , then , where is an arbitrary bijection.
After that, we can define the motif matrix. Given a motif , a motif matrix of is defined as: is the number on the row and column of the motif matrix , and it is equal to the number of the times that nodes and appear in the same instances of . Formally,
where is indicative function.
The above is the usual definition of motif adjacency matrix, and particularly, we extend it. When the nodes and come to be the same node, we also record its number of the times that appearing in the same instances of . For example, the positional relationship of the motif ”triangle” is like Fig.3 (A), while after expanding the definition, the position relationship like Fig.3 (B) is also counted. For a specific example, its triangle motif adjacency matrix is shown in Fig.3 (C).
3.2 Embedding Algorithm
For specific algorithm, see: mGCMN embedding algorithm.
We first obtain a custom motif matrix as defined in Section 3.1. The numbers of GCN layers based on motif and MLP layers are specified by the users in advance; The initialization of all nodes is expressed as : , in line 1; In lines 26, we perform a graph convolution operation based on motif, in the formula , represents a weighted nonlinear aggregation function, whose purpose is to reorganize the information of the target node and its neighbors. Formally,
where
is the hidden representation of node
in the th layer; is the number on the row and column of the motif matrix , indicating the closeness between nodes of and ; is the parameter matrix to be trained of layer ; is the neighborhood nodes set of node in the motif matrix ;represents for ReLU function.
In lines 711, the processing results of the graph convolution operation based on motif are sent to MLP for further processing, in the formula , represents a nonlinear activation unit that further processes the information of target nodes. Formally, we have
where represents for ReLU function, except for the last layer (in the last layer, represents for Softmax function).
Then, the final representation vector of node is obtained. Finally, the cross entropy function is used as the loss function to train the parameters of our model:
where is the label of the node .
3.3 Complexity Analysis
Our method is based on GCN. And from the related work of Kipf et al., we know that the computational complexity of the original GCN based on the following formula is , where is the edge set of the graph:
softmax
Here, is the adjacency matrix and is the feature matrix. And is the normalized processing matrix of the adjacency matrix . is an inputtohidden weight matrix and is a hiddentooutput weight matrix, where is input channels, is feature maps in the hidden layer and is feature maps in the output layer. Next, we will prove that the calculation complexity of our method is also while keeping the number of hidden layers unchanged and using the Motif matrix instead of the original adjacency matrix.
Proof.
Let be the maximum degree of the nodes in graph , denote the number of nodes in , denote the Motif matrix and denote the original adjacency matrix.
For the triangle Motif, consider the zero element in A. Let us assume that is , that is, there is no edge between nodes and . So we can know (nodes , are not in the same triangle). So for the triangle Motif, the computational complexity does not change;
For the wedge Motif, we consider . If , it means that node can not reach node by 2 steps (i.e., node is not a secondorder neighbor of node ), which means that the nodes i and j are not in the same wedge. So we can know . Then consider the number of nonzero elements in the matrix , which is set to . According to ( represents the th row of ), we can know that , where represents the number of nonzero elements in and is the degree of node . Therefore, the total number of nonzero elements in satisfies equation:
So the number of nonzero elements in is no more than . Then the computational complexity is . ∎
4 Experiments
In section 4.1, we introduce the datasets used in the experiment, and the specific settings of the experiment are described in section 4.2.
4.1 Datasets
Datasets  Types  Nodes  Edges  Features  Classes 

Cora  Citation  2708  5429  1433  7 
Citeseer  Citation  3327  4732  3703  6 
Pubmed  Citation  19717  44338  500  3 
107Ego  Social  1045  53498  576  9 
414Ego  Social  159  3386  105  7 
1684Ego  Social  792  28048  319  16 
1912Ego  Social  755  60050  480  46 
The statistics of the experimental datasets are shown in the Table 1. In the citation network dataset (Citeseer, Cora, and Pubmed), nodes represent documents and edges represent citation links; In the social network dataset (EgoFacebook), nodes represent users and edges represent interactions between users.
Citation Network Datasets: Citeseer, Cora, and Pubmed. The three citation network datasets contain a sparse feature vector for each document and a list of reference links between the documents. Citation links are considered as (undirected) edges and each document has a category label.
Social Network Dataset: EgoFacebook. This dataset consists of ’circles’ (or ’friends lists’) from Facebook (Facebook data was collected from survey participants). There are many subsets of the EgoFacebook dataset. Take ’107Ego’ as an example. The dataset includes node features, edge sets, node category sets, and selfnetworks (network with node 107 as the core), where each user is considered as a node, the interaction is considered as an (undirected) edge, and each user has a feature attribute vector and a category label. We choose the suitable EgoFacebook subsets for experiments, and after preprocessing, the data whose information has been lost is removed.
4.2 Experimental setup
We first add motif of the general sense to the GCN network, the purpose is to observe whether the individual motif will work, and if it works, which kind of motif works better; then the general sense motif is changed to a custom motif, and connected to the MLP network for further processing (that is, the complete mGCMN algorithm), and the method is also marked as mGCMN.
When we perform experiments on the citation network dataset (Citeseer, Cora, and Pubmed), the number of GCN network layer and MLP network layers are set to 1 and 0 respectively for Citeseer and Pubmed; as for Cora, the number of GCN network layer and MLP network layers are set to 2 and 1 respectively; and the other parameters are set as the settings in [8]. When performing experiments on the social network dataset (EgoFacebook), the number of GCN network layer is set to 2, and the number of MLP network layers is set to 1.
In all of the experiment, many motifs have been experimented, and finally a mixed matrix of motifs and graph adjacency matrix are choosen, which will get the best results. The mixed matrix parameter is determined by grid search. The details are as follows.
In the mGCMN method: the ratio of edge, triangle motif and wedge motif is 8: 1: 2 on the Citeseer dataset; the ratio of edge, triangle motif and wedge motif is 8: 1: 3 on the Cora dataset; the ratio of edge and wedge motif is 9: 1 on the Pubmed dataset; the ratio of edge and wedge motif is 9: 1 on the Facebook107Ego dataset; the ratio of edge and triangle motif is 4: 1 on the Facebook414Ego dataset; the ratio of edge and wedge motif is 1: 1 on the Facebook1684Ego dataset; the ratio of edge and wedge motif is 4: 1 on the Facebook1912Ego dataset.
Finally, the prediction vector is used to compare the prediction accuracy of classification on the test set. See specific procedures in our program.
5 Results and Discussion
In this section, we introduce a variety of baseline methods, and show the comparison of all experimental results as follows:
5.1 Experimental Results on Citation Network Datasets (Citeseer, Cora, Pubmed)
First, we use the citation network datasets (Citeseer, Cora, and Pubmed) for the experiment, and compare the experimental results with various baseline methods. The results are shown in Table 2.
Method  Cora  Citeseer  Pubmed 

ManiReg  59.5  60.1  70.7 
SemiEmb  59.0  60.1  71.1 
LP  68.0  45.3  63.0 
DeepWalk  67.2  43.2  65.3 
ICA  75.1  69.1  73.9 
Planetoid  75.7  64.7  77.2 
GCN  81.5  70.3  79.0 
mGCMN  82.3  71.8  79.5 
CC  0.09350  0.14297  0.05380 
The table shows the comparative results of our method with the methods of label propagation (LP), semisupervised embedding (SemiEmb), manifold regularization (ManiReg), iterative classification algorithm (ICA) and Planetoid, and DeepWalk is a method based on random walks, as stated at the beginning of the article, whose sampling strategy can be seen as a special case of node2vec with and . As for method named GCN, which is the first method to achieve convolution on the graph, it is the best performing baseline method, and we can see that our method performs better than it on every dataset.
One interesting finding is that we noticed that the global clustering coefficients (CC) of these three graph networks are 0.09350, 0.14297 and 0.05380, whose ordering is consistent with the order of the increase of our method (compared to GCN). A more intuitive display is shown in the Fig.4 (a). In the next experiment, this phenomenon appears again, and we believe that this illustrates the rationality of our application of higherorder neighborhood information. The clustering coefficient is calculated according to the following formula:
5.2 Experimental Results on Social Network Dataset (EgoFacebook)
Now, we use the social network dataset (EgoFacebook) for the experiments, and compare the experimental results with the classic method based on random walks Deepwalk, and the best performing baseline method GCN. The experimental results are shown in the Table 3:
Method  107Ego  414Ego  1684Ego  1912Ego 

DeepWalk  (77.5)  (79.2)  (64.4)  (66.5) 
GraRep  (90.0)  (85.4)  (76.3)  (77.0) 
GCN  73.1(92.5)  64.2(93.8)  59.9(81.9)  53.8(77.0) 
mGCMN  80.1(95.0)  72.4 (100)  66.3(88.8)  62.9(84.0) 
CC  0.54431  0.67137  0.45752  0.71837 
In the Table 3, GraRep works by defining a more accurate loss function that allows nonlinear combinations of different local relationship information to be integrated.
Among them, GCN and mGCMN require random weight initialization, so the average accuracy of 100 runs after the random weight initialization is reported, the highest accuracy in all experiments is shown in brackets. Deepwalk and GraRep does not require random weight initialization, so the best performance was reported after the hyperparameters were determined.
As can be seen from Table 3, the experimental results of both parts of our method are significantly higher than the other baseline. And we again see that the ranking of the global clustering coefficients is consistent with the ranking of the improvement of our method (compared to GCN). A more intuitive display is shown in the Fig.4 (b). We think this phenomenon may help to search the data which is suitable to process using our method.
6 Conclusion
In this paper, we have designed a new framework combined with motifs mGCMN, which can effectively aggregate node information (we think it can be seen as accomplishing this by defining a new neighborhood structure), and capture higherorder features through deeper learning. The results have shown that mGCMN can effectively generate embeddings for nodes of unknown category and is always better than the baseline methods. At the same time, the experiment also reveal the relationship between increase of the classification accuracy and the clustering coefficient.
There are many extensions and potential improvements to our method, such as further exploring the relationship between motifs and graph statistics and extending mGCMN to handle directed or multigraph mode. Another interesting direction for future work is to explore how to use the adjacency matrix more efficiently and flexibly.
Acknowledgments
This work is supported by the Research and Development Program of China（No.2018AAA0101100）, the Fundamental Research Funds for the Central Universities, the International Cooperation Project No.2010DFR00700, Fundamental Research of Civil Aircraft No. MJF201204 and the Beijing Natural Science Foundation (1192012, Z180005).
References
 1. L. Backstrom and J. Leskovec, ”Supervised random walks: predicting and recommending links in social networks,” Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM), 2011.
 2. M. Zitnik and J. Leskovec, ”Predicting multicellular function through multilayer tissue networks,” Bioinformatics, Vol. 33, no. 14, 2017, pp. l190l198.

3.
X. Wang, Y. Ye and A. Gupta, ”Zeroshot Recognition via Semantic Embeddings and Knowledge Graphs,”
Computer Science, 2018.  4. M. Belkin and P. Niyogi, ”Laplacian eigenmaps and spectral techniques for embedding and clustering,” NIPS, 2002.
 5. B. Perozzi, R. AlRfou, and S. Skiena, ”Deepwalk: Online learning of social representations,” Computer Science, 2014.
 6. A. Grover and J. Leskovec, ”node2vec: Scalable feature learning for networks,” Computer Science, 2016.
 7. F. Scarselli et al., ”The Graph Neural Network Model,” IEEE Transactions on Neural Networks, vol.20, no.1, 2009, pp. 6180.
 8. T.N. Kipf and M. Welling, ”Semisupervised classification with graph convolutional networks,” ICLR, 2016.
 9. R. Milo et al., ”Network Motifs: Simple Building Blocks of Complex Networks,” Science, vol.298, 2002, pp. 824827.
 10. A.R. Benson, D.F. Gleich, and J. Leskovec, ”Higherorder organization of complex networks,” Science, 2016, pp. 163166.
 11. W.L. Hamilton, R. Ying, and J. Leskovec, ”Representation Learning on Graphs: Methods and Applications,” Computer Science, 2017.
 12. A. Ahmed et al., ”Distributed largescale natural graph factorization,” IW3C2  International World Wide Web Conference, 2013.
 13. S. Cao, W. Lu, and Q. Xu, ”Grarep: Learning graph representations with global structural information,” ACM, 2015, pp. 891900.
 14. M. Ou et al., ”Asymmetric transitivity preserving graph embedding,” 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016.
 15. H. Chen et al., ”Harp: Hierarchical representation learning for networks,” Computer Science, 2017.
 16. S. Cao, W. Lu, and Q. Xu, ”Deep neural networks for learning graph representations,” AAAI, 2016.
 17. D. Wang, P. Cui, and W. Zhu, ”Structural deep network embedding,” KDD, 2016.
 18. G.E. Hinton and R.R. Salakhutdinov, ”Reducing the dimensionality of data with neural networks,” Science, vol.313, 2006, pp. 504507.
 19. X. Zhu, Z. Ghahramani, and J.D. Lafferty, ”Semisupervised learning using gaussian fields and harmonic functions,” International Conference on Machine Learning, vol.3, 2003, pp. 912919.
 20. J. Weston et al., ”Deep learning via semisupervised embedding,” Neural Networks: Tricks of the Trade, 2016, pp. 639655.
 21. M. Belkin, P. Niyogi, and V. Sindhwani, ”Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” Journal of machine learning research, vol.7, 2006, pp. 23992434.
 22. Q. Lu and L. Getoor, ”Linkbased classification,” International Conference on Machine Learning, vol.3, 2003, pp. 496503.
 23. Z. Yang, W.W. Cohen, and R. Salakhutdinov, ”Revisiting semisupervised learning with graph embeddings,” International Conference on Machine Learning, 2016.
Comments
There are no comments yet.