A Literature Review of Recent Graph Embedding Techniques for Biomedical Data

01/17/2021 ∙ by Yankai Chen, et al. ∙ The Chinese University of Hong Kong 0

With the rapid development of biomedical software and hardware, a large amount of relational data interlinking genes, proteins, chemical components, drugs, diseases, and symptoms has been collected for modern biomedical research. Many graph-based learning methods have been proposed to analyze such type of data, giving a deeper insight into the topology and knowledge behind the biomedical data, which greatly benefit to both academic research and industrial application for human healthcare. However, the main difficulty is how to handle high dimensionality and sparsity of the biomedical graphs. Recently, graph embedding methods provide an effective and efficient way to address the above issues. It converts graph-based data into a low dimensional vector space where the graph structural properties and knowledge information are well preserved. In this survey, we conduct a literature review of recent developments and trends in applying graph embedding methods for biomedical data. We also introduce important applications and tasks in the biomedical domain as well as associated public biomedical datasets.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the recent advances in biomedical technology, a large number of relational data interlinking biomedical components including proteins, drugs, diseases, and symptoms, etc. has gained much attention in biomedical academic research. Relational data, also known as the graph, which captures the interactions (i.e., edges) between entities (i.e., nodes), now plays a key role in the modern machine learning domain. Analyzing these graphs provides users a deeper understanding of topology information and knowledge behind these graphs, and thus greatly benefits many biomedical applications such as biological graph analysis 

[2], network medicine [4], clinical phenotyping and diagnosis [40], etc.

As summarized in Figure 1, although graph analytics is of great importance, most existing graph analytics methods suffer the computational cost drawn by high dimensionality and sparsity of the graphs [12, 7, 36]. Furthermore, owing to the heterogeneity of biomedical graphs, i.e., containing multiple types of nodes and edges, traditional analyses over biomedical graphs remain challenging. Recently, graph embedding methods, aiming at learning a mapping that embeds nodes into a low dimensional vector space

, now provide an effective and efficient way to address the problems. Specifically, the goal is to optimize this mapping so that the node representation in the embedding space can well preserve information and properties of the original graphs. After optimization of such representation learning, the learned embedding can then be used as feature inputs for many machine learning downstream tasks, which hence introduces enormous opportunities for biomedical data science. Efforts of applying graph embedding over biomedical data are recently made but still not thoroughly explored; capabilities of graph embedding for biomedical data are also not extensively evaluated. In addition, the biomedical graphs are usually sparse, incomplete, and heterogeneous, making graph embedding more complicated than other application domains. To address these issues, it is strongly motivated to understand and compare the state-of-the-art graph embedding techniques, and further study how these techniques can be adapted and applied to biomedical data science. Thus in this survey, we investigate recent developments and trends of graph embedding techniques for biomedical data, which give us better insights into future directions. In this article, we introduce the general models related to biomedical data and omit the complete technical details. For a more comprehensive overview of graph embedding techniques and applications, we refer readers to previous well-summarized papers 

[7, 19, 43, 9].

Figure 1: Comparison between traditional graph analysis methods and graph embedding techniques for biomedical graphs.

In this article, we first give the preliminaries used in this paper. We then briefly introduce the widely used graph embedding models. After that, we introduce some related public biomedical datasets. Finally, we carefully discuss the recent developments and trends of biomedical graph embedding applications.

2 Preliminaries

Definition 1 (Homogeneous graphs)

A homogeneous graph is associated with two mapping functions (node set) (node type set) and (edge set) (edge type set) and .

Definition 2 (Heterogeneous graphs)

A heterogeneous graph is associated with a node type mapping function and an edge type mapping function and and/or .

Definition 3 (Dynamic graphs)

A graph is a dynamic graph where with , are respectively the start and end timestamps for the vertex existence (with ); with and , are respectively the start and end timestamps for the edge existence (with ).

Problem 1 (Graph embedding)

Given a graph , and a predefined embedding dimensionality where . Graph embedding aims to convert into a -dimensional space , where the information and proprieties of are well preserved as much as possible.

In the following section, we provide the taxonomy of graph embedding methods based on the graph settings and embedding techniques, respectively.

3 Taxonomy of Graph Embedding Models

As shown in Figure 2, in this section, according to the graph settings, we introduce homogeneous graph embedding models, heterogeneous graph embedding models and dynamic graph embedding models as follows.

Figure 2: Taxonomy of graph embedding models.

3.1 Homogeneous Graph Embedding Models

In the literature, there are three main types of homogeneous graph embedding methods, i.e., matrix factorization-based methods, random walk-based methods and deep learning-based methods.

Matrix factorization-based methods. Matrix factorization-based methods, inspired by classic techniques for dimensionality reduction, use the form of a matrix to represent the graph properties, e.g., node pairwise similarity. Generally, there are two types of matrix factorization to compute the node embedding, i.e., node proximity matrix and graph Laplacian eigenmaps.

For node proximity matrix factorization methods, they usually approximate node proximity into a low dimension and the objective of preserving node proximity is to minimize the approximation loss , where is the node proximity matrix, is the embedding for context nodes and embedding

can be computed using this loss function. Actually, there are many other solutions to approximate this loss function, such as low rank matrix factorization, regularized Gaussian matrix factorization, etc. For graph Laplacian eigenmaps factorization methods, the assumption is that the graph property can be interpreted as the similarity of pairwise nodes. Thus, to obtain a good representation, the normal operation is that a larger penalty will be given if two nodes with higher similarity are far embedded. The optimal embedding

can be computed by using the objective function (1):


where is the graph Laplacian. is the diagonal matrix and . There are many works using graph Laplacian-based methods and they mainly differ from how they calculate the pairwise node similarity . For example, BANE [55] defines a new Weisfeiler-Lehman proximity matrix to capture data dependence between edges and attributes; then based on this matrix, BANE learns the node embeddings by formulating a new Weisfiler-Lehman matrix factorization. Recently, NetMF [37] unifies state-of-the-art approaches into a matrix factorization framework with close forms.

Random walk-based methods. Random walk-based methods have been widely used to approximate many properties in the graph including node centrality and similarity. They are more useful when the graph can only partially be observed, or the graph is too large to measure. Two widely recognized random walk-based methods have been proposed, i.e., DeepWalk [36] and node2vec [20]. Concretely, DeepWalk considers the paths as sentences and implements an NLP model to learn node embeddings. Compared to DeepWalk, node2vec introduces a trade-off strategy using breadth-first and depth-first search to perform a biased random walk. In recent year, there are still many random walk-based papers working on improving performance. For example, AWE [24] uses a recently developed method called anonymous walks, i.e., an anonymized version of the random walk-based method providing characteristic graph traits and are capable to exactly reconstruct network proximity of a node. AttentionWalk [1] uses the softmax to learn a free-form context distribution in a random walk; then the learned attention parameters guide the random walk, by allowing it to focus more on short or long term dependencies when optimizing an upstream objective. BiNE [18] proposes methods for bipartite graph embedding by performing biased random walks. Then they generate vertex sequences that can well preserve the long-tail distribution of vertices in original bipartite graphs.

Deep learning-based methods. Deep learning has shown outstanding performance in a wide variety of research fields. SDNE [47]

applies a deep autoencoder to model non-linearity in the graph structure. DNGR 


learns deep low-dimensional vertex representations, by using the stacked denoising autoencoders on the high-dimensional matrix representations. Furthermore, Graph Convolutional Network (GCN) 


introduces a well-behaved layer-wise propagation rule for the neural network model which operates directly on graphs in Equation (



with , where and

is the adjacency and identity matrix and

is the diagonal degree matrix of . is a weight matrix for the -th neural network layer and

is a non-linear activation function like the

ReLU. and are the input and output for layer and layer , respectively. Another important work is Graph Attention Network (GAT) [46], which leverages masked self-attentional layers to address the shortcomings of prior graph convolution-based methods. Specifically, as shown in Equation (3):


is the neighbors of node . GAT computes normalized coefficients using the softmax function across different neighborhoods by a byproduct of an attentional mechanism across node pairs. To stabilize the learning process of self-attention, GAT uses multi-head attention to replicate times of learning phases, and outputs are feature-wise aggregated (typically by concatenating or adding), as shown in Equation (4):


where and

are the attention coefficients and the weight matrix specifying the linear transformation of the

-th replica. Recently, HGCN [11] and ATTH [10] use hyperbolic model to embed hierarchical graph structure with less distortion.

3.2 Heterogeneous Graph Embedding Models

The heterogeneity in both graph structures and node attributes makes it challenging for the graph embedding task to encode their diverse and rich information. In this section, we will introduce translational distance methods and semantic matching methods, which try to address the above issue by constructing different energy functions. Furthermore, we will introduce meta-path-based methods that use different strategies to capture graph heterogeneity.

Translational distance methods. The first work of translation distance models is TransE [6]. The basic idea of the translational distance models is, for each observed fact representing head entity having a relation with tail entity , to learn a good graph representation such that and are closely connected by relation in low dimensional embedding space, i.e., h + r t using geometric notations. Here h, r and t are embedding vectors for entities , and relation , respectively. The energy function of TransE is defined as . The margin-based objective funtion of TransE is shown in Equation (5):


where denotes the set containing the true facts, e.g., , and is the set of false triplets, e.g.,

, that are not observed in the knowledge graphs. Please note that the energy function

here can be viewed as the distance score of the embedding of entities and in terms of relation . To further improve TransE model and address its inadequacies, many recent works have been developed. For example, RotatE [44] defines each relation as a rotation from the source entity to the target entity in the complex vector space. QuatE [56] computes node embedding vectors in the hypercomplex space with three imaginary components, as opposed to the standard complex space with a single real component and imaginary component. MuRP [3] is a hyperbolic embedding method that embeds multi-relational data in the Poincaré ball model of hyperbolic space, which can well perform in hierarchical and scale-free graphs.

Semantic matching methods. Semantic matching models exploit similarity-based scoring functions. They measure plausibility of facts by matching latent semantics of entities and relations embodied in their representations. Targetting the observed fact , RESCAL [34] embeds each entity with a vector to capture its latent semantics and each relation with a matrix to model pairwise interactions between latent factors. Equation (6) defines the energy function:


where is a matrix associated with the relation. HolE [33] deals with directed graphs and composes head entity and tail entity by their circular correlation, which achieves a better performance than RESCAL. There are other works trying to extend or simplify RESCAL, e.g., DistMult [54], ComplEx [45], ANALOGY [30]. Other direction of semantic matching methods is to fuse neural network architecture by considering embedding as the input layer and energy function as the output layer. For instance, SME model [5]

first imputs embeddings of entities and relations in the input layer. The relation

is then combined with the head entity to get , and with the tail entity to get in the hidden layer. The score function is defined as . There are other semantic matching methods using neural network architecture, e.g., NTN [42], MLP [15].

Meta-path-based methods. Generally, a meta-path is an ordered path that consists of node types and connects via edge types defined on the graph schema, e.g., , which describes a composite relation between node types , , , and edge types , , . Thus, meta-paths can be viewed as high-order proximity between two nodes with specific semantics. A set of recent works have been proposed. Metapath2vec [16] computes node embeddings by feeding metapath-guided random walks to a skip-gram[32] model. HAN [51] learns meta-path-oriented node embeddings from different meta-path-based graphs converted from the original heterogeneous graph and leverages the attention mechanism to combine them into one vector representation for each node. HERec [39] learns node embeddings by applying DeepWalk [36] to the meta-path-based homogeneous graphs for recommendation. MAGNN [17] comprehensively considers three main components to achieve the state-of-the-art performance. Concretely, MAGNN [17] fuses the node content transformation to encapsulate node attributes, the intra-metapath aggregation to incorporate intermediate semantic nodes, and the inter-metapath aggregation to combine messages from multiple metapaths.

Other methods. LANE [23] constructs proximity matrices by incorporating label information, graph topology, and learns embeddings while preserving their correlations based on Laplacian matrix. EOE [53]

aims to embed the graph coupled by two non-attribute graphs. In EOE, latent features encode not only intra-network edges, but also inter-network ones. To tackle the challenge of heterogeneity of two graphs, the EOE incorporates a harmonious embedding matrix to further embed the embeddings. Inspired by generative adversarial network models, HeGAN 

[21] is designed to be relation-aware in order to capture the rich semantics on heterogeneous graphs and further trains a discriminator and a generator in a minimax game to generate robust graph embeddings.

3.3 Dynamic Graph Embedding Models

In practice, graphs are always evolving over time. Recently, much attention is paid to graph embedding for dynamic graphs. In this section, we will briefly introduce some typical general models as follows.

Probabilistic models. In generative probabilistic models, Dynamic latent space models and Dynamic stochastic block models

are two main types within. Latent space models model every node with an unobserved feature vector. An edge between two nodes is then formed conditionally independent of all other pairs of nodes. The latent features are changed over time. Such models are flexible and require fitting of parameters with Markov chain Monte Carlo methods that scale up to only a few hundred nodes 


. Stochastic block models divide nodes into blocks (classes), where nodes within a block are assumed to have identical statistical properties. An edge between two nodes is formed independently of all other pairs of nodes with a probability dependents only on the blocks of the two nodes, giving the adjacency matrix of blocks corresponding to pairs of blocks.

Dynamic graph embedding methods. In dynamic graph embedding methods, there are mainly three types of methods, i.e., tensor decomposition-based methods, random walk-based methods, deep learning-based methods

, which are actually inspired from those for homogeneous graphs. Tensor decomposition is analogous to matrix factorization where the additional dimension is time. As for random walk-based methods for dynamic graphs, they are generally extensions of random walk-based embedding methods for static graphs or they apply temporal random walks. Furthermore, deep learning models for dynamic graphs mainly contain two types of models: temporal restricted Boltzmann machines and dynamic graph neural networks. For detailed analysis, please refer to the survey over dynamic graph embedding in 

[41, 26].

4 Applications and Tasks in Biomedical Domain

4.1 Biomedical datasets

We first summarize some commonly used biomedical datasets in Table 1, where the columns are: average number of nodes/edges, dimensionality of node features, number of node classes, and graphs, respectively.

Dataset avg. avg. Features Classes Graphs Graph Type
PubMed-diabetes 19,717.00 44,338.00 500 3 1 Citation Graph
PPI 2,372.67 34,113.17 50 121 24 Bio-chemical Graph
MUTAG 17.93 19.79 7 2 188 Bio-chemical Graph
NCI-1 29.87 32.30 37 2 4,110 Bio-chemical Graph
NCI-33 30.20 - 29 - 2,843 Bio-chemical Graph
NCI-83 29.50 - 28 - 3,867 Bio-chemical Graph
NCI-109 29.60 - 38 - 4,127 Bio-chemical Graph
DD 284.31 715.65 82 2 1,178 Bio-chemical Graph
PROTEINS 39.06 72.81 4 2 1,113 Bio-chemical Graph
ENZYMES 32.46 63.14 6 6 600 Biological Graph
Table 1: Datasets Statistics

PubMed-diabetes111https://linqs.soe.ucsc.edu/data is a citation graph consists of scientific publications and citations pertaining to diabetes. PPI222http://snap.stanford.edu/graphsage/ppi.zip contains 24 graphs including protein-protein interactions of different organisms such as Homo sapiens, Mus musculus, etc. MUTAG333https://ls11-www.cs.uni-dortmund.de/people/morris/graphkerneldatasets dataset contains nitro compounds which are divided into two classes according to their mutagenic effect on a bacterium. NCI-{1, 33, 83, 109}[35] contains chemical compounds which are screened for activity against non-small cell cancer of lung, melanoma, breast and ovarian, respectively. DD444https://chrsmrrs.github.io/datasets/docs/datasets/ and PROTEINS4 are two datasets that represent proteins as graphs which labels are enzymes and non-enzymes and ENZYMES[52] is a biological dataset.

4.2 Applications and Tasks

In recent years, graph embedding methods have been applied in biomedical data science. In this section, we will introduce some main biomedical applications of applying graph embedding techniques, including pharmaceutical data analysis, multi-omics data analysis and clinical data analysis.

Pharmaceutical data analysis. Generally, there are two main types of applications for pharmaceutical data analysis, i.e., (i) drug repositioning and (ii) adverse drug reaction analysis.

(i) Drug repositioning usually aims to predict unknown drug-target or drug-disease interactions. Recently, DTINet [31] generates drug and target-protein embedding by separately performing random walk with restart on heterogeneous biomedical graphs. Then DTINet projects drugs into the embedding space of target proteins and made predictions based on geometric proximity. Other studies over drug repositioning focused on predicting drug disease associations. For instance, Dai et al. [14]

first embed genes by applying eigenvalue decomposition to a gene-gene interaction graph and calculated genomic representations for drugs and diseases from the gene embedding vectors. Wang et al. 

[49] proposed to detect unknown drug-disease interactions from the medical literature by fusing NLP and graph embedding techniques. (ii) An adverse drug reaction (ADR) is defined as any undesirable drug effect out of its desired therapeutic effects that occur at a usual dosage, which now is the center of drug development before a drug is launched on the clinical trial.

Multi-omics data analysis. The main aim of multi-omics is to study structures, functions, and dynamics of organism molecules. Fortunately, graph embedding now becomes a valuable tool to analyze relational data in omics. Concretely, the computation tasks included in multi-omics data analysis are mainly about (i) genomics, (ii) proteomics and (iii) transcriptomics.

(i) Works of graph embedding used in genomics data analysis usually try to decipher biology from genome sequences and related data. For example, based on gene-gene interaction data, a recent work [29] extends the graph embedding method, i.e., LINE, over two bipartite graphs, Cell-ContexGene and Gene-ContexGene networks, and then proposes SCRL to address representation learning for single cell RNA-seq data, which outperforms traditional dimensional reduction methods according to the experimental results. (ii) As we have introduced before, PPIs play key roles in most cell functions. Graph embedding has also been introduced to PPI graphs for proteomics data analysis, such as assessing and predicting PPIs or predicting protein functions, etc. Recently, ProSNet [50] has been proposed for protein function prediction. In this model, they introducing DCA to a heterogeneous molecular graph and further use the meta-path-based methods to modify DCA for preserving heterogeneous structural information. Thanks to the proposed embedding methods for such heterogeneous graphs, their experimental prediction performance was greatly improved. (iii) As for transcriptomics study, the focus is to analyze an organism’s transcriptome. For instance, Identifying miRNA-disease associations now becomes an important topic of pathogenicity; while graph embedding now provides a useful tool to involve in transcriptomics for prediction of miRNA-disease associations. To predict new associations, CMFMDA [38] introduces matrix factorization methods to the bipartite miRNA-disease graph for graph embedding. Besides, Li et al. [28] proposed a method by using DeepWalk to embed the bipartite miRNA-disease network. Their experimental results demonstrate that, by preserving both local and global graph topology, DeepWalk can result in significant improvements in association prediction for miRNA-disease graphs.

Clinical data analysis. Graph embedding techniques have been applied to clinic data, such as electronic medical records (EMRs), electronic health records (EHRs) and medical knowledge graph, providing useful assistance and support for clinicians in recent clinic development.

EMRs and EHRs are heterogeneous graphs that comprehensively include medical and clinical information from patients, which provide opportunities for graph embedding techniques to make medical research and clinical decision. To address the heterogeneity of EMRs and EHRs data, GRAM [13] learns EHR representation with the help of hierarchical information inherent to medical ontologies. ProSNet [22] constructs a biomedical knowledge graph to learn the embeddings of medical entities. The proposed method is used to visualize the Parkinson’s disease data set. Conducting medical knowledge graph is of great importance and attention recently. For instance, analogous to TransE, Zhao et al. [58] defined energy function by considering the relation between the symptoms of patients and diseases as a translation vector to further learn the representation of medical forum data. Then a new method is proposed to learn embeddings of medical entities in the medical knowledge graph, based on the energy functions of RESCAL and TransE [57]. In addition, Wang et al. [48] constructed objective function by using both the energy function of TransR and LINE’s 2nd-order proximity measurement to learn embeddings from a heterogeneous medical knowledge graph to further recommend proper medicine to patients.

5 Conclusion

Graph embedding methods aim to learn compact and informative representations for graph analysis and thus provide a powerful opportunity to solve the traditional graph-based machine learning problems both effectively and efficiently. With the rapid development of relational data in the biomedical data domain, applying graph embedding techniques now draws much attention in numerous biomedical applications. However, as we have reviewed in this survey, the capability of graph embedding for biomedical graph analysis has not been fully explored. There may exist many issues associated with the biomedical data that may bring challenges to biomedical graph embedding tasks. For example, biomedical data quality could be not well structured; knowledge and information from biomedical domain or health care records could be complicated, compared to the general domain. In this survey, we introduce recent developments and trends of different graph embedding methods. By carefully summarizing biomedical applications with graph embedding methods, we provide more perspectives over this emerging research domain for better improvement in human health care.


  • [1] S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi (2018) Watch your step: learning node embeddings via graph attention. In NeurIPS, pp. 9180–9190. Cited by: §3.1.
  • [2] R. Albert (2005) Scale-free networks in cell biology. Journal of cell science. Cited by: §1.
  • [3] I. Balazevic, C. Allen, and T. Hospedales (2019) Multi-relational poincaré graph embeddings. In NeurIPS, pp. 4465–4475. Cited by: §3.2.
  • [4] A. Barabási, N. Gulbahce, and J. Loscalzo (2011) Network medicine: a network-based approach to human disease. Nature reviews genetics 12 (1), pp. 56–68. Cited by: §1.
  • [5] A. Bordes, X. Glorot, J. Weston, and Y. Bengio (2014) A semantic matching energy function for learning with multi-relational data. ML 94 (2), pp. 233–259. Cited by: §3.2.
  • [6] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In NeurIPS, pp. 2787–2795. Cited by: §3.2.
  • [7] H. Cai, V. W. Zheng, and K. C. Chang (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. TKDE 30 (9), pp. 1616–1637. Cited by: §1.
  • [8] S. Cao, W. Lu, and Q. Xu (2016) Deep neural networks for learning graph representations. In AAAI, Cited by: §3.1.
  • [9] I. Chami, S. Abu-El-Haija, B. Perozzi, C. Ré, and K. Murphy (2020) Machine learning on graphs: a model and comprehensive taxonomy. arXiv preprint arXiv:2005.03675. Cited by: §1.
  • [10] I. Chami, A. Wolf, D. Juan, F. Sala, S. Ravi, and C. Ré (2020) Low-dimensional hyperbolic knowledge graph embeddings. ACL. Cited by: §3.1.
  • [11] I. Chami, Z. Ying, C. Ré, and J. Leskovec (2019)

    Hyperbolic graph convolutional neural networks

    In NeurIPS, pp. 4868–4879. Cited by: §3.1.
  • [12] Y. Chen, J. Zhang, Y. Fang, X. Cao, and I. King (2020) Efficient community search over large directed graph: an augmented index-based approach. In IJCAI, pp. 3544–3550. Cited by: §1.
  • [13] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun (2017)

    GRAM: graph-based attention model for healthcare representation learning

    In SIGKDD, Cited by: §4.2.
  • [14] W. Dai, X. Liu, Y. Gao, L. Chen, J. Song, D. Chen, K. Gao, Y. Jiang, Y. Yang, J. Chen, et al. (2015) Matrix factorization-based prediction of novel drug indications by integrating genomic space. CMMM 2015. Cited by: §4.2.
  • [15] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In SIGKDD, pp. 601–610. Cited by: §3.2.
  • [16] Y. Dong, N. V. Chawla, and A. Swami (2017) Metapath2vec: scalable representation learning for heterogeneous networks. In SIGKDD, pp. 135–144. Cited by: §3.2.
  • [17] X. Fu, J. Zhang, Z. Meng, and I. King (2020) MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding. In WWW, pp. 2331–2341. Cited by: §3.2.
  • [18] M. Gao, L. Chen, X. He, and A. Zhou (2018) Bine: bipartite network embedding. In SIGIR, pp. 715–724. Cited by: §3.1.
  • [19] P. Goyal and E. Ferrara (2018) Graph embedding techniques, applications, and performance: a survey. Knowledge-Based Systems 151, pp. 78–94. Cited by: §1.
  • [20] A. Grover and J. Leskovec (2016) Node2vec: scalable feature learning for networks. In SIGKDD, pp. 855–864. Cited by: §3.1.
  • [21] B. Hu, Y. Fang, and C. Shi (2019) Adversarial learning on heterogeneous information networks. In SIGKDD, pp. 120–129. Cited by: §3.2.
  • [22] E. W. Huang, S. Wang, and C. Zhai (2018) VisAGE: integrating external knowledge into electronic medical record visualization.. In PSB, pp. 578–589. Cited by: §4.2.
  • [23] X. Huang, J. Li, and X. Hu (2017) Label informed attributed network embedding. In WSDM, pp. 731–739. Cited by: §3.2.
  • [24] S. Ivanov and E. Burnaev (2018) Anonymous walk embeddings. arXiv:1805.11921. Cited by: §3.1.
  • [25] R. R. Junuthula, K. S. Xu, and V. K. Devabhaktuni (2016) Evaluating link prediction accuracy in dynamic networks with added and removed edges. In BDCloud-SocialCom-SustainCom, pp. 377–384. Cited by: §3.3.
  • [26] S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and P. Poupart (2020) Representation learning for dynamic graphs: a survey. Journal of Machine Learning Research 21 (70), pp. 1–73. Cited by: §3.3.
  • [27] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. Cited by: §3.1.
  • [28] G. Li, J. Luo, Q. Xiao, C. Liang, P. Ding, and B. Cao (2017) Predicting microrna-disease associations using network topological similarity based on deepwalk. IEEE Access 5, pp. 24032–24039. Cited by: §4.2.
  • [29] X. Li, W. Chen, Y. Chen, X. Zhang, J. Gu, and M. Q. Zhang (2017) Network embedding-based representation learning for single cell rna-seq data. Nucleic acids research 45 (19), pp. e166–e166. Cited by: §4.2.
  • [30] H. Liu, Y. Wu, and Y. Yang (2017) Analogical inference for multi-relational embeddings. In ICML, pp. 2168–2178. Cited by: §3.2.
  • [31] Y. Luo, X. Zhao, J. Zhou, J. Yang, Y. Zhang, W. Kuang, J. Peng, L. Chen, and J. Zeng (2017) A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nature communications 8 (1), pp. 1–13. Cited by: §4.2.
  • [32] T. Mikolov, K. Chen, G. Corrado, and J. Dean (2013)

    Efficient estimation of word representations in vector space

    In ICLR (Workshop Poster), Cited by: §3.2.
  • [33] M. Nickel, L. Rosasco, and T. Poggio (2016) Holographic embeddings of knowledge graphs. In AAAI, Cited by: §3.2.
  • [34] M. Nickel, V. Tresp, and H. Kriegel (2011) A three-way model for collective learning on multi-relational data.. In ICML, Vol. 11, pp. 809–816. Cited by: §3.2.
  • [35] S. Pan, X. Zhu, C. Zhang, and S. Y. Philip (2013) Graph stream classification using labeled and unlabeled graphs. In ICDE, pp. 398–409. Cited by: §4.1.
  • [36] B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In SIGKDD, pp. 701–710. Cited by: §1, §3.1, §3.2.
  • [37] J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In WSDM, Cited by: §3.1.
  • [38] Z. Shen, Y. Zhang, K. Han, A. K. Nandi, B. Honig, and D. Huang (2017) MiRNA-disease association prediction with collaborative matrix factorization. Complexity. Cited by: §4.2.
  • [39] C. Shi, B. Hu, W. X. Zhao, and S. Y. Philip (2018) Heterogeneous information network embedding for recommendation. TKDE 31 (2), pp. 357–370. Cited by: §3.2.
  • [40] B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi (2017) Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis. IEEE journal of biomedical and health informatics 22 (5), pp. 1589–1604. Cited by: §1.
  • [41] J. Skarding, B. Gabrys, and K. Musial (2020) Foundations and modelling of dynamic networks using dynamic graph neural networks: a survey. arXiv:2005.07496. Cited by: §3.3.
  • [42] R. Socher, D. Chen, C. D. Manning, and A. Ng (2013) Reasoning with neural tensor networks for knowledge base completion. In NeurIPS, pp. 926–934. Cited by: §3.2.
  • [43] C. Su, J. Tong, Y. Zhu, P. Cui, and F. Wang (2020) Network embedding in biomedical data science. Briefings in bioinformatics 21 (1), pp. 182–197. Cited by: §1.
  • [44] Z. Sun, Z. Deng, J. Nie, and J. Tang (2019) RotatE: knowledge graph embedding by relational rotation in complex space. In ICLR (Poster), Cited by: §3.2.
  • [45] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard (2016) Complex embeddings for simple link prediction. Cited by: §3.2.
  • [46] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv:1710.10903. Cited by: §3.1.
  • [47] D. Wang, P. Cui, and W. Zhu (2016) Structural deep network embedding. In SIGKDD, pp. 1225–1234. Cited by: §3.1.
  • [48] M. Wang, M. Liu, J. Liu, S. Wang, G. Long, and B. Qian (2017) Safe medicine recommendation via medical knowledge graph embedding. arXiv:1710.05980. Cited by: §4.2.
  • [49] P. Wang, T. Hao, J. Yan, and L. Jin (2017) Large-scale extraction of drug–disease pairs from the medical literature. Journal of the AIST 68 (11), pp. 2649–2661. Cited by: §4.2.
  • [50] S. Wang, M. Qu, and J. Peng (2017) PROSNET: integrating homology with molecular networks for protein function prediction. In PSB, pp. 27–38. Cited by: §4.2.
  • [51] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu (2019) Heterogeneous graph attention network. In WWW, pp. 2022–2032. Cited by: §3.2.
  • [52] Z. Xinyi and L. Chen (2019) Capsule graph neural network. In ICLR (Poster), Cited by: §4.1.
  • [53] L. Xu, X. Wei, J. Cao, and P. S. Yu (2017) Embedding of embedding (eoe) joint embedding for coupled heterogeneous networks. In WSDM, pp. 741–749. Cited by: §3.2.
  • [54] B. Yang, W. Yih, X. He, J. Gao, and L. Deng (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv:1412.6575. Cited by: §3.2.
  • [55] H. Yang, S. Pan, P. Zhang, L. Chen, D. Lian, and C. Zhang (2018) Binarized attributed network embedding. In ICDM, pp. 1476–1481. Cited by: §3.1.
  • [56] S. Zhang, Y. Tay, L. Yao, and Q. Liu (2019) Quaternion knowledge graph embeddings. In NeurIPS, pp. 2731–2741. Cited by: §3.2.
  • [57] C. Zhao, J. Jiang, Y. Guan, X. Guo, and B. He (2018)

    EMR-based medical knowledge representation and inference via markov random fields and distributed representation learning

    Artificial intelligence in medicine 87, pp. 49–59. Cited by: §4.2.
  • [58] S. Zhao, M. Jiang, Q. Yuan, B. Qin, T. Liu, and C. Zhai (2017) ContextCare: incorporating contextual information networks to representation learning on medical forum data.. In IJCAI, pp. 3497–3503. Cited by: §4.2.