1 Introduction
Knowledge graphs [2] are knowledge bases which represent domain knowledge as interlinked entities, forming the nodes of a graph. Driven by the recent ‘explosion’ of data, many corporations and academic institutions are relying on knowledge graphs for the modelling and analysis of large amounts of data. In this work, we use graph embeddings [7,12] to predict possible relationships between drugs in a drug knowledge base represented as a knowledge graph. Finally, we intend to rely on graph embeddings to provide relevant information and predictions on the given database, as well as to assist us in performing the tasks of relation predictions using link prediction and drugdrug similarity utilising the node similarity concepts mentioned in this work. Because of the reliance on expensive medical equipment and medical professionals to deal with nuances of the equipment and other activities, the field of drug similarity and analysis is time consuming and expensive. We aim to achieve very good results by employing knowledge graphs to develop drugsimilarity and predictions in less time and at a lower cost than previous techniques. Our drug similarity model developed will help in drug similarity discovery which will help reduce the side effect [1] caused by the use of alternative similar drugs.
2 Related Work
Currently, knowledge graphs [2] are able to support many medical applications, due to the fact that graphs are a practical resource of many realworld applications [3]
. Typically, biological knowledge graphs are built using manually selected datasets such as MIMICiii, ICD9, and others. Other options include using natural language processing to lessen the work of manual information collection. Today, knowledge graphs
[2] are used in a variety of biomedical applications such as Genomics, Proteomics, Drug Side Effects, Drug Repurposing, Safe Drug Recommendation, and many more, indicating their popularity in this field. The research [10] explores using representation learning in knowledge graphs for the study of drug target predictions and drugdrug interactions and [11] uses knowledge graph embeddings to perform link predictions and drug target discovery that too using the KEGG database. The work done in research [4] utilizes the use of LSTM and knowledge graph embeddings based model to predict drugdrug interaction. A section of our work inspired from the works like [5] which deals with making inferences using the learned models. The research explore various graph embedding model which has been described and compared in [6], [7] and [8].3 KEGG Database
KEGG (Kyoto Encyclopedia of Genes and Genomes) [9] is a knowledge base that contains genetic, chemical, and functional information. It contains various entities such as disease, gene, network, route, drug, and so on, and each entity type is linked to the others via a certain form of relationship as shown in the image below. KEGG was developed by Kanehisa Laboratories^{1}^{1}1Kanehisa Lab Website https://www.kanehisa.jp/ and is structured as a network of interconnected entities that resembles the biological ecosystem at the molecular level [11]. The database compromises ﬁve types of entities. The first one is drugs which are comprehensive drug information site that exclusively includes pharmaceuticals that are approved in Europe, Japan, and the United States, includes information about drugs such as molecular interactions, drug metabolism, and chemical structure. The second is genes which involves an amalgamation of genes and proteins, contains information about gene sequences and their interaction with other biological entities [11]. The next are diseases, collection of singlegene disorders, multifactorial diseases, and infectious diseases with a focus on perturbation, which deals with disease interactions with other entities. Pathways comprises manually selected pathway maps containing data on metabolisms, biological processes, human diseases, drug development, and other topics. Each route is linked to entities such as diseases, medications, and genes [11]. By linking distinct entities in other KEGG databases based on this network database, the KEGG Network database represents information about medications and disorders in the form of molecular networks.
4 Methodology
4.1 Knowledge Graph and Graph Embeddings
A knowledge graph is defined as ”a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities and whose edges represent relations between these entities.” [2]. Formally, a knowledge graph consists of triplets, consisting of two entities (head, tail) connected or related using edges (relationship). The triplets are represented as (head,relationship, tail) or (h,r,t) with h,t E, r
R where E is a set of entities and R represents set of all possible relationships that can exist between any two entities. For example, KEGG contains instances like (D11034, DRUG_EFFICACY_DISEASE, H00409) which represents the drug and disease are connected by the specific relationship in the database. Embedding learning is an efficient way to tackle data sparseness by representing knowledge graph’s entities and relationship as low dimensional real value vectors storing the structural characteristics of the graph structure within themselves
[13]. In research [7], the authors have divided the graph embedding models in three families, the first translation distance based which utilizes distance based scoring methods, the second ones are semantic matching based which rely on similarity based scoring methods and finally the third family of neural network based models.
4.2 Link Prediction
The task of predicting the connection between any two nodes based on their node properties is referred to as link prediction. The link prediction task in graph G, where each node represents some information, is to develop a model that predicts whether two nodes are related by an edge or not [14, 15]
.This can be used to find additional information for an existing dataset and can also be considered an information extraction task. A link prediction job on a drugdrug interaction knowledge network, for example, can assist us in relieving new information by suggesting probable predictions between any new pair of medications. For the task of link prediction, where every entity is considered as a target entity for a triple in testing data
[10], metrics such as Mean Reciprocal Rank (MRR) and Hits@n are considered for evaluations. These scoring methods are then used to generate sorted scores based on corrupted or correct triplets [16].
MRR: Mean Reciprocal Rank calculates the mean of the reciprocals of vectors of ranking ranks. Also, MRR is less sensitive to outliers as compared to Mean Rank
[17]. MRR scores are standardised from 0 to 1, with 1 representing perfect ranking.(1) Here, refers to the rank position of the first relevant element for the ith query [11].

Hits@n: It represents the model’s likelihood of ranking the relevant (true) fact in the top k element scores in the rank by reporting how many elements of vectors of ranking rank made it to the top n forecasts [11].
4.3 Graph Embedding Algorithms
The key characteristic of graph embeddings is that they hold the complex graph structure and interactions within themselves, with the distance between latent dimensions representing a metric for similarities between distinct graph elements [18]. The following graph embedding models has been used:

TransE: An additive model which uses distance based scoring function for the link prediction task, treating edges of graph as linear transformations
[17]. The scoring function calculates a similarity between embedding of the head translated by the embedding of relationship and the embedding of the tail. Mathematically the scoring function is defined as below which involves using the or norms:(2) As TransE involves simple translation operation by forcing one to one mappings, it does not perform well with multirelational graphs.

DistMult: The algorithm uses bilinear diagonal modeling using the trilinear dot product scoring function [5] defined as follow:
(3) (4) Here h, r and t are embeddings of head, relationship and tail respectively. The score function has its own limitation as it deals with the symmetric relationships as the score function is only able to capture the pairwise interaction between components of h and t along the same dimensions [6].

ComplEx: This model represents entities and relationships as complex vector embeddings, which consist of two vectors, the real and imaginary [19]. Fact assertion using the asymmetric score function defined below for fact (h,r,t):
(5) (6) Also is conjugate of t and results in providing the real part of the complex value. The algorithm is an extension of DistMult that involves complex embeddings leading to better modelling of asymmetric relations due to asymmetric scoring function. Now embeddings of h,r, and t no longer exist in real space, but complex space . For symmetric relationships, DistMult performs well with symmetric relations and ComplEx works well with antisymmetric relations [20].

HolE: Stands for Holographic embeddings [21] which associate each entity with a vector to capture its latent semantics [6] The scoring function for HolE is defined as:
(7) Where the circular correlation operator is defined as:
(8) (9) HolE uses the simplicity of TrasnE and expressive power of the tensor product, to generate better embeddings. It is able to deal with asymmetric relationship due to the fact that circular correlation is not commutative i.e.
[6]. 
ConvE: Due its neural network structure it performs better in making nonlinear transformations [22]. It uses 2D convolution which is better at extracting interatcions between embeddings as compared to 1D convolution network. The scoring function is defined as:
(10) Here, g is the nonlinear activation function, vec indicated the 2D reshaping function and * is the linear convolution operator and W is the weight matrix. Also,
andare 2D reshaping of head entity embedding and the relationship embedding respectively, with loss function BCE used :
(11) 
ConvKB: Unlike ConvE it utilizes 1D convolution and models the relationship among same dimensional entries of the embeddings [23], which leads it to generalize the translational characteristics present in the translation based models. It represent a kdimension embedding of every triple which is viewed as a matrix . An operation is performed on each row of the generated matrix, the row i can be represented as . An additional filter is operated on every row to examine the global relationships, which enhances the translation characteristics of algorithm.
Finally the feature map v is generated, which is generated as follows:
(12) Finally, these feature maps are used in scoring function of ConvKB where is convolution operation and are shared parameters.
(13)
4.4 Visualizing Knowledge Graph
The below visual represents a section of original KEGG knowledge graph where different entities are denoted using different colours. It can observed how few entities are acting as the bridge to connected multiple clusters in the node. It is clear through visualization that the number of connection per entity is highly imbalanced, with few nodes connected to a large section of other nodes creating a cluster of their own.
4.5 Novel Node Similarity Measure Using Graph Embedding and Link Prediction Techniques
We will use the cosine similarity measure between embeddings and information gathered from the link prediction task to improve the performance of the similarity measure by capturing various aspects of similarity. The link prediction outputs are transformed into probabilities using calibration where higher link prediction probability represents higher chances of a link existing between those two nodes. We exploit the fact that if two drugs are similar their interaction with other entities in the dataset should be similar i.e. their chance of linking with other drugs should follow similar trends. Our aim will be to find some kind of measure that can provide real value output to show the extent of similarity between two entities, for our use case these two entities will be two drugs so our system will act as a drug similarity system. The cosine similarity between two graph embeddings, each of length m defined as
and is defined as follows:(14) 
A loss function is defined to measure the difference between the link prediction results of any two drugs with respect to all other possible entities in the dataset.
(15) 
Here represents link prediction probability of drug A with entity i and stands of count of entities taken into account in this activity. Finally, MSE will give us a sense of the difference in interaction between two drugs with other entities. The final similarity measure will use information from both similarity measures using graph embeddings and the link prediction loss function results. The novel similarity measure is defined in next equation.
(16) 
Here, MSE is link prediction loss, and are embeddings from drugs and and Sim represents the overall similarity score between a pair of drugs.
5 Evaluation
5.1 Model Settings
Like any other machine learning model, the performance of embeddings is dependent on the quantity and quality of data used, different hyperparameters. Our experiments use embedding sizes of 100, 200, 300 with loss functions Multiclass NLL Loss and Binary Cross Entropy Loss. Optimizers used are Adam (Adaptive Moment Estimation) and Stochastic Gradient Descent are considered with learning rates 1e1 and 1e3 for experimentation. The test includes regularization techniques that are L1 (Lasso Regression), L2 (Ridge Regression) and L3 (Nuclear 3norm) proposed in the ComplExN3 paper
[24] with regularization constants 1e5, 1e2 and 1e1. There are other hyperparameters which are valid for convolution graph neural net based models like ConvE and ConvKB. Their parameters are kept fixed as number of feature maps per convolution kernel as 32, convolution kernel size as 3, dropout at embedding layer, convolution layer and dense layer as 0.2, 0.3 and 0.2 respectively.5.2 Performance of Algorithms on Link Prediction Task
The six different graph embedding model are trained on KEGG data which is split in 80:20 as train/test set and evaluated on the link prediction task using measures like MRR and Hits@k.
ComplEx has performed better than all other algorithms on the link prediction task providing the best MRR and Hits@k scores. Surprisingly, the complex graph convolution network models like ConvE and ConvKB have performed poorly on the KEGG dataset. Overall, ComplEx algorithm with embedding size of 300, Adam optimizer with learning rate of 1e3 with Multiclass NLL Loss and L3 Regularization technique with 1e2 as regularization constant provides best scores like 0.46 as MRR, 0.64, 0.49 and 0.37 as Hits@10, Hits@3 and Hits@1 respectively.
5.3 Visualizing Different Entity Type in 2D Embedding Space
The dimensionality reduction method PCA (Principal Component Analysis)
[12] is used to reduce the larger space embeddings to smaller spaces that are easier to visualise. The embeddings are transformed to 2D space and visualized on a plot marking the different entities present in dataset namely drug, disease, gene, network and pathway with different colours for better understanding of relationships between entities. It can be observed how the graphed embedding algorithm cleverly generated the embeddings such that different entities are present in the different sections of the plot, storing their personalities and entity type’s within them.5.4 Insights from Link Predictions
The learned model is used to find possibilities for new unseen relationships between entities to provide meaningful insights. The model outputs the rank of the statement, score of the statement which is normalized to a definite range from 0 to 1 to give a sense of a comparative probabilities among different statements. The transformation of the scores (real number) to probabilities (between 0 and 1) is performed using the expit transformation, which takes any real number x and transforms it to a value in (0, 1). The model used for below predictions is the ComplEx model that has provided best scores, 0.46 as MRR, 0.64, 0.49 and 0.37 as Hits@10, Hits@3 and Hits@1 respectively.
Statement  Rank  Score  Prob 

hsa04024 PATHWAY_GENE HSA:51196  236  3.704783  0.975985 
D11034 DRUG_EFFICACY_DISEASE H00409  2  4.851221  0.992242 
D04905 DRUG_TARGET_PATHWAY hsa05010  1  4.979891  0.993172 
N00060 NETWORK_GENE HSA:23401  1  5.399962  0.995504 
hsa04024 PATHWAY_GENE HSA:6336  19814  0.21121  0.447393 
D11056 DRUG_TARGET_GENE HSA:7388  7636  0.089212  0.522288 
D11056 DRUG_TARGET_GENE HSA:3352  133  2.823472  0.943931 
N00399 NETWORK_GENE HSA:9217  27812  0.44892  0.389616 
D04905 DRUG_TARGET_PATHWAY hsa04728  25  4.369776  0.987504 
N00399 NETWORK_GENE N00399  16476  0.05818  0.485458 
H00242 DISEASE_GENE D11034  28037  0.20788  0.448214 
N00060 DRUG_TARGET_PATHWAY hsa04380  32017  0.92776  0.283378 
The table shows that there are a few assertions for which the model projected fairly high odds. The probability of relationship between D11056 (Drug name : Mirtazapine hydrate) and HSA:3352 (Gene name : HTR1D, 5HT1D, HT1DA, HTR1DA, HTRL, RDC4) under this model is quite high with 0.94, which suggests a possible connection between these two entities. Similarly, the relationship between D04905 (Drug name : Memantine hydrochloride) and pathway hsa04728 have a high connection probability of 0.987 under the model.
5.5 Drug Similarity System Using Graph Embeddings and Link Prediction
Graph embeddings generated by the trained graph model are utilized to determine the possible similarity of two drugs. A loss function like mean square error is used to measure the difference between the link prediction scores of drug A with other entities and drug B with other entities. The new similarity measure incorporates inputs from the cosine similarity between the embeddings of the pair of drugs and the link prediction score mean square error, which is expressed in section 4.5. The top 10 possibly similar drugs with respect to D00043 (Drug name : Isoflurophate Fluostigmine) sorted based on the novel similarity measure are defined below:
No.  Drug  Cosine  MSE  Ratio  Drug Name 
1  D01228  0.933001  0.000289  3223.509127  Distigmine bromide (JP18/INN) 
2  D00196  0.939603  0.000306  3075.463914  Physostigmine (USP) 
3  D03751  0.943170  0.000324  2915.059617  Icopezil maleate (USAN) 
4  D02418  0.944881  0.000328  2884.443301  Physostigmine salicylate (JAN/USP) 
5  D06288  0.918739  0.000339  2706.522993  Velnacrine maleate (USAN) 
6  D00469  0.928133  0.000347  2677.510252  Pralidoxime chloride (USP) 
7  D05981  0.934112  0.000366  2555.506934  Suronacrine maleate (USAN) 
8  D02558  0.957066  0.000386  2479.764194  Rivastigmine tartrate 
9  D03826  0.934139  0.000388  2404.652320  Physostigmine sulfate (USP) 
10  D02068  0.915335  0.000388  2357.032571  Tacrine hydrochloride (USP) 
It can be observed that D01228 (Drug name : Distigmine bromide (JP18/INN)) provides the highest ratio score with respect to drug D00043 (Drug name : Isoflurophate Fluostigmine) due to its high cosine similarity and low link prediction mean square error, indicating a higher probability to perform similar to D00043 when interacted with other possible entities. Both drugs belong to the Neuropsychiatric agent class and are part of drug groups DG01595 (Drug group name : Cholinesterase inhibitor) and DG01593 (Drug group name : Acetylcholinesterase inhibitor) and targeting similar genes like HSA:43 (Gene name : ACHE, ACEE, ARACHE, NACHE, YT), indicating a high similarities between the pair of drugs. For evaluation we have used Tanimoto coefficient, which is used to calculate the chemical similarity between molecules. It is defined as below, where S represent molecular similarity between A and B, a represents the number of on bits in A, b is number of on bits in B and c represents number of on bits in both A and B.
(17) 
The Tanimoto coefficient values for top drugs are 0.049, 0.056, 0.01, 0.046, 0.014, 0.021, 0.012, 0.77, 0.53. The chemical structure similarity is not capable to generalize the trend as there are drugs that treat same clinical problems but are different in structures, such as Migitol and Glipizide which both are used for diabetes but have completely different structures [25]. Finally some clinical drug similarity experiments will be a good choice for evaluation which is kept out of scope from this research.
6 Conclusion
Overall, the use of graph embedding models is an excellent choice to tackle the problem of finding similarity of drugs, and can be easily scalable by incorporating other medical datasets to provide our graph embedding model with more connected and larger databases. More data which eventually lead our models to generalize better by producing less overfitted model, eventually providing better performance on link prediction task. Graph embeddings models using neural networks, convolution neural networks and attention models is an exciting field which can help us learn better about the complex graph structure their nodes and edges. Representing entities and relationships of a knowledge graph in a form of low dimensional embedding can be used for various graphical structures to find new insights using operations like link predictions. This work effectively contributes a machine learning pipeline that uses graph embeddings and link prediction algorithms to uncover drug similarity and capture unique relationships and possibilities in a biomedical database, with systematic comparison between different graph embedding algorithms.
References
 [1] Joanne Bowes and Andrew J. Brown and Jacques Hamon and Wolfgang Jarolimek and Arun Sridhar and Gareth Waldron and Steven Whitebread: Reducing safetyrelated drug attrition: the use of in vitro pharmacological profiling. Nature Reviews Drug Discovery 11(12):90922, 2012.
 [2] Hogan, Aidan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane et al: Knowledge graphs. ACM Computing Surveys (CSUR) 54, no. 4 (2021): 137, 2021.
 [3] David N.Nicholson and Casey S.Greene: Constructing knowledge graphs and their biomedical applications. Computational and Structural Biotechnology Journal Vol. 18, (2020): 14141428, 2020.
 [4] Md. Rezaul Karim and Michael Cochez and Joao Bosco Jares and Mamtaz Uddin and Oya Beyan and Stefan Decker: DrugDrug Interaction Prediction Based on Knowledge Graph Embeddings and ConvolutionalLSTM Network. Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics. 2019.
 [5] Bishan Yang and Wentau Yih and Xiaodong He and Jianfeng Gao and Li Deng: Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv preprint arXiv:1412.6575, 2014.
 [6] Quan Wang and Zhendong Mao and Bin Wang and Li Guo: Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering, 29 (12):2724–2743, 2017.
 [7] Meihong Wang and Linling Qiu and Xiaoli Wang.: A Survey on Knowledge Graph Embeddings for Link Prediction. Symmetry 13, no. 3: 485, 2021.
 [8] Ilya Makarov and Dmitrii Kiselev and Nikita Nikitinsky and Lovro Subelj: Survey on graph embeddings and their applications to machine learning problems on graphs. PeerJ Computer Science 7:e357 https://doi.org/10.7717/peerjcs.357, 2021.
 [9] Minoru Kanehisa and Miho Furumichi and Mao Tanabe and Yoko Sato and Kanae Morishima.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(D1):D353D361, 2017.
 [10] Rajeev Verma and Dr. Preetam Kumar: Knowledge Graph Representation Learning Based Drug Informatics. Proceedings 2019 IEEE International Conference on Electronics, Computing and Commu nication Technologies (CONECCT), pages 1–4, 2019.
 [11] Sameh K. Mohamed and Aayah Nounu and Vít Novácek: Drug target discovery using knowledge graph embeddings. Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (SAC 2019), 2019.
 [12] Ian T. Jolliffe and Jorge Cadima. Principal component analysis: a review and recent developments. Phil. Trans. R. Soc. A. 374:20150202, 2016.
 [13] Alberto GarcíaDurán and Antoine Bordes and Nicolas Usunier and Yves Grandvalet: Combining Two And ThreeWay Embeddings Models for Link Prediction in Knowledge Bases. J. Artif. Intell. Res. 55 (2016): 715742, 2016.
 [14] Muhan Zhang and Yixin Chen: Link Prediction Based on Graph Neural Networks. Proceedings 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), 2018.

[15]
Zecheng Zhang and Danni Ma and Xiaohan Li: Link Prediction with Graph Neural Networks and Knowledge Extraction. CS230: Deep Learning, Spring 2020, Stanford University, CA.
 [16] Zhen Tan and Xiang Zhao and Yang Fang and Bin Ge and Weidong Xiao: Knowledge Graph Representation via SimilarityBased Embedding. Scientific Programming 2018:112, 2018.
 [17] Antoine Bordes and Nicolas Usunier and Alberto GarciaDuran and Jason Weston and Oksana Yakhnenko: Translating Embeddings for Modeling Multirelational Data. Proceedings Advances in Neural Information Processing Systems 26 (NIPS 2013), 2013.
 [18] Bryan Perozzi and Rami AlRfou and Steven Skiena: DeepWalk: Online Learning of Social Representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014.
 [19] Théo Trouillon and Johannes Welbl and Sebastian Riedel and Eric Gaussier and Guillaume Bouchard: Complex Embeddings for Simple Link Prediction. Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:20712080, 2016.
 [20] Xiaofei Shi and Yanghua Xiao: Modeling Multimapping Relations for Precise Crosslingual Entity Alignment. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLPIJCNLP), 2019.

[21]
Maximilian Nickel and Lorenzo Rosasco and Tomaso Poggio: Holographic Embeddings of Knowledge Graphs. Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 30. No. 1. 2016.
 [22] Pasquale Minervini and Pontus Stenetorp and Sebastian Riedel: Convolutional 2D Knowledge Graph Embeddings. Proceedings Thirtysecond AAAI Conference on Artificial Intelligence, 2018.
 [23] Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung: A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network. Proceedings The 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018.
 [24] Timothée Lacroix and Nicolas Usunier and Guillaume Obozinski: Canonical Tensor Decomposition for Knowledge Base Completion. Proceedings of the International Conference on Machine Learning. PMLR, 2018.
 [25] Xian Zengb and Zheng Jiab and Zhiqiang Hec and Weihong Chenc and Xudong Lub and Huilong Duanb and Haomin Lia: Measure Clinical Drug–Drug Similarity Using Electronic Medical Records. International Journal of Medical Informatics Vol. 124, April 2019, Pages 97103, 2019.
Comments
There are no comments yet.