Interactions between the structured entities like chemicals are the basis of many applications such as chemistry, biology, material science, medical science, and environmental science. For example, the knowledge of chemical interactions is a helpful guide for the toxicity prediction, new material design and pollutant removal [xu2019mr]. In medical science, understanding the interaction between drugs is vital for drug discovery and side effect prediction which can save millions of lives every year [menche2015uncovering].
One immediate way to investigate the interactions between two structured entities is to conduct experiments for them in the laboratory or clinics. However, due to the enormous number of structured entities, it is infeasible in terms of both time and resource to examine all possible interactions. Thanks to the advances in the computational approaches for the structured entity interaction prediction, a variety of techniques have been proposed to predict the interactions among structured entities effectively and efficiently by utilizing the deep neural network or graph neural network (GNN) techniques such as DeepCCI [kwon2017deepcci] for chemical-chemical interaction prediction and DeepDDI [ryu2018deep] for drug-drug interaction prediction.
We observe that the interactions among structured entities can be naturally modeled by the graph-of-graphs (a.k.a network-of-networks) where each structured entity is a local graph, and the interactions of the entities form a global graph. In Figure 1, we take the chemical-chemical interactions as an example. Each chemical molecule is a structured entity and can be represented by a local graph (i.e., molecule graph) where nodes represent atoms and the bonds among the atoms are the edges. On the other hand, the interactions (edges) among the structured entities (nodes) form a global graph. However, the existing studies for structured entity interaction prediction do not make full use of the graph-of-graphs model and only consider partial information. For instance, MR-GNN [xu2019mr] only considers the local structure information of entities and their pairwise similarity; Decagon [zitnik2018modeling] focuses on the interaction graph and only treats the structured entity as a simple node. Other works such as DeepCCI and DeepDDI even do not consider the graph structure information.
These limitations motivate us to develop a new approach to fully exploit the graph-of-graphs (GoG) model to predict the structured entity interactions. In particular, we propose a novel model called Graph of Graphs Neural Network(GoGNN). Our model builds a graph neural network with attention-based pooling over local graphs and attention-based neighbor aggregation on the global graph such that GoGNN is able to capture broader information that enhances the performance on the prediction. Furthermore, the GNNs on both levels of graphs play synergistic effects on improving the representativeness of GoGNN. The contributions of our model can be summarized as follows:
To the best of our knowledge, this is the first work to systematically apply the graph neural network on graph-of-graphs model, namely Graph of Graphs Neural Network (GoGNN), to the problem of structured entity interaction prediction.
The proposed GoGNN mines the features from both local entity graphs and global interaction graph hierarchically and synergistically. We design dual attention architecture to capture the significance of the substructures in the local graphs while preserving the importance of the interactions within the global graph.
The extensive experiments conducted on the real-life benchmark datasets show that GoGNN outperforms the state-of-the-art structured entity interaction prediction methods in two representative applications: chemical-chemical interaction prediction and drug-drug interaction prediction.
2 Related Work
In this section, we introduce the closely related works.
2.1 Structured Entities Interaction Prediction
In many real-life applications such as chemistry, biology, material science, and medical science, we need to understand the interactions between the structured entities. In recent years, a variety of techniques have been proposed for structured entity interaction prediction in some specific applications. In this paper, we focus on two representative applications: chemical-chemical interaction prediction and drug-drug interaction prediction.
Many computational methods have been proposed for these two applications. DeepCCI and DeepDDI [kwon2017deepcci, ryu2018deep]
utilize the conventional convolutional neural network and PCA on the chemical data. Some models are graph neural network-based. For example, Decagon[zitnik2018modeling] performs the GCN on drug-protein interaction graph; MR-GNN [xu2019mr] proposes a model with dual graph-state LSTMs that extracts local features of molecule graphs, and MLRDA [chu2019mlrda]
2.2 Graph Neural Networks
Node-level applications. Most GNNs are designed for node-level applications such as node classification and link prediction [DBLP:conf/iclr/KipfW17, velivckovic2017graph, zhang2018link, hamilton2017inductive, liu2019geniepath, 10.1145/3366423.3380151, 10.1145/3366423.3380187]. They rely on the node embedding techniques like skip-gram, autoencoder and neighbor aggregation methods like GCN, GraphSAGE, etc. These methods focus on the node relations within the graph and use the low-dimension representations to preserve the structural and attribute information.
Graph-level applications. Recently, some research works on GNNs are proposed for graph-level applications such as graph classification[zhang2018end, lee2018graph] and graph matching[li2019graph]. These works learn the graph representations for each graph individually or pair-wisely without considering the interactions between the graphs.
2.3 Graph of Graphs
In most real-world systems, an individual network is one component within a much larger complex multi-level network. Applying the graph theory paradigm to these networks has led to the development of the concept of “Graph of Graphs” (also known as “Network of Networks”). [d2014networks] introduces the theoretical research development [dong2013robustness], applications [DBLP:conf/kdd/NiTFZ14] and phenomenological model [rome2014federated] on the network of networks. These works enable us to understand and model the inter-dependent critical infrastructures. SEAL[DBLP:conf/www/LiRCMHH19] proposed graph neural network in a hierarchical graph perspective for graph classification task. With significant differences between GoGNN and SEAL in tasks, loss functions and optimizers, GoGNN is the first work to develop graph neural network technique on graph of graphs for structured entity interaction prediction problem.
3.1 Problem Definition
For ease of understanding of our techniques, in this paper, we focus on two representative applications of structured entities interaction prediction: chemical-chemical interaction (CCI) prediction and drug-drug interaction (DDI) prediction. In the CCI graph, there is only one type of interaction, and our goal is to estimate the reaction probability scoreof given chemical pair . As to DDI graph which has multiple types of interactions, we aim to estimate the occurrence probabilities of side effect type with the given triplet .
3.2 Input Graph of Graphs
Overall, the input interaction graph is regarded as graph-of-graphs as follows.
Molecule Graph. In both CCI and DDI prediction tasks, the local graphs are molecule graphs, each of which can be modeled as a heterogeneous graph with multiple types of nodes and edges. In particular, the molecule graph consists of atoms as nodes, and edges where denotes the bond between atoms and . Each atom (i.e., node)
is encoded as a vector. For each bond (i.e., edge), we assign a weight to the corresponding edge depending on the type of the bond. For example, the bond between the carbon atoms in the ethylene molecule is a double bond. Therefore, the weight of the edge between the carbon atoms is set to .
Interaction Graph. The global interaction graph is formed by the molecule graphs and the interactions between them: , where denotes the node set of which consists of molecule graphs , and denotes the interaction edges between the molecule graphs. Note that, in CCI graph, there is only one type of interaction between two nodes. In DDI graph, there are multiple types of side effects caused by the combination of two drugs. An attribute vector is assigned for each edge based on the side effect type .
4 Graph of Graphs Neural Network
In this section, we introduce our Graph of Graphs Neural Network model.
4.1 Framework of GoGNN
The framework of GoGNN is illustrated in Figure 2. GoGNN contains molecule graph neural network which takes the atom features as input and interaction graph neural network which produces the graph representation for the prediction task. The two parts of GoGNN play a synergistic effect on improving the performance. The hidden features learned by molecule-level GNN provide the interaction-level GNN a representative initial input. The feature aggregation on the interaction-level GNN promotes the ability of molecule-level GNN to find key substructure through back-propagation.
4.2 Molecule Graph Neural Network
In organic chemistry, functional groups (i.e., substructures) in molecules are responsible for the characteristic chemical reactions between these molecules. For example, the reaction between benzoic acid and ethanol in Figure 1 is the esterification between two functional groups -COOH in benzoic acid and -OH in ethanol.
The model could achieve better performance for prediction if the model can identify the functional groups in the molecules and represent the molecule with such functional groups. Therefore, we designed our molecule graph neural network with the combination of multi-resolution architecture [xu2019mr] which preserves the information of multi-hop substructures and attention-based graph pooling [lee2019self, gao2019graph] which selects the substructures to represent the molecules.
As proved in previous work [xu2018powerful], one single general graph convolution layer can only aggregate the feature of the node and its immediate neighbors. To obtain features of the multi-scale substructure of the molecule graph, we apply multiple layers of graph convolution operations to the input graphs. The graph convolution operation at layer is summarized as follows
where is the hidden feature matrix for molecule graph at layer, is the adjacency matrix with self-connection for molecule graph , is the diagonal degree matrix of and
denotes the activation function.
Different from MR-GNN [xu2019mr] which uses dual graph-state LSTMs on the input of subgraph representations, GoGNN applies graph pooling for learning the graph representation that preserves the substructure information, in order to reduce the time and space complexity significantly. As shown in Figure 2, the self-attention graph pooling layer takes the output of each graph convolution layer as input to select the most representative substructures (functional groups) by learning the self-attention score for molecule graph with atoms at layer
where is the attention weight matrix for the pooling layer to obtain the self-attention score. In order to select the most representative substructure, the graph pooling layer calculates the attention score for each atom in the graph and finds the top-
atoms with the highest attention scores. We set a hyperparameter pooling ratioto determine the number of nodes that are selected to represent the molecule graph
where is the function that returns the indices of atoms with top attention scores as in [DBLP:conf/icml/GaoJ19]; is the mask vector determined by the attention score; denotes the column-wise product for masking; is the feature matrix of selected atoms in a molecule graph. Afterward, the readout layer, which contains mean and sum pooling, is applied on the embedding of selected atoms to produce the molecule graph hidden feature. After multiple graph convolutional and self-attention graph pooling layers, we got several graph hidden features. Once obtained, we concatenate the outputs of the graph pooling layers as the hidden feature vector for the molecule graph. Because the hierarchical graph pooling architecture is applied, the graph representation can preserve the multi-hop substructure information effectively. Hence, GoGNN can identify the function groups which play the key roles in molecule interactions and use these functional groups to represent the molecule graph.
4.3 Interaction Graph Neural Network
Most of existing CCI and DDI prediction models train the model with the input of pair of molecule graphs, but ignore the molecule interaction graph. However, the information of interaction graph is crucial for the interaction prediction because it enables the model to capture high-order interaction relationship and enhance the model’s ability to capture the representative molecular substructures synergistically.
We have the following observations that motivate us to perform graph neural network on the interaction graph: Firstly, the type of interaction is dependent on the type of involved molecules. As mentioned in Section 4.2, esterification is the reaction between -OH in alcohols and -COOH in carboxyl acids. The neighbor aggregation of GNN can gather the neighbor information that helps to summarize the types of chemicals that interact with the selected one. Secondly, it is necessary to assign importance score to the neighbors for molecules in the interaction graph, since the chemical interactions have different significance and frequency. For example, vitamin C has two main properties: reducibility and acidity. Therefore, vitamin C cannot be prescribed with oxidizing drugs like vitamin K1 and alkaline drugs like omeprazole. In an uncommon case, vitamin C reduces the therapeutic effect of inosine because of their complex physical and chemical reactions. Therefore, we apply the graph attention network in order to preserve the frequencies of the chemical reactions and reduce the influence of biased observation of the interaction graph. As for the DDI graph with edge attributes, an edge-aggregation graph neural network is applied.
Graph Attention Network. The attention-based graph neural network [velivckovic2017graph] is applied on the interaction graph without edge attributes. With the learned molecule hidden feature vector and interaction graph as input, molecule graph representations are calculated by the neighbor aggregation on the interaction graph as follows
where is the number of attention heads, is a nonlinearity function, is the weight matrix at attention head in layer and is the set of neighbor molecule graphs of in the interaction graph . Notation is the attention coefficient between and which is calculated by the following equation:
where is a learnable attention weight vector and is the concatenation operation.
Edge Aggregation Network. In DDI graph, each edge has an attribute vector which is determined by the side effect type of the drug combination . To capture the edge attributes [schlichtkrull2018modeling], we propose an edge aggregation network that aggregates the neighbor information together with edge attribute:
is the MLP layer with linear transformation matrixwhich transforms the edge attribute vector into a real number . In this way, GoGNN aggregates node’s neighbor information together with edge attributes. Different from Decagon [zitnik2018modeling] which sets side-effect-specific parameters, GoGNN shares the parameters for all types of side effects in order to improve the robustness and generalization of the model.
4.4 GoGNN Model Training
We optimize the parameters with the task-specific loss functions.
Chemical Interaction Prediction. Since there is no edge attribute in the graph, we regard the chemical interaction prediction as a link prediction problem. The dot product of two graph representations is used as the link probability of two graphs:
where is the activation function such as sigmoid function that ensures . To encourage the model to assign higher probabilities to the observed edges than the random non-edges, we follow the previous study and estimate the model through negative sampling. For each positive edge pair (, ), a random negative edge (, ) is sampled by choosing a molecule graph randomly. We optimize the model using the following cross-entropy loss function
Drug Interaction Prediction. The drug-drug interaction prediction task is regarded as a multirelational link prediction problem. Inspired by the loss design in [zitnik2018modeling], we train the parameters with the following cross-entropy loss function
where is the side-effect-specific weight for linear transformation of w.r.t. the side effect type . Given observed triplet , the negative sample is chosen by replacing with randomly selected graph according to sampling distribution [mikolov2013distributed].
In this section, we introduce the extensive experiment results that demonstrate the effectiveness and robustness of GoGNN.
To test the performance of our model on chemical-chemical interaction and drug-drug interaction prediction tasks, following datasets are chosen for the experiments:
CCI. The CCI dataset333http://stitch.embl.de/download/chemical_chemical.links.detailed.v5.0.tsv.gz assigns a score from 0 to 999 to describe the interaction probability where a higher score indicates higher interaction probability. According to threshold score, we get two datasets with chemical interaction probability score over 900 and 950: CCI900 and CCI950. CCI900 has 14343 chemicals and 110078 chemical interaction edges, and CCI950 has 7606 chemicals and 34412 chemical interaction edges.
DDI. For the drug-drug interaction prediction problem, DDI dataset444https://www.pnas.org/content/suppl/2018/04/14/1803294115.DCSupplemental and the side effect dataset SE555http://snap.stanford.edu/decagon [zitnik2018modeling] are used. The DDI dataset is proposed by DeepDDI [ryu2018deep] which contains 86 types of side effects, 1704 drugs and 191400 drug interaction edges. SE dataset is the integration of SIDER (Side Effect Resource), OFFSIDES and TWOSIDES database. To familiarize the comparison, we use the preprocessed data used by Decagon [zitnik2018modeling]. Therefore, the SE dataset contains 645 drugs, 964 types of side effects and 4651131 drug-drug interaction edges. A vector representation is assigned to each side effect type produced by pre-trained BERT model [devlin2018bert].
The molecules are transformed from the SMILE strings [weininger1989smiles]
into graphs by the open-source rdkit[landrum2013rdkit]. An initial feature vector is assigned for every atom. The edges in molecule graphs are weighted by the type of the bonds.
The proposed GoGNN is compared with the following state-of-the-art models:
DeepCCI [kwon2017deepcci] is the CNN based model for predicting the interactions between the chemicals.
DeepDDI [ryu2018deep] is the model designs a feature called structural similarity profile(SSP) combined with traditional MLP for DDI prediction.
Decagon [zitnik2018modeling] is a GCN model on the drug and protein interaction graphs to predict the polypharmacy side effects caused by drug combinations.
MR-GNN [xu2019mr] is an end-to-end graph neural network with multi-resolution architecture that produces interaction between pairs of chemical graphs.
MLRDA [chu2019mlrda] is the multitask, semi-supervised model for DDI prediction.
SEAL [DBLP:conf/www/LiRCMHH19] is the neural network on hierarchical graphs for graph classification.
We used the public code of the baselines and keep the settings of models the same as mentioned in the original papers. We reimplemented SEAL for CCI and DDI prediction.
To investigate how the graph of graphs architecture and dual-attention mechanism improve the performance of the proposed model, we conduct the ablation study on the following variants of GoGNN:
GoGNN-M is the variant which only learns the representations for the molecule-level graphs without the graph convolution on the interaction graph. An MLP layer is applied with the input of molecule-level graph representations for the graph interaction prediction task.
GoGNN-I only conducts graph convolution operation on the chemical interaction graphs. The initial molecule representations are the sum pooling of the atom representations within the molecule.
GoGNN-noPool replaces the self-attention pooling on the molecule graph by the concatenation of conventional mean pooling and sum pooling.
GoGNN-noAttn replaces the attention-based neural network on the interaction graph by a conventional GCN.
5.3 CCI Prediction Results
Settings. Following the previous study, we divide the CCI datasets into training and testing set with ratio 9:1, and randomly choose 10% data for validation. The dimensions of molecule graph hidden feature, and the output molecule graph representation are set to 384, 256, respectively. We set the learning rate to 0.01 and the pooling ratio to 0.5. To evaluate the performance, we choose area under the ROC curve(AUC) and average precision score(AP) as metrics.
Results. As shown in Table 1
, GoGNN outperforms all the other state-of-the-art baseline methods on the CCI prediction task. The improvement indicates that, compared with the methods that only train the parameters with pair-wise or individual chemical inputs, GoGNN can preserve more useful information on different scales by the feature extraction and aggregation through the graph of graphs. The dual-attention mechanism also helps the model to learn higher quality graph representations by identifying and preserving the importance of molecular substructures and chemical interactions.
5.4 DDI Prediction Results
Settings. To familiarize the comparison, we divide the DDI dataset for training, testing, validation with ratio 6:2:2, and divide the SE dataset with ratio 8:1:1. The dimensions of molecule graph hidden feature, and the output molecule graph representation are set to 384, 256, respectively. We set the learning rate to 0.001, pooling ratio . We choose AUC and average precision(AP) for evaluation.
indicates that the result is the output of the baselines after two weeks’ training.
DDI dataset has no protein data which is required by Decagon
Results. The experiment results for DDI prediction are listed in Table 2. The results show that compared with the baseline methods, GoGNN improves the performance with a significantly large margin. GoGNN improves the AUC and AP by 1.18% and 1.19% respectively on DDI dataset, and 6.65% and 11.42% respectively on the SE dataset. The improvement is attributed to the abundant information brought by the graph of graphs architecture and edge-filtered aggregation.
5.5 Ablation Experiments
The ablation experiment results on both tasks are shown in Table 1 and Table 2. The results prove that the graph of graphs architecture, attention-based pooling, attention-based and edge-filtered aggregation are all effective for the side effect prediction task. Among all the variants, GoGNN-M and GoGNN-I have the most significant performance gaps between GoGNN, which indicates that the view of graph of graphs contributes the most to helping the model to capture more structural information that improves the prediction accuracy.
5.6 Parameter Sensitivity Analysis
In this experiment, we test the impact of the hyper-parameters of GoGNN.
Settings. We conduct the parameter sensitivity experiment on the CCI950 dataset by changing the tested hyper-parameter while keeping other settings the same as mentioned in Section 5.3. We test the following hyper-parameters: the dimensions of the output representation and hidden feature, learning rate and pooling ratio.
Results. As shown in Figure 3, overall, the impact of hyperparameter variation is insignificant. Figure 2(a) shows that GoGNN reaches the best performance with representation dimension 128. Figure 2(b) indicates that the salient point for the hidden feature size is 384. As for the learning rate and pooling ratio, the best point appears at and , respectively.
In this paper, we focus on structured entity interaction prediction. This prediction demands the model to capture the information of the structure of entities and the interactions between entities. However, the previous works represent the entities with insufficient information. To address this limitation, we propose a novel model GoGNN which leverages the dual-attention mechanism in the view of graph of graphs to capture the information from both entity graphs and entity interaction graph hierarchically. The experiments on real-life datasets demonstrate that our model could improve the performance on the chemical-chemical interaction prediction and drug-drug interaction prediction tasks. GoGNN can be naturally extended to the applications on other graph of graphs such as financial networks, electrical networks, etc. We leave the extension for future work.