1 Introduction
A large variety of applications require understanding the interactions between structured entities. For example, when one medicine is taken together with another, each medicine’s intended efficacy may be altered substantially (see Fig. 1). Understanding their interactions is important to minimize the side effects and maximize the synergistic benefits [Ryu et al.2018]. In chemistry, understanding what chemical reactions will occur between two chemicals is helpful in designing new materials with desired properties [Kwon and Yoon2017]. Despite its importance, examining all interactions by performing clinical or laboratory experiments is impractical due to the potential harms to patients and also highly time and monetary costs.
Recently, machine learning methods have been proposed to address this problem, and they are demonstrated to be effective in many tasks
[Duvenaud et al.2015, Li et al.2017, Tian et al.2016, Ryu et al.2018]. These methods use features extracted from entities to train a classifier to predict entity interactions. However, features have to be carefully provided by domain experts
[Ryu et al.2018, Tian et al.2016], and it is laborintensive. To automate feature extraction, graph convolution neural networks (GCNs) have been proposed
[Alex et al.2017, Kwon and Yoon2017, Zitnik et al.2018]. GCNs represent structured entities as graphs, and use graph convolution operators to extract features. One of the stateoftheart GCN models, proposed by Alex et al. protein_interface, extracts features from the 3hop neighborhood of each node. We thus say that their model uses a fixsized receptive field (RF). However, using a fixsized RF to extract features may have limitations, which can be illustrated by the following example.Example 1.
Figure 2 shows two weak acids, i.e., Hydroquinone and Acetic acid. They are weak acids due to the existence of substructures phenolic hydroxyl (ArOH) and carboxyl (COOH), respectively. Representing these two chemical compounds as graphs, we need a threehop neighborhood to accurately extract ArOH from Hydroquinone, and a twohop neighborhood to accurately extract COOH from Acetic acid. While using a fixsized neighborhood will result in that either incomplete substructures being extracted (i.e., RF is too small), or useless substructures being included (i.e., RF is too large).
Another limitation of existing GCNs is that, they learn each graph’s representation independently, and model the interactions only in the final prediction process. However, for different entities, the interaction also occurs by substructures of different size. Take Fig. 2 for example again, when these two weak acids are neutralized with the same strong base, the interaction can be accurately modeled by features of the second convolution layer for Acetic acid because the key substructure ArOH can be accurately extracted. But for Hydroquinone, the best choice is to model the interaction by features of the third convolution layer. Thus, modeling the interactions only in the final process may make a lot of noise to the prediction.
To address these limitations, this work presents a novel GCN model named MultiResolution RF based Graph Neural Network (MRGNN), which leverages differentsized local features and models interaction during the procedure of feature extraction to predict structured entity interactions.
1.0.1 Overview of our approach.
MRGNN uses a multiresolution RF, which consists of multiple graph convolution layers with different RFs, to extract local structure features effectively (see Fig. 2). When aggregating these multiresolution local features, MRGNN uses two key dual graphstate LSTMs. One is SummaryLSTM (SLSTM), which aggregates multiresolution local features for each graph. Compared with the straightforward method that simply sums all multiresolution features up, SLSTM learns additional effective features by modeling the diffusion process of node information in graphs which can greatly enrich the graph representation. The other is InteractionLSTM (ILSTM), which extracts interaction features between pairwise graphs during the procedure of feature extraction.
Our contributions are as follows:

In MRGNN, we design a multiresolution based architecture that mines features from multiscale substructures to predict graph interactions. It is more effective than considering only fixsized RFs.

We develop two dual graphstate LSTMs: One summarizes subgraph features of multisized RFs while modeling the diffusion process of node information, and the other extracts interaction features for pairwise graphs during feature extraction.

Experimental results on two benchmark datasets show that MRGNN outperforms the stateoftheart methods.
2 Problem Definition
Notations. We denote a structured entity by a graph , where is the node set and is the edge set. Each specific node is associated with a
dimension feature vector
. The feature vectors can also be lowdimensional latent representations/embeddings for nodes or explicit features which intuitively reflects node attributes. Meanwhile, let denote ’s neighbors, and denote ’s degree.Entity Interaction Prediction. Let denote a set of
interaction labels between two entities. The entity interaction prediction task is formulated as a supervised learning problem: Given training dataset
where is an input entity pair, and is the corresponding interaction label; let denote the size of , we want to accurately predict the interaction label of an unseen entity pair .3 Method
In this section, we propose a graph neural network, i.e., MRGNN, to address the entity interaction prediction problem.
3.1 Overview
Figure 3 depicts the architecture of MRGNN, which mainly consists of three parts: 1) multiple weighted graph convolution layers, which extract structure features from receptive fields of different sizes, 2) dual graphstate LSTMs, which summarize multiresolution structure features and extract interaction features, and 3) fully connected layers, which predict the entity interaction labels.
3.2 Weighted graph convolution layers
Before introducing the motivation and design of our weighted graph convolution operators in detail, we elaborate the standard graph convolution operator.
Standard Graph Convolution Operator. Inspired by the convolution operator on images, for a specific node in a graph, the general spatial graph convolution [Duvenaud et al.2015] aggregates features of a node as well as its onehop neighbors’ as the node’s new features. Based on the above definition, take the node as an example, the formula is:
(1) 
where denotes the feature vector of in the graph convolution layer, is the weight matrix associated with the center node and
is the tanh activation function. Note that
.Because the output graph of each graph convolution layer is exactly same as the input graph, MRGNN can conveniently learn the structural characteristics of different resolutions through different iterations of the graph convolution layer. Take the node A in Fig. 3 as an example, after three iterations of graph convolution layer, the receptive field in the third graph convolution layer is a threehop neighborhood centered on it.
However, since graphs are not regular grids compared with images, it is difficult for the existing graph convolution operator to distinguish the weight by spatial orientation position like the convolution operator on gridlike data, e.g., in the image processing, the right neighbor and the left neighbor of a pixel can be treated with different weight for each convolution kernel. Inspired by the fact that the degree of nodes can well reflect the importance of nodes in a network for many applications. We modify the graph convolution operator by adding weights according to the node degree . (Other metrics such as betweenness centrality can also work well. In this paper we choose the degree of nodes because of the simplicity of calculation.) Furthermore, Sukhbaatar et al. sukhbaatar2016learning treats different agents with different weights in order to distinguish the feature of the original node and the features of neighboring nodes. We treat each node and its neighbors with different weight matrixes, and . Our improved weighted graph convolution is as follows:
(2) 
where denote the weight of node with degree , denotes the dimension of the feature vector in the graph convolution layer, and is a bias. We let .
After each convolution operation, similar to the classical CNN, we use a graph pooling operation to summarize the information within neighborhoods (i.e., a center node and its neighbors). For a specific node, the Graph Pooling [AltaeTran et al.2017] returns a new feature vector of which each element is the maximum activation of the corresponding element of onehop neighborhood at this node. We denote this operation by the following formula and get the feature vectors of the next layer:
(3) 
3.3 Graphgather layers
Graph interaction prediction is a graphlevel problem rather than a nodelevel problem. To learn the graphlevel features of differentsized receptive fields, we aggregate the node representations of each convolution layer’s graph to a graphstate by a graphgather layer. Graphgather layers compute a weighted sum of all node vectors in the connected graph convolution layers. The formula is:
(4) 
where is the graphgather weight of nodes with degree in the graph convolution layer, is the graphstate vector of the convolution layer, denotes the dimension of graphstates, is the nodes’ number in the graph and is a bias. Specially, the first graphstate only includes all individual nodes’ information.
3.4 Dual graphstate lstms
To solve graphlevel tasks, the existing graph convolution networks (GCNs) methods [AltaeTran et al.2017] generally choose the graphstate of the last convolution layer, which has the largest receptive fields, as input for subsequent prediction. But such state may loss many important features.
Referring to the CNN on images, there are multiple convolution kernels for extracting different features in each convolution layer, which ensure the hidden representation of the final convolution layer can fully learn features of input images. However, GCN is equivalent to CNN that only has one kernel in each layer. It is difficult for the output of the final graph convolution layer to fully learn all features in the large receptive fields, especially for structure features of small receptive field. The straightforward way is to design multiple graph convolution kernels and aggregate the output of them. However it is computational expensive.
To solve the above problem, we propose a multiresolution based architecture in our model, in which the graphstate of each graph convolution layer is leveraged to learn the final representation. We propose a SummaryLSMT (SLSTM) to aggregate the graphstates of differentsized receptive fields for learning the final features comprehensively. Instead of the straightforward method that directly sums all graphstates up, SLSTM models the node information diffusion process of graphs by sequentially receiving the graphstate with receptive field from small to large as inputs. It is inspired by the idea a representation that encapsulates graph diffusion can provide a better basis for prediction than the graph itself. The formula of SLSTM is:
(5) 
where is the hidden vector of SLSTM. To further enhance the global information of graphs, we concatenate the final hidden output of SLSTM and the output of global graph pooling layer as the final graphstate of the input graph:
(6) 
where is the result of global graph pooling on the final graph convolution layer.
In addition, to extract the interaction features of pairwise graphs, we propose an InteractionLSTM (ILSTM) which takes the concatenation of dual graphstates as input:
(7) 
where is the hidden vector of ILSTM .We initialize and as an allzero vector and the SLSTM is shared to both input graphs.
3.5 Fully connected layers
For the interaction prediction, we simply concatenate the final graph representations and interaction features of input graphs (i.e., , and ) and use fully connected layers for prediction. Formally, we have:
(8)  
(9) 
where are linear operations, and are trainable weight matrices, is the dimension of the hidden vector, and is the number of interaction labels. The activation function
is a rectified linear unit (ReLU), i.e.,
. is the output of softmax function , the element of is computed as. At last, we choose the cross entropy function as loss function, that is:
(10) 
where is the groundtruth vector.
4 Experiment
In this section, we conduct experiments to validate our method^{2}^{2}2Code available at https://github.com/prometheusXN/MRGNN. We consider two prediction tasks: 1) predicting whether there is an interaction between two chemicals (i.e., binary classification), and 2) predicting the interaction label between two drugs (i.e., multiclass classification).
4.1 Dataset
CCI Dataset. For the binary classification task, we use the CCI dataset^{3}^{3}3http://stitch.embl.de/download/chemical_chemical.links.detailed.v5.0.tsv.gz. This dataset uses a score ranging from to
to describe the interaction level between two compounds. The higher the score is, the larger probability the interaction will occur with. According to threshold scores
, and , we got positive samples of three datasets: CCI, CCI, and CCI. As for negative samples, we choose the chemical pairs of which the score is . For each pair of chemicals, we assign a label “1” or “0” to indicate whether an interaction occurs between them. We use a public available API, DeepChem^{4}^{4}4https://deepchem.io/, to convert compounds to graphs, that each node has a 75dimension feature vector.DDI Dataset. For the multiclass classification task, we use the DDI dataset^{5}^{5}5http://www.pnas.org/content/suppl/2018/04/14/1803294115.DCSupplemental. This dataset contains interaction labels, and each drug is represented by SMILES string [Weininger1988]. In our preprocessing, we remove the data items that cannot be converted into graphs from SMILES strings.
Dataset  Graph Meaning  #Graphs  #Pairs 

CCI900  Chemical Compounds  11990  19624 
CCI800  Chemical Compounds  73602  151796 
CCI700  Chemical Compounds  114734  343277 
DDI  Drug Molecule Graphs  1704  191400 
4.2 Baselines
CCI900  CCI800  CCI700  

AUC  accuracy  recall  F1  AUC  accuracy  recall  F1  AUC  accuracy  recall  F1  
PIP  
SNR  
DGCNN  
DeepDDI  
DeepCCI  
MRGNN 
We compare our method with the following stateoftheart models:

DeepCCI [Kwon and Yoon2017] is one of the stateoftheart methods on the CCI datasets. It represents SMILES strings of chemicals as onehot vector matrices and use classical CNN to predict interaction labels.

DeepDDI [Ryu et al.2018]
is one of the stateoftheart methods on the DDI dataset. DeepDDI designs a feature called structural similarity profile (SSP) combined with multilayer perceptron (MLP) for prediction.

PIP [Alex et al.2017] is proposed to predict the protein interface. It extracts features from the fixed threehop neighborhood for each node to learn a node representation. In this paper, when building this model, we use our graphgather layer to aggregate node representations to get the graph representation.

DGCNN [Zhang et al.2018a] uses the standard graph convolution operator as described in Section 3. It concatenates the node vectors of each graph convolution layer and applies CNN with a node ordering scheme to generate a graph representation.

SNR [Li et al.2017] uses the similar graph convolution layer as our method. The difference is that this work introduces an additional node that sums all nodes features up to a graph representation.
4.3 Binary classification
Settings. We divide each CCI dataset into a training dataset and a testing dataset with ratio , and randomly choose of the training dataset as a validation dataset. We set the three graph convolution layers with , , output units, respectively. We set output units of graphgather layers as the same as the LSTM layer. The fully connected layer has
hidden units followed by a softmax layer as the output layer. We set the learning rate to
. To evaluate the experimental results, we choose four metrics: area under ROC curve (AUC), accuracy, recall, and F1.Results. Table 2
shows the performance of different methods. MRGNN performs the best in terms of all of the evaluation metrics. Compared with the stateoftheart method DeepCCI, our MRGNN improves accuracy by
, F1 by , recall by , and AUC by . As for little improvement of AUC, we think it is ascribed to the fact that the basic value is too large to provide enough space for improvement. When translated into the remaining space, the AUC is increased by . The performance improvement proves that features extraction of MRGNN, which represents structured entities as graphs for features extraction, is more effective than DeepCCI, which treats SMILES string as character sequence without considering topological information of structured entities. Compared with PIP, the performance of MRGNN demonstrates that the multiresolution based architecture is more effective than the fixsized RF based framework. In addition, compared with SNR which directly sums all node features to get the graph representation, experimental results prove that our SLSTM summarizes the local features more effectively and more comprehensively. We attribute this improvement to the diffusion process and the interaction that our graphstate LSTM modeled during the procedure of feature extraction, which is effective for the prediction.4.4 Multiclass classification
Settings. To make an intuitional comparison, similar to DeepDDI, we use , , of dataset for the training, validation and testing, respectively. All hyperparameter selections are the same as the binary classification task. To evaluate the experimental results, we choose five metrics on the multiclassification problem: AUPRC, Micro average, Macro recall, Macro precision, and Macro F1. (In particular, we choose the AUPRC metric due to the imbalance of the DDI dataset.) We show the results on DDI dataset in Table 3.
Mi_avg  Ma_recall  Ma_pre  Ma_F1  AUPRC  

PIP  
SNR  
DGCNN  
DeepCCI  
DeepDDI  
MRGNN  
no ILSTM  
no SLSTM  
no wGCL  
no LSTMs 
Results. We observe that MRGNN performs the best in terms of all five evaluation metrics. MRGNN improves these five metrics by , , , and , respectively. Compared with the stateoftheart method DeepDDI, the performance improvement of MRGNN is attributed to the higher quality representations learned by endtoend training instead of the humandesigned representation called SSP. In addition, we also conduct experiments on CCI and DDI datasets, and we observe that MRGNN indeed improves performance.
Ablation experiment. We also conducted ablation experiments on the DDI dataset to study the effects of three components in our model (namely SLSTM, ILSTM, and weighted GCL). We find that each of these three components can improve performance. Among them, weighted GCLs contributes most significantly, then comes SLSTM and ILSTM.
4.5 Efficiency and robustness
In the third experiment, we conduct experiments to analyze the efficiency and robustness of MRGNN.
Effects of training dataset size. We carried out a comparative experiment with different size of training datasets from to on the CCI900 dataset. In each comparative experiments, we kept the same of the dataset as the test dataset to evaluate the performance of all six methods. Figure 4(a) shows that MRGNN always performs the best under different training dataset size. In particular, as the training dataset proportions increases, the improvement of MRGNN increases significantly, demonstrating that our MRGNN has better robustness. This is due to the fact that MRGNN is good at learning subgraph information of differentsized receptive fields, especially subgraphs of small receptive fields that often appear in various graphs.
Training efficiency. Figure 4(b) shows that the training time of MRGNN is at a moderate level among all methods. Although the graphstate LSTMs takes the additional time, the training of MRGNN is still fast and acceptable.
Effects of hyperparameter variation. In this experiment, we consider the impact of hyperparameters of MRGNN: the output units number of GCLs () and LSTMs (), the hidden units number of the fully connected layer (), and . The results are shown in Fig. 5. We see that the impact of hyperparameter variation is insignificant (the absolute difference is less than ). Fig. 5(a) shows that larger provides a better performance (with an salient point at ). Fig. 5(b) shows that similar result of while a salient point is at . The performance increases fast when and slightly declines when . As for and , the best point appears at and , respectively.
Result on CCI900: a) Accuracy under different training set proportions; b) Training time per epoches.
5 Related Work
Nodelevel Applications. Many neural network based methods have been proposed to solve the nodelevel tasks such as node classification [Henaff et al.2015, Li et al.2015, Defferrard et al.2016, Kipf and Welling2016, Velic̈kovic et al.2018], link prediction [Zhang and Chen2018, Zhang et al.2018b], etc. They rely on node embedding techniques, including skipgram based methods like DeepWalk [Perozzi et al.2014] and LINE [Tang et al.2015]
, autoencoder based methods like SDNE
[Wang et al.2016], neighbor aggregation based methods like GCN [Defferrard et al.2016, Thomas and Welling2017] and GraphSAGE [Hamilton et al.2017a], etc.Single Graph Based Applications. Attention also has been paid on the graphlevel tasks. Most existing works focus on classifying graphs and predicting graphs’ properties [Duvenaud et al.2015, Atwood and Towsley2016, Li et al.2017, Zhang et al.2018a] and they compute one embedding per graph. To learn graph representations, the most straightforward way is to aggregate node embeddings, including averagebased methods (simple average and weight average) [Li et al.2017, Duvenaud et al.2015, Zhao et al.2018], sumbased methods [Hamilton et al.2017b] and some more sophisticated schemes, such as aggregating nodes via histograms [Kearnes et al.2016] or learning node ordering to make graphs suitable for CNN [Zhang et al.2018a].
Pairwise Graph Based Applications. Nowadays, very little neural network based works pay attention to the pairwise graph based tasks whose input is a pair of graphs. However, most existing works focus on learning “similarity” relation between graphs [Bai et al.2018, Yanardag and Vishwanathan2015] or links between nodes across graphs [Alex et al.2017]. In this work, we study the prediction of the universal graph interactions.
6 Conclusion
In this paper, we propose a novel graph neural network, i.e., MRGNN, to predict the interactions between structured entities. MRGNN can learn comprehensive and effective features by leveraging a multiresolution architecture. We empirically analyze the performance of MRGNN on different interaction prediction tasks, and the results demonstrate the effectiveness of our model. Moreover, MRGNN can easily be extended to large graphs by assigning node weights to node groups that based on the distribution of node degrees. In the future, we will apply it to more other domains.
Acknowledgments
The research presented in this paper is supported in part by National Key R&D Program of China (2018YFC0830500), National Natural Science Foundation of China (UI736205, 61603290), Shenzhen Basic Research Grant (ICYJ20170816100819428), Natural Science Basic Research Plan in Shaanxi Province of China (2019JM159), Natural Science Basic Research Plan in ZheJiang Province of China (LGG18F020016).
References
 [Alex et al.2017] Fout. Alex, Byrd. Jonathon, Shariat. Basir, and BenHur. Asa. Protein interface prediction using graph convolutional networks. In NIPS, pages 6530–6539, 2017.
 [AltaeTran et al.2017] Han AltaeTran, Bharath Ramsundar, Aneesh S Pappu, and Vijay Pande. Low data drug discovery with oneshot learning. ACS CENTRAL SCI, 3(4):283–293, 2017.
 [Atwood and Towsley2016] James. Atwood and Don. Towsley. Diffusionconvolutional neural networks. In NIPS, pages 1993–2001, 2016.
 [Bai et al.2018] Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, and Wei Wang. Graph edit distance computation via graph neural networks. arXiv:1808.05689, 2018.
 [Defferrard et al.2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pages 3844–3852, 2016.
 [Duvenaud et al.2015] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán AspuruGuzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. In NIPS, pages 2224–2232, 2015.
 [Hamilton et al.2017a] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024–1034, 2017.
 [Hamilton et al.2017b] William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. 2017.
 [Henaff et al.2015] Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graphstructured data. arXiv:1506.05163, 2015.
 [Kearnes et al.2016] Steven Kearnes, Kevin Mccloskey, Marc Berndl, Vijay Pande, and Patrick Riley. Molecular graph convolutions: moving beyond fingerprints. J COMPUT AID MOL DES, 30(8):1–14, 2016.
 [Kipf and Welling2016] Thomas N Kipf and Max Welling. Semisupervised classification with graph convolutional networks. arXiv:1609.02907, 2016.
 [Kwon and Yoon2017] Sunyoung. Kwon and Sungroh. Yoon. Deepcci: Endtoend deep learning for chemicalchemical interaction prediction. arXiv:1704.08432, 2017.
 [Li et al.2015] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. arXiv:1511.05493, 2015.
 [Li et al.2017] Junying Li, Deng Cai, and Xiaofei He. Learning graphlevel representation for drug discovery. arXiv:1709.03741, 2017.
 [Perozzi et al.2014] Bryan Perozzi, Rami AlRfou, and Steven Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701–710, 2014.
 [Ryu et al.2018] J. Y. Ryu, H. U. Kim, and S. Y. Lee. Deep learning improves prediction of drugdrug and drugfood interactions. PNAS, 115(18):E4304, 2018.

[Sukhbaatar et al.2016]
Sainbayar Sukhbaatar, Rob Fergus, et al.
Learning multiagent communication with backpropagation.
In in NIPS, pages 2244–2252, 2016.  [Tang et al.2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In WWW, pages 1067–1077, 2015.
 [Thomas and Welling2017] N Kipf Thomas and Max Welling. Semisupervised classification with graph convolutional networks. arxiv preprint. In ICLR, 2017.
 [Tian et al.2016] K. Tian, M. Shao, Y. Wang, J. Guan, and S. Zhou. Boosting compoundprotein interaction prediction by deep learning. Methods, 110:64–72, 2016.
 [Velic̈kovic et al.2018] Petar Velic̈kovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv:1806.03536, 2018.
 [Wang et al.2016] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In SIGKDD, pages 1225–1234, 2016.
 [Weininger1988] David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci., 28(1):31–36, 1988.
 [Yanardag and Vishwanathan2015] Pinar Yanardag and S. V. N. Vishwanathan. Deep graph kernels. In SIGKDD, pages 1365–1374, 2015.
 [Zhang and Chen2018] Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. arXiv:1802.09691, 2018.
 [Zhang et al.2018a] Muhan Zhang, Zhicheng Cui, and Y. Neumann, M. & Chen. An endtoend deep learning architecture for graph classification. In AAAI, 2018.
 [Zhang et al.2018b] Yuhao Zhang, Peng Qi, and Christopher D Manning. Graph convolution over pruned dependency trees improves relation extraction. arXiv:1809.10185, 2018.
 [Zhao et al.2018] Xiaohan Zhao, Bo Zong, Ziyu Guan, Kai Zhang, and Wei Zhao. Substructure assembling network for graph classification. AAAI, 2018.
 [Zitnik et al.2018] Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13):457–466, 2018.