DeepAI
Log In Sign Up

SHINE: SubHypergraph Inductive Neural nEtwork

Hypergraph neural networks can model multi-way connections among nodes of the graphs, which are common in real-world applications such as genetic medicine. In particular, genetic pathways or gene sets encode molecular functions driven by multiple genes, naturally represented as hyperedges. Thus, hypergraph-guided embedding can capture functional relations in learned representations. Existing hypergraph neural network models often focus on node-level or graph-level inference. There is an unmet need in learning powerful representations of subgraphs of hypergraphs in real-world applications. For example, a cancer patient can be viewed as a subgraph of genes harboring mutations in the patient, while all the genes are connected by hyperedges that correspond to pathways representing specific molecular functions. For accurate inductive subgraph prediction, we propose SubHypergraph Inductive Neural nEtwork (SHINE). SHINE uses informative genetic pathways that encode molecular functions as hyperedges to connect genes as nodes. SHINE jointly optimizes the objectives of end-to-end subgraph classification and hypergraph nodes' similarity regularization. SHINE simultaneously learns representations for both genes and pathways using strongly dual attention message passing. The learned representations are aggregated via a subgraph attention layer and used to train a multilayer perceptron for inductive subgraph inferencing. We evaluated SHINE against a wide array of state-of-the-art (hyper)graph neural networks, XGBoost, NMF and polygenic risk score models, using large scale NGS and curated datasets. SHINE outperformed all comparison models significantly, and yielded interpretable disease models with functional insights.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/31/2022

Message Passing Neural Networks for Hypergraphs

Hypergraph representations are both more efficient and better suited to ...
06/02/2021

A Hypergraph Convolutional Neural Network for Molecular Properties Prediction using Functional Group

We propose a Molecular Hypergraph Convolutional Network (MolHGCN) that p...
06/18/2020

Subgraph Neural Networks

Deep learning methods for graphs achieve remarkable performance on many ...
06/30/2021

Edge Representation Learning with Hypergraphs

Graph neural networks have recently achieved remarkable success in repre...
06/24/2021

You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks

Hypergraphs are used to model higher-order interactions amongst agents a...
12/15/2020

PANTHER: Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning

Genetic pathways usually encode molecular mechanisms that can inform tar...
10/17/2019

Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy

We present an extension of our Molecular Transformer architecture combin...

1 Introduction

Hypergraph neural networks have recently emerged as a series of successful methods to model multi-way connections that are beyond pairwise associations among nodes of the graphs. Multi-way connections are common in many real-world applications and, in particular, genetic medicine. From genetic medicine’s perspective, pathways or broadly speaking gene sets encode the relationship among multiple genes that collectively correspond to a molecular function (Liberzon et al., 2015)

, which can be used in machine learning models to account for disease mechanisms more intuitively and accurately than individual genes 

(Luo and Mao, 2021). Genetic pathways or gene sets encode functional relations among multiple genes (see Appendix for detailed explanations), which can be naturally modeled as hyperedges connecting all involved nodes (e.g., genes). Thus, hypergraph-guided embedding can capture functional relations in learned representations.

Existing hypergraph neural network models often adopt the semi-supervised learning (SSL) paradigm to assign labels to initially unlabeled nodes in a hypergraph 

(Feng et al., 2019; Yadati et al., 2018; Yadati, 2020). Other methods have focused on learning graph representations (Zhang et al., 2020; Ding et al., 2020). Node-level and graph-level representations give either local or overarching views of the graphs, i.e., at the two extremes of hypergraph topological structures. There is an unmet need in learning powerful representations of subgraphs in hypergraphs. Such capabilities are important in real-world applications such as genetic medicine. For example, cancer patients can be viewed as subgraphs of genes that harbor mutations, while all the genes are connected by hyperedges that correspond to pathways or gene sets representing specific molecular functions. Powerful subgraph representations will lead to the capability to more accurately account for the patient’s pathophysiology. For regular graphs where edges connect node pairs, several subgraph representation learning algorithms were proposed, including methods that can use the learned representations to make predictions for subgraphs with fixed sizes (Meng et al., 2018) or varying sizes (Alsentzer et al., 2020). There are currently few if any work on inductive inference for varying-sized subhypergraphs. In this work, we propose a new framework named SHINE: SubHypergraph Inductive Neural nEtwork. We share our source code at https://github.com/luoyuanlab/SHINE. Our contributions are as follows:

  • To the best of our knowledge, SHINE is the first model to effectively learn subgraph representations for hypergraphs, use the learned representations (for seen subgraphs) and inductively infer representations (for unseen subgraphs) for downstream subgraph predictions.

  • Novel applications in the field of genetic medicine on Next Generation Sequencing (NGS) datasets across diverse diseases show significant performance improvements by SHINE over a wide array of state-of-the-art baselines.

  • In addition to learning and inductively inferring subgraph representations, SHINE simultaneously learns the representations of nodes and hyperedges. This brings interpretation advantages, allowing assessing pathways (hyperedges) correlations and reasoning on multiple molecular functions mutually interacting and collectively contributing to disease onset and progression.

2 Related Work

Graph Neural Networks.

Graph representation learning maps graphs or their components to vector representations and has attracted growing attention over the past decade. Recently, graph neural networks (GNNs), which can learn a distributed representation for a graph or a node in a graph, are widely applied to a variety of areas including computer vision and image processing 

(Wei et al., 2020; Mao et al., 2022), molecular structure inference (Duvenaud et al., 2015; Gilmer et al., 2017)

, natural language processing 

(Yao et al., 2019; Peng et al., 2018; Li et al., 2019), and healthcare (Zitnik et al., 2018; Mao et al., 2019). GNN recursively updates the representation of a node in a graph by aggregating the feature vectors of its neighbors and itself, e.g. (Kipf and Welling, 2016). The graph-level representations can then be obtained through set pooling (e.g., (Vinyals et al., 2015)) or graph coarsening (e.g., (Ying et al., 2018)) to aggregate the node representations in the graph. The reader is referred to a comprehensive book (Hamilton, 2020) on the topic of graph neural networks.

Hypergraph neural network. Hypergraph neural networks (Zhang et al., 2020; Yadati et al., 2018; Feng et al., 2019; Ding et al., 2020) have become a popular approach for learning on multi-way relations from data. Early work on hypergraph learning, e.g.,  (Zhou et al., 2006), formulated hypergraph message passing using spectral theory of hypergraphs. This formulation and its variants (Feng et al., 2019; Yadati et al., 2018; Jin et al., 2019; Jiang et al., 2019; Satchidanand et al., 2015; Feng et al., 2018) essentially adopted clique expansion to extend graph convolutional network (GCN) for hypergraph learning. Others methods applied attention mechanism to aggregate the information across the hypergraph (Zhang et al., 2020; Ding et al., 2020) or directly learned node representations to preserve the proximity of nodes sharing a hyperedge or having similar neighborhoods (Tu et al., 2018). In both formulations, messages were passed to the node of interest from its immediate neighbors, and added layers allow propagation of messages to a farther neighborhood. A very recent model implements hypergraph neural network layers in a generalized way as compositions of two multiset functions that are approximated by neural networks (Chien et al., 2021).

Subgraph representation learning and prediction. Recent studies on subgraph embeddings and prediction starts with learning representations of small subgraphs.  (Meng et al., 2018) encoded small fixed-sized subgraphs for subgraph evolution prediction. SubGNN (Alsentzer et al., 2020) learned representations for varying-sized subgraphs through neighborhood, position and structure channels using random patches distributed throughout the graph.  (Huang and Zitnik, 2020; Sun et al., 2021) learned subgraph representations by pooling local structures to aid the predictions of the entire graphs. Note modeling hyperedges as another type of nodes turns hypergraphs into bipartite (and heterogeneous) graphs, making SubGNN a potential baseline for subhypergraph inferencing. On the other hand, most of the existing general heterogeneous graph neural network models do not support subgraph inferencing (Wang et al., 2019; Zhang et al., 2019; Fu et al., 2019, 2020; Hu et al., 2020).

The intersection of hypergraph neural network and subgraph representation learning is currently underexplored. While the above methods focus on either hypergraph learning or subgraph learning, none of the methods consider subgraph prediction for hypergraphs. Technically, subgraphs can be viewed as a hyperedge and studies on link prediction could predict the existence of a hyperedge (Zhang et al., 2018). However, few if any such studies addressed the problems of differentiating the classes of the subgraphs, which is especially important in genetic medicine where subgraphs and hyperedges have different real-world meanings. For example, a hyperedge corresponds to a genetic pathway from curated knowledge and a subgraph corresponds to a patient with mutated genes as its nodes.

To sum, there is a major unmet need regarding varying-sized subgraph inference for hypergraphs, and even more so in the inductive learning setting. Our proposed framework SHINE provides an end-to-end framework that operates on hypergraphs and performs inductive subgraph inferencing.

3 Methods

We first outline the workflow of SHINE, see Table 1 for symbol definitions. We develop a strongly dual attention message passing algorithm to propagate information between nodes and hyperedges, and across layers. We develop a weighted subgraph attention mechanism to learn the subgraph representation by integrating representations of hypergraph nodes. We next explain each step.

Symbol Definition Symbol Definition
An undirected hypergraph , Set of hypergraph nodes and size
, Set of hypergraph hyperedges and size Hidden layer size
Hyperedge incidence matrix Operation composition
Number of subgraphs * Element-wise multiplication
Table 1: Common notations used throughout the paper.

3.1 Collecting Genetic Pathways

We use the Molecular Signatures Database (MSigDB) (Liberzon et al., 2015) and focus on MSigDB’s curated pathway (gene set) collection, which contains human gene sets that are canonical representations of a biological process compiled by domain experts. There are 21,587 genes in MSigDB Pathways. Some pathways may overlap with others and have been filtered by MSigDB to remove interset redundancy. Genes with unknown functions are not included in the pathways and not used for classification, as our focus here is on interpretable modeling through inferencing with known molecular functions. Adding genes with unknown functions to study their impact will be our future work.

3.2 Hypergraph Learning

We first review the hypergraph analysis basics. Different from a simple graph, a hyperedge in a hypergraph connects two or more vertices. A hypergraph is defined as , which includes a set of nodes , a set of hyperedges . In the case of genetic medicine, we model genes as hypergraph nodes, i.e., , and pathways as hyperedges, i.e., , where are sizes of the nodes and hyperedges respectively. The hypergraph’s topological structure can be denoted by an incidence matrix , whose entries are defined as

(1)

Generally speaking, each node in the hypergraph may be accompanied by a -dimensional node feature/embedding matrix , where each row corresponds to a node’s feature/embedding. The hypergraph with its topological structure and node features can be represented succinctly as .

Fig. 1 (a) shows a schematic of the constructed genome hypergraph with nodes denoted by circles and hyperedges denoted by colored arcs. While a pathway can contain multiple genes, a gene can also contribute to multiple pathways. That is, we can have multiple hyperedges incident on the same node (gene), as can be seen in Fig. 1 (a) nodes .

Figure 1: SHINE’s strongly dual attention mechanism for message passing for the genome hypergraph, and its use of subgraph attention to integrate gene nodes in the feature learning for subgraphs.

3.3 Strongly Dual Attention Message Passing

Hyperedge attention over nodes. For a hyperedge

, in order to update its hidden representation at layer

, we aggregate the information from its incident nodes using the following attention mechanism. We first calculate the hyperedge attention over nodes as in

(2)

where is a trainable context vector and the attention ready state for a hyperedge-node pair is calculated from the th layer as in

(3)

where denotes element-wise product, and ( and ) are the transformation weights and bias of the nodes (the hyperedges) for the attention ready state. This is motivated by the observation that different nodes (genes) contribute differently to the hyperedges (pathways), thus we need proper attentions across the nodes to up- or down-weight their contributions when aggregating their representations to compute the representation of the hyperedge. Once we have the hyperedge attention over nodes, we calculate the hyperedge’s representation in layer from the nodes’ representations in layer (equation 4) where

is the nonlinearity layer (ReLU in our experiment).

(4)

Node attention over hyperedges. For a node , in order to update its hidden representation at layer , we aggregate the information from its incident hyperedges using the following attention mechanism. We first calculate the node attention over hyperedges as in

(5)

where is the same trainable context vector as used in hyperedge attention calculation and the attention ready state for hyperedge-node pair is calculated as in equation 3. This allows us to have the node’s attention over the hyperedges, i.e., we can weight hyperedges’ contributions when aggregating their representations to compute the representation of the node. We calculate the node’s representation in layer from the hyperedges’ representations in layer as in

(6)

Note that different from HyperGAT, here the calculation of hyperedge’s and node’s attentions share the same underlying dual-attention matrix as shown in Fig. 1 (b), which is essentially unstandardized covariance matrix. Such parameter sharing across hyperedges and nodes allows us to cross-regulate the learning of their mutual attentions to prevent overfitting. This difference not only allows for simplification of the model, but also is more consistent with the notion of duality. The dual of the hypergraph is a hypergraph with ’s vertices and edges interchanged, and we should have . It is easily provable that the dual-attentions for are the same as those for . Such a self-dual statement is generally not true for the attentions proposed in HyperGAT due to their unsymmetrical way of calculating the node-level and the edge-level attentions, despite that the HyperGAT attention was termed as “dual” attention. For this reason, we term our attention message passing scheme as strongly dual attention message passing.

Hypergraph regularization. One important intuition about graph and hypergraph convolutional network is that the learned representations for nodes with similar context of (hyper)edges should be similar. In the case of a simple graph , this is to minimize the summed distance or its weighted variants. Instead of using it as an explicit regularizer, graph or hypergraph convolutional networks leverage an appropriately defined graph or hypergraph Laplacian. As noted in (Zhou et al., 2006), the hypergraph Laplacian is where

is the identity matrix and

is defined as (let be a diagonal matrix with diagonal entries as hyperedge weights)

(7)

Here, different from hypergraph convolutional networks, we use explicit regularization on the similarity of representations of nodes with similar hyperedge context. Let be the matrix of the learned nodes’ representations, where row and the th layer is the last hypergraph message passing layer. We can define the regularizer as

(8)

Intuitively, the more hyperedges are incident on the node pair , the more we should penalize their representational differences. On the other hand, the regularizer down-weights the penalization if a hyperedge connects many nodes or if a node has many incident hyperedges, indicating lack of specificity for hyperedges and nodes, respectively.

3.4 Weighted Subgraph Attention

The multiple layers of strongly dual attention message passing allow learning the nodes’ and hyperedges’ representations. However, the instance for the classification algorithm is a subgraph (e.g., a patient, who has mutations in multiple genes (nodes)). From the hypergraph perspective, a patient can be considered as a subhypergraph whose nodes (genes) have mutations in and are a subset of those of . This is shown in Fig. 1 (c) where different node colors in the hypergraph denote different patients. In order to calculate the subgraph’s representation from its component nodes’ representations at the th layer, we use the following weighted subgraph attention (WSA) mechanism, inspired by Li et al. (2015). In fact, none of the previous hypergraph methods support subgraph inferencing, and we had to add our WSA module to those models for subgraph inferencing as well. We first compute the subgraph attention over nodes (e.g., ’s) as in

(9)

where is a trainable context vector, is the mutation rate feature matrix with each row corresponding to a patient and each column corresponding to a gene. Thus, equation 9 is a mutation rate weighted subgraph attention mechanism. This choice conforms to the intuition that the rate of a mutation is more informative than a categorical indicator of the mutation’s occurrence. With these subgraph level attentions, we compute the patient (subgraph) representation from the th layer’s gene representations as in

(10)

We then stacked the learned patient representations to form the new patient feature matrix, as in

(11)

where each row is a patient (subgraph) embedding.

3.5 Inductive Classification on Subgraphs

Let the learned feature matrix be

and feed it into a softmax classifier

(12)

where (2) in the superscript indicates two MLP layers (

=Fully Connected layer). The loss function is defined as the cross-entropy error over all subjects in all classes as in

(13)

where is the training set of subjects that have labels and is the dimension of the output labels, which is equal to the number of classes. is the label indicator matrix. Note that the subgraph attention layer allows us to compute any patient’s representation, which effectively eliminates the need for access to test set patient features during training, making the model inductive. Existing models such as HyperGCN and HGNN are transductive, we cascade our subgraph attention layer on top of these models and make them inductive to serve as our comparison models.

4 Experiments

We conducted experiments on real-world datasets in genetic medicine. Both datasets have more than 20 different classes, indicating significant complexity of the prediction tasks. These datasets are different in nature, e.g., curated from literature vs. obtained directly from high-throughput sequencing, and multi-class vs. multi-class multi-label. Our experiments are motivated by the fact that massive genomic data call for novel methods and present unique technical challenges, in this case inductive subgraph inferencing on hypergraph. The summary statistics for each dataset is shown in Table 2, and the description of each dataset is as follows. Most pathways have small to medium sizes, see Table 2 pathway sizes IQR. In fact, even at the 95th percentile, the pathway size is just over 200. On the other hand, we observed that the larger the pathway (hyperedge), the more subgraph it is incident on, and the less attention our model will give it as a discriminative feature. The DisGeNet and the TCGA-MC3 datasets are publicly available111DisGeNet: https://www.disgenet.org/; TCGA-MC3: https://gdc.cancer.gov/about-data/publications/mc3-2017, and this study is approved by Northwestern University Institutional Review Board.

4.1 Disease Type Prediction with DisGeNet Data

In this experiment, we have used the DisGeNet dataset (Piñero et al., 2016) that is a collection of mutated genes involved in human diseases compiled from expert curated repositories, GWAS catalogs, animal models and the scientific literature. In the following text, we abuse terminology to use “gene” to really mean "variants in the gene". We model genes as hypergraph nodes and diseases as hyperedges. Each disease is labeled with one or more of 22 MeSH codes, and the task is a multi-class multi-label classification problem. We used 6:2:2 train:validation:test partition, and the split distribution is shown in Appendix. The DisGeNet dataset has 6226 pathways and 9133 genes involved in 8383 diseases.

4.2 Cancer Type Prediction with NGS Somatic Mutations Data

In this experiment, we have used the consensus somatic mutations for TCGA subjects produced by the Multi-Center Mutation Calling in Multiple Cancers (MC3) project Ellrott et al. (2018). Aiming to enable robust cross-tumor-type analyses, the MC3 approach applied an ensemble of 7 mutation-calling algorithms and assigned a PASS identifier to a mutation that was called by 2 or more variant callers out of the total 7 callers (Ellrott et al., 2018)

. The MC3 approach accounted for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, and sequencing over time. Following this approach, we restricted our analysis to PASS calls in order to maintain sample sizes and uniformity in mutation calling. Each subject is labeled with one of 25 cancer types, and the task is a multi-class classification problem. We used 6:2:2 train:validation:test partition, stratified by cancer types, and the split distribution is shown in Appendix. The TCGA-MC3 dataset has 6229 pathways and 18059 genes involved in 9012 subjects in total.

Dataset # hypernodes # hyperedges Hyperedge size # classes # subgraphs
DisGeNet 9133 6226 25 (12 - 57) 22 8383
TCGA-MC3 18059 6229 33 (15 - 77) 25 9012
Table 2:

Real-world hypergraph datasets for subgraph inference used in our work. For hyperedge sizes, shown are medians and interquartile ranges. From the distribution, it is clear that the hyperedge size has a positive skewness.

4.3 Baselines

We compared SHINE with the following state-of-the-art baselines, we use validation datasets to tune parameters and hyperparameters, please see Appendix for details.

  • Hypergraph neural networks (HGNN(Feng et al., 2019) uses clique expansion to transform the hypergraph to graph and uses Chebyshev approximation to derive a simplified hypergraph convolution operation.

  • HyperGCN (Yadati et al., 2018) represents a hyperedge by a selected pairwise simple edge connecting two most unlike nodes, and adds the remaining nodes in the hyperedge as mediators.

  • HyperGAT (Ding et al., 2020) learns node representations by aggregating information from nodes to edges and vice versa. Different from SHINE, HyperGAT uses alternating attention instead of strictly dual attention and has no regularization on nodes with similar context of hyperedges.

  • AllSetTransformer and AllDeepSets are two variants (attention-based and MLP-based respectively) of set-based methods (e.g., compositions of two multiset functions that are permutation invariant on their input multisets) for exploiting hyperedges in hypergraphs (Chien et al., 2021).

  • SubGNN (Alsentzer et al., 2020) was applied to the hypergraph by viewing the nodes and hyperedges as two types of vertices of a bipartite graph (Wang et al., 2022).

  • The multilayer perceptron (MLP

    ) baseline evaluates how a simple feed-forward neural network with hypergraph regularization and subgraph attention performs, by replacing dual attention with MLP.

  • Polygenic risk score (PRS) is currently a widely used standard practice in genetic medicine and calculates disease risk from genotype profile using regression (Choi et al., 2020).

  • Non-negative matrix factorization (NMF) discovers low-dimensional structure from high-dimensional multi-omic data and enables inference of complex biological processes (Stein-O’Brien et al., 2018).

  • XGBoost is an end-to-end tree boosting system and a state-of-the-art machine learning method (Chen and Guestrin, 2016) that frequently achieves the top results on many machine learning challenges.

To assess whether performance changes are due to added information (e.g., pathway information) and/or better utilization of the added information, we run PRS, NMF, and XGBoost in the following three settings: gene features only, pathway features only, and both gene and pathway features.

5 Results

The held-out test set micro-averaged F1 scores (micro-F1) for our proposed method SHINE and all the other comparison models are in Table 3

. Comparing all the models, we can see that SHINE clearly outperforms a comprehensive array of state-of-the-art baselines in various configurations, with non-overlapping standard deviation intervals. PRS is indeed a competitive baseline, as can be seen from its close performance compared with XGBoost that frequently topped many machine learning challenges’ leaderboards. Previously state-of-the-art hypergraph neural network models (HyperGCN, HGNN, HyperGAT) do not always outperform PRS and XGBoost (e.g., on the TCGA-MC3 dataset). On the other hand, pathway as features could improve performance if used properly, whether alone or jointly with genes. This comparison shows that genetic pathway information is useful to disease type classification, consistent to the intuition that pathways encode molecular functional mechanisms that underlie the disease etiology. However, properly utilizing such information is non-trivial, as evidenced by the difficulty to outperform PRS and XGBoost models by NMF models and hypergraph models including HyperGCN, HGNN and HyperGAT. Given that difficulty, SHINE still attained the best performance on each dataset. Intuitively speaking, HyperGCN and HGNN focus on similarity regularization: hypergraph nodes with similar context of hyperedges should have similar representations. HyperGAT’s attention mechanism gears more towards minimizing the classification loss. SHINE, in an attempt to balance the similarity regularization with the end-to-end classification task via strongly dual attention mechanism, achieved better trade-off between the two objectives and effectively integrated the functional pathway’s (hyperedge) information with individual gene’s (node) information.

The importance of the strong duality follows naturally from that SHINE outperforms SubGNN (bipartite) with non-overlapping standard deviation intervals and wide separation. In addition, although MLP has been frequently used to approximate a target function, in the setting of large hypergraph (e.g., both hypergraphs have a few thousand-nodes hyperedges), it can still be quite challenging to approximate an ideal target function and explicit dual attention formulation wins out. The results from both AllDeepSets and AllSetTransformer have non-overlapping standard deviation intervals, in fact wide separation, with their counterparts from SHINE. These results echo with our observations that strongly dual attention explores the hypergraph propagation from a different angle than both AllDeepSets and AllSetTransformer, and suggest that effectively combining both angles could be an interesting future direction. Also note that in general, we have some performance drop when moving from the DisGeNet dataset to the TCGA-MC3 dataset, likely due to the fact that the former uses genetic features from curated literatures and the latter is from high throughput sequencing intended for data-driven discovery. Both complex classification tasks (20 classes) are uniquely challenging because diseases may have overlapping disrupted molecular functions (genetic pathways, hyperedges), especially for the TCGA-MC3 experiment that is distinguishing subcategories of similar diseases as they are loosely all cancers. In addition, both tasks exhibit class distributional shift between the train and the test datasets, as shown in Appendix Tables 5 and 6, and have been designed to require inductive inference on subgraphs with highly variable hyperedge sizes. Strong performance of SHINE on these tasks thus suggests that our model can leverage its relational inductive biases for more robust generalization. Ablation studies further confirmed the contributions from each of the key components, including strictly dual attention massage passing, weighted subgraph attention and hypergraph regularization (see Appendix for details).

Model Feature DisGeNet Dataset TCGA-MC3 Dataset
Metrics Test Micro F1 Test Micro F1
PRS gene 0.6303 0.4981
PRS pathway 0.6461 0.5047
PRS gene+pathway 0.6512 0.5042
XGBoost gene 0.6259 0.0012 0.4927 0.0058
XGBoost pathway 0.6467 0.0035 0.4936 0.0092
XGBoost gene+pathway 0.6486 0.0036 0.5117 0.0084
NMF gene 0.6167 0.0040 0.4181 0.0125
NMF pathway 0.5867 0.0039 0.4842 0.0057
NMF gene+pathway 0.5847 0.0045 0.4839 0.0032
SubGNN (bipartite) gene+pathway 0.6137 0.0097 0.4025 0.0049
HyperGCN gene+pathway 0.6638 0.0028 0.4384 0.0095
HGNN gene+pathway 0.6809 0.0027 0.4504 0.0042
HyperGAT gene+pathway 0.6495 0.0050 0.4721 0.0032
MLP gene+pathway 0.6331 0.0056 0.4249 0.0165
AllDeepSets gene+pathway 0.6309 0.0147 0.4324 0.0220
AllSetTransformer gene+pathway 0.6355 0.0160 0.4904 0.0158
SHINE gene+pathway 0.6955 0.0034 0.5319 0.0049
Table 3: Held-out test set micro-F1 on real-world datasets. Standard deviations are provided from runs with 10 random seeds. SHINE significantly outperforms all the state-of-the-art comparison models. PRS: polygenic risk score. NMF: non-negative matrix factorization. Best model in bold.

Model interpretation.

SHINE simultaneously learns the representations of nodes and hyperedges, which are then used to learn and inductively infer subgraph representations. This brings some interpretation advantages as it allows assessing pathways (hyperedges) correlations and reasoning multiple molecular functions mutually interacting and collectively contributing to the disease onset and progression. In addition, SHINE has built-in measures to prevent or discourage genes belonging to the same functional class (e.g., promoting immune reactions) from having drastically different representations (e.g., opposite directions), a phenomenon that will pose interpretation difficulty to other models that do not employ SHINE’s hypergraph regularization. We identify the top pathways that are enriched in different cancers using the attention weights learned for SHINE, as shown in Table 4. From the table, we see that many of the listed pathways reflect innate key events in the development of individual or multiple types of cancers, consistent with genetic and medical knowledge from wet lab (e.g., TNF/Stress Related Signaling (Mercogliano et al., 2020)). We showcase interpretations for breast cancer and lung cancer here, and refer the reader to the Appendix for full interpretation of Table 4. For breast cancer, TNF is not only closely involved in its onset, progression and in metastasis formation, but also linked to therapy resistance (Mercogliano et al., 2020). Regarding the 4-1BB pathway, studies have suggested HER2/4-1BB bispecific molecule as a candidate of alternative therapeutic strategy to patients in HER2-positive breast cancer (Hinner et al., 2019). VIP/PACAP and their receptors have prominent roles in transactivation of the Epidermal growth factor (EGF) family and growth effects in breast cancer (Moody et al., 2016). For lung cancer, the ErbB3 receptor recycling controlled by neuroregulin receptor degradation protein-1 is linked to lung cancer and small inhibitory RNA (siRNA) to ErbB3 shows promise as a therapeutic approach to treatment of lung adenocarcinoma (Sithanandam and Anderson, 2008). Lung cancer is also modulated by multiple miRNAs interacting with the TFAP2 family (Kołat et al., 2019).

BRCA LUAD LGG HNSC
Stress pathway PTK6 stabilizes HIF1 Citrate cycle TCA cycle Apoptotic factor response
4-1BB pathway ErbB3 pathway Cytosine methylation Programmed cell death
VIP pathway Hypertrophic TCA cycle and deficiency

MECP2 regulates neuronal

cardiomyopathy of pyruvate dehydrogenase receptors and channels
CD40 pathway Diseases of metabolism Glutathione metabolism FRA pathway
TOLL pathway TFAP2 regulates growth Digestion of Caspase activation via
factors transcription dietary carbohydrate extrinsic apoptotic signalling
Table 4: Top enriched genetic pathways associated with different cancer risks. The text color indicates the source database for pathways that MSigDB integrated: BioCarta, Reactome, WikiPathways, Pathway Interaction Database, KEGG.

6 Discussion, Limitation and Future Work

In addition to being significantly more accurate and interpretable, SHINE uses inductive subgraph inferencing that works well with minibatch, and scales well to large scale problems, as showcased by real-world experiments. It is known that GNN suffers from over-smoothing when the number of layers increases, as increasingly globally uniform representation of nodes may be developed. On the other hand, attention could limit this phenomenon by limiting to a restricted set of nodes. The effect of hypergraph regularization, while also smoothing, happens on a local scale as part of a direct optimization objective and does not accumulate with increasing number of layers. Such decoupling between attention and local smoothing allows SHINE to better explore the optimization landscape.

Our work has limitations. We assumed that the hyperedges are known in advance. However, in reality, as our domain knowledge increases and evolves, we need to account for unknown hyperedges and, better, simultaneously discover novel hyperedges from data while predicting disease classes. Such a task has important clinical utilities in genetic medicine to discover new genetic pathways that may underlie disease etiology, and will be our future work. Moreover, strongly dual attention explores the hypergraph propagation from a different angle than both AllDeepSets and AllSetTransformer, and effectively combining both angles could be an interesting future direction. Another line of future work is to derive a hypergraph coarsening model on top of SHINE. SHINE currently has flat hypergraph layout and does not learn hierarchical representations of hypergraphs. The emerging technique of spatial transcriptomics can enable discovery of localized and hierarchical gene expression patterns (Zeng et al., 2022; Rao et al., 2021). A flexible hypergraph coarsening model that can effectively learn hierarchical network structure out of the hypergraphs can shed light on the organizations of the hyperedges (e.g., pathways representing synergistic molecular functions in certain tissue context).

From the application point of view, detecting tumor subtypes is often interesting, and we expect to extend our method to such detections using multi-modal data when large shared datasets will become available (Kline et al., 2022). To certain extent, the TCGA labels we used reflect subtypes of organ-specific primary tumors, e.g., LUAD vs. LUSC in lung cancer, KIRC vs. KIRP in kidney cancer. On the other hand, identifying drivers genes and pathways for cancer types and other disease subtypes continue to be biologically important (Bailey et al., 2018) and will be increasingly fruitful with simultaneously collected deep genetic and phenotypic data on the same patients (Luo et al., 2019; Ritchie et al., 2015).

The field of genetic medicine encompasses areas of molecular biology and clinical phenotyping to explore new relationships between disease susceptibility and human genetics. Though appearing as a single field, it revolutionizes the practice of medicine in preventing, modifying and treating many diseases such as cardiovascular disease and cancer (Green et al., 2020). We expect SHINE to be a useful tool in the quest of broadly advancing the knowledge on disease susceptibility. In these real-world applications, a subject’s genetic profile may contain individual characterizing information. Thus, this work should never be used in violation of an individual’s privacy, and the necessary steps of IRB review and execution of data user agreement need to be properly completed prior to the study.

7 Conclusions

We proposed a novel framework termed SubHypergraph Inductive Neural nEtwork (SHINE) for inductive subgraph inferencing on hypergraphs, designed for jointly optimizing the objectives of end-to-end subgraph classification and similarity regularization for representations of hypergraph nodes with similar context of hyperedges. We showed that SHINE improved the performance (micro-F1) of the learned model for disease type prediction for complex (20 classes) genetic medicine datasets of different characteristics and under different settings (e.g., multi-class and/or multi-label). Genetic pathways directly correspond to molecular mechanisms and functions, which are more informative than individual genes and are represented as hyperedges in SHINE. The novel formulation of disease classification as a subgraph inferencing problem allows a hypergraph neural network to link correlated pathways, i.e., interacting molecular mechanisms, to disease etiology. This leads to better performance with added interpretability. We compared SHINE with a wide array of state-of-the-art (hyper)graph neural networks, XGBoost, NMF, and PRS models with different configurations of genes and pathways as features. SHINE consistently outperformed all state-of-the-art baselines significantly in each of the disease classification and cancer classification tasks. Feature analysis of the learned pathway groups that are automatically identified by SHINE in a data-driven fashion offered significant clinical insights about multiple molecular mechanisms that interact and are associated with disease types and status.

This work was supported in part by NIH grants R01LM013337 and U01TR003528.

Appendix A Appendix for SHINE: SubHypergraph Inductive Neural nEtwork

Datasets details.

In this section, we give additional details on the datasets used in this paper. The DisGeNet dataset is a collection of mutated genes involved in human diseases compiled from expert curated repositories, GWAS catalogs, animal models and the scientific literature. Each disease is labeled with one or more of 22 MeSH codes, and the task is a multi-class multi-label classification problem. We used 6:2:2 train:validation:test partition, and the split distribution is shown in Table 5. The DisGeNet dataset has 6226 pathways and 9133 genes involved in 8383 diseases in total. The TCGA-MC3 dataset records somatic mutations for subjects in The Cancer Genome Atlas (TCGA). The genetic variants are stored in a specially formatted file. A row in the file specifies a particular variant (e.g., Single Nucleotide Polymorphism or insertion/deletion), its chromosomal location, and what proportion of the sequencing reads covering that chromosomal location have that variant, among other characteristics. Each subject is labeled with one or more of 25 cancer types, and the task is a multi-class classification problem. We used 6:2:2 train:validation:test partition, stratified by cancer types, and the split distribution is shown in Table 6. The TCGA-MC3 dataset has 6229 pathways and 18059 genes involved in 9012 subjects in total.

Genetic pathways.

Genetic pathways are a valuable tool to assist in representing, understanding, and analyzing the complex interactions between molecular functions. The pathways contain multiple genes (can be modeled using hyperedges) and correspond to genetic functions, including regulations, genetic signaling, and metabolic interactions. They have a wide range of applications, including predicting cellular activity and inferring disease types and status (Alon, 2006). For a simplified and illustrative example, a signaling pathway p1 (having 20 genes) sensing the environment may govern (the governing function embodied as a pathway p2 having 15 genes) the expression of transcription factors in another signaling pathway p3 (having 23 genes), which then controls (the controlling function embodied as a pathway p4 having 34 genes) the expression of proteins that play roles as enzymes in a metabolic pathway p5 (having 57 genes). In general, there will be partial overlap between pathways p1 and p2, p2 and p3, p3 and p4, p4 and p5, and other potential partial overlaps corresponding to partial overlaps between their corresponding hyperedges.

MeSH Description Total Train Val Test
C01 Infections 221 135 45 41
C04 Neoplasms 1010 626 190 194
C05 Musculoskeletal Diseases 1266 765 239 262
C06 Digestive System Diseases 430 238 91 101
C07 Stomatognathic Diseases 242 156 50 36
C08 Respiratory Tract Diseases 235 137 52 46
C09 Otorhinolaryngologic Diseases 299 188 55 56
C10 Nervous System Diseases 2960 1769 619 572
C11 Eye Diseases 756 470 150 136
C12 Male Urogenital Diseases 537 337 102 98
C13 Female Urogenital Diseases and 640 402 118 120
Pregnancy Complications
C14 Cardiovascular Diseases 746 441 147 158
C15 Hemic and Lymphatic Diseases 624 392 108 124
C16 Congenital, Hereditary, and Neonatal 3648 2168 725 755
Diseases and Abnormalities
C17 Skin and Connective Tissue Diseases 789 459 142 188
C18 Nutritional and Metabolic Diseases 1277 725 271 281
C19 Endocrine System Diseases 535 327 107 101
C20 Immune System Diseases 415 249 87 79
C23 Pathological Conditions, Signs and Symptoms 1795 1065 387 343
C25 Chemically-Induced Disorders 135 80 29 26
F01 Behavior and Behavior Mechanisms 267 164 62 41
F03 Mental Disorders 501 295 123 83
Table 5: Statistics of DisGeNet experiment data. The table includes the distribution of the 22 MeSH categories with more than 100 diseases. The dataset is split into a training set, a validation set and a test set according to a 6:2:2 ratio.
Cancer Description Total Train Val Test
BLCA Bladder Urothelial Carcinoma 411 247 82 82
BRCA Breast invasive carcinoma 791 475 158 158
CESC Cervical squamous cell carcinoma 289 173 58 58
and endocervical adenocarcinoma
COAD Colon adenocarcinoma 288 173 57 58
ESCA Esophageal carcinoma 184 110 37 37
GBM Glioblastoma multiforme 309 185 62 62
HNSC Head and Neck squamous cell carcinoma 507 304 102 101
KIRC Kidney renal clear cell carcinoma 368 220 74 74
KIRP Kidney renal papillary cell carcinoma 281 169 56 56
LAML Acute Myeloid Leukemia 137 83 27 27
LGG Brain Lower Grade Glioma 510 306 102 102
LIHC Liver hepatocellular carcinoma 363 217 73 73
LUAD Lung adenocarcinoma 512 307 103 102
LUSC Lung squamous cell carcinoma 480 288 96 96
OV Ovarian serous cystadenocarcinoma 409 245 82 82
PAAD Pancreatic adenocarcinoma 175 105 35 35
PCPG Pheochromocytoma and Paraganglioma 178 107 35 36
PRAD Prostate adenocarcinoma 493 295 99 99
SARC Sarcoma 236 142 47 47
SKCM Skin Cutaneous Melanoma 466 280 93 93
STAD Stomach adenocarcinoma 438 262 88 88
TGCT Testicular Germ Cell Tumors 128 77 25 26
THCA Thyroid carcinoma 490 294 98 98
THYM Thymoma 122 74 24 24
UCEC Uterine Corpus Endometrial Carcinoma 447 268 90 89
Table 6: Statistics of TCGA-MC3 experiment data. The table includes the distribution of the 25 cancer types with more than 100 subjects. The dataset is split into a training set, a validation set and a test set according to a 6:2:2 ratio.

Genetic variant calling and filtering for TCGA-MC3 dataset.

The variants are usually of high dimensionality. For example, in the TCGA-MC3 dataset, even after we retain only the variants that received PASS identifiers, there are still around 3 million variants. Thus, we choose to aggregate their counts according to the affected genes to avoid impractically large matrices. We aggregate genetic variant count at gene level and sum up all the alternative allele counts and reference allele counts in a gene. We calculate the mutation rate for a gene as in equation 14,

(14)

where variant belongs to the gene , is the read depth supporting the variant (alternative) allele in tumor sequencing data and is the read depth supporting the reference allele (non-mutated) in tumor sequencing data.

Parameter and hyperparameter tuning for models.

For SHINE and other hypergraph methods, the hyperparameter of hidden dimension is tuned on the validation dataset with choices from 100 to 1000, at increments of 100. Deep neural network models are often randomly initialized, thus we also run initialization 10 times and report the averages and standard deviations. For the comparison hypergraph neural network models, we used the implementations by the original authors. The hyperparameters were tuned on the validation set using choice grids according to respective papers, or when unspecified, from default grids as with our proposed method (learning rate , weight decay , dropout rate ). For PRS, the regularization coefficient

is tuned on the validation dataset with choices from a geometric sequence from 0.001 to 1000 at a multiplying ratio of 10. For NMF, the number of factors is tuned on the validation dataset with choices from 100 to 1000, at increments of 100. For XGBoost, we tuned max tree depth (3, 5, 10), the number of estimators (from 100 to 1000, at increments of 100), and min child weight (0.01, 0.1, 1, 10, 100), using the validation set. For models requiring random initialization, we run initializations 10 times with different seeds and report the averages and standard deviations. We varied the number K of layers from 1 to 4, and found that 2 layers to give the best results for SHINE.

Regarding sensitivity to the hidden dimension, in general, the performance is less sensitive to the hidden dimensions when it is sufficiently big (300), with <0.05 change in micro-F1 score. Smaller hidden dimensions (100-200) can lead to >0.05 micro-F1 drop, likely due to insufficient representation power. The optimal hidden dimension is 300 for the TCGA-MC3 dataset and 600 for the DisGeNet dataset. The performance also shows <0.05 change in micro-F1 score when varying other hyperparameters including learning rate, weight decay, dropout rate in their respective grids as specified above.

Computational complexity.

The complexity of SHINE scales as the following factors grow: the numbers of layers and nodes, the number and size of hyperedges, the size of hidden dimensions, and finally the number and size of subhypergraphs. We implement SHINE on PyTorch, and run it on NVIDIA V100 GPUs. We train SHINE for up to 6000 epochs using Adam 

(Kingma and Ba, 2015) and stop training if the validation loss does not decrease for 10 consecutive epochs. The TCGA-MC3 dataset’s training times are: MLP 5 min, HyperGCN 7 min, AllSetTransformer 20 min, AllDeepSet 20 min, SHINE 30 min, HGNN 30 min, HyperGAT 30 min, SubGNN >1 day (excluding prebuild time). The DisGeNet dataset’s training times are: MLP 5 min, HyperGCN 6 min, AllDeepSet 13 min, HGNN 15 min, HyperGAT 15 min, AllSetTransformer 16 min, SHINE 20 min, SubGNN >1 day (excluding prebuild time).

Ablation study.

To investigate the contribution of key components (e.g., the strictly dual attention massage passing, the usage of hypergraph regularization) in the proposed algorithm to the overall method, we performed an ablation analysis. The previous state-of-the-art hypergraph neural network models in fact serve as some of the steps in the ablation. For example, HyperGAT does not have strict dual attention message passing and does not employ hypergraph regularization. HGNN and HyperGCN apply hypergraph convolution instead of attention message passing. HyperGCN, compared to HGNN, applies approximate hypergraph convolution by representing a hyperedge by a selected pairwise simple edge connecting two most unlike nodes, and adding the remaining nodes in the hyperedge as mediators. To evaluate the efficacy of the weighted subgraph attention (WSA), we consider a subgraph simply the sum of the nodes (genes) that are of interest (with mutations) for each patient (subgraph). Finally, we added SHINE with no hypergraph regularization to evaluate the regularization effectiveness. The ablation analysis results are shown in Table 7. From the results, it is clear that SHINE’s strictly dual attention message passing outperforms HyperGAT without strictly dual attention message passing. We can see that adding hypergraph regularization further improves performance, in fact, with improvement beyond standard deviation intervals of the regularization-ablated model on both datasets. The weighted subgraph attention (WSA) ablation leads to a larger performance drop than hypergraph regularization ablation, which corroborates the importance of the WSA step. We also notice that the performance drop due to WSA ablation on the TCGA-MC3 dataset is larger than that on the DisGeNet dataset. This is consistent with the fact that the TCGA-MC3 dataset has denser hypergraph and larger subgraphs than the DisGeNet dataset. This is also consistent with the fact that differentiating among cancer subtypes is a more complex and nuanced task than differentiating among disease categories. These observations collectively argue for the benefits of weighted subgraph attention over direct aggregation such as sum, and more increasingly so for larger datasets and more complex tasks.

Model DisGeNet Dataset TCGA-MC3 Dataset
Metrics Test Micro F1 Test Micro F1
HyperGCN (approx. hypergraph convolution) 0.6638 0.0028 0.4384 0.0095
HGNN (hypergraph convolution) 0.6809 0.0027 0.4504 0.0042
HyperGAT (not strictly dual attention) 0.6495 0.0050 0.4721 0.0032
SHINE without weighted subgraph attention 0.6472 0.0053 0.4388 0.0091
SHINE without hypergraph regularization 0.6829 0.0059 0.5247 0.0048
SHINE 0.6955 0.0034 0.5319 0.0049
Table 7: Ablation Analysis: Held-out test set micro-F1 on real-world datasets. Standard deviations are provided from runs with 10 random seeds. SHINE significantly outperforms all the state-of-the-art comparison models. Best model in bold.

Model interpretation.

SHINE simultaneously learns the representations of nodes and hyperedges, which are then used to learn and inductively infer subgraph representations. This brings model interpretation advantages as it allows assessing pathways (hyperedges) correlations and reasoning multiple molecular functions mutually interacting and collectively contributing to the disease onset. We identify the top pathways that are enriched in different cancers using the attention weights learned for SHINE, as shown in Table 4. From the table, we see that many of the listed pathways reflect innate key events in the development of individual or multiple types of cancers, consistent with genetic and medical knowledge from wet lab (e.g., TNF/Stress Related Signaling (Mercogliano et al., 2020)).

For breast cancer, TNF is not only closely involved in its onset, progression and in metastasis formation, but also linked to therapy resistance (Mercogliano et al., 2020). Regarding the 4-1BB pathway, studies have suggested HER2/4-1BB bispecific molecule as a candidate of alternative therapeutic strategy to patients in HER2-positive breast cancer (Hinner et al., 2019). VIP/PACAP and their receptors have prominent roles in transactivation of the Epidermal growth factor (EGF) family and growth effects in breast cancer (Moody et al., 2016). For lung cancer, the ErbB3 receptor recycling controlled by neuroregulin receptor degradation protein-1 is linked to lung cancer and small inhibitory RNA (siRNA) to ErbB3 shows promise as a therapeutic approach to treatment of lung adenocarcinoma (Sithanandam and Anderson, 2008). Lung cancer is also modulated by multiple miRNAs interacting with the TFAP2 family (Kołat et al., 2019). For lower-grade gliomas, recent studies have reported the association between DNA demethylation and their malignant progressions (Nomura et al., 2019). Emerging evidence has also linked the citric acid (TCA) cycle for energy production to fuel the development of certain cancer types, especially those with deregulated oncogene and tumor suppressor expression (Anderson et al., 2018). For head and neck cancer, studies have reported a high percentage of cases with MECP2 copy-number gain and in combination with RAS mutation or amplification (Neupane et al., 2016). The apoptotic signaling and response pathways involving the mitochondrial pro-apoptotic protein SMAC/Diablo have also been suggested to regulate lipid synthesis that is essential for cancer growth and development (Paul et al., 2018).

Of note, the pathways listed in Table 4 for each cancer type play roles in different phases of cancer onset, growth or metastasis, and likely function together in tumorigenesis and progression, as discovered by SHINE. These analyses suggest that besides providing useful and discriminative features, SHINE integrates gene and pathway data to provide insights into functional and molecular mechanisms by linking together multiple pathways that may function together and contribute to cancer development and progression.

Relevance and impact.

The techniques and results presented in the paper could apply to many diseases through informing genetic medicine practice. In these real-world applications, a subject’s genetic profile may contain individual characterizing information. Thus, this work, or derivatives of it, should never be used in violation of an individual’s privacy. For using individual level dataset such as the TCGA-MC3, the proper steps of IRB review of study and execution of data user agreement need to be properly completed prior to the study, such as done by this study.

It is important for the machine learning (ML) community to continue being informed about the problems arising in critical application domains such as healthcare and biomedicine that can guide model design. More specifically, explicitly treating hyperedges as first class citizens in the GNN modelling is important, since in this way hyperedges can be the subjects of notions of regularization or attention. This article demonstrated the feasibility to address those needs with our practical considerations of design and implementation choices by SHINE to advance modern genetic medicine study. We have demonstrated successful applications of SHINE on large-scale genetic medicine datasets, including the TCGA-MC3 dataset that is one of the largest NIH dbGaP datasets. Genetic medicine revolutionizes the practice of medicine in preventing, modifying and treating many diseases such as cardiovascular disease and cancer. In the future, as even larger genetic datasets will be collected through NIH programs such as All of Us and TopMed, we expect SHINE to be a useful tool in the quest of broadly advancing the knowledge on disease susceptibility.

References

  • U. Alon (2006) An introduction to systems biology: design principles of biological circuits. Chapman and Hall/CRC. Cited by: Appendix A.
  • E. Alsentzer, S. G. Finlayson, M. M. Li, and M. Zitnik (2020) Subgraph neural networks. arXiv preprint arXiv:2006.10538. Cited by: §1, §2, 5th item.
  • N. M. Anderson, P. Mucka, J. G. Kern, and H. Feng (2018) The emerging role and targetability of the tca cycle in cancer metabolism. Protein & cell 9 (2), pp. 216–237. Cited by: Appendix A.
  • M. H. Bailey, C. Tokheim, E. Porta-Pardo, S. Sengupta, D. Bertrand, A. Weerasinghe, A. Colaprico, M. C. Wendl, J. Kim, B. Reardon, et al. (2018) Comprehensive characterization of cancer driver genes and mutations. Cell 173 (2), pp. 371–385. Cited by: §6.
  • T. Chen and C. Guestrin (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794. Cited by: 9th item.
  • E. Chien, C. Pan, J. Peng, and O. Milenkovic (2021) You are allset: a multiset function framework for hypergraph neural networks. arXiv preprint arXiv:2106.13264. Cited by: §2, 4th item.
  • S. W. Choi, T. S. Mak, and P. F. O’Reilly (2020) Tutorial: a guide to performing polygenic risk score analyses. Nature Protocols 15 (9), pp. 2759–2772. Cited by: 7th item.
  • K. Ding, J. Wang, J. Li, D. Li, and H. Liu (2020) Be more with less: hypergraph attention networks for inductive text classification. arXiv preprint arXiv:2011.00387. Cited by: §1, §2, 3rd item.
  • D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams (2015) Convolutional networks on graphs for learning molecular fingerprints. In NeurIPS, pp. 2224–2232. Cited by: §2.
  • K. Ellrott, M. H. Bailey, G. Saksena, K. R. Covington, C. Kandoth, C. Stewart, J. Hess, S. Ma, K. E. Chiotti, M. McLellan, et al. (2018) Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell systems 6 (3), pp. 271–281. Cited by: §4.2.
  • F. Feng, X. He, Y. Liu, L. Nie, and T. Chua (2018) Learning on partial-order hypergraphs. In Proceedings of the 2018 World Wide Web Conference, pp. 1523–1532. Cited by: §2.
  • Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao (2019) Hypergraph neural networks. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Vol. 33, pp. 3558–3565. Cited by: §1, §2, 1st item.
  • X. Fu, J. Zhang, Z. Meng, and I. King (2020) Magnn: metapath aggregated graph neural network for heterogeneous graph embedding. In Proceedings of The Web Conference 2020, pp. 2331–2341. Cited by: §2.
  • Y. Fu, Y. Xiong, S. Y. Philip, T. Tao, and Y. Zhu (2019) Metapath enhanced graph attention encoder for hins representation learning. In 2019 IEEE International Conference on Big Data (Big Data), pp. 1103–1110. Cited by: §2.
  • J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl (2017) Neural message passing for quantum chemistry. In ICML, pp. 1263–1272. Cited by: §2.
  • E. D. Green, C. Gunter, L. G. Biesecker, V. Di Francesco, C. L. Easter, E. A. Feingold, A. L. Felsenfeld, D. J. Kaufman, E. A. Ostrander, W. J. Pavan, et al. (2020) Strategic vision for improving human health at the forefront of genomics. Nature 586 (7831), pp. 683–692. Cited by: §6.
  • W. L. Hamilton (2020) Graph representation learning. Synthesis Lectures on Artifical Intelligence and Machine Learning 14 (3), pp. 1–159. Cited by: §2.
  • M. J. Hinner, R. S. B. Aiba, T. J. Jaquin, S. Berger, M. C. Dürr, C. Schlosser, A. Allersdorfer, A. Wiedenmann, G. Matschiner, J. Schüler, et al. (2019) Tumor-localized costimulatory t-cell engagement by the 4-1bb/her2 bispecific antibody-anticalin fusion prs-343. Clinical Cancer Research 25 (19), pp. 5878–5889. Cited by: Appendix A, §5.
  • Z. Hu, Y. Dong, K. Wang, and Y. Sun (2020) Heterogeneous graph transformer. In Proceedings of The Web Conference 2020, pp. 2704–2710. Cited by: §2.
  • K. Huang and M. Zitnik (2020) Graph meta learning via local subgraphs. Advances in Neural Information Processing Systems 33. Cited by: §2.
  • J. Jiang, Y. Wei, Y. Feng, J. Cao, and Y. Gao (2019) Dynamic hypergraph neural networks.. In IJCAI, pp. 2635–2641. Cited by: §2.
  • T. Jin, L. Cao, B. Zhang, X. Sun, C. Deng, and R. Ji (2019) Hypergraph induced convolutional manifold networks.. In IJCAI, pp. 2670–2676. Cited by: §2.
  • D. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: Appendix A.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
  • A. Kline, H. Wang, Y. Li, S. Dennis, M. Hutch, Z. Xu, F. Wang, F. Cheng, and Y. Luo (2022) Multimodal machine learning in precision health. arXiv preprint arXiv:2204.04777. Cited by: §6.
  • D. Kołat, Ż. Kałuzińska, A. K. Bednarek, and E. Płuciennik (2019) The biological characteristics of transcription factors ap-2 and ap-2 and their importance in various types of cancers. Bioscience reports 39 (3). Cited by: Appendix A, §5.
  • Y. Li, R. Jin, and Y. Luo (2019)

    Classifying relations in clinical narratives using segment graph convolutional and recurrent neural networks (seg-gcrns)

    .
    Journal of the American Medical Informatics Association 26 (3), pp. 262–268. Cited by: §2.
  • Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel (2015) Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493. Cited by: §3.4.
  • A. Liberzon, C. Birger, H. Thorvaldsdóttir, M. Ghandi, J. P. Mesirov, and P. Tamayo (2015) The molecular signatures database hallmark gene set collection. Cell systems 1 (6), pp. 417–425. Cited by: §1, §3.1.
  • Y. Luo, C. Mao, Y. Yang, F. Wang, F. Ahmad, D. Arnett, M. Irvin, and S. Shah (2019) Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization.. Bioinformatics (Oxford, England) 35 (8), pp. 1395–1403. Cited by: §6.
  • Y. Luo and C. Mao (2021)

    PANTHER: pathway augmented nonnegative tensor factorization for higher-order feature learning

    .
    In Proceedings of the AAAI conference on artificial intelligence, Vol. 35, pp. 371–380. Cited by: §1.
  • C. Mao, L. Yao, and Y. Luo (2019) Medgcn: graph convolutional networks for multiple medical tasks. arXiv preprint arXiv:1904.00326. Cited by: §2.
  • C. Mao, L. Yao, and Y. Luo (2022) Imagegcn: multi-relational image graph convolutional networks for disease identification with chest x-rays. IEEE Transactions on Medical Imaging. Cited by: §2.
  • C. Meng, S. C. Mouli, B. Ribeiro, and J. Neville (2018) Subgraph pattern neural networks for high-order graph evolution prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §1, §2.
  • M. F. Mercogliano, S. Bruni, P. V. Elizalde, and R. Schillaci (2020) Tumor necrosis factor blockade: an opportunity to tackle breast cancer. Frontiers in oncology 10. Cited by: Appendix A, Appendix A, §5.
  • T. W. Moody, B. Nuche-Berenguer, and R. T. Jensen (2016) VIP/pacap, and their receptors and cancer. Current opinion in endocrinology, diabetes, and obesity 23 (1), pp. 38. Cited by: Appendix A, §5.
  • M. Neupane, A. P. Clark, S. Landini, N. J. Birkbak, A. C. Eklund, E. Lim, A. C. Culhane, W. T. Barry, S. E. Schumacher, R. Beroukhim, et al. (2016) MECP2 is a frequently amplified oncogene with a novel epigenetic mechanism that mimics the role of activated ras in malignancy. Cancer discovery 6 (1), pp. 45–58. Cited by: Appendix A.
  • M. Nomura, K. Saito, K. Aihara, G. Nagae, S. Yamamoto, K. Tatsuno, H. Ueda, S. Fukuda, T. Umeda, S. Tanaka, et al. (2019) DNA demethylation is associated with malignant progression of lower-grade gliomas. Scientific reports 9 (1), pp. 1–12. Cited by: Appendix A.
  • A. Paul, Y. Krelin, T. Arif, R. Jeger, and V. Shoshan-Barmatz (2018) A new role for the mitochondrial pro-apoptotic protein smac/diablo in phospholipid synthesis associated with tumorigenesis. Molecular Therapy 26 (3), pp. 680–694. Cited by: Appendix A.
  • H. Peng, J. Li, Y. He, Y. Liu, M. Bao, L. Wang, Y. Song, and Q. Yang (2018) Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In WWW, pp. 1063–1072. Cited by: §2.
  • J. Piñero, À. Bravo, N. Queralt-Rosinach, A. Gutiérrez-Sacristán, J. Deu-Pons, E. Centeno, J. García-García, F. Sanz, and L. I. Furlong (2016) DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research 45 (D1), pp. D833–D839. External Links: ISSN 0305-1048, Document, Link, https://academic.oup.com/nar/article-pdf/45/D1/D833/8847238/gkw943.pdf Cited by: §4.1.
  • A. Rao, D. Barkley, G. S. França, and I. Yanai (2021) Exploring tissue architecture using spatial transcriptomics. Nature 596 (7871), pp. 211–220. Cited by: §6.
  • M. D. Ritchie, E. R. Holzinger, R. Li, S. A. Pendergrass, and D. Kim (2015) Methods of integrating data to uncover genotype–phenotype interactions. Nature Reviews Genetics 16 (2), pp. 85–97. Cited by: §6.
  • S. N. Satchidanand, H. Ananthapadmanaban, and B. Ravindran (2015) Extended discriminative random walk: a hypergraph approach to multi-view multi-relational transductive learning.. In IJCAI, pp. 3791–3797. Cited by: §2.
  • G. Sithanandam and L. Anderson (2008) The erbb3 receptor in cancer and cancer gene therapy. Cancer gene therapy 15 (7), pp. 413–448. Cited by: Appendix A, §5.
  • G. L. Stein-O’Brien, R. Arora, A. C. Culhane, A. V. Favorov, L. X. Garmire, C. S. Greene, L. A. Goff, Y. Li, A. Ngom, M. F. Ochs, et al. (2018) Enter the matrix: factorization uncovers knowledge from omics. Trends in Genetics 34 (10), pp. 790–805. Cited by: 8th item.
  • Q. Sun, J. Li, H. Peng, J. Wu, Y. Ning, P. S. Yu, and L. He (2021) Sugar: subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. arXiv preprint arXiv:2101.08170. Cited by: §2.
  • K. Tu, P. Cui, X. Wang, F. Wang, and W. Zhu (2018) Structural deep embedding for hyper-networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
  • O. Vinyals, S. Bengio, and M. Kudlur (2015) Order matters: sequence to sequence for sets. arXiv preprint arXiv:1511.06391. Cited by: §2.
  • X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and S. Y. Philip (2022) A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data. Cited by: 5th item.
  • X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu (2019) Heterogeneous graph attention network. In The world wide web conference, pp. 2022–2032. Cited by: §2.
  • X. Wei, R. Yu, and J. Sun (2020) View-gcn: view-based graph convolutional network for 3d shape analysis. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    ,
    pp. 1850–1859. Cited by: §2.
  • N. Yadati, M. Nimishakavi, P. Yadav, V. Nitin, A. Louis, and P. Talukdar (2018) HyperGCN: a new method of training graph convolutional networks on hypergraphs. arXiv preprint arXiv:1809.02589. Cited by: §1, §2, 2nd item.
  • N. Yadati (2020) Neural message passing for multi-relational ordered and recursive hypergraphs. Advances in Neural Information Processing Systems 33. Cited by: §1.
  • L. Yao, C. Mao, and Y. Luo (2019) Graph convolutional networks for text classification. In AAAI, Cited by: §2.
  • R. Ying, J. You, C. Morris, X. Ren, W. L. Hamilton, and J. Leskovec (2018) Hierarchical graph representation learning with differentiable pooling. arXiv preprint arXiv:1806.08804. Cited by: §2.
  • Z. Zeng, Y. Li, Y. Li, and Y. Luo (2022) Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome biology 23 (1), pp. 1–23. Cited by: §6.
  • C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla (2019) Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 793–803. Cited by: §2.
  • M. Zhang, Z. Cui, S. Jiang, and Y. Chen (2018) Beyond link prediction: predicting hyperlinks in adjacency space. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
  • R. Zhang, Y. Zou, and J. Ma (2020) Hyper-sagnn: a self-attention based graph neural network for hypergraphs. In International Conference on Learning Representations, Cited by: §1, §2.
  • D. Zhou, J. Huang, and B. Schölkopf (2006) Learning with hypergraphs: clustering, classification, and embedding. Advances in neural information processing systems 19, pp. 1601–1608. Cited by: §2, §3.3.
  • M. Zitnik, M. Agrawal, and J. Leskovec (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34 (13), pp. i457–i466. Cited by: §2.