1 Introduction
Graphs provide a mathematical structure for describing relationships between objects in a system. Owing to their intuitive representation, wellunderstood theoretical properties, the wealth of the algorithmic methodology and available code base, graphs have also become a major framework for modeling biological systems. Proteinprotein interaction networks, protein 3D structures, drugtarget interaction networks, metabolic networks and gene regulatory networks are some of the major representations of biological systems. Unfortunately, molecular and cellular systems are only partially observable and may contain significant amount of noise due to their inherent stochastic nature as well as the limitations of both lowthroughput and highthroughput experimental techniques. This highlights the need for the development and application of computational approaches for predictive modeling (e.g., inferring novel interactions) and identifying interesting patterns in such data.
Learning on graphs can be generally seen as supervised or unsupervised. Under a supervised setting, typical tasks involve graph classification; i.e., the assignment of class labels to entire graphs [55], vertex or edge classification; i.e., the assignment class labels to vertices or edges in a single graph [30], or link prediction; i.e., the prediction of the existence of edges in graphs [34]. Alternatively, frequent subgraph mining [27], motif finding [38], clustering [1], and community detection [17] are traditional unsupervised approaches. Regardless of the category, the development of techniques that capture local/global network structure, measure graph similarity and incorporate domainspecific knowledge in a principled manner lie at the core of all these problems.
The focus of this study is on classification problems across various biological networks. A straightforward approach to this problem is the use of topological and other descriptors (e.g., vertex degree, clustering coefficient, betweenness centrality) that summarize graph neighborhoods. These descriptors straightforwardly lead to vectorspace representations of vertices or edges in the graph, after which standard machine learning algorithms can be applied to learn a target function
[11, 62]. Another approach involves the use of kernel functions on graphs [58]. Kernels are mappings of pairs of objects from an input space to an output space with special properties, such as symmetry and positive semidefiniteness, that lead to efficient learning. Graph kernels often exploit similar ideas as traditional vectorspace approaches. Finally, classification on graphs can be approached using probabilistic graphical models such as Markov Random Fields [30] and related labelpropagation [66] or flowbased [40] methods. These “global” formulations are generally well adjusted to learning smooth functions over neighboring nodes.Despite the success and wide adoption of these methods in machine learning and computational biology, it is wellunderstood that graph representations suffer from information loss since every edge can only encode pairwise relationships [29]. A protein complex, for instance, cannot be distinguished from a set of proteins that interact only pairwise. Such disambiguation, however, is important in order to understand the biological activity of these molecules. Hypergraphs, a generalization of graphs, naturally capture these higherorder relationships [5]. As we show later, they also provide a representation that can be used to unify several conventional classification problems on (hyper)graphs as a single vertex classification approach on hypergraphs.
In this paper, we present and evaluate a kernelbased framework for the problems of vertex classification, edge classification and link prediction in graphs and hypergraphs. We first use the concepts of hypergraph duality to demonstrate that all such classification problems can be unified through the use of hypergraphs. We then describe the development of editdistance hypergraphlet kernels for vertex classification in hypergraphs and combine them with support vector machines into a semisupervised predictive methodology. Finally, we use sixteen biological network data sets, eleven assembled specifically for this work, to provide evidence that the proposed approaches compare favorably to the previously established methods.
2 Background
2.1 Graphs and hypergraphs
Graphs. A graph is a pair , where is a set of vertices (nodes) and is a set of edges. In a vertexlabeled graph, a labeling function is defined as , where is a finite alphabet. Similarly, in an edgelabeled graph, another labeling function is given as , where is also a finite set. A rooted graph is a graph together with one distinguished vertex called the root. We denote such graphs as , where is the root. A neighborhood graph of a vertex is a rooted graph constructed from such that all nodes at distance at least from (and corresponding edges) are removed.
In this work we focus on undirected (the order of the vertices in each edge can be ignored), simple graphs (graphs without selfloops). Additionally, for the simplicity of presentation, we ignore weighted graphs; i.e., graphs where a nonnegative number is associated with each vertex. Generalization of our approach and terminology to directed and weighted graphs is straightforward.
A walk of length in a graph is a sequence of nodes such that , for . If , is called a cycle of length . A path in is a walk in which all nodes are distinct. A connected graph is a graph where there is a path between any two nodes.
Hypergraphs. A hypergraph is a pair , where is the vertex set as previously defined and is a family of nonempty subsets of called hyperedges. As in the case of graphs, one can define a vertexlabeled, edgelabeled, rooted, and neighborhood hypergraphs. A hyperedge is said to be incident with a vertex if and two vertices are called adjacent if there is an edge that contains both vertices. The neighbors of a vertex in a hypergraph are the vertices adjacent to . Two hyperedges are said to be adjacent if their intersection is nonempty. Finally, the degree of a vertex in a hypergraph is given by , whereas the degree of a hyperedge is defined as its cardinality; that is, .
A walk of length in a hypergraph is a sequence of vertices and hyperedges such that for each and . If , is called a cycle of length . A path in a hypergraph is a walk in which all nodes and edges are distinct. A connected hypergraph is a hypergraph where there exists a path between any two nodes.
Isomorphism. Consider two graphs, and . We say that and are isomorphic, denoted as , if there exists a bijection such that if and only if for all . If and are hypergraphs, an isomorphism is defined as interrelated bijections and such that if and only if for all hyperedges . Isomorphic graphs (hypergraphs) are structurally identical. An automorphism is an isomorphism of a graph (hypergraph) to itself.
Edit distance. Consider two vertex and hyperedgelabeled hypergraphs and . The edit distance between these hypergraphs corresponds to the minimum number of edit operations necessary to transform into , where edit operations are defined as insertion/deletion of vertices/hyperedges and substitutions of vertex and hyperedge labels. Any sequence of edit operations that transforms into is referred to as an edit path; hence, the hypergraph edit distance between and corresponds to the length of the shortest edit path between them. This concept can be generalized to the case where each edit operation is assigned a cost. Hypergraph edit distance then corresponds to the edit path of minimum cost.
2.2 Hypergraph duality
Let be a hypergraph, where and . The dual hypergraph of , denoted as , is obtained by constructing the set of vertices as and the set of hyperedges as such that . Figure 1AB shows two examples of a hypergraph and its dual hypergraph representation . Observe that the hyperedges of the original hypergraph are the vertices of the dual hypergraph , whereas the hyperedges of are constructed using the hyperedges of that are incident with the respective vertices.
2.3 Classification on hypergraphs
We are interested in binary classification on hypergraphs. The following paragraphs briefly define three distinct classification problems, formulated here so as to naturally lead to the methodology proposed in the next section.
Vertex classification. Given a set of rooted hypergraphs , where each corresponds to the same, possibly disconnected, hypergraph rooted at a different vertex of interest
. Here, one aims to learn some classifier function
using a labeled training set , where , as a means of assigning class labels to each unlabeled vertex in . A number of classical problems in computational biology map straightforwardly to vertex classification; e.g., protein function prediction [50], disease gene prioritization [39], and so on.Hyperedge classification. Given a possibly disconnected hypergraph , the objective is to learn a discriminant function from a labeled training set , where , and infer class annotations for every unlabeled hyperedge in . An example of edge classification is the prediction of types of macromolecular interactions such as positive vs. negative regulation.
Link prediction. Let be a hypergraph with some missing hyperedges and let be all nonexistent hyperedges in ; i.e., , where represents all possible hyperedges over . The goal is to learn a target function and infer the existence of all missing hyperedges. Examples of link prediction include predicting proteinprotein interactions, predicting drugtarget interactions, and so on.
2.4 Positiveunlabeled learning
A number of prediction problems in computational biology can be considered within a semisupervised framework, where a set of labeled and a set of unlabeled examples are used to construct classifiers that discriminate between positive and negative examples. A special category of semisupervised learning occurs when labeled data contain only positive examples; i.e., where the negative examples are either unavailable or ignored; say, if the set of available negatives is small or biased. Such problems are generally referred to as learning from positive and unlabeled data or positiveunlabeled learning
[14]. Many prediction problems in molecular biology belong to the open world category; i.e., due to various experimental reasons, the absence of evidence of class labels is not the evidence of absence. Such problems lend themselves naturally to the positiveunlabeled setting.Research in machine learning has recently established tight connections between traditional supervised learning and (nontraditional) positiveunlabeled learning. Under mild conditions, a classifier that optimizes the ranking performance; e.g., area under the ROC curve [16], in the nontraditional setting has been shown to also optimize the performance in the traditional setting [15, 6, 37]. Similar relationships have been established in approximating posterior distributions [24, 26] as well as in recovering the true performance accuracy in the traditional setting for a classifier evaluated in a nontraditional setting [25]. The latter two problems require estimation of class priors; i.e., the fractions of positive and negative examples in (representative) unlabeled data [24, 26, 48].
3 Methods
3.1 Problem formulation
We consider binary classification problems on graphs and hypergraphs and propose to unify all such learning problems through semisupervised vertex classification on hypergraphs. First, vertex classification falls trivially into this framework. Second, the problems of edge classification in graphs and hyperedge classification in hypergraphs are equivalent to the problem of vertex classification on dual hypergraphs. As discussed in Section 2.2, both graphs and hypergraphs give rise to dual hypergraph representations and, thus, (hyper)edge classification on a graph straightforwardly translates into vertex classification on its dual hypergraph . We note here that vertices with the degree of one in give rise to selfloops in the dual hypergraph . To account for them, we add one dummy node per selfloop with the same vertex label as the original vertex and connect them with an appropriately labeled edge. Third, one can similarly see link prediction as vertex classification on dual hypergraphs, where the set of existing links is treated as positive data, the set of known nonexisting links is treated as negative data, and the remaining set of missing links is treated as unlabeled data. This formulation further requires an extension of dual hypergraph representations as follows. Consider a particular negative or missing link in the original graph with its dual hypergraph (Fig. 1C). To make a prediction on this edge , we must first introduce a new vertex in the dual hypergraph as well as modify those hyperedges in that correspond to the vertices in (Fig. 1C). We denote this extended hypergraph as . It now easily follows that the sets of negative and unlabeled examples can be created by considering a collection of extended graphs , one at a time, for all nonexisting vertices or a subset thereof.
Since most graph data in biological networks lack large sets of representative negative examples, we approach vertex classification, (hyper)edge classification and link prediction as instances of vertex classification on (extended, dual) hypergraphs in a positiveunlabeled setting. We believe this is a novel and useful attempt at generalizing three distinct graph classification problems in a common kernelbased semisupervised setting. The following sections introduce hypergraphlet kernels that are the core of our classification approach.
3.2 Hypergraphlets
Hypergraphlets. Inspired by graphlets [44, 43], we define hypergraphlets as small, simple, connected, rooted hypergraphs. A hypergraphlet with vertices is called an hypergraphlet; and the th hypergraphlet of order is denoted as . We consider hypergraphlets up to isomorphism and will refer to these isomorphisms as root and labelpreserving isomorphisms when hypergraphs are rooted and labeled. Figure 2 displays all nonisomorphic unlabeled hypergraphlets with up to three vertices. There is only one hypergraphlet of order (; Fig. 2A) and one hypergraphlet of order (; Fig. 2B). On the other hand, there are nine hypergraphlets of order (; Fig. 2C) and hypergraphlets of order (not shown). We refer to all these hypergraphlets as base hypergraphlets since they correspond to the case when .
Consider now a vertex and hyperedgelabeled (or fully labeled for short) hypergraphlet with vertices and hyperedges, where and denote the vertexlabel and hyperedgelabel alphabets, respectively. If and/or , then automorphic structures with respect to the same base hypergraphlet may exist; hence, the number of fully labeled hypergraphlets per base structure is generally smaller than . For example, if one only considers vertexlabeled hypergraphlets, then there are vertexlabeled hypergraphlets corresponding to the asymmetric base hypergraphlets , and but only corresponding to the base hypergraphlets , , , , , ; see Table 5. This is a result of symmetries in the base hypergraphlets that give rise to automorphisms among vertexlabeled structures. Similarly, if , then new symmetries may exist with respect to the base hypergraphlets that give rise to different automorphisms among hyperedgelabeled structures. In Section 8.1, we provide a more detailed discussion on these symmetries. The relevance of these symmetries and enumeration steps relates to the dimensionality of the Hilbert space in which the prediction is carried out.
3.3 Hypergraphlet kernels
Motivated by the case for graphs [52, 56, 35], we introduce hypergraphlet kernels. Let , be a fully labeled hypergraph where is a vertexlabeling function , is a hyperedgelabeling function , and . The vertex and hyperedgelabeled hypergraphlet count vector for any vertex is defined as
(1) 
where is the count of the th fully labeled hypergraphlet and is the total number of vertex and hyperedgelabeled hypergraphlets. A kernel function between the hypergraphlet counts for vertices and is defined as an inner product between and ; i.e.,
(2) 
The hypergraphlet kernel function incorporating all hypergraphlets up to the size is given by
(3) 
where is a small integer. In this work we use due to the exponential growth of the number of base hypergraphlets.
3.4 Editdistance hypergraphlet kernels
Consider a fully labeled hypergraph . Given a vertex , we define the vector of counts for a generalized editdistance hypergraphlet representation as
(4) 
where
(5) 
Here, is the set of all hypergraphlets such that for each there exists an edit path of total cost at most that transforms into and is a userdefined constant. In words, the counts for each hypergraphlet are updated by also counting all other hypergraphlets that are in the vicinity of . The function can be used to adjust the weights of these pseudocounts. We set for all and and the cost of all edit operations was also set to . This restricts to nonnegative integers.
The length editdistance hypergraphlet kernel between vertices and can be computed as an inner product between the respective count vectors and ; i.e.,
(6) 
Finally, the length editdistance hypergraphlet kernel function is given as
(7) 
The edit operations considered here incorporate substitutions of vertex labels, substitutions of hyperedge labels, and insertions/deletions (indels) of hyperedges. Given these edit operations, we also define three subclasses of editdistance hypergraphlet kernels referred to as vertex labelsubstitution , hyperedge labelsubstitution and hyperedgeindel kernels .
Although the functions from Equations (2) and (6
) are defined as inner products, other formulations such as radial basis functions can be similarly considered
[51]. We also note that the combined kernels from Equations (3) and (7) can be generalized beyond linear combinations [51]. For the simplicity of this work, however, we only explore equalweight linear combinations and normalize the functions from Equations (2) and (6) using a cosine transformation.3.5 Computational complexity
The implementation and the analysis of hypergraphlet kernels is an extension of the available solutions for string kernels [49]. Let be a neighborhood hypergraph, as defined in Section 2.1 and suppose it is significantly smaller than the original hypergraph . The hypergraphlet counting algorithm takes steps, where is the maximum degree of a vertex. Similarly, the generation of the minimum cost edit path takes per single hypergraphlet edit operation. Therefore, for each vertex an order of
operations are necessary, where the term enumerates possible hypergraphlets in . Note that the possible number of edges in a hypergraph can be significantly larger than the possible number of edges in a standard graph. Hence, in a practical setting, the edit distance hypergraphlet kernels could greatly benefit from effective sampling techniques or exploitation of special types of hypergraphlets. The proposed implementation for computing hypergraphlet kernel functions is computed in time linear in the number of nonzero elements.
4 Experiment design
In this section we summarize classification problems, data sets, and evaluation methodology. The hypergraphlet kernels were evaluated on the problems of edge classification and link prediction, both of which require generation of dual hypergraphs followed by the subsequent vertex classification approach.
4.1 Data sets
Proteinprotein interaction data. The proteinprotein interaction (PPI) data was used for both edge classification and link prediction. In the context of edge classification, we are given a PPI network where each interaction is annotated as either direct physical interaction or a comembership in a complex. The objective is to predict the type of each interacting protein pair as physical vs. complex (PC). For this task, we used the budding yeast S. cerevisiae PPI network assembled by BenHur and Noble [4].
Another important task in PPI networks is discovering whether two proteins interact. Despite the existence of highthroughput experimental methods for determining interactions between proteins, the PPI network data of all organisms is incomplete [59]. Furthermore, highthroughput PPI data contains a potentially large fraction of false positive interactions [59, 19, 33]. Therefore, there is a continued need for computational methods to help guide experiments for identifying novel interactions. Under this scenario, there are two classes of link prediction algorithms: (1) prediction of direct physical interactions [18, 47, 36, 4] and (2) prediction of comembership in a protein complex [64, 46]. In this paper, we focused on the former task and assembled nine speciesspecific data sets comprised solely of direct proteinprotein interaction data derived from public databases (BIND, BioGRID, DIP, HPRD, and IntAct) as of January 2017. We considered only one protein isoform per gene and used experimental evidence types described by Lewis et al. [33]. Specifically, we constructed link prediction tasks for: (1) bacterium E. coli (EC), (2) budding yeast S. cerevisiae (SC), (3) nematode worm C. elegans (CE), (4) thale cress A. thaliana (AT), (5) fruit fly D. melanogaster (DM), (6) human H. sapiens (HS), (7) fission yeast S. pombe (SP), (8) brown rat R. norvegicus (RN), and (9) house mouse M. musculus (MM).
Drugtarget interaction data. Identification of interactions between drugs and target proteins is an area of growing interest in drug design and therapy [63, 61]. In a drugtarget interaction (DTI) network, nodes correspond to either drugs or proteins and edges indicate that a protein is a known target of the drug. Here we used DTI data for both edge classification and link prediction. In the context of edge labeling, we are given a DTI network where each interaction is annotated as direct (binding) or indirect, as well as assigned modes of action as activating or inhibiting. The objective is to predict the type of each interaction between proteins and drug compounds. For this task, we derived two data sets: (1) indirect vs. direct (ID) binding derived from MATADOR, and (2) activation vs. inhibition (AI) assembled from STITCH. Under link prediction setting, the learning task is to predict drugtarget protein interactions. In particular, we focus on four drugtarget classes: (1) enzymes (EZ), (2) ion channels (IC), (3) G proteincoupled receptors (GR), and (4) nuclear receptors (NR); originally assembled by Yamanishi et al. [63]. Table 1 summarizes all data sets used in this work.
Type  Dataset  

Edge classification  
PPI  PC  
DTI  ID  drugs  
targets  
AI  drugs  
targets  
Link prediction  
PPI  EC  
SC  
CE  
AT  
DM  
HS  
SP  
RN  
MM  
DTI  EZ  drugs  
targets  
IC  drugs  
targets  
GR  drugs  
targets  
NR  drugs  
targets 
a The size of and is given by .
4.2 Integrating domain knowledge via vertex alphabet
To incorporate domain knowledge into the PPI networks, we exploited the fact that each vertex (protein) in the graph is associated with its amino acid sequence. Two methods were used to develop vertex alphabet. First, we mapped each protein into a vector of mer (
) counts and then applied hierarchical clustering on these count vectors. A result of the clustering step assigned one of the
vertex labels for each node. Second, we used protein sequences to predict their molecular and biological function (Gene Ontology terms) using the FANNGO algorithm [12]. Hierarchical clustering was subsequently used on the predicted term scores to group proteins into broad functional categories. In the case of DTI data, target proteins were annotated in a similar manner. For labeling drug compounds, we used the chemical structure similarity matrix computed from SIMCOMP [20], transformed it into a dissimilarity matrix and then applied hierarchical clustering to group compounds into structural categories.4.3 Evaluation methodology
For each data set, we evaluated all hypergraphlet kernels by comparing them to two inhouse implementations of random walk kernels on hypergraphs. The random walk kernels were implemented as follows: given a hypergraph and two vertices and , simultaneous random walks and were generated from and using random restarts. However, in contrast to random walks on standard graphs, a random walk in a hypergraph is a twostep process such that at each step one must simultaneously (1) pick hyperedges and incident with current vertices and respectively, and (2) pick destination vertices and . This process is repeated until a predefined number of steps is reached. In the conventional random walk implementation on hypergraphs, a walk was scored as 1 if the entire sequences of vertex and hyperedge labels between and matched; otherwise, a walk was scored as 0. After 10,000 steps, the scores over all walks were summed to produce a kernel value between and . In order to construct a random walk similar to the hypergraphlet edit distance approach, a cumulative random walk kernel was also implemented. Here, any match between the labels of vertices and , or hyperedges and in the th step of each walk was scored as 1, while a mismatch was scored as 0. Thus, a walk of length could contribute between and
to the total count. In each of the random walks, the probability of restart was selected from a set
and the result with the highest accuracy is reported. On the PPI data sets we also evaluated the performance of pairwise spectrum kernels [4]. The mer size was varied from and the result with the highest accuracy is reported. Finally, in the case of the edit distance kernels, we computed the set of normalized hypergraphlet kernel matrices using , , , and for all pairs obtained from a grid search over , and . The result with the highest accuracy is reported.The performance of each method was evaluated through a 10fold crossvalidation. In each iteration, 10% of nodes in the network are selected for the test set, whereas the remaining 90% are used for training. Support vector machine (SVM) classifiers were used to construct all predictors and perform comparative evaluation. We used SVM with the default value for the capacity parameter [28]. Once each predictor was trained, we used Platt’s correction to adjust the outputs of the predictor to the 01 range [41]. Finally, we estimated the area under the ROC curve (AUC), which plots the true positive rate (sensitivity, ) as a function of false positive rate (1  specificity, ).
5 Results
Dataset/Method  PC  ID  AI 
Without domain information,  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  
With domain information,  
Random walk  
Cumulative random walk  0.834  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  0.781  0.845 
5.1 Performance analysis on edge classification
We first evaluated the performance of hypergraphlet kernels in the task of predicting the types of interactions between pairs of proteins in a PPI network, as well as interaction types and modes of action between proteins and chemicals in DTI data. As described in Section 3.1 we first converted the input hypergraph to its dual hypergraph and then used the dual hypergraph for vertex classification. Table 2 lists the AUC estimates for each method and data set. Figure 3 shows ROC curves for one representative data set from each classification task and network type. Observe that the edit distance kernel () outperformed the traditional hypergraphlet kernel () on all data sets. Edit distance kernels achieved the highest AUCs on two of the three data sets over random walk kernels. Therefore, these results provide evidence of the feasibility of this alternative approach to edge classification via exploiting hypergraph duality.
Method/Dataset  EC  SC  CE  AT  DM  HS  SP  RN  MM 
Without domain information,  
Hypergraphlet kernel ()  0.900  
Hypergraphlet kernel ()  
With domain information, and  
Random walk  
Cumulative random walk  
Pairwise spectrum kernel ()  0.911  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  0.742  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  0.830  
Hypergraphlet + Pairwise spectrum  0.883  0.858  0.878  0.818  0.847 
Method/Dataset  EZ  IC  GR  NR 

Without domain information,  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  
With domain information,  
Random walk  
Cumulative random walk  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  
Hypergraphlet kernel ()  0.922  0.863  0.858  0.941 
5.2 Performance analysis on link prediction
The performance of hypergraphlet kernels was further evaluated on the problem of link prediction on multiple PPI and DTI network data sets. Tables 3 and 4 show the performance accuracies for each hypergraphbased method across all link prediction data sets. These results demonstrate good performance of our methods, with editdistance kernels generally having the best performance. The primary objective of our study was to present a new approach whose value will increase as biological data becomes more frequently modeled by hypergraphs. At this time, such data sets are not readily available.
5.3 Estimating interactome sizes
We used the AlphaMax algorithm [24] for estimating class priors in positiveunlabeled learning to estimate the number of missing links and misannotated (false positives) interactions on each PPI network. For example, if we assume a tissue and cellular component agnostic model (i.e., any two proteins can interact), we obtained that the number of missing interactions on the largest component of the human PPI network (see Table 1) is about 5% (i.e., approximately 2.5 million interactions), while the number of misannotated interactions is close to 11% which translates to about 4,985 interactions. In the case of yeast, we computed that less than 1% of the potential protein interactions are missing which is close to 95,000. The number of misannotated interactions is close to 13%, which is about 3,400 misannotated protein pairs. Some of these numbers fall within previous studies that suggest that the size of the yeast interactome is between 13,500 [53] and 137,000 [22]; however, the size of the human interactome is estimated to be within 130,000 [57] and 650,000 [53] interactions. A recent paper by Lewis et al. [33] presents a scenario where yeast and human interactome size could reach 400,000 and over two million interactions, respectively. In any case, we note that these estimates were made as a proof of concept for the proposed methodology under the assumption of representative positive data. They however can serve as further validation of the usefulness of our problem formulation and underlying methodology. Additional tests and experiments, potentially involving exhaustive classifier and parameter optimization, will be necessary for more accurate and reliable estimates, especially for understanding the influence of potential biases within the PPI network data.
6 Related work
The literature on the similaritybased measures for learning on hypergraphs is relatively scarce. Most studies revolve around the use of random walks for clustering that were first used in the field of circuit design [13]
. Historically, typical hypergraphbased learning approaches can be divided into (1) tensorbased approaches, which extend traditional matrix (spectral) methods on graphs to higherorder relations for hypergraph clustering
[13, 10, 32], and (2) approximationbased approaches that convert hypergraphs into standard weighted graphs and then exploit conventional graph clustering and (semi) supervised learning [2, 65]. The methods from the first category provide a direct and mathematically rigorous treatment of hypergraph learning, although most tensor problems are NPhard. As a consequence, this line of research remains largely unexplored despite a renewed interest in tensor decomposition approaches [21, 45]. Regarding the second category, there are two commonly used transformations for graphbased hypergraph approximation: (1) the star expansion and (2) the clique expansion. These methods are reviewed and compared by Agarwal et al. [1].Under a supervised learning framework, Wachman and Khardon [60] propose random walkbased hypergraph kernels on ordered hypergraphs, while Sun et al. [54] present a hypergraph spectral learning formulation for multilabel classification. More recently, Bai et al. [3] introduced a hypergraph kernel that transforms a hypergraph into a directed line graph and computes a WeisfeilerLehman isomorphism test between directed graphs. A major drawback of most such approaches is that no graph representation fully captures the hypergraph structure. For instance, Ihler et al. [23] have shown that it is impossible to have an exact representation of a hypergraph via a graph while still retaining its cut properties. Therefore, there is a need for a robust hypergraphbased methodology for learning directly on hypergraph data.
7 Conclusions
This paper presents a learning framework for the problems of vertex classification, (hyper)edge classification, and link prediction in graphs and hypergraphs. The key to our approach is the use of hypergraph duality in order to cast each classification problem as an instance of vertex classification. This work also presents a new family of kernel functions defined directly on hypergraphs. Using the terminology of Bleakey et al. [7], our method belongs to the category of “local” techniques. That is, it captures the structure of local neighborhoods, rooted at the vertex of interest, and should be distinguished from “global” models such as Markov Random Fields or diffusion kernels [31]. The body of literature on graph learning is vast. We therefore selected to perform extensive comparisons against a limited set of methods that are most relevant to ours.
The development of hypergraphlet kernels derives from the graph reconstruction conjecture, an idea of using small graphs to probe large graphs [8, 9]. Hypergraphlet kernels prioritize accuracy over run time and, it may be argued, do not follow some recent trends in machine learning that generally trade off accuracy for improved scalability and realtime performance. We therefore propose that hypergraphlet kernel approaches, in particular those based on edit distances, be predominantly used on sparse graphs of moderate size. Fortunately, all graphs used in this work fall into that category. Increased accuracy, in general, benefits experimental biologists who typically use prediction to prioritize targets for experimental validation.
The proposed methodology was evaluated on multiple data sets for edge classification and link prediction in biological networks. The results show that hypergraphlet kernels are competitive with other approaches and readily deployable in practice. Through limited tests, we also find that combining hypergraphlet kernels with pairwise spectrum kernels achieves better accuracy than either of the methods does individually.
8 Acknowledgments
We thank Matthew Carey for his help in implementing hyperedgeindel kernels. This work was partially supported by the National Science Foundation (NSF) grant DBI1458477, National Institutes of Health (NIH) grant R01 MH105524, and the Indiana University Precision Health Initiative.
References
 [1] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In Proc. 23rd International Conference on Machine Learning, ICML ’06, pp. 17–24, 2006.

[2]
S. Agarwal, J. Lim, L. ZelnikManor, P. Perona, D. Kriegman, and S. Belongie.
Beyond pairwise clustering.
In
Proc. 18th Conference on Computer Vision and Pattern Recognition
, CVPR ’05, pp. 838–845, 2005.  [3] L. Bai, P. Ren, and E. R. Hancock. A hypergraph kernel from isomorphism tests. In Proc. 22nd International Conference on Pattern Recognition, ICPR ’14, pp. 3880–3885, 2014.
 [4] A. BenHur and W. S. Noble. Kernel methods for predicting proteinprotein interactions. Bioinformatics, 21(Suppl 1):i38–i46, 2005.
 [5] C. Berge. Graphs and Hypergraphs. NorthHolland, 1973.

[6]
G. Blanchard, G. Lee, and C. Scott.
Semisupervised novelty detection.
J Mach Learn Res, 11:2973–3009, 2010.  [7] K. Bleakley, G. Biau, and JP. Vert. Supervised reconstruction of biological networks with local models. Bioinformatics, 23(13):i57, 2007.
 [8] J. A. Bondy and R. L. Hemminger. Graph reconstructiona survey. J Graph Theory, 1(3):227–268, 1977.
 [9] C. Borgs, J. Chayes, L. Lovász, V. T. Sós, and K. Vesztergombi. Counting graph homomorphisms. In Topics Discrete Math, Algorithms and Combinatorics, pp. 315–371. Springer Berlin Heidelberg, 2006.
 [10] S. R. Bulò and M. Pelillo. A gametheoretic approach to hypergraph clustering. In Proc. 22nd Advances in Neural Information Processing Systems, NIPS ’09, pp. 1571–1579, 2009.
 [11] F. ChungGraham. Spectral Graph Theory. CBMS Regional Conference Series in Mathematics, 1997.
 [12] W. T. Clark and P. Radivojac. Analysis of protein function and its prediction from amino acid sequence. Proteins, 79(7):2086–2096, 2011.
 [13] J. Cong, L. Hagen, and A. Kahng. Random walks for circuit clustering. In Proc. 4th International ASIC Conference, ASIC ’91, pp. P14–2.1–P14–2.4, 1991.
 [14] F. Denis, R. Gilleron, and F. Letouzey. Learning from positive and unlabeled examples. Theor Comput Sci, 348(1):70–83, 2005.
 [15] C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proc. 14th International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 213–220, 2008.
 [16] T. Fawcett. An introduction to ROC analysis. Pattern Recogn Lett, 27:861–874, 2006.
 [17] S. Fortunato. Community detection in graphs. Phys Rep, 486(35):75–174, 2010.
 [18] S. M. Gomez, W. S. Noble, and A. Rzhetsky. Learning to predict proteinprotein interactions from protein sequences. Bioinformatics, 19(15):1875–1881, 2003.
 [19] G. T. Hart, A. K. Ramani, and E. M. Marcotte. How complete are current yeast and human proteininteraction networks? Genome Biol, 7(11):120, 2006.
 [20] M. Hattori, Y. Okuno, S. Goto, and M. Kanehisa. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. JACS, 125(39):11853–11865, 2003.
 [21] M. Hein, S. Setzer, L. Jost, and S. S. Rangapuram. The total variation on hypergraphs  learning on hypergraphs revisited. In Proc. 26th Advances in Neural Information Processing Systems, NIPS ’13, pp. 2427–2435, 2013.
 [22] H. Huang, B. M Jedynak, and J. S. Bader. Where have all the interactions gone? Estimating the coverage of twohybrid protein interaction maps. PLoS Comput Biol, 3(11):1–20, 2007.
 [23] E. Ihler, D. Wagner, and F. Wagner. Modeling hypergraphs by graphs with the same mincut properties. Inform Process Lett, 45(4):171–175, 1993.
 [24] S. Jain, M. White, and P. Radivojac. Estimating the class prior and posterior from noisy positives and unlabeled data. In Proc. 30th Advances in Neural Information Processing Systems, NIPS ’16, pp. 2693–2701, 2016.

[25]
S. Jain, M. White, and P. Radivojac.
Recovering true classifier performance in positiveunlabeled
learning.
In
Proc. 31st AAAI Conference on Artificial Intelligence
, AAAI ’17, 2017.  [26] S. Jain, M. White, M. W. Trosset, and P. Radivojac. Nonparametric semisupervised learning of class proportions. arXiv preprint arXiv:1601.01944, 2016.
 [27] C. Jiang, F. Coenen, and M. Zito. A survey of frequent subgraph mining algorithms. Knowl Eng Rev, 28(01):75–105, 2013.
 [28] T. Joachims. Learning to classify text using support vector machines: methods, theory, and algorithms. Kluwer Academic Publishers, 2002.
 [29] S. Klamt, UU. Haus, and F. Theis. Hypergraphs and cellular networks. PLoS Comput Biol, 5(5):1–6, 2009.
 [30] D. Koller and N. Friedman. Probabilistic graphical models: Principles and Techniques. MIT Press, 2009.
 [31] R. I. Kondor and J. D. Lafferty. Diffusion kernels on graphs and other discrete structures. In Proc. 19th International Conference on Machine Learning, ICML ’02, pp. 315–322, 2002.
 [32] M. Leordeanu and C. Sminchisescu. Efficient hypergraph clustering. In Proc. 15th International Conference on Artificial Intelligence and Statistics, volume 22 of AISTATS ’12, pp. 676–684, 2012.
 [33] A. C. F. Lewis, N. S. Jones, M. A. Porter, and C. M. Deane. What evidence is there for the homology of proteinprotein interactions? PLoS Comput Biol, 8:1–14, 9 2012.
 [34] D. LibenNowell and J. Kleinberg. The linkprediction problem for social networks. J Am Soc Inf Sci Technol, 58(7):1019–1031, 2007.
 [35] J. LugoMartinez and P. Radivojac. Generalized graphlet kernels for probabilistic inference in sparse graphs. Network Science, 2(2):254–276, 2014.
 [36] S. Martin, D. Roe, and JL. Faulon. Predicting proteinprotein interactions using signature products. Bioinformatics, 21(2):218–226, 2005.
 [37] A. K. Menon, B. van Rooyen, C. S. Ong, and R. C. Williamson. Learning from corrupted binary labels via classprobability estimation. In Proc. 32nd International Conference on Machine Learning, ICML ’15, pp. 125–134, 2015.
 [38] R. Milo, S. ShenOrr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298:824–827, 2002.
 [39] Y. Moreau and L. C. Tranchevent. Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet, 13(8):523–536, 2012.
 [40] E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh. Wholeproteome prediction of protein function via graphtheoretic analysis of interaction maps. Bioinformatics, 21(Suppl 1):i302–i310, 2005.
 [41] J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, MIT Press, pp. 61–74, 2000.
 [42] G. Pólya. Kombinatorische anzahlbestimmungen für gruppen, graphen und chemische verbindungen. Acta Math, 68:145–254, 1937.
 [43] N. Przulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2):e177–e183, 2007.
 [44] N. Przulj, D. G. Corneil, and I. Jurisica. Modeling interactome: scalefree or geometric? Bioinformatics, 20(18):3508–3515, 2004.
 [45] P. Purkait, TJ. Chin, H. Ackermann, and D. Suter. Clustering with hypergraphs: the case for large hyperedges. In Proc. 13th European Conference on Computer Vision, ECCV ’14, pp. 672–687, 2014.
 [46] J. Qiu and W. S. Noble. Predicting cocomplexed protein pairs from heterogeneous data. PLoS Comput Biol, 4(4):e1000054, 2008.
 [47] A. K. Ramani and E. M. Marcotte. Exploiting the coevolution of interacting proteins to discover interaction specificity. J Mol Biol, 327(1):273–284, 2003.
 [48] H. G. Ramaswamy, C. Scott, and A. Tewari. Mixture proportion estimation via kernel embedding of distributions. arXiv preprint arXiv:1603.02501, 2016.
 [49] K. Rieck and P. Laskov. Lineartime computation of similarity measures for sequential data. J Mach Learn Res, 9:23–48, 2008.
 [50] R. Sharan, I. Ulitsky, and R. Shamir. Networkbased prediction of protein function. Mol Syst Biol, 3:88, 2007.
 [51] J. ShaweTaylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University Press, Cambridge CB2 8RU, UK, 4th edition, 2001.
 [52] N. Shervashidze, S. V. N. Vishwanathan, T. H. Petri, K. Mehlhorn, and K. M. Borgwardt. Efficient graphlet kernels for large graph comparison. In Proc. 12th International Conference on Artificial Intelligence and Statistics, AISTATS ’09, pp. 488–495, 2009.
 [53] M. P. H. Stumpf, T. Thorne, E. de Silva, R. Stewart, H. J. An, M. Lappe, and C. Wiuf. Estimating the size of the human interactome. Proc Natl Acad Sci USA, 105(19):6959–6964, 2008.
 [54] L. Sun, S. Ji, and J. Ye. Hypergraph spectral learning for multilabel classification. In Proc. 14th International Conference on Knowledge Discovery and Data Mining, KDD ’08, pp. 668–676, 2008.
 [55] K. Tsuda and H. Saigo. Graph classification. In Managing and Mining Graph Data, volume 40 of Advances in Database Systems, pp. 337–363, 2010.
 [56] V. Vacic, L. M. Iakoucheva, S. Lonardi, and P. Radivojac. Graphlet kernels for prediction of functional residues in protein structures. J Comput Biol, 17(1):55–72, 2010.
 [57] K. Venkatesan et al. An empirical framework for binary interactome mapping. Nat Methods, 6:83–90, 2009.
 [58] S. V. N. Vishwanathan, N. N. Schraudolph, R. I. Kondor, and K. M. Borgwardt. Graph kernels. J Mach Learn Res, 11:1201–1242, 2010.
 [59] C. von Mering, R. Krause, R. I. Kondor, B. Snel, M. Cornell, S. G. Oliver, S. Fields, and P. Bork. Comparative assessment of largescale data sets of proteinprotein interactions. Nature, 417(6887):399–403, 2002.
 [60] G. Wachman and R. Khardon. Learning from interpretations: a rooted kernel for ordered hypergraphs. In Proc. 24th International Conference on Machine Learning, ICML ’07, pp. 943–950, 2007.

[61]
Y. Wang and J. Zeng.
Predicting drugtarget interactions using restricted boltzmann machines.
Bioinformatics, 29(13):i126, 2013.  [62] J. Xu and Y. Li. Discovering diseasegenes by topological features in human proteinprotein interaction network. Bioinformatics, 22(22):2800–2805, 2006.
 [63] Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, and M. Kanehisa. Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13):i232–i240, 2008.
 [64] L. V. Zhang, S. L. Wong, O. D. King, and F. P. Roth. Predicting cocomplexed protein pairs using genomic and proteomic data integration. BMC bioinformatics, 5(1):38, 2004.
 [65] D. Zhou, J. Huang, and B. Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. In Proc. 19th Advances in Neural Information Processing Systems, NIPS ’06, pp. 1601–1608, 2006.
 [66] X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. In Technical Report CMUCALD02107, Carnegie Mellon University, 2002.
Appendix
8.1 Enumeration of labeled hypergraphlets
Here we characterize the feature space of fully labeled hypergraphlets by describing the dimensionality of count vectors . We are interested in the order of growth of as a function of , and .
Suppose that and are base hypergraphlets with vertices and hyperedges. We say that and belong to the same equivalence class if and only if the total number of (nonisomorphic) fully labeled hypergraphlets corresponding to the base cases and are equal for any and . The total counts of labeled hypergraphlets over all alphabet sizes induce a partition of base hypergraphlets into equivalence classes. We denote the set of all equivalence classes over the hypergraphlets of order as . For example, the set of vertex and hyperedgelabeled hypergraphlets can be partitioned into either: two symmetry classes when : and , or seven symmetry classes when : , , , , , and . Table 5 summarizes equivalence classes induced by partitioning base hypergraphlets up to the order of along with the cardinality of each set. Overall, observe that the cardinality of can be significantly larger than those reported for graphlets [35] because the possible number of hyperedges in a hypergraphlet is generally much larger than the possible number of edges in a graphlet. Additionally, hyperedgelabels require base hypergraphlets and to have an equal number of hyperedges.
This approach can be generalized to hypergraphlets labeled by any alphabet and , such that
where is the number of (nonisomorphic) fully labeled hypergraphlets corresponding to any base hypergraphlet from the equivalence class . We use this decomposition to compute the total dimensionality of the count vectors by first finding the equivalence classes corresponding to the base hypergraphlets and then counting the number of labeled hypergraphlets for any one member of the group.
Vertexlabeled hypergraphlets  
)  
)  
Fullylabeled hypergraphlets  
In the case of undirected fully labeled hypergraphlets, can also be computed by applying the theory of enumeration developed by Pólya [42]. In order to get the derivation of the complete generating function for each equivalence class , we first define the automorphism group of a given vertex and hyperedgelabeled hypergraph . That is, in the case of fullylabeled hypergraphs, set is a collection of permutations (automorphisms) of and . Therefore, the counting problem can be reformulated as follows: Let be a base hypergraphlet of vertices and hyperedges, and be the automorphism group of over and . Then, each permutation can be written uniquely as the product of disjoint cycles such that for each integer (), we define () as the number of cycles of length () in the disjoint cycle expansion of . Interestingly, the generalized formula for the cycle index of , denoted as , is a polynomial in given by
By applying Pólya’s theorem in the context of enumerating vertex and hyperedgelabeled hypergraphlets corresponding to any base hypergraphlet in , we get that is determined by substituting for each variable and for each variable in . Hence,
where is the automorphism group of a base hypergraphlet from . As an example, consider the equivalence class with and (Figure 2 illustrates an unlabeled version of hypergraphlet ). The automorphism group ; thus, . Therefore, it follows that,
Comments
There are no comments yet.