In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise associations. Hypergraphs provide a flexible and natural modeling tool to model such complex relationships. For example, in a co-authorship network an author (hyperedge) can be a co-author of more than two documents (vertices).
The obvious existence of such complex relationships in many real-world networks naturaly motivates the problem of learning with hypergraphs Zhou et al. (2007); Hein et al. (2013); Zhang et al. (2017); Feng et al. (2019). A popular learning paradigm is graph-based / hypergraph-based semi-supervised learning (SSL) where the goal is to assign labels to initially unlabelled vertices in a graph / hypergraph Chapelle et al. (2010); Zhu et al. (2009); Subramanya and Talukdar (2014). While many techniques have used explicit Laplacian regularisation in the objective Zhou et al. (2003); Zhu et al. (2003); Chapelle et al. (2003); Weston et al. (2008), the state-of-the-art neural methods encode the graph / hypergraph structure implicitly via a neural network Kipf and Welling (2017); Atwood and Towsley (2016); Feng et al. (2019) ( contains the initial features on the vertices for example, text attributes for documents).
While explicit Laplacian regularisation assumes similarity among vertices in each edge / hyperedge, implicit regularisation of graph convolutional networks (GCNs) Kipf and Welling (2017) avoids this restriction and enables application to a broader range of problems in combinatorial optimisation Gong et al. (2019); Lemos et al. (2019); Prates et al. (2019); Li et al. (2018c)2019); Norcliffe-Brown et al. (2018); Wang et al. (2018)2019a); Yao et al. (2019); Marcheggiani and Titov (2017), etc. In this work, we propose, HyperGCN, a novel training scheme for a GCN on hypergraph and show its effectiveness not only in SSL where hyperedges encode similarity but also in combinatorial optimisation where hyperedges do not encode similarity. Combinatorial optimisation on hypergraphs has recently been highlighted as crucial for real-world network analysis Amburg et al. (2019); Nguyen et al. (2019).
Methodologically, HyperGCN approximates each hyperedge of the hypergraph by a set of pairwise edges connecting the vertices of the hyperedge and treats the learning problem as a graph learning problem on the approximation. While the state-of-the-art hypergraph neural networks (HGNN) Feng et al. (2019) approximates each hyperedge by a clique and hence requires (quadratic number of) edges for each hyperedge of size , our method, i.e. HyperGCN, requires a linear number of edges (i.e. ) for each hyperedge. The advantage of this linear approximation is evident in Table 1 where a faster variant of our method has lower training time on synthetic data (with higher density as well) for densest -subhypergraph and SSL on real-world hypergraphs (DBLP and Pubmed). In summary, we make the following contributions:
|Model Metric||Training time||Density||DBLP||Pubmed|
average training time of an epoch (lower is better)
We propose HyperGCN, a novel method of training a graph convolutional network (GCN) on hypergraphs using existing tools from spectral theory of hypergraphs (Section 4).
While the motivation of HyperGCN is based on similarity of vertices in a hyperedge, we show that it can be used effectively for combinatorial optimisation where hyperedges do not encode similarity.
2 Related work
In this section, we discuss related work and then the background in the next section.
Deep learning on graphs: Geometric deep learning Bronstein et al. (2017) is an umbrella phrase for emerging techniques attempting to generalise (structured) deep neural network models to non-Euclidean domains such as graphs and manifolds. Graph convolutional network (GCN) Kipf and Welling (2017) defines the convolution using a simple linear function of the graph Laplacian and is shown to be effective on semi-supervised classification on attributed graphs. The reader is referred to a comprehensive literature review Bronstein et al. (2017) and extensive surveys Hamilton et al. (2017); Battaglia et al. (2018); Zhang et al. (2018); Sun et al. (2018); Wu et al. (2019)
on this topic of deep learning on graphs.
Learning on hypergraphs: The clique expansion of a hypergraph was introduced in a seminal work Zhou et al. (2007) and has become popular Agarwal et al. (2006); Satchidanand et al. (2015); Feng et al. (2018). Hypergraph neural networks Feng et al. (2019)
use the clique expansion to extend GCNs for hypergraphs. Another line of work uses mathematically appealing tensor methodsShashua et al. (2006); Bulò and Pelillo (2009); Kolda and Bader (2009), but they are limited to uniform hypergraphs. Recent developments, however, work for arbitrary hypergraphs and fully exploit the hypergraph structure Hein et al. (2013); Zhang et al. (2017); Chan and Liang (2018); Li and Milenkovic (2018b); Chien et al. (2019).
Graph-based SSL: Researchers have shown that using unlabelled data in training can improve learning accuracy significantly. This topic is so popular that it has influential books Chapelle et al. (2010); Zhu et al. (2009); Subramanya and Talukdar (2014).
Graph neural networks for combinatorial optimisation: Graph-based deep models have recently been shown to be effective as learning-based approaches for NP-hard problems such as maximal independent set, minimum vertex cover, etc. Li et al. (2018c), the decision version of the traveling salesman problem Prates et al. (2019), graph colouring Lemos et al. (2019), and clique optimisation Gong et al. (2019).
3 Background: Graph convolutional network
Let , with , be a simple undirected graph with adjacency , and data matrix . which has
-dimensional real-valued vector representations for each node.
The basic formulation of graph convolution Kipf and Welling (2017) stems from the convolution theorem Mallat (1999) and it can be shown that the convolution of a real-valued graph signal and a filter signal is approximately where and are learned weights, and is the scaled graph Laplacian,
is the largest eigenvalue of the symmetrically-normalised graph Laplacianwhere is the diagonal degree matrix with elements . The filter depends on the structure of the graph (the graph Laplacian ). The detailed derivation from the convolution theorem uses existing tools from graph signal processing Shuman et al. (2013); Hammond et al. (2011); Bronstein et al. (2017) and is provided in the supplementary material. The key point here is that the convolution of two graph signals is a linear function of the graph Laplacian .
|an undirected simple graph||an undirected hypergraph|
|set of nodes||set of hypernodes|
|set of edges||set of hyperedges|
|number of nodes||number of hypernodes|
|graph Laplacian||hypergraph Laplacian|
|graph adjacency matrix||hypergraph incidence matrix|
The graph convolution for different graph signals contained in the data matrix with learned weights with hidden units is . The proof involves a renormalisation trick Kipf and Welling (2017) and is in the supplementary.
Gcn Kipf and Welling (2017)
The forward model for a simple two-layer GCN takes the following simple form:
where is an input-to-hidden weight matrix for a hidden layer with hidden units and
is a hidden-to-output weight matrix. The softmax activation function is defined asand applied row-wise.
GCN training for SSL: For multi-class, classification with classes, we minimise cross-entropy,
over the set of labelled examples . Weights and are trained using gradient descent.
A summary of the notations used throughout our work is shown in Table 2.
4 HyperGCN: Hypergraph Convolutional Network
We consider semi-supervised hypernode classification on an undirected hypergraph with , and a small set of labelled hypernodes. Each hypernode is also associated with a feature vector of dimension given by . The task is to predict the labels of all the unlabelled hypernodes, that is, all the hypernodes in the set .
Overview: The crucial working principle here is that the hypernodes in the same hyperedge are similar and hence are likely to share the same label Zhang et al. (2017). Suppose we use to denote some representation of the hypernodes in , then, for any , the function will be “small” only if vectors corresponding to the hypernodes in are “close” to each other. Therefore, as a regulariser is likely to achieve the objective of the hypernodes in the same hyperedge having similar representations. However, instead of using it as an explicit regulariser, we can achieve the same goal by using GCN over an appropriately defined Laplacian of the hypergraph. In other words, we use the notion of hypergraph Laplacian as an implicit regulariser which achieves this objective.
A hypergraph Laplacian with the same underlying motivation as stated above was proposed in prior works Chan et al. (2018); Louis (2015). We present this Laplacian first. Then we run GCN over the simple graph associated with this hypergraph Laplacian. We call the resulting method -HyperGCN (as each hyperedge is approximated by exactly one pairwise edge). One epoch of -HyperGCN is shown in figure 1
4.1 Hypergraph Laplacian
As explained before, the key element for a GCN is the graph Laplacian of the given graph . Thus, in order to develop a GCN-based SSL method for hypergraphs, we first need to define a Laplacian for hypergraphs. One such way Chan et al. (2018) (see also Louis (2015)) is a non-linear function (the Laplacian matrix for graphs can be viewed as a linear function ).
Definition 1 (Hypergraph Laplacian Chan et al. (2018); Louis (2015)111The problem of breaking ties in choosing (resp. ) is a non-trivial problem as shown in Chan et al. (2018). Breaking ties randomly was proposed in Louis (2015), but Chan et al. (2018) showed that this might not work for all applications (see Chan et al. (2018) for more details). Chan et al. (2018) gave a way to break ties, and gave a proof of correctness for their tie-breaking rule for the problems they studied. We chose to break ties randomly because of its simplicity and its efficiency. )
Given a real-valued signal defined on the hypernodes, is computed as follows.
For each hyperedge , let , breaking ties randomly††footnotemark: .
A weighted graph on the vertex set is constructed by adding edges with weights to , where is the weight of the hyperedge . Next, to each vertex , self-loops are added such that the degree of the vertex in is equal to . Let denote the weighted adjacency matrix of the graph .
The symmetrically normalised hypergraph Laplacian is
By following the Laplacian construction steps outlined in Section 4.1, we end up with the simple graph with normalized adjacency matrix . We now perform GCN over this simple graph . The graph convolution operation in Equation (1), when applied to a hypernode in , in the neural message-passing framework Gilmer et al. (2017) is . Here, is epoch number, is the new hidden layer representation of node , is a non-linear activation function, is a matrix of learned weights, is the set of neighbours of , is the weight on the edge after normalisation, and is the previous hidden layer representation of the neighbour
. We note that along with the embeddings of the hypernodes, the adjacency matrix is also re-estimated in each epoch.
Figure 1 shows a hypernode with five hyperedges incident on it. We consider exactly one representative simple edge for each hyperedge given by where for epoch . Because of this consideration, the hypernode may not be a part of all representative simple edges (only three shown in figure). We then use traditional Graph Convolution Operation on considering only the simple edges incident on it. Note that we apply the operation on each hypernode in each epoch of training until convergence.
Connection to total variation on hypergraphs: Our 1-HyperGCN model can be seen as performing implicit regularisation based on the total variation on hypergraphs Hein et al. (2013). In that prior work, explicit regularisation and only the hypergraph structure is used for hypernode classification in the SSL setting. HyperGCN, on the other hand, can use both the hypergraph structure and also exploit any available features on the hypernodes, e.g., text attributes for documents.
4.3 HyperGCN: Enhancing -HyperGCN with mediators
One peculiar aspect of the hypergraph Laplacian discussed is that each hyperedge is represented by a single pairwise simple edge (with this simple edge potentially changing from epoch to epoch). This hypergraph Laplacian ignores the hypernodes in in the given epoch. Recently, it has been shown that a generalised hypergraph Laplacian in which the hypernodes in act as “mediators" Chan and Liang (2018) satisfies all the properties satisfied by the above Laplacian given by Chan et al. (2018). The two Laplacians are pictorially compared in Figure 2. Note that if the hyperedge is of size , we connect and with an edge. We also run a GCN on the simple graph associated with the hypergraph Laplacian with mediators Chan and Liang (2018) (right in Figure 2). It has been suggested that the weights on the edges for each hyperedge in the hypergraph Laplacian (with mediators) sum to Chan and Liang (2018). We chose each weight to be as there are edges for a hyperedge .
We use just the initial features (without the weights) to construct the hypergraph Laplacian matrix (with mediators) and we call this method FastHyperGCN. Because the matrix is computed only once before training (and not in each epoch), the training time of FastHyperGCN is much less than that of other methods. We have provided the algorithms for the three methods in the supplementary.
5 Experiments for semi-supervised learning
We conducted experiments not only on real-world datasets but also on categorical data (results in supplementary) which are a standard practice in hypergraph-based learning Zhou et al. (2007); Hein et al. (2013); Zhang et al. (2017); Li and Milenkovic (2018b, a); Li et al. (2018a).
We compared HyperGCN, -HyperGCN and FastHyperGCN against the following baselines:
Multi-layer perceptron + explicit hypergraph Laplacian regularisation (MLP + HLR): regularises the MLP by training it with the loss given by and uses the hypergraph Laplacian with mediators for explicit Laplacian regularisation . We used of the test set used for all the above models for this baseline to get an optimal .
Confidence Interval-based method (CI) Zhang et al. (2017) uses a subgradient-based method Zhang et al. (2017). We note that this method has consistently been shown to be superior to the primal dual hybrid gradient (PDHG) of Hein et al. (2013) and also Zhou et al. (2007). Hence, we did not use these other previous methods as baselines, and directly compared HyperGCN against CI.
The task for each dataset is to predict the topic to which a document belongs (multi-class classification). Statistics are summarised in Table 3. For more details about datasets, please refer to the supplementary. We trained all methods for
epochs and used the same hyperparameters of a prior workKipf and Welling (2017)
. We report the mean test error and standard deviation overdifferent train-test splits. We sampled sets of same sizes of labelled hypernodes from each class to have a balanced train split.
Real-world hypergraph datasets used in our work. Distribution of hyperedge sizes is not symmetric either side of the mean and has a strong positive skewness.
|MLP + HLR|
6 Analysis of results
The results on real-world datasets are shown in Table 4. We now attempt to explain them.
Given a hypergraph with and signals on the vertices , let, for each hyperedge , and . Define
so that and are the normalised clique exapnsion, i.e., graph of HGNN and mediator expansion, i.e., graph of HyperGCN/FastHyperGCN respectively. A sufficient condition for is .
Observe that we consider hypergraphs in which the size of each hyperedge is at least . It follows from definitions that and . Clealy, a sufficient condition is when each hyperedge is approximated by the same subgraph in both the expansions. In other words the condition is for each . Solving the resulting quadratic eqution gives us . Hence, or for each .
Comparable performance on Cora and Citeseer co-citation We note that HGNN is the most competitive baseline. Also for FastHyperGCN and for HyperGCN. The proposition states that the graphs of HGNN, FastHyperGCN, and HyperGCN are the same irrespective of the signal values whenever the maximum size of a hyperedge is .
This explains why the three methods have comparable accuracies for Cora co-citaion and Citeseer co-citiation hypergraphs. The mean hyperedge sizes are close to (with comparitively lower deviations) as shown in Table 3. Hence the graphs of the three methods are more or less the same.
Superior performance on Pubmed, DBLP, and Cora co-authorship
We see that HyperGCN performs statistically significantly (p-value of Welch t-test is less than 0.0001) compared to HGNN on the other three datasets. We believe this is due to large noisy hyperedges in real-world hypergraphs. An author can write papers from different topics in a co-authorship network or a paper typically cites papers of different topics in co-citation networks.
Average sizes in Table 3 show the presence of large hyperedges (note the large standard deviations). Clique expansion has edges on all pairs and hence potentially a larger number of hypernode pairs of different labels than the mediator graph of Figure 2, thus accumulating more noise.
Preference of HyperGCN and FastHyperGCN over HGNN To further illustrate superiority over HGNN on noisy hyperedges, we conducted experiments on synthetic hypergraphs each consisting of hypernodes, randomly sampled hyperedges, and classes with hypernodes in each class. For each synthetic hypergraph, hyperedges (each of size ) were “pure", i.e., all hypernodes were from the same class while the other hyperedges (each of size ) contained hypernodes from both classes. The ratio, , of hypernodes of one class to the other was varied from (less noisy) to (most noisy) in steps of .
Table 5 shows the results on synthetic data. We initialise the hypernode features to random Gaussian of dimensions. We report mean error and deviation over different synthetically generated hypergraphs. As we can see in the table for hyperedges with (mostly pure), HGNN is the superior model. However, as (noise) increases our methods begin to outperform HGNN.
Subset of DBLP: We also trained all three models on a subset of DBLP (we call it sDBLP) by removing all hyperedges of size and . The resulting hypergraph has around hyperedges with an average size of . We report mean error over different train-test splits in Table 5.
Conclusion: From the above analysis, we conclude that our proposed methods (HyperGCN and FastHyperGCN) should be preferred to HGNN for hypergraphs with large noisy hyperedges. This is also the case on experiments in combinatorial optimisation (Table 6) which we discuss next.
7 HyperGCN for combinatorial optimisation
Inspired by the recent sucesses of deep graph models as learning-based approaches for NP-hard problems Li et al. (2018c); Prates et al. (2019); Lemos et al. (2019); Gong et al. (2019), we have used HyperGCN as a learning-based approach for the densest -subhypergraph problem Chlamtác et al. (2018). NP-hard problems on hypergraphs have recently been highlighted as crucial for real-world network analysis Amburg et al. (2019); Nguyen et al. (2019). Our problem is, given a hypergraph , to find a subset of hypernodes so as to maximise the number of hyperedges contained in , i.e., we wish to maximise the density given by .
A greedy heuristic for the problem is to select thehypernodes of the maximum degree. We call this “MaxDegree". Another greedy heuristic is to iteratively remove all hyperedges from the current (residual) hypergraph consisting of a hypernode of the minimum degree. We repeat the procedure times and consider the density of the remaining hypernodes. We call this “RemoveMinDegree".
|MLP + HLR|
Experiments: Table 6 shows the results. We trained all the learning-based models with a synthetically generated dataset. More details on the approach and the synthetic data are in the supplementary. As seen in Table 6, our proposed HyperGCN outperforms all the other approaches except for the pubmed dataset which contains a small number of vertices with large degrees and a large number of vertices with small degrees. The RemoveMinDegree baseline is able to recover all the hyperedges here.
Qualitative analysis: Figure 3 shows the visualisations given by RemoveMinDegree and HyperGCN on the Cora co-authorship hypergraph. We used Gephi’s Force Atlas to space out the vertices. In general, a cluster of nearby vertices has multiple hyperedges connecting them. Clusters of only green vertices indicate the method has likely included all vertices within the hyperedges induced by the cluster. The figure of HyperGCN has more dense green clusters than that of RemoveMinDegree.
8 Comparison of training time
We compared the average training time of an epoch of FastHyperGCN and HGNN in Table 1. Both were run on a GeForce GTX 1080 Ti GPU machine. We observe that FastHyperGCN is faster than HGNN because it uses a linear number of edges for each hyperedge while HGNN uses quadratic. FastHyperGCN is also superior in terms of performance on hypergraphs with large noisy hyperedges.
We have proposed HyperGCN, a new method of training GCN on hypergraph using tools from spectral theory of hypergraphs. We have shown HyperGCN’s effectiveness in SSL and combinatorial optimisation. Approaches that assign importance to nodes Veličković et al. (2018); Monti et al. (2018); Vashishth et al. (2019b) have improved results on SSL. HyperGCN may be augmented with such approaches for even more improved performance. Supplementary: Hypergraph convolutional network
10 Algorithms of our proposed methods
The forward propagation of a -layer graph convolutional network (GCN) Kipf and Welling (2017) is
and is the diagonal degree matrix with elements . We provide algorithms for our three proposed methods:
10.1 Time complexity
Given an attributed hypergraph , let be the number of initial features, be the number of hidden units, and be the number of labels. Further, let be the total number of epochs of training. Define
HyperGCN takes time
1-HyperGCN takes time
FastHyperGCN takes time
HGNN takes time
11 HyperGCN for combinatorial optimisation
Inspired by the recent sucesses of deep graph models as learning-based approaches for NP-hard problems Li et al. (2018c); Prates et al. (2019); Lemos et al. (2019); Gong et al. (2019), we have used HyperGCN as a learning-based approach for the densest -subhypergraph problem Chlamtác et al. (2018), an NP-hard hypergraph problem. The problem is given a hypergraph , find a subset of hypernodes so as to maximise the number of hyperedges contained in (induced by) i.e. we intend to maximise the density given by
One natural greedy heuristic approach for the problem is to select the hypernodes of the maximum degree. We call this approach “MaxDegree". Another greedy heuristic approach is to iteratively remove all the hyperedges from the current (residual) hypergraph containing a hypernode of the minimum degree. We repeat the procedure times and consider the density of the remaining hypernodes. We call this approach “RemoveMinDegree".
11.1 Our approach
A natural approach to the problem is to train HyperGCN to perform the labelling. In other words, HyperGCN would take an input hypergraph as input and output a binary labelling of the hypernodes
. A natural output representation is a probability map inthat indicates how likely each hypernode is to belong to .
Let be a training set, where is an input hypergraph and is one of the optimal solutions for the NP-hard hypergraph problem. The HyperGCN model learns its parameters and is trained to predict given . During training we minimise the binary cross-entropy loss for each training sample Additionally we generate different probability maps to minimise the hindsight loss i.e. where is the cross-entropy loss corresponding to the -th probability map. Generating multiple probability maps has the advantage of generating diverse solutions Li et al. (2018c).
11.2 Experiments: Training data
To generate a sample in the training set , we fix a vertex set of vertices chosen uniformly randomly. We generate each hyperedge such that with high probability . Note that with probability . We give the algorithm to generate a sample .
11.3 Experiments: Results
We generated training samples with the number of hypernodes uniformly randomly chosen from . We fix as this is mostly the case for real-world hypergraphs. Further we chose such that is uniformly randomly chosen from as this is also mostly the case for real-world hypergraphs. We compared all our proposed approaches viz. -HyperGCN, HyperGCN, and FastHyperGCN against the baselines MLP, MLP+HLR and the state-of-the art HGNN. We also compared against the greedy heuristics MaxDegree and RemoveMinDegree. We train all the deep models using the same hyperparameters of Li et al. (2018c) and report the results for and in Table 7. We test all the models on a synthetically generated test set of hypergraphs with vertices for each. We also test the models on the five real-world hypergraphs used for SSL experiments. As we can see in the table our proposed HyperGCN outperforms all the other approaches except for the pubmed dataset which contains a small number of vertices with large degrees and a large number of vertices with small degrees. The RemoveMinDegree baseline is able to recover all the hyperedges in the pubmed dataset. Moreover FastHyperGCN is competitive with HyperGCN as the number of hypergraphs in the training data is large.
11.4 Qualitative analysis
Figure 4 shows the visualisations given by RemoveMinDegree and HyperGCN on the Cora co-authorship hypergraph. We used Gephi’s Force Atlas to space out the vertices. In general, a cluster of nearby vertices has multiple hyperedges connecting them. Clusters of only green vertices indicate the method has likely included all vertices within the hyperedges induced by the cluster. The figure of HyperGCN has more dense green clusters than that of RemoveMinDegree. Figure 5 shows the results of HGNN vs. HyperGCN.
|MLP + HLR|
12 Sources of the real-world datasets
Co-authorship data: All documents co-authored by an author are in one hyperedge. We used the author data222https://people.cs.umass.edu/ mccallum/data.htmlto get the co-authorship hypergraph for cora. We manually constructed the DBLP dataset from Arnetminer333https://aminer.org/lab-datasets/citation/DBLP-citation-Jan8.tar.bz.
Co-citation data: All documents cited by a document are connected by a hyperedge. We used cora, citeseer, pubmed from 444https://linqs.soe.ucsc.edu/data for co-citation relationships. We removed hyperedges which had exactly one hypernode as our focus in this work is on hyperedges with two or more hypernodes. Each hypernode (document) is represented by bag-of-words features (feature matrix ).
12.1 Construction of the DBLP dataset
We downloaded the entire dblp data from https://aminer.org/lab-datasets/citation/DBLP-citation-Jan8.tar.bz. The steps for constructing the dblp dataset used in the paper are as follows:
We defined a set of conference categories (classes for the SSL task) as “algorithms", “database", “programming", “datamining", “intelligence", and “vision"
For a total of venues in the entire dblp dataset we took papers from only a subset of venues from https://en.wikipedia.org/wiki/List_of_computer_science_conferences corresponding to the above conferences
From the venues of the above conference categories, we got authors publishing at least two documents for a total of
We took the abstracts of all these documents, constructed a dictionary of the most frequent words (words with frequency more than ) and this gave us a dictionary size of
13 Experiments on datasets with categorical attributes
|number of hypernodes,|
|number of hyperedges,|
|number of edges in clique expansion|
|number of classes,|
We closely followed the experimental setup of the baseline model Zhang et al. (2017)
. We experimented on three different datasets viz., mushroom, covertype45, and covertype67 from the UCI machine learning repositoryDheeru and Karra Taniskidou (2017). Properties of the datasets are summarised in Table 8. The task for each of the three datasets is to predict one of two labels (binary classification) for each unlabelled instance (hypernode). The datasets contain instances with categorical attributes. To construct the hypergraph, we treat each attribute value as a hyperedge, i.e., all instances (hypernodes) with the same attribute value are contained in a hyperedge. Because of this particular definition of a hyperedge clique expansion is destined to produce an almost fully connected graph and hence GCN on clique expansion will be unfair to compare against. Having shown that HyperGCN is superior to -HyperGCN in the relational experiments, we compare only the former and the non-neural baseline Zhang et al. (2017). We have calledHyperGCN as HyperGCN_with_mediators. We used the incidence matrix (that encodes the hypergraph structure) as the data matrix . We trained HyperGCN_with_mediators for the full epochs and we used the same hyperparameters as in Kipf and Welling (2017).
. We find that HyperGCN_with_mediators model generally does better than the baselines. We believe that this is because of the powerful feature extraction capability of HyperGCN_with_mediators.
13.1 GCN on clique expansion
We reiterate that clique expansion, i.e., HGNN Feng et al. (2019) for all the three datasets produce almost fuly connected graphs and hence clique expansion does not have any useful information. So, GCN on clique expansion is unfair to compare against (HGNN does not learn any useful weights for classification because of the fully connected nature of the graph).
13.2 Relevance of SSL
The main reason for performing these experiments, as pointed out in the publicly accessible NIPS reviews555https://papers.nips.cc/paper/4914-the-total-variation-on-hypergraphs-learning-on-hypergraphs-revisited of the total variation on hypergraphs Hein et al. (2013), is to show that the proposed method (the primal-dual hybrid gradient method in their case and the HyperGCN_with_mediators method in our case) has improved results on SSL, even if SSL is not very relevant in the first place.
We do not claim that SSL with HyperGCN_with_mediators is the best way to go about handling these categorical data but we do claim that, given this built hypergraph albeit from non-relational data, it has superior results compared to the previous best non-neural hypergraph-based SSL method Zhang et al. (2017) in the literature and that is why we have followed their experimental setup.
|MLP + HLR|
14.1 Graph signal processing
We now briefly review essential concepts of graph signal processing that are important in the construction of ChebNet and graph convolutional networks. We need convolutions on graphs defined in the spectral domain. Similar to regular -D or -D signals, real-valued graph signals can be efficiently analysed via harmonic analysis and processed in the spectral domain Shuman et al. (2013). To define spectral convolution, we note that the convolution theorem Mallat (1999) generalises from classical discrete signal processing to take into account arbitrary graphs Sandryhaila and Moura (2013).
Informally, the convolution theorem says the convolution of two signals in one domain (say time domain) equals point-wise multiplication of the signals in the other domain (frequency domain). More formally, given a graph signal, , , and a filter signal, , , both of which are defined in the vertex domain (time domain), the convolution of the two signals, , satisfies
where , , are the graph signals in the spectral domain (frequency domain) corresponding, respectively, to , and .
An essential operator for computing graph signals in the spectral domain is the symmetrically normalised graph Laplacian operator of , defined as
where is the diagonal degree matrix with elements . As the above graph Laplacian operator, , is a real symmetric and positive semidefinite matrix, it admits spectral eigen decomposition of the form , where,
forms an orthonormal basis of eigenvectors andis the diagonal matrix of the corresponding eigenvalues with .
The eigenvectors form a Fourier basis and the eigenvalues carry a notion of frequencies as in classical Fourier analysis. The graph Fourier transform of a graph signal, is thus defined as and the inverse graph Fourier transform turns out to be , which is the same as,
The convolution theorem generalised to graph signals 3 can thus be rewritten as . It follows that , which is the same as
|MLP + HLR|
14.2 ChebNet convolution
We could use a non-parametric filter but there are two limitations: (i) they are not localised in space (ii) their learning complexity is . The two limitations above contrast with with traditional CNNs where the filters are localised in space and the learning complexity is independent of the input size. It is proposed by Defferrard et al. (2016) to use a polynomial filter to overcome the limitations. A polynomial filter is defined as:
The graph convolution provided by Eq. 9 uses the monomial basis to learn filter weights. Monomial bases are not optimal for training and not stable under perturbations because they do not form an orthogonal basis. It is proposed by Defferrard et al. (2016) to use the orthogonal Chebyshev polynomials Hammond et al. (2011) (and hence the name ChebNet) to recursively compute the powers of the graph Laplacian.
A Chebyshev polynomial of order can be computed recursively by the stable recurrence relation