1 Introduction
In many realworld network datasets such as coauthorship, cocitation, email communication, etc., relationships are complex and go beyond pairwise associations. Hypergraphs provide a flexible and natural modeling tool to model such complex relationships. For example, in a coauthorship network an author (hyperedge) can be a coauthor of more than two documents (vertices).
The obvious existence of such complex relationships in many realworld networks naturaly motivates the problem of learning with hypergraphs Zhou et al. (2007); Hein et al. (2013); Zhang et al. (2017); Feng et al. (2019). A popular learning paradigm is graphbased / hypergraphbased semisupervised learning (SSL) where the goal is to assign labels to initially unlabelled vertices in a graph / hypergraph Chapelle et al. (2010); Zhu et al. (2009); Subramanya and Talukdar (2014). While many techniques have used explicit Laplacian regularisation in the objective Zhou et al. (2003); Zhu et al. (2003); Chapelle et al. (2003); Weston et al. (2008), the stateoftheart neural methods encode the graph / hypergraph structure implicitly via a neural network Kipf and Welling (2017); Atwood and Towsley (2016); Feng et al. (2019) ( contains the initial features on the vertices for example, text attributes for documents).
While explicit Laplacian regularisation assumes similarity among vertices in each edge / hyperedge, implicit regularisation of graph convolutional networks (GCNs) Kipf and Welling (2017) avoids this restriction and enables application to a broader range of problems in combinatorial optimisation Gong et al. (2019); Lemos et al. (2019); Prates et al. (2019); Li et al. (2018c)
Chen et al. (2019); NorcliffeBrown et al. (2018); Wang et al. (2018)Vashishth et al. (2019a); Yao et al. (2019); Marcheggiani and Titov (2017), etc. In this work, we propose, HyperGCN, a novel training scheme for a GCN on hypergraph and show its effectiveness not only in SSL where hyperedges encode similarity but also in combinatorial optimisation where hyperedges do not encode similarity. Combinatorial optimisation on hypergraphs has recently been highlighted as crucial for realworld network analysis Amburg et al. (2019); Nguyen et al. (2019).Methodologically, HyperGCN approximates each hyperedge of the hypergraph by a set of pairwise edges connecting the vertices of the hyperedge and treats the learning problem as a graph learning problem on the approximation. While the stateoftheart hypergraph neural networks (HGNN) Feng et al. (2019) approximates each hyperedge by a clique and hence requires (quadratic number of) edges for each hyperedge of size , our method, i.e. HyperGCN, requires a linear number of edges (i.e. ) for each hyperedge. The advantage of this linear approximation is evident in Table 1 where a faster variant of our method has lower training time on synthetic data (with higher density as well) for densest subhypergraph and SSL on realworld hypergraphs (DBLP and Pubmed). In summary, we make the following contributions:
Model Metric  Training time  Density  DBLP  Pubmed 

HGNN  s  s  s  
FastHyperGCN  s  s  s 
average training time of an epoch (lower is better)

We propose HyperGCN, a novel method of training a graph convolutional network (GCN) on hypergraphs using existing tools from spectral theory of hypergraphs (Section 4).
While the motivation of HyperGCN is based on similarity of vertices in a hyperedge, we show that it can be used effectively for combinatorial optimisation where hyperedges do not encode similarity.
2 Related work
In this section, we discuss related work and then the background in the next section.
Deep learning on graphs: Geometric deep learning Bronstein et al. (2017) is an umbrella phrase for emerging techniques attempting to generalise (structured) deep neural network models to nonEuclidean domains such as graphs and manifolds. Graph convolutional network (GCN) Kipf and Welling (2017) defines the convolution using a simple linear function of the graph Laplacian and is shown to be effective on semisupervised classification on attributed graphs. The reader is referred to a comprehensive literature review Bronstein et al. (2017) and extensive surveys Hamilton et al. (2017); Battaglia et al. (2018); Zhang et al. (2018); Sun et al. (2018); Wu et al. (2019)
on this topic of deep learning on graphs.
Learning on hypergraphs: The clique expansion of a hypergraph was introduced in a seminal work Zhou et al. (2007) and has become popular Agarwal et al. (2006); Satchidanand et al. (2015); Feng et al. (2018). Hypergraph neural networks Feng et al. (2019)
use the clique expansion to extend GCNs for hypergraphs. Another line of work uses mathematically appealing tensor methods
Shashua et al. (2006); Bulò and Pelillo (2009); Kolda and Bader (2009), but they are limited to uniform hypergraphs. Recent developments, however, work for arbitrary hypergraphs and fully exploit the hypergraph structure Hein et al. (2013); Zhang et al. (2017); Chan and Liang (2018); Li and Milenkovic (2018b); Chien et al. (2019).Graphbased SSL: Researchers have shown that using unlabelled data in training can improve learning accuracy significantly. This topic is so popular that it has influential books Chapelle et al. (2010); Zhu et al. (2009); Subramanya and Talukdar (2014).
Graph neural networks for combinatorial optimisation: Graphbased deep models have recently been shown to be effective as learningbased approaches for NPhard problems such as maximal independent set, minimum vertex cover, etc. Li et al. (2018c), the decision version of the traveling salesman problem Prates et al. (2019), graph colouring Lemos et al. (2019), and clique optimisation Gong et al. (2019).
3 Background: Graph convolutional network
Let , with , be a simple undirected graph with adjacency , and data matrix . which has
dimensional realvalued vector representations for each node
.The basic formulation of graph convolution Kipf and Welling (2017) stems from the convolution theorem Mallat (1999) and it can be shown that the convolution of a realvalued graph signal and a filter signal is approximately where and are learned weights, and is the scaled graph Laplacian,
is the largest eigenvalue of the symmetricallynormalised graph Laplacian
where is the diagonal degree matrix with elements . The filter depends on the structure of the graph (the graph Laplacian ). The detailed derivation from the convolution theorem uses existing tools from graph signal processing Shuman et al. (2013); Hammond et al. (2011); Bronstein et al. (2017) and is provided in the supplementary material. The key point here is that the convolution of two graph signals is a linear function of the graph Laplacian .Symbol  Description  Symbol  Description 

an undirected simple graph  an undirected hypergraph  
set of nodes  set of hypernodes  
set of edges  set of hyperedges  
number of nodes  number of hypernodes  
graph Laplacian  hypergraph Laplacian  
graph adjacency matrix  hypergraph incidence matrix 
The graph convolution for different graph signals contained in the data matrix with learned weights with hidden units is . The proof involves a renormalisation trick Kipf and Welling (2017) and is in the supplementary.
Gcn Kipf and Welling (2017)
The forward model for a simple twolayer GCN takes the following simple form:
(1) 
where is an inputtohidden weight matrix for a hidden layer with hidden units and
is a hiddentooutput weight matrix. The softmax activation function is defined as
and applied rowwise.GCN training for SSL: For multiclass, classification with classes, we minimise crossentropy,
(2) 
over the set of labelled examples . Weights and are trained using gradient descent.
A summary of the notations used throughout our work is shown in Table 2.
4 HyperGCN: Hypergraph Convolutional Network
We consider semisupervised hypernode classification on an undirected hypergraph with , and a small set of labelled hypernodes. Each hypernode is also associated with a feature vector of dimension given by . The task is to predict the labels of all the unlabelled hypernodes, that is, all the hypernodes in the set .
Overview: The crucial working principle here is that the hypernodes in the same hyperedge are similar and hence are likely to share the same label Zhang et al. (2017). Suppose we use to denote some representation of the hypernodes in , then, for any , the function will be “small” only if vectors corresponding to the hypernodes in are “close” to each other. Therefore, as a regulariser is likely to achieve the objective of the hypernodes in the same hyperedge having similar representations. However, instead of using it as an explicit regulariser, we can achieve the same goal by using GCN over an appropriately defined Laplacian of the hypergraph. In other words, we use the notion of hypergraph Laplacian as an implicit regulariser which achieves this objective.
A hypergraph Laplacian with the same underlying motivation as stated above was proposed in prior works Chan et al. (2018); Louis (2015). We present this Laplacian first. Then we run GCN over the simple graph associated with this hypergraph Laplacian. We call the resulting method HyperGCN (as each hyperedge is approximated by exactly one pairwise edge). One epoch of HyperGCN is shown in figure 1
4.1 Hypergraph Laplacian
As explained before, the key element for a GCN is the graph Laplacian of the given graph . Thus, in order to develop a GCNbased SSL method for hypergraphs, we first need to define a Laplacian for hypergraphs. One such way Chan et al. (2018) (see also Louis (2015)) is a nonlinear function (the Laplacian matrix for graphs can be viewed as a linear function ).
Definition 1 (Hypergraph Laplacian Chan et al. (2018); Louis (2015)^{1}^{1}1The problem of breaking ties in choosing (resp. ) is a nontrivial problem as shown in Chan et al. (2018). Breaking ties randomly was proposed in Louis (2015), but Chan et al. (2018) showed that this might not work for all applications (see Chan et al. (2018) for more details). Chan et al. (2018) gave a way to break ties, and gave a proof of correctness for their tiebreaking rule for the problems they studied. We chose to break ties randomly because of its simplicity and its efficiency. )
Given a realvalued signal defined on the hypernodes, is computed as follows.

For each hyperedge , let , breaking ties randomly^{†}^{†}footnotemark: .

A weighted graph on the vertex set is constructed by adding edges with weights to , where is the weight of the hyperedge . Next, to each vertex , selfloops are added such that the degree of the vertex in is equal to . Let denote the weighted adjacency matrix of the graph .

The symmetrically normalised hypergraph Laplacian is
4.2 HyperGCN
By following the Laplacian construction steps outlined in Section 4.1, we end up with the simple graph with normalized adjacency matrix . We now perform GCN over this simple graph . The graph convolution operation in Equation (1), when applied to a hypernode in , in the neural messagepassing framework Gilmer et al. (2017) is . Here, is epoch number, is the new hidden layer representation of node , is a nonlinear activation function, is a matrix of learned weights, is the set of neighbours of , is the weight on the edge after normalisation, and is the previous hidden layer representation of the neighbour
. We note that along with the embeddings of the hypernodes, the adjacency matrix is also reestimated in each epoch.
Figure 1 shows a hypernode with five hyperedges incident on it. We consider exactly one representative simple edge for each hyperedge given by where for epoch . Because of this consideration, the hypernode may not be a part of all representative simple edges (only three shown in figure). We then use traditional Graph Convolution Operation on considering only the simple edges incident on it. Note that we apply the operation on each hypernode in each epoch of training until convergence.
Connection to total variation on hypergraphs: Our 1HyperGCN model can be seen as performing implicit regularisation based on the total variation on hypergraphs Hein et al. (2013). In that prior work, explicit regularisation and only the hypergraph structure is used for hypernode classification in the SSL setting. HyperGCN, on the other hand, can use both the hypergraph structure and also exploit any available features on the hypernodes, e.g., text attributes for documents.
4.3 HyperGCN: Enhancing HyperGCN with mediators
One peculiar aspect of the hypergraph Laplacian discussed is that each hyperedge is represented by a single pairwise simple edge (with this simple edge potentially changing from epoch to epoch). This hypergraph Laplacian ignores the hypernodes in in the given epoch. Recently, it has been shown that a generalised hypergraph Laplacian in which the hypernodes in act as “mediators" Chan and Liang (2018) satisfies all the properties satisfied by the above Laplacian given by Chan et al. (2018). The two Laplacians are pictorially compared in Figure 2. Note that if the hyperedge is of size , we connect and with an edge. We also run a GCN on the simple graph associated with the hypergraph Laplacian with mediators Chan and Liang (2018) (right in Figure 2). It has been suggested that the weights on the edges for each hyperedge in the hypergraph Laplacian (with mediators) sum to Chan and Liang (2018). We chose each weight to be as there are edges for a hyperedge .
4.4 FastHyperGCN
We use just the initial features (without the weights) to construct the hypergraph Laplacian matrix (with mediators) and we call this method FastHyperGCN. Because the matrix is computed only once before training (and not in each epoch), the training time of FastHyperGCN is much less than that of other methods. We have provided the algorithms for the three methods in the supplementary.
5 Experiments for semisupervised learning
We conducted experiments not only on realworld datasets but also on categorical data (results in supplementary) which are a standard practice in hypergraphbased learning Zhou et al. (2007); Hein et al. (2013); Zhang et al. (2017); Li and Milenkovic (2018b, a); Li et al. (2018a).
5.1 Baselines
We compared HyperGCN, HyperGCN and FastHyperGCN against the following baselines:

Multilayer perceptron (MLP)
treats each instance (hypernode) as an independent and identically distributed (i.i.d) instance. In other words, in equation 1. We note that this baseline does not use the hypergraph structure to make predictions. 
Multilayer perceptron + explicit hypergraph Laplacian regularisation (MLP + HLR): regularises the MLP by training it with the loss given by and uses the hypergraph Laplacian with mediators for explicit Laplacian regularisation . We used of the test set used for all the above models for this baseline to get an optimal .

Confidence Intervalbased method (CI) Zhang et al. (2017) uses a subgradientbased method Zhang et al. (2017). We note that this method has consistently been shown to be superior to the primal dual hybrid gradient (PDHG) of Hein et al. (2013) and also Zhou et al. (2007). Hence, we did not use these other previous methods as baselines, and directly compared HyperGCN against CI.
The task for each dataset is to predict the topic to which a document belongs (multiclass classification). Statistics are summarised in Table 3. For more details about datasets, please refer to the supplementary. We trained all methods for
epochs and used the same hyperparameters of a prior work
Kipf and Welling (2017). We report the mean test error and standard deviation over
different traintest splits. We sampled sets of same sizes of labelled hypernodes from each class to have a balanced train split.DBLP  Pubmed  Cora  Cora  Citeseer  

(coauthorship)  (cocitation)  (coauthorship)  (cocitation)  (cocitation)  
# hypernodes,  
# hyperedges,  
avg.hyperedge size  
# features,  
# classes,  
label rate, 
Realworld hypergraph datasets used in our work. Distribution of hyperedge sizes is not symmetric either side of the mean and has a strong positive skewness.
Data  Method  DBLP  Pubmed  Cora  Cora  Citeseer 

coauthorship  cocitation  coauthorship  cocitation  cocitation  
CI  
MLP  
MLP + HLR  
HGNN  
1HyperGCN  
FastHyperGCN  
HyperGCN 
6 Analysis of results
The results on realworld datasets are shown in Table 4. We now attempt to explain them.
Proposition 1:
Given a hypergraph with and signals on the vertices , let, for each hyperedge , and . Define
so that and are the normalised clique exapnsion, i.e., graph of HGNN and mediator expansion, i.e., graph of HyperGCN/FastHyperGCN respectively. A sufficient condition for is .
Method  sDBLP  

HGNN  
FastHyperGCN  
HyperGCN 
Proof:
Observe that we consider hypergraphs in which the size of each hyperedge is at least . It follows from definitions that and . Clealy, a sufficient condition is when each hyperedge is approximated by the same subgraph in both the expansions. In other words the condition is for each . Solving the resulting quadratic eqution gives us . Hence, or for each .
Comparable performance on Cora and Citeseer cocitation We note that HGNN is the most competitive baseline. Also for FastHyperGCN and for HyperGCN. The proposition states that the graphs of HGNN, FastHyperGCN, and HyperGCN are the same irrespective of the signal values whenever the maximum size of a hyperedge is .
This explains why the three methods have comparable accuracies for Cora cocitaion and Citeseer cocitiation hypergraphs. The mean hyperedge sizes are close to (with comparitively lower deviations) as shown in Table 3. Hence the graphs of the three methods are more or less the same.
Superior performance on Pubmed, DBLP, and Cora coauthorship
We see that HyperGCN performs statistically significantly (pvalue of Welch ttest is less than 0.0001) compared to HGNN on the other three datasets. We believe this is due to large noisy hyperedges in realworld hypergraphs. An author can write papers from different topics in a coauthorship network or a paper typically cites papers of different topics in cocitation networks.
Average sizes in Table 3 show the presence of large hyperedges (note the large standard deviations). Clique expansion has edges on all pairs and hence potentially a larger number of hypernode pairs of different labels than the mediator graph of Figure 2, thus accumulating more noise.
Preference of HyperGCN and FastHyperGCN over HGNN To further illustrate superiority over HGNN on noisy hyperedges, we conducted experiments on synthetic hypergraphs each consisting of hypernodes, randomly sampled hyperedges, and classes with hypernodes in each class. For each synthetic hypergraph, hyperedges (each of size ) were “pure", i.e., all hypernodes were from the same class while the other hyperedges (each of size ) contained hypernodes from both classes. The ratio, , of hypernodes of one class to the other was varied from (less noisy) to (most noisy) in steps of .
Table 5 shows the results on synthetic data. We initialise the hypernode features to random Gaussian of dimensions. We report mean error and deviation over different synthetically generated hypergraphs. As we can see in the table for hyperedges with (mostly pure), HGNN is the superior model. However, as (noise) increases our methods begin to outperform HGNN.
Subset of DBLP: We also trained all three models on a subset of DBLP (we call it sDBLP) by removing all hyperedges of size and . The resulting hypergraph has around hyperedges with an average size of . We report mean error over different traintest splits in Table 5.
Conclusion: From the above analysis, we conclude that our proposed methods (HyperGCN and FastHyperGCN) should be preferred to HGNN for hypergraphs with large noisy hyperedges. This is also the case on experiments in combinatorial optimisation (Table 6) which we discuss next.
7 HyperGCN for combinatorial optimisation
Inspired by the recent sucesses of deep graph models as learningbased approaches for NPhard problems Li et al. (2018c); Prates et al. (2019); Lemos et al. (2019); Gong et al. (2019), we have used HyperGCN as a learningbased approach for the densest subhypergraph problem Chlamtác et al. (2018). NPhard problems on hypergraphs have recently been highlighted as crucial for realworld network analysis Amburg et al. (2019); Nguyen et al. (2019). Our problem is, given a hypergraph , to find a subset of hypernodes so as to maximise the number of hyperedges contained in , i.e., we wish to maximise the density given by .
A greedy heuristic for the problem is to select the
hypernodes of the maximum degree. We call this “MaxDegree". Another greedy heuristic is to iteratively remove all hyperedges from the current (residual) hypergraph consisting of a hypernode of the minimum degree. We repeat the procedure times and consider the density of the remaining hypernodes. We call this “RemoveMinDegree".Dataset  Synthetic  DBLP  Pubmed  Cora  Cora  Citeseer 

Approach  test set  coauthorship  cocitation  coauthorship  cocitation  cocitation 
MaxDegree  
RemoveMinDegree  
MLP  
MLP + HLR  
HGNN  
HyperGCN  
FastHyperGCN  
HyperGCN  
# hyperedges, 
Experiments: Table 6 shows the results. We trained all the learningbased models with a synthetically generated dataset. More details on the approach and the synthetic data are in the supplementary. As seen in Table 6, our proposed HyperGCN outperforms all the other approaches except for the pubmed dataset which contains a small number of vertices with large degrees and a large number of vertices with small degrees. The RemoveMinDegree baseline is able to recover all the hyperedges here.
Qualitative analysis: Figure 3 shows the visualisations given by RemoveMinDegree and HyperGCN on the Cora coauthorship hypergraph. We used Gephi’s Force Atlas to space out the vertices. In general, a cluster of nearby vertices has multiple hyperedges connecting them. Clusters of only green vertices indicate the method has likely included all vertices within the hyperedges induced by the cluster. The figure of HyperGCN has more dense green clusters than that of RemoveMinDegree.
8 Comparison of training time
We compared the average training time of an epoch of FastHyperGCN and HGNN in Table 1. Both were run on a GeForce GTX 1080 Ti GPU machine. We observe that FastHyperGCN is faster than HGNN because it uses a linear number of edges for each hyperedge while HGNN uses quadratic. FastHyperGCN is also superior in terms of performance on hypergraphs with large noisy hyperedges.
9 Conclusion
We have proposed HyperGCN, a new method of training GCN on hypergraph using tools from spectral theory of hypergraphs. We have shown HyperGCN’s effectiveness in SSL and combinatorial optimisation. Approaches that assign importance to nodes Veličković et al. (2018); Monti et al. (2018); Vashishth et al. (2019b) have improved results on SSL. HyperGCN may be augmented with such approaches for even more improved performance. Supplementary: Hypergraph convolutional network
10 Algorithms of our proposed methods
The forward propagation of a layer graph convolutional network (GCN) Kipf and Welling (2017) is
and is the diagonal degree matrix with elements . We provide algorithms for our three proposed methods:
10.1 Time complexity
Given an attributed hypergraph , let be the number of initial features, be the number of hidden units, and be the number of labels. Further, let be the total number of epochs of training. Define

HyperGCN takes time

1HyperGCN takes time

FastHyperGCN takes time

HGNN takes time
11 HyperGCN for combinatorial optimisation
Inspired by the recent sucesses of deep graph models as learningbased approaches for NPhard problems Li et al. (2018c); Prates et al. (2019); Lemos et al. (2019); Gong et al. (2019), we have used HyperGCN as a learningbased approach for the densest subhypergraph problem Chlamtác et al. (2018), an NPhard hypergraph problem. The problem is given a hypergraph , find a subset of hypernodes so as to maximise the number of hyperedges contained in (induced by) i.e. we intend to maximise the density given by
One natural greedy heuristic approach for the problem is to select the hypernodes of the maximum degree. We call this approach “MaxDegree". Another greedy heuristic approach is to iteratively remove all the hyperedges from the current (residual) hypergraph containing a hypernode of the minimum degree. We repeat the procedure times and consider the density of the remaining hypernodes. We call this approach “RemoveMinDegree".
11.1 Our approach
A natural approach to the problem is to train HyperGCN to perform the labelling. In other words, HyperGCN would take an input hypergraph as input and output a binary labelling of the hypernodes
. A natural output representation is a probability map in
that indicates how likely each hypernode is to belong to .Let be a training set, where is an input hypergraph and is one of the optimal solutions for the NPhard hypergraph problem. The HyperGCN model learns its parameters and is trained to predict given . During training we minimise the binary crossentropy loss for each training sample Additionally we generate different probability maps to minimise the hindsight loss i.e. where is the crossentropy loss corresponding to the th probability map. Generating multiple probability maps has the advantage of generating diverse solutions Li et al. (2018c).
11.2 Experiments: Training data
To generate a sample in the training set , we fix a vertex set of vertices chosen uniformly randomly. We generate each hyperedge such that with high probability . Note that with probability . We give the algorithm to generate a sample .
11.3 Experiments: Results
We generated training samples with the number of hypernodes uniformly randomly chosen from . We fix as this is mostly the case for realworld hypergraphs. Further we chose such that is uniformly randomly chosen from as this is also mostly the case for realworld hypergraphs. We compared all our proposed approaches viz. HyperGCN, HyperGCN, and FastHyperGCN against the baselines MLP, MLP+HLR and the stateofthe art HGNN. We also compared against the greedy heuristics MaxDegree and RemoveMinDegree. We train all the deep models using the same hyperparameters of Li et al. (2018c) and report the results for and in Table 7. We test all the models on a synthetically generated test set of hypergraphs with vertices for each. We also test the models on the five realworld hypergraphs used for SSL experiments. As we can see in the table our proposed HyperGCN outperforms all the other approaches except for the pubmed dataset which contains a small number of vertices with large degrees and a large number of vertices with small degrees. The RemoveMinDegree baseline is able to recover all the hyperedges in the pubmed dataset. Moreover FastHyperGCN is competitive with HyperGCN as the number of hypergraphs in the training data is large.
11.4 Qualitative analysis
Figure 4 shows the visualisations given by RemoveMinDegree and HyperGCN on the Cora coauthorship hypergraph. We used Gephi’s Force Atlas to space out the vertices. In general, a cluster of nearby vertices has multiple hyperedges connecting them. Clusters of only green vertices indicate the method has likely included all vertices within the hyperedges induced by the cluster. The figure of HyperGCN has more dense green clusters than that of RemoveMinDegree. Figure 5 shows the results of HGNN vs. HyperGCN.
Dataset  Synthetic  DBLP  Pubmed  Cora  Cora  Citeseer 

Approach  test set  coauthorship  cocitation  coauthorship  cocitation  cocitation 
MaxDegree  
RemoveMinDegree  
MLP  
MLP + HLR  
HGNN  
HyperGCN  
FastHyperGCN  
HyperGCN  
# hyperedges, 
12 Sources of the realworld datasets
Coauthorship data: All documents coauthored by an author are in one hyperedge. We used the author data^{2}^{2}2https://people.cs.umass.edu/ mccallum/data.htmlto get the coauthorship hypergraph for cora. We manually constructed the DBLP dataset from Arnetminer^{3}^{3}3https://aminer.org/labdatasets/citation/DBLPcitationJan8.tar.bz.
Cocitation data: All documents cited by a document are connected by a hyperedge. We used cora, citeseer, pubmed from ^{4}^{4}4https://linqs.soe.ucsc.edu/data for cocitation relationships. We removed hyperedges which had exactly one hypernode as our focus in this work is on hyperedges with two or more hypernodes. Each hypernode (document) is represented by bagofwords features (feature matrix ).
12.1 Construction of the DBLP dataset
We downloaded the entire dblp data from https://aminer.org/labdatasets/citation/DBLPcitationJan8.tar.bz. The steps for constructing the dblp dataset used in the paper are as follows:

We defined a set of conference categories (classes for the SSL task) as “algorithms", “database", “programming", “datamining", “intelligence", and “vision"

For a total of venues in the entire dblp dataset we took papers from only a subset of venues from https://en.wikipedia.org/wiki/List_of_computer_science_conferences corresponding to the above conferences

From the venues of the above conference categories, we got authors publishing at least two documents for a total of

We took the abstracts of all these documents, constructed a dictionary of the most frequent words (words with frequency more than ) and this gave us a dictionary size of
13 Experiments on datasets with categorical attributes
property/dataset  mushroom  covertype45  covertype67 

number of hypernodes,  
number of hyperedges,  
number of edges in clique expansion  
number of classes, 
We closely followed the experimental setup of the baseline model Zhang et al. (2017)
. We experimented on three different datasets viz., mushroom, covertype45, and covertype67 from the UCI machine learning repository
Dheeru and Karra Taniskidou (2017). Properties of the datasets are summarised in Table 8. The task for each of the three datasets is to predict one of two labels (binary classification) for each unlabelled instance (hypernode). The datasets contain instances with categorical attributes. To construct the hypergraph, we treat each attribute value as a hyperedge, i.e., all instances (hypernodes) with the same attribute value are contained in a hyperedge. Because of this particular definition of a hyperedge clique expansion is destined to produce an almost fully connected graph and hence GCN on clique expansion will be unfair to compare against. Having shown that HyperGCN is superior to HyperGCN in the relational experiments, we compare only the former and the nonneural baseline Zhang et al. (2017). We have calledHyperGCN as HyperGCN_with_mediators. We used the incidence matrix (that encodes the hypergraph structure) as the data matrix . We trained HyperGCN_with_mediators for the full epochs and we used the same hyperparameters as in Kipf and Welling (2017).As in Zhang et al. (2017), we performed trials for each and report the mean accuracy (averaged over the trials). The results are shown in Figure 6
. We find that HyperGCN_with_mediators model generally does better than the baselines. We believe that this is because of the powerful feature extraction capability of HyperGCN_with_mediators.
13.1 GCN on clique expansion
We reiterate that clique expansion, i.e., HGNN Feng et al. (2019) for all the three datasets produce almost fuly connected graphs and hence clique expansion does not have any useful information. So, GCN on clique expansion is unfair to compare against (HGNN does not learn any useful weights for classification because of the fully connected nature of the graph).
13.2 Relevance of SSL
The main reason for performing these experiments, as pointed out in the publicly accessible NIPS reviews^{5}^{5}5https://papers.nips.cc/paper/4914thetotalvariationonhypergraphslearningonhypergraphsrevisited of the total variation on hypergraphs Hein et al. (2013), is to show that the proposed method (the primaldual hybrid gradient method in their case and the HyperGCN_with_mediators method in our case) has improved results on SSL, even if SSL is not very relevant in the first place.
We do not claim that SSL with HyperGCN_with_mediators is the best way to go about handling these categorical data but we do claim that, given this built hypergraph albeit from nonrelational data, it has superior results compared to the previous best nonneural hypergraphbased SSL method Zhang et al. (2017) in the literature and that is why we have followed their experimental setup.
14 Derivations
We show how the graph convolutional network (GCN) Kipf and Welling (2017) has its roots from the convolution theorem Mallat (1999).
Available data  Method  

CI  
MLP  
MLP + HLR  
HGNN  
1HyperGCN  
FastHyperGCN  
HyperGCN 
14.1 Graph signal processing
We now briefly review essential concepts of graph signal processing that are important in the construction of ChebNet and graph convolutional networks. We need convolutions on graphs defined in the spectral domain. Similar to regular D or D signals, realvalued graph signals can be efficiently analysed via harmonic analysis and processed in the spectral domain Shuman et al. (2013). To define spectral convolution, we note that the convolution theorem Mallat (1999) generalises from classical discrete signal processing to take into account arbitrary graphs Sandryhaila and Moura (2013).
Informally, the convolution theorem says the convolution of two signals in one domain (say time domain) equals pointwise multiplication of the signals in the other domain (frequency domain). More formally, given a graph signal, , , and a filter signal, , , both of which are defined in the vertex domain (time domain), the convolution of the two signals, , satisfies
(3) 
where , , are the graph signals in the spectral domain (frequency domain) corresponding, respectively, to , and .
An essential operator for computing graph signals in the spectral domain is the symmetrically normalised graph Laplacian operator of , defined as
(4) 
where is the diagonal degree matrix with elements . As the above graph Laplacian operator, , is a real symmetric and positive semidefinite matrix, it admits spectral eigen decomposition of the form , where,
forms an orthonormal basis of eigenvectors and
is the diagonal matrix of the corresponding eigenvalues with .The eigenvectors form a Fourier basis and the eigenvalues carry a notion of frequencies as in classical Fourier analysis. The graph Fourier transform of a graph signal
, is thus defined as and the inverse graph Fourier transform turns out to be , which is the same as,(5) 
The convolution theorem generalised to graph signals 3 can thus be rewritten as . It follows that , which is the same as
(6) 
Available data  Method  

CI  
MLP  
MLP + HLR  
HGNN  
1HyperGCN  
FastHyperGCN  
HyperGCN 
14.2 ChebNet convolution
We could use a nonparametric filter but there are two limitations: (i) they are not localised in space (ii) their learning complexity is . The two limitations above contrast with with traditional CNNs where the filters are localised in space and the learning complexity is independent of the input size. It is proposed by Defferrard et al. (2016) to use a polynomial filter to overcome the limitations. A polynomial filter is defined as:
(7) 
Using 7 in 6, we get . From the definition of an eigenvalue, we have and hence for a positive integer and . Therefore,
(8) 
Hence,
(9) 
The graph convolution provided by Eq. 9 uses the monomial basis to learn filter weights. Monomial bases are not optimal for training and not stable under perturbations because they do not form an orthogonal basis. It is proposed by Defferrard et al. (2016) to use the orthogonal Chebyshev polynomials Hammond et al. (2011) (and hence the name ChebNet) to recursively compute the powers of the graph Laplacian.
A Chebyshev polynomial of order can be computed recursively by the stable recurrence relation
Comments
There are no comments yet.