ResMHGNN
Source code for the paper Residual Enhanced Multi-Hypergraph Neural Network.
view repo
Hypergraphs are a generalized data structure of graphs to model higher-order correlations among entities, which have been successfully adopted into various research domains. Meanwhile, HyperGraph Neural Network (HGNN) is currently the de-facto method for hypergraph representation learning. However, HGNN aims at single hypergraph learning and uses a pre-concatenation approach when confronting multi-modal datasets, which leads to sub-optimal exploitation of the inter-correlations of multi-modal hypergraphs. HGNN also suffers the over-smoothing issue, that is, its performance drops significantly when layers are stacked up. To resolve these issues, we propose the Residual enhanced Multi-Hypergraph Neural Network, which can not only fuse multi-modal information from each hypergraph effectively, but also circumvent the over-smoothing issue associated with HGNN. We conduct experiments on two 3D benchmarks, the NTU and the ModelNet40 datasets, and compare against multiple state-of-the-art methods. Experimental results demonstrate that both the residual hypergraph convolutions and the multi-fusion architecture can improve the performance of the base model and the combined model achieves a new state-of-the-art. Code is available at <https://github.com/OneForward/ResMHGNN>.
READ FULL TEXT VIEW PDF
In this paper, we present a hypergraph neural networks (HGNN) framework ...
read it
Hypergraph, an expressive structure with flexibility to model the
higher...
read it
Graph representation learning has made major strides over the past decad...
read it
This paper presents a novel version of the hypergraph neural network met...
read it
Previous hypergraph expansions are solely carried out on either vertex l...
read it
Depth completion aims to recover a dense depth map from the sparse depth...
read it
With the rising number of machine learning competitions, the world has w...
read it
Source code for the paper Residual Enhanced Multi-Hypergraph Neural Network.
The hypergraph structure consists of a set of vertices and hyperedges, in which a hyperedge can contain any number of vertices. In recent years, hypergraphs have attracted increasing attention in various computer vision tasks including 3D object classification
[7], 3D pose estimation
[10], person re-identification [1], cross-modal retrieval [6], hypergraph-based image processing [17] and video segmentation [12].In the meantime, multi-modal datasets are becoming more and more common since the increasing availability of multiple datasets acquired from different sources while describing the same category. When dataset for each modal can be represented by a hypergraph, this leads to the multi-hypergraph learning problem. Traditional multi-hypergraph learning models, such as tMHL [7] and iMHL [7, 18], utilize the hypergraph Laplacian to design the optimization schemes, which are shallow and require high computational cost.
HyperGraph Neural Network (HGNN) [5] is currently the de-facto method for hypergraph representation learning, after which the expressive embeddings are employed into diverse downstream tasks. Despite the success of HGNN, it fails to handle multi-modal dataset directly. HGNN aims at single hypergraph learning and uses a pre-concatenation approach when confronting multi-modal datasets, which leads to information loss since the inter-correlations between multi-modal hypergraphs are ignored.
HGNN also suffers the over-smoothing issue, a phenomenon that the model performance drops significantly as the number of layers increase [3]. This degradation of learning limits HGNN to be a 2-layer model, of which the maximal exploitation of hypergraph structures can not be obtained.
In this paper, we propose the Residual enhanced Multi-HyperGraph Neural Network (ResMultiHGNN) to address the above issues. We summarize our contributions as follows: (a) We present the first Multi-Hypergraph Neural Network for direct and parallel learning in multi-modal datasets. (b) We enhance the traditional hypergraph convolutions with residual connection and input mapping, which eventually help to circumvent the over-smoothing issue in deep hypergraph models. (c) With the combined techniques, we present the first deep multi-hypergraph neural network for deep multi-modal learning. In the view-based 3D object classification task, our model achieves a new state-of-the-art.
Let the triplet denote a hypergraph, where is a set of vertices, each hyperedge is a non-empty subset of and is the features of vertices. Each row of denotes a -dimensional feature of a vertex. The degree of a hyperedge is while the degree of a vertex is defined as the number of hyperedges containing . Let and denote the diagonal matrices of the vertex degrees and hyperedge degrees respectively. The hypergraph can also be characterized by , where is the incidence matrix with each nonzero entry denoting .
Dataset | |||||
---|---|---|---|---|---|
ModelNet40 | 12311 | 4096 | 2048 | 40 | 80% |
NTU | 2012 | 4096 | 2048 | 67 | 81% |
HGNN utilizes the hypergraph Laplacian to design hypergraph convolution, where is defined as
(1) |
Let and denote the learnable parameters and the input features in the -th layer respectively, then the message passing process of HGNN is formulated as
(2) |
where
is the activation function.
Dataset | Method/Layers | Full | Balanced | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 4 | 8 | 16 | 32 | 64 | 2 | 4 | 8 | 16 | 32 | 64 | ||
ModelNet40 | HGNN | 96.88 | 96.68 | 96.43 | 79.58 | 4.34 | 4.13 | 97.79 | 97.65 | 97.55 | 88.27 | 5.65 | 4.15 |
MultiHGNN | 97.45 | 97.16 | 96.56 | 86.31 | 4.05 | 4.05 | 98.49 | 98.29 | 97.49 | 93.51 | 4.15 | 2.77 | |
ResHGNN | 97.49 | 97.49 | 97.53 | 97.49 | 97.49 | 97.41 | 98.14 | 98.14 | 98.19 | 98.19 | 98.16 | 98.29 | |
ResMultiHGNN | 98.02 | 97.81 | 97.93 | 97.97 | 97.85 | 97.69 | 98.67 | 98.66 | 98.74 | 98.68 | 98.64 | 98.71 | |
NTU2012 | HGNN | 83.65 | 82.57 | 81.50 | 55.50 | 5.09 | 5.09 | 90.31 | 89.63 | 87.67 | 41.87 | 2.30 | 6.10 |
MultiHGNN | 85.26 | 84.18 | 82.31 | 70.78 | 6.70 | 4.56 | 90.58 | 90.52 | 88.55 | 77.98 | 20.66 | 3.18 | |
ResHGNN | 84.99 | 85.52 | 85.52 | 85.26 | 85.26 | 85.26 | 91.80 | 91.33 | 91.26 | 91.40 | 91.67 | 91.46 | |
ResMultiHGNN | 85.79 | 86.86 | 85.79 | 86.06 | 85.26 | 85.26 | 91.94 | 91.80 | 92.14 | 91.73 | 91.80 | 92.01 |
Given a multi-modal dataset containing hypergraphs, HGNN pre-concatenates these hypergraphs into a larger hypergraph and then conducts the single hypergraph learning.
To better exploit the inter-relations between different modals, we propose the Multi-Hypergraph Neural Network (MultiHGNN), which include distinct branches to extract high-level information from each modal in parallel. As illustrated by Fig. 1, embeddings from multi-modal hypergraphs are combined together and then output for the downstream tasks. Notice that the gradients in each branch are back-propagated respectively, therefore the inner structure for each modal can be fully explored. In practice we use the simple Mean function to combine the multiple embeddings.
Formally, the output of MultiHGNN is formulated as
(3) |
where is the hypergraph Laplacian of .
In computer vision, it is well known that residual connections [8] are the key components for making powerful and extremely deep networks. Drawing inspiration from GCNII [3], we enhance the vanilla hypergraph convolution with initial residual and identity mapping. Given the single hypergraph Laplacian , we define the propagation process of residual enhanced hypergraph convolution in the -th layer as
(4) |
where
are hyperparameters and
is the identity matrix.
We present the ResHGNN by naively stacking multiple blocks of residual hypergraph convolution, which empirically circumvent the over-smoothing issue associated with HGNN.
We combine both techniques to build very deep multi-hypergraph neural network, denoted as ResMultiHGNN. In each branch, additional Linear transforms are added in the first and last layer, then the residual hypergraph convolutions are employed to propagate information over the distinct hypergraph structure. The deep embeddings from each branch are finally combined for downstream tasks.
We evaluate the performance of the proposed method on view-based 3D object classification task. We use the Princeton ModelNet dataset [16] and National Taiwan University (NTU) 3D model dataset as testing benchmarks. The ModelNet40 dataset is composed of 12,311 3D CAD models from 40 popular object categories, which is the publicly used subset of Princeton ModelNet dataset [16]. We use the same training/testing splits as [4], where the training split contains 9,843 objects and the testing split contains 2,468 objects. The NTU dataset [2] includes 2012 3D objects from 67 categories, such as boat, motorcycle, train, and truck. We closely follow [4] and use the same training/testing splits. Details of the datasets are listed in Table 1.
For fair comparison, we employ the same shape features as [4], which are extracted from two multi-view based 3D shape descriptors, i.e. Multi-View Convolutional Neural Net Network features (MVCNN) [15]
and Group-View Convolutional Neural Network (GVCNN)
[5] features. The hypergraph of each modal is constructed based on respective features. Each hyperedge contains the 10 nearest neighbors of the central vertex/object. The multi-hypergraphs can be directly fed into the MultiHGNNs, whereas HGNNs only accept the pre-concatenated single hypergraph as input.Note that the original training splits in both datasets contain unbalanced labels. We further conduct experiments by using only the balanced subset of original training labels and predict the results on the rest
samples. We implement all models with Pytorch
[13]. We release our code publicly in GitHub for reproducible experiments.Table 2 reports the performance of proposed methods against HGNN with different depths, which can also be viewed as the ablation study of multi-fusion structure and residual hypergraph convolution. We visualize the results in Fig 2 for better comparison. Based on the table and the figure, we summarize the observations as follows: (1) Regardless of adding the residual hypergraph convolution or not, MultiHGNNs are consistently better than HGNNs, which verifies the effectiveness of the proposed multi-fusion architecture. (2) Residual connection can consistently enhance the performance of HGNN and MultiHGNN. It is worthwhile to note that with the number of layers increase, residual enhanced models maintain stable performance whereas performances of vanilla models deteriorate significantly. (3) ResMultiHGNN outperforms all other methods on both datasets.
We also point out that only using the balanced subset of training labels and testing on the rest all samples produce much better results than using full training labels. This indicates the importance of balanced distribution of labels in training samples for hypergraph learning.
Table 3 summarizes the classification accuracy of proposed ResMultiHGNN model against multiple recent state-of-the-art methods on ModelNet40 dataset. We see that ResMultiHGNN achieves a new state-of-the-art against the object recognition methods and the hypergraph learning methods, which demonstrates the power of multi-modal fusions and residual enhanced deep structures.
We further investigate the stability of our method against HGNN by modifying the ratios of training labels. We conduct all experiments with 8 different seeds and report the best performed models with optimal layers, as visualized in Fig. 3
. We observe that ResMultiHGNN consistently shows better performances than HGNN in all training ratios with gains around 2% and 3%. HGNN exhibits higher variance whereas ResMultiHGNN is slightly more stable, especially when training ratios are small.
In this paper, we propose the first Multi-Hypergraph Neural Network for multi-modal learning, and enhance it with the residual connections to build deep structures. Extensive experiments demonstrate the effectiveness of proposed Multi-fusion architecture and Residual Hypergraph Convolution. ResMultiHGNN enjoys both gains and shows stable and better results against HGNN.
On visual similarity based 3d model retrieval
. Comput. Graph. Forum 22 (3), pp. 223–232. External Links: Link, Document Cited by: §4.1.International Conference on Machine Learning
. External Links: 2007.02133 Cited by: §1, §3.2.The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019
, pp. 3558–3565. External Links: Link, Document Cited by: §4.1, §4.1, Table 3.2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018
, pp. 264–272. External Links: Link, Document Cited by: §1, §4.1.Semi-dynamic hypergraph neural network for 3d pose estimation
. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, C. Bessiere (Ed.), Note: Main track External Links: Document, Link Cited by: §1.PyTorch: an imperative style, high-performance deep learning library
. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), External Links: Link Cited by: §4.1.
Comments
There are no comments yet.