Residual Enhanced Multi-Hypergraph Neural Network

05/02/2021 ∙ by Jing Huang, et al. ∙ 2

Hypergraphs are a generalized data structure of graphs to model higher-order correlations among entities, which have been successfully adopted into various research domains. Meanwhile, HyperGraph Neural Network (HGNN) is currently the de-facto method for hypergraph representation learning. However, HGNN aims at single hypergraph learning and uses a pre-concatenation approach when confronting multi-modal datasets, which leads to sub-optimal exploitation of the inter-correlations of multi-modal hypergraphs. HGNN also suffers the over-smoothing issue, that is, its performance drops significantly when layers are stacked up. To resolve these issues, we propose the Residual enhanced Multi-Hypergraph Neural Network, which can not only fuse multi-modal information from each hypergraph effectively, but also circumvent the over-smoothing issue associated with HGNN. We conduct experiments on two 3D benchmarks, the NTU and the ModelNet40 datasets, and compare against multiple state-of-the-art methods. Experimental results demonstrate that both the residual hypergraph convolutions and the multi-fusion architecture can improve the performance of the base model and the combined model achieves a new state-of-the-art. Code is available at <https://github.com/OneForward/ResMHGNN>.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

ResMHGNN

Source code for the paper Residual Enhanced Multi-Hypergraph Neural Network.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The hypergraph structure consists of a set of vertices and hyperedges, in which a hyperedge can contain any number of vertices. In recent years, hypergraphs have attracted increasing attention in various computer vision tasks including 3D object classification

[7]

, 3D pose estimation

[10], person re-identification [1], cross-modal retrieval [6], hypergraph-based image processing [17] and video segmentation [12].

In the meantime, multi-modal datasets are becoming more and more common since the increasing availability of multiple datasets acquired from different sources while describing the same category. When dataset for each modal can be represented by a hypergraph, this leads to the multi-hypergraph learning problem. Traditional multi-hypergraph learning models, such as tMHL [7] and iMHL [7, 18], utilize the hypergraph Laplacian to design the optimization schemes, which are shallow and require high computational cost.

HyperGraph Neural Network (HGNN) [5] is currently the de-facto method for hypergraph representation learning, after which the expressive embeddings are employed into diverse downstream tasks. Despite the success of HGNN, it fails to handle multi-modal dataset directly. HGNN aims at single hypergraph learning and uses a pre-concatenation approach when confronting multi-modal datasets, which leads to information loss since the inter-correlations between multi-modal hypergraphs are ignored.

HGNN also suffers the over-smoothing issue, a phenomenon that the model performance drops significantly as the number of layers increase [3]. This degradation of learning limits HGNN to be a 2-layer model, of which the maximal exploitation of hypergraph structures can not be obtained.

In this paper, we propose the Residual enhanced Multi-HyperGraph Neural Network (ResMultiHGNN) to address the above issues. We summarize our contributions as follows: (a) We present the first Multi-Hypergraph Neural Network for direct and parallel learning in multi-modal datasets. (b) We enhance the traditional hypergraph convolutions with residual connection and input mapping, which eventually help to circumvent the over-smoothing issue in deep hypergraph models. (c) With the combined techniques, we present the first deep multi-hypergraph neural network for deep multi-modal learning. In the view-based 3D object classification task, our model achieves a new state-of-the-art.

Figure 1: An illustration of Residual enhanced Multi-HyperGraph Neural Network.

2 Background

2.1 Hypergraph

Let the triplet denote a hypergraph, where is a set of vertices, each hyperedge is a non-empty subset of and is the features of vertices. Each row of denotes a -dimensional feature of a vertex. The degree of a hyperedge is while the degree of a vertex is defined as the number of hyperedges containing . Let and denote the diagonal matrices of the vertex degrees and hyperedge degrees respectively. The hypergraph can also be characterized by , where is the incidence matrix with each nonzero entry denoting .

Dataset
ModelNet40 12311 4096 2048 40 80%
NTU 2012 4096 2048 67 81%
Table 1: Details of the ModelNet40 and the NTU dataset. is the number of vertices. and are the dimensions of MVCNN features and GVCNN features respectively. is the number of classes and is the label rate.

2.2 Hypergraph Neural Network

HGNN utilizes the hypergraph Laplacian to design hypergraph convolution, where is defined as

(1)

Let and denote the learnable parameters and the input features in the -th layer respectively, then the message passing process of HGNN is formulated as

(2)

where

is the activation function.

Dataset Method/Layers Full Balanced
2 4 8 16 32 64 2 4 8 16 32 64
ModelNet40 HGNN 96.88 96.68 96.43 79.58 4.34 4.13 97.79 97.65 97.55 88.27 5.65 4.15
MultiHGNN 97.45 97.16 96.56 86.31 4.05 4.05 98.49 98.29 97.49 93.51 4.15 2.77
ResHGNN 97.49 97.49 97.53 97.49 97.49 97.41 98.14 98.14 98.19 98.19 98.16 98.29
ResMultiHGNN 98.02 97.81 97.93 97.97 97.85 97.69 98.67 98.66 98.74 98.68 98.64 98.71
NTU2012 HGNN 83.65 82.57 81.50 55.50 5.09 5.09 90.31 89.63 87.67 41.87 2.30 6.10
MultiHGNN 85.26 84.18 82.31 70.78 6.70 4.56 90.58 90.52 88.55 77.98 20.66 3.18
ResHGNN 84.99 85.52 85.52 85.26 85.26 85.26 91.80 91.33 91.26 91.40 91.67 91.46
ResMultiHGNN 85.79 86.86 85.79 86.06 85.26 85.26 91.94 91.80 92.14 91.73 91.80 92.01
Table 2: Summaries of classification accuracy (%) results with different depths. Full denotes the experiments with full training labels while Balanced denotes the experiments with balanced subset of original training labels. The result of the best performed model for each dataset is bolded.
Figure 2: The performance of different hypergraph neural networks v.s. layers in 3D Object Classification Task (Balanced).
Figure 3: Stability analysis. The performance of HGNN and ResMultiHGNN v.s. different ratios of training labels.

3 Proposed Methods

3.1 Multi-Hypergraph Neural Network

Given a multi-modal dataset containing hypergraphs, HGNN pre-concatenates these hypergraphs into a larger hypergraph and then conducts the single hypergraph learning.

To better exploit the inter-relations between different modals, we propose the Multi-Hypergraph Neural Network (MultiHGNN), which include distinct branches to extract high-level information from each modal in parallel. As illustrated by Fig. 1, embeddings from multi-modal hypergraphs are combined together and then output for the downstream tasks. Notice that the gradients in each branch are back-propagated respectively, therefore the inner structure for each modal can be fully explored. In practice we use the simple Mean function to combine the multiple embeddings.

Formally, the output of MultiHGNN is formulated as

(3)

where is the hypergraph Laplacian of .

3.2 Residual Hypergraph Convolution

In computer vision, it is well known that residual connections [8] are the key components for making powerful and extremely deep networks. Drawing inspiration from GCNII [3], we enhance the vanilla hypergraph convolution with initial residual and identity mapping. Given the single hypergraph Laplacian , we define the propagation process of residual enhanced hypergraph convolution in the -th layer as

(4)

where

are hyperparameters and

is the identity matrix.

We present the ResHGNN by naively stacking multiple blocks of residual hypergraph convolution, which empirically circumvent the over-smoothing issue associated with HGNN.

3.3 Deep Multi-Hypergraph Neural Network

We combine both techniques to build very deep multi-hypergraph neural network, denoted as ResMultiHGNN. In each branch, additional Linear transforms are added in the first and last layer, then the residual hypergraph convolutions are employed to propagate information over the distinct hypergraph structure. The deep embeddings from each branch are finally combined for downstream tasks.

4 Experiments

4.1 Datasets and Experimental Setup

We evaluate the performance of the proposed method on view-based 3D object classification task. We use the Princeton ModelNet dataset [16] and National Taiwan University (NTU) 3D model dataset as testing benchmarks. The ModelNet40 dataset is composed of 12,311 3D CAD models from 40 popular object categories, which is the publicly used subset of Princeton ModelNet dataset [16]. We use the same training/testing splits as [4], where the training split contains 9,843 objects and the testing split contains 2,468 objects. The NTU dataset [2] includes 2012 3D objects from 67 categories, such as boat, motorcycle, train, and truck. We closely follow [4] and use the same training/testing splits. Details of the datasets are listed in Table 1.

For fair comparison, we employ the same shape features as [4], which are extracted from two multi-view based 3D shape descriptors, i.e. Multi-View Convolutional Neural Net Network features (MVCNN) [15]

and Group-View Convolutional Neural Network (GVCNN)

[5] features. The hypergraph of each modal is constructed based on respective features. Each hyperedge contains the 10 nearest neighbors of the central vertex/object. The multi-hypergraphs can be directly fed into the MultiHGNNs, whereas HGNNs only accept the pre-concatenated single hypergraph as input.

Note that the original training splits in both datasets contain unbalanced labels. We further conduct experiments by using only the balanced subset of original training labels and predict the results on the rest

samples. We implement all models with Pytorch

[13]. We release our code publicly in GitHub for reproducible experiments.

4.2 Experimental Results

Table 2 reports the performance of proposed methods against HGNN with different depths, which can also be viewed as the ablation study of multi-fusion structure and residual hypergraph convolution. We visualize the results in Fig 2 for better comparison. Based on the table and the figure, we summarize the observations as follows: (1) Regardless of adding the residual hypergraph convolution or not, MultiHGNNs are consistently better than HGNNs, which verifies the effectiveness of the proposed multi-fusion architecture. (2) Residual connection can consistently enhance the performance of HGNN and MultiHGNN. It is worthwhile to note that with the number of layers increase, residual enhanced models maintain stable performance whereas performances of vanilla models deteriorate significantly. (3) ResMultiHGNN outperforms all other methods on both datasets.

We also point out that only using the balanced subset of training labels and testing on the rest all samples produce much better results than using full training labels. This indicates the importance of balanced distribution of labels in training samples for hypergraph learning.

Table 3 summarizes the classification accuracy of proposed ResMultiHGNN model against multiple recent state-of-the-art methods on ModelNet40 dataset. We see that ResMultiHGNN achieves a new state-of-the-art against the object recognition methods and the hypergraph learning methods, which demonstrates the power of multi-modal fusions and residual enhanced deep structures.

4.3 Stability Analysis

We further investigate the stability of our method against HGNN by modifying the ratios of training labels. We conduct all experiments with 8 different seeds and report the best performed models with optimal layers, as visualized in Fig. 3

. We observe that ResMultiHGNN consistently shows better performances than HGNN in all training ratios with gains around 2% and 3%. HGNN exhibits higher variance whereas ResMultiHGNN is slightly more stable, especially when training ratios are small.

Model Accuracy
PointNet++ [14] 90.7
LP-3DCNN [9] 92.1
RS-CNN [11] 93.6
tMHL [18] 96.2
HGNN [4] 96.9
iMHL [18, 7] 97.2
ResMultiHGNN 98.0
Table 3: Experimental results of multiple recent state-of-the-art methods on ModelNet40 dataset.

5 Conclusion

In this paper, we propose the first Multi-Hypergraph Neural Network for multi-modal learning, and enhance it with the residual connections to build deep structures. Extensive experiments demonstrate the effectiveness of proposed Multi-fusion architecture and Residual Hypergraph Convolution. ResMultiHGNN enjoys both gains and shows stable and better results against HGNN.

References

  • [1] L. An, X. Chen, S. Yang, and X. Li (2017) Person re-identification by multi-hypergraph fusion. IEEE Transactions on Neural Networks and Learning Systems 28 (11), pp. 2763–2774. External Links: Document Cited by: §1.
  • [2] D. Chen, X. Tian, Y. Shen, and M. Ouhyoung (2003)

    On visual similarity based 3d model retrieval

    .
    Comput. Graph. Forum 22 (3), pp. 223–232. External Links: Link, Document Cited by: §4.1.
  • [3] M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li (2020) Simple and Deep Graph Convolutional Networks.

    International Conference on Machine Learning

    .
    External Links: 2007.02133 Cited by: §1, §3.2.
  • [4] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao (2019) Hypergraph neural networks. In

    The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019

    ,
    pp. 3558–3565. External Links: Link, Document Cited by: §4.1, §4.1, Table 3.
  • [5] Y. Feng, Z. Zhang, X. Zhao, R. Ji, and Y. Gao (2018) GVCNN: group-view convolutional neural networks for 3d shape recognition. In

    2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018

    ,
    pp. 264–272. External Links: Link, Document Cited by: §1, §4.1.
  • [6] J. Gao, W. Zhang, Z. Chen, and F. Zhong (2020) HDMFH: hypergraph based discrete matrix factorization hashing for multimodal retrieval. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 1923–1927. External Links: Document Cited by: §1.
  • [7] Y. Gao, Z. Zhang, H. Lin, X. Zhao, S. Du, and C. Zou (2020) Hypergraph learning: methods and practices. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §1, §1, Table 3.
  • [8] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.2.
  • [9] S. Kumawat and S. Raman (2019) LP-3DCNN: unveiling local phase in 3d convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 4903–4912. External Links: Link, Document Cited by: Table 3.
  • [10] S. Liu, P. Lv, Y. Zhang, J. Fu, J. Cheng, W. Li, B. Zhou, and M. Xu (2020)

    Semi-dynamic hypergraph neural network for 3d pose estimation

    .
    In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, C. Bessiere (Ed.), Note: Main track External Links: Document, Link Cited by: §1.
  • [11] Y. Liu, B. Fan, S. Xiang, and C. Pan (2019) Relation-shape convolutional neural network for point cloud analysis. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 8895–8904. External Links: Link, Document Cited by: Table 3.
  • [12] X. Lv, L. Wang, Q. Zhang, N. Zheng, and G. Hua (2018) Video object co-segmentation from noisy videos by a multi-level hypergraph model. In 2018 25th IEEE International Conference on Image Processing (ICIP), Vol. , pp. 2207–2211. External Links: Document Cited by: §1.
  • [13] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019)

    PyTorch: an imperative style, high-performance deep learning library

    .
    In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), External Links: Link Cited by: §4.1.
  • [14] C. R. Qi, L. Yi, H. Su, and L. J. Guibas (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pp. 5099–5108. External Links: Link Cited by: Table 3.
  • [15] H. Su, S. Maji, E. Kalogerakis, and E. G. Learned-Miller (2015) Multi-view convolutional neural networks for 3d shape recognition. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pp. 945–953. External Links: Link, Document Cited by: §4.1.
  • [16] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao (2015) 3D shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 1912–1920. External Links: Link, Document Cited by: §4.1.
  • [17] S. Zhang, S. Cui, and Z. Ding (2020) Hypergraph-based image processing. In IEEE International Conference on Image Processing, ICIP 2020, Abu Dhabi, United Arab Emirates, October 25-28, 2020, pp. 216–220. External Links: Link, Document Cited by: §1.
  • [18] Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao (2018) Inductive multi-hypergraph learning and its application on view-based 3d object classification. IEEE Trans. Image Process. 27 (12), pp. 5957–5968. External Links: Link, Document Cited by: §1, Table 3.