Fully Hyperbolic Graph Convolution Network for Recommendation

08/10/2021 ∙ by Liping Wang, et al. ∙ 0

Recently, Graph Convolution Network (GCN) based methods have achieved outstanding performance for recommendation. These methods embed users and items in Euclidean space, and perform graph convolution on user-item interaction graphs. However, real-world datasets usually exhibit tree-like hierarchical structures, which make Euclidean space less effective in capturing user-item relationship. In contrast, hyperbolic space, as a continuous analogue of a tree-graph, provides a promising alternative. In this paper, we propose a fully hyperbolic GCN model for recommendation, where all operations are performed in hyperbolic space. Utilizing the advantage of hyperbolic space, our method is able to embed users/items with less distortion and capture user-item interaction relationship more accurately. Extensive experiments on public benchmark datasets show that our method outperforms both Euclidean and hyperbolic counterparts and requires far lower embedding dimensionality to achieve comparable performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In information era, recommendation systems have been widely adopted to perform personalized information filtering (Covington et al., 2016; Ying et al., 2018). Even though there are many recommendation paradigms, collaborative filtering (Ebesu et al., 2018; Wang et al., 2019) which generates recommendations by utilizing available historical interactions, remains a fundamental and challenging task. The core idea behind collaborative filtering is to learn compact user/item embeddings and infer a user’s preference to an item according to the distance between their embeddings.

From the perspective of graph, user-item interactions can be viewed as a bi-partite graph (Bollobás, 2013), where nodes represent users/items and edges represent their interactions. As a powerful tool of analyzing graph-structured data, Graph Neural Networks(GNNs) (Kipf and Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) have recently demonstrated great success across various domains, including recommendation systems. Employing multiple layers of neighborhood aggregation, GNN-based methods (Wang et al., 2019; He et al., 2020) have achieved the state-of-the-art performance on diverse public benchmarks.

Although GNN-based methods have achieved outstanding performance, it might not be appropriate to adopt Euclidean space to embed users and items. In real-world scenarios, user-item bi-partite graphs usually exhibit tree-like hierarchical structures (Adcock et al., 2013), in which the number of a node’s neighbors grows exponentially with respect to the number of hops. Ideally, neighbors of node should be embedded in the ball centering at ’s embedding, and the distance between embeddings should reflect the number of hops between nodes. Nevertheless, in Euclidean space, the volume of a ball only grows polynomially as a function of radius. Hence, embedding exponentially-growing number of neighbors into polynomially-growing size of volume would make distance between embeddings less accurate to reflect distance between nodes in the graph, and this is called distortion (Chami et al., 2019). This kind of distortion makes it difficult to infer a user’s preference to a target item according to the distance between their embeddings.

In contrast, hyperbolic space (Chami et al., 2019) in which the volume of a ball grows exponentially with radius offers a promising alternative. Compared with Euclidean space, hyperbolic space is more suitable for modeling user-item interaction graphs which exhibit strong tree-like hierarchical structures. Accordingly, it is a natural choice to conduct user/item embedding and graph convolution in hyperbolic space for recommendation. To the best of our knowledge, the only work adopting a similar idea is HGCF (Sun et al., 2021). However, resorting to tangent space to realize graph convolution makes the performance of HGCF inferior for the two following reasons. On the one hand, tangent space is only local linear approximation of hyperbolic space. During the process of message propagation, errors caused by approximation accumulate and spread to the whole graph. As a result, influence from high-order neighbors cannot be captured accurately. On the other hand, tangent space is actually Euclidean space according to its definition (Boothby, 1986). Hence, the advantage of hyperbolic space in modeling user-item interaction relationship with less distortion cannot be fully utilized.

To overcome the limitation of Euclidean space and obtain more accurate user/item embeddings, we design a novel fully hyperbolic GCN framework specially for collaborative filtering. All operations are conducted in hyperbolic space, more specifically, in the Lorentz model, and we name it Lorentz Graph Collaborative Filtering (LGCF).

The main contributions of this work are summarized as follows:

  • We propose a fully hyperbolic graph convolution network for recommendation.

  • We conduct extensive experiments on multiple public benchmark datasets, and the results demonstrate the superiority of our method.

2. Preliminaries

Problem Formulation. In this paper, the standard collaborative filtering set-up is considered. Let be the set of users, and be the set of items. Historical interactions between users and items are represented as a binary matrix , where if the -th user interacts with the -th item, otherwise . Given historical interactions , the goal is to predict potential interactions.

GNN-based Recommendation Methods. GNN-based recommendation methods have received increasing attention for their ability to learn rich node representations. In collaborative filtering setting, GCN has replaced matrix factorization (Rendle et al., 2009) and shown leading performance. In order to capture the influence from high-order neighbors, Wang et al. (Wang et al., 2019) proposed NGCF, a GCN framework performing on user-item interaction bi-partite graphs. Subsequently, He et al. (He et al., 2020) empirically found that two most common designs in GCNs — feature transformation and nonlinear activation, contribute little to the performance of collaborative filtering. Hence, they proposed a simplified architecture – LightGCN (He et al., 2020).

3. Our Method

Figure 1. Illustration of LGCF. First, users and items are embedded into the Lorentz model of hyperbolic space. Then, multiple graph convolution layers (only one shown in the figure for simplicity) are adopted to aggregate information from neighbors. In each layer, embeddings are first mapped by to the Klein model in which graph convolution is performed. After that, maps refined embeddings back to the Lorentz model. Finally, LGCF infers a user’s preference to an item according to the distance between their embeddings.

As illustrated in Figure 1

, there are three components in LGCF: (1) an embedding layer that provides and initializes user/item embeddings in hyperbolic space; (2) multiple graph convolution layers that propagate user/item embeddings over the graph; and (3) a prediction layer that estimates a user’s preference to an item by computing the distance between their embeddings.

3.1. Embed Users/Items in Hyperbolic Space

Existing GNN-based recommendation methods usually embed users and items in the same Euclidean space. To accurately model user-item interaction relationship, we investigate the utilization of hyperbolic space. There are several models of hyperbolic space, such as the Lorentz model, the Klein model and the Poincaré ball model. In this paper, we choose the Lorentz model due to its simplicity and numerical stability. Formally, the Lorentz model of -dimensional hyperbolic space is defined as:

(1)

where is Lorentz inner product and is defined as . In addition, at an arbitrary point , hyperbolic space can be locally approximated by a linear Euclidean space. And this approximated Euclidean space is termed as tangent space in which norm is well defined. In LGCF, both users and items are embedded in the same Lorentz model of hyperbolic space.

It is well known that random initialization can have a significant impact on optimization in training (Sutskever et al., 2013)

. A common practice in Euclidean space is Gaussian distribution initialization. Similarly, we design an initialization strategy for embeddings based on Wrapped Normal Distribution

(Nagano et al., 2019) which is a generalization of Gaussian distribution to hyperbolic space.

3.2. Graph Convolution Layer

The basic idea of GCN-based recommendation models is learning representations for users and items by aggregating neighbors’ information iteratively over the interaction graph. In order to apply GCN for recommendation where users and items are embedded in hyperbolic space, we design graph convolution layers specially since naïve generalization will drive embeddings out of hyperbolic space. Before that, we give a brief review of existing graph convolution layers in Euclidean space.

In Euclidean space, a graph convolution operation is composed of three steps: feature transformation, neighborhood aggregation and non-linear activation. Among them, feature transformation is performed through linear transformation. In LGCF, we discard the feature transformation operation for two reasons. On the one hand, different from attributed graphs (e.g., citation networks) where nodes bring rich feature information, nodes in user-item interaction graphs contain no semantics but one-hot IDs. In this case, feature transformation may provide no benefits, and could bring difficulties to training. On the other hand, linear transformation (matrix-vector multiplication) is not well-defined in hyperbolic space since it is not a vector space.

Neighborhood Aggregation. Exising neighborhood aggregation of the -th layer can be summarized as:

(2)

in which represents the -th node’s embedding and denotes the set consisting of node and its neighborhood nodes. A natural generalization of mean to hyperbolic space is Einstein midpoint (Ungar, 2008). However, it is defined in the Klein model , while user/item embeddings lie in the Lorentz model. Luckily, there are isometric maps between the Klein model and the Lorentz model defined as follows:

(3)

in which and .

Hence, we propose a neighborhood aggregation strategy utilizing the Klein model as an intermediate bridge. Specifically, neighborhood aggregation can be divided into three steps. First, current embeddings are mapped from the Lorentz model to the Klein model by . Then, neighborhood aggregation is conducted in the Klein model as follows:

(4)

Here, we obtain aggregated user/item embeddings in the Klein model. Last, aggregated embeddings are mapped back to the Lorentz model through .

Nonlinear Activation Layer. In Euclidean space, nonlinear activation has proven to be a key component in modern neural networks. However, direct adoption will drive the computation result out of hyperbolic space. To fix this problem, we design a calibration strategy following general activation. Formally, let

be the output of general activation function

, e.g., ReLU. The first element of

is calibrated while the other elements remain unchanged to pull the activated embedding back to hyperbolic space:

(5)

3.3. Prediction Layer

After propagating with graph convolution layers, we obtain multiple representations for users and items. Representations generated in different layers emphasize messages passed through different connections, and they reflect users’ preference or items’ attributes from different perspectives. Recommendation models operating in hyperbolic space usually estimate a user’s preference to a target item according to the distance or similarity metric between their representations. In the Lorentz model, the generalization of a straight line in Euclidean space is geodesics which gives the shortest distance between two points . Formally, the geodesics distance between and is defined as:

(6)

Hence, in LGCF, we infer the preference of user to item based on the geodesics distance between their corresponding representations. Further, to utilize different semantics captured by different layers, we take representations learned by different layers into consideration simultaneously. In summary, LGCF estimates the preference of user to target item as:

(7)

in which and are representations generated by the

-th layer. Even though there are multiple choices for layer aggregation, such as weighted average, max pooling, LSTM, etc., we find simple summation adopted here works well empirically.

3.4. Margin Ranking Loss

Margin ranking loss (Tay et al., 2018) has been a competitive choice for distance-based recommendation systems. Since it encourages positive and negative user-item pairs to be separated up to a given margin. Once the difference between a negative user-item pair and a positive one is greater than the margin, these two user-item pairs will make no contribution to the loss. In this way, hard pairs violating the margin are focused all the time, making optimization much easier. We extend margin ranking loss to hyperbolic space based on geodesics distance. Given a sampled positive user-item pair and a negative one , geodesics margin loss is defined as:

(8)

where

is a non-negative hyper-parameter. Note that representations obtained by different layers contribute to the loss simultaneously. This not only makes it possible to utilize different semantics captured by different layers, but also decreases the difficulty of optimization due to the residual connection

(He et al., 2016).

3.5. Optimization

The only parameter of LGCF is the embedding matrix of users and items. These embeddings lie in the Lorentz model of hyperbolic space which is out of the range of common optimization algorithms such as SGD (Robbins and Monro, 1951) and Adam (Kingma and Ba, 2014). Hence, we employ RGSD (Bonnabel, 2013), a generation of SGD to hyperbolic space, which mimics SGD’s behavior while taking into account the geometry of hyperbolic space.

4. Experiments

4.1. Set Up

Dataset #User #Item #Interactions
Amazon-CD 22,947 18,395 422,301
Amazon-Book 52,406 41,264 1,861,118
Yelp2020 91,174 45,063 1,940,014
Table 1. Dataset statistics.

Datasets and Baselines. Following HGCF (Sun et al., 2021), we employ Amazon-CD (Ni et al., 2019), Amazon-Book (Ni et al., 2019) and Yelp2020 (24) datasets. Dataset statistics are provided in Table 1. Each dataset is split into 80-20 train and test sets. Multiple competitive baseline methods from three categories are compared: BPRMF (Rendle et al., 2009), NGCF (Wang et al., 2019), LightGCN (He et al., 2020), HVAE (Mirvakhabova et al., 2020), HGCF (Sun et al., 2021). Among them, BPRMF optimizes matrix factorization by Bayesian personalize ranking (BPR) loss (Rendle et al., 2009). NGCF and LightGCN employ GNN in Euclidean space. HVAE combines variational auto-encoder(VAE) with hyperbolic geometry. Last, HGCF applies the latest Hyperbolic GCN (Chami et al., 2019) to recommendation systems.

Implementation. For a fair comparison, the embedding dimensionality is set to 50 for all methods, and the same negative sampling strategy is adopted. For all baseline methods, suggested settings in original papers are followed. Grid search for hyper-parameters are conducted following HGCF (Sun et al., 2021)

. For LGCF, the number of GCN layers is set to 3. We set the learning rate to 0.001 and weight decay to 0.005. And model is trained for 1000 epochs. Hyper-parameter margin

is tuned from .

4.2. Overall Results

Datasets Metrics BPRMF NGCF LightGCN HVAE HGCF LGCF
Amazon-CD R@10 0.0779 0.0758 0.0929 0.0781 0.0962 0.0996
R@20 0.1200 0.1150 0.1404 0.1147 0.1455 0.1503
Amazon-Book R@10 0.0611 0.0658 0.0799 0.0774 0.0867 0.0899
R@20 0.0794 0.1050 0.1248 0.1125 0.1318 0.1360
Yelp2020 R@10 0.0325 0.0458 0.0522 0.0421 0.0543 0.0573
R@20 0.0556 0.0764 0.0866 0.0691 0.0884 0.0946
Table 2. Recall results for all datasets.
Datasets Metrics BPRMF NGCF LightGCN HVAE HGCF LGCF
Amazon-CD N@10 0.0610 0.0591 0.0726 0.0629 0.0751 0.0780
N@20 0.0974 0.0718 0.0881 0.0749 0.0909 0.0945
Amazon-Book N@10 0.0594 0.0655 0.0780 0.0778 0.0869 0.0906
N@20 0.0971 0.0791 0.0938 0.0901 0.1022 0.1063
Yelp2020 N@10 0.0283 0.0405 0.0461 0.0371 0.0458 0.0485
N@20 0.0512 0.0513 0.0582 0.0465 0.0585 0.0612
Table 3. NDCG results for all datasets.

Recall and NDCG results for all datasets are reported in Table 2 and Table 3 respectively. We can see that LGCF consistently outperforms other methods on all the three datasets. Compared with LightGCN, LGCF achieves an improvement up to 16.15% in NDCG@10 on Amazon-Book dataset, demonstrating the superiority of hyperbolic space over Euclidean space in modeling real-world user-item interactions.

Among the baseline methods, HGCF is the most competitive counterpart of LGCF. Even though HGCF adopts hyperbolic space, it resorts to tangent space to conduct aggregation operations, which brings inevitable distortion. In contrast, LGCF performs all graph convolution operations in hyperbolic space. LGCF outperforms HGCF with wide margins on all the datasets, showing the information loss introduced by tangent space in HGCF.

4.3. Ablation Study

R@10 R@20 N@10 N@20
LGCF 0.0573 0.0946 0.0485 0.0612
LGCF-tangent 0.0545 0.0895 0.0463 0.0586
Table 4. Results for LGCF and LGCF-tangent on Yelp2020.

To further analyze the effect of fully hyperbolic graph convolution network, we conduct an ablation study on Yelp2020, the largest one among the three datasets. Since simply replacing fully hyperbolic graph convolution with regular Euclidean graph convolution would drive user/item embeddings out of hyperbolic space, we conduct convolution operations in tangent space, and name this model variant as LGCF-tangent. From experimental results shown in Table 4, we can observe that there is a wide margin between the performance of LGCF and LGCF-tangent. This is because, in LGCF-tangent, errors caused by tangent space approximation accumulate and spread to the whole graph. As a result, influence from neighbors, especially high-order neighbors, cannot be captured accurately.

4.4. Effect of Embedding Dimensionality

Figure 2. Recall@20 and NDCG@20 on Amazon-CD dataset with an embedding dimensionality varying from 20 to 50.

In order to validate the advantage of hyperbolic space to learn compact representations, we compare the results of LGCF and LightGCN with different values of embedding dimensionality. From Figure 2, we can observe that LGCF outperforms LightGCN consistently at all dimensionality values, and the greatest margin occurs at lower dimensionality. LGCF requires far lower embedding dimensionality to achieve comparable performance to its Euclidean analogue. This reflects that LGCF’s advantage is more prominent when the embedding dimensionality cannot be large due to limited computing and storage resources.

5. Conclusion

In this paper, we propose LGCF, a fully hyperbolic GCN model for recommendation. Utilizing the advantage of hyperbolic space, LGCF is able to embed users/items with less distortion and capture user-item interaction relationship more accurately. Extensive experiments on public benchmark datasets show that LGCF outperforms both Euclidean and hyperbolic counterparts and requires far lower embedding dimensionality to achieve comparable performance.

References

  • A. B. Adcock, B. D. Sullivan, and M. W. Mahoney (2013) Tree-like structure in large social and information networks. In IEEE International Conference on Data Mining, Cited by: §1.
  • B. Bollobás (2013) Modern graph theory. Cited by: §1.
  • S. Bonnabel (2013) Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control. Cited by: §3.5.
  • W. M. Boothby (1986) An introduction to differentiable manifolds and riemannian geometry. Cited by: §1.
  • I. Chami, R. Ying, C. Ré, and J. Leskovec (2019)

    Hyperbolic graph convolutional neural networks

    .
    Advances in Neural Information Processing Systems. Cited by: §1, §1, §4.1.
  • P. Covington, J. Adams, and E. Sargin (2016) Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Cited by: §1.
  • T. Ebesu, B. Shen, and Y. Fang (2018) Collaborative memory network for recommendation systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Cited by: §1.
  • W. L. Hamilton, R. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, Cited by: §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    IEEE Conference on Computer Vision and Pattern Recognition

    ,
    Cited by: §3.4.
  • X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang (2020) Lightgcn: simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Cited by: §1, §2, §4.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.5.
  • T. N. Kipf and M. Welling (2017) Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, Cited by: §1.
  • L. Mirvakhabova, E. Frolov, V. Khrulkov, I. Oseledets, and A. Tuzhilin (2020) Performance of hyperbolic geometry models on top-n recommendation tasks. In Fourteenth ACM Conference on Recommender Systems, Cited by: §4.1.
  • Y. Nagano, S. Yamaguchi, Y. Fujita, and M. Koyama (2019) A wrapped normal distribution on hyperbolic space for gradient-based learning. In

    International Conference on Machine Learning

    ,
    Cited by: §3.1.
  • J. Ni, J. Li, and J. McAuley (2019) Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing

    ,
    Cited by: §4.1.
  • S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme (2009) BPR: bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Cited by: §2, §4.1.
  • H. Robbins and S. Monro (1951) A stochastic approximation method. The Annals of Mathematical Statistics. Cited by: §3.5.
  • J. Sun, Z. Cheng, S. Zuberi, F. Pérez, and M. Volkovs (2021) HGCF: hyperbolic graph convolution networks for collaborative filtering. Cited by: §1, §4.1, §4.1.
  • I. Sutskever, J. Martens, G. Dahl, and G. Hinton (2013)

    On the importance of initialization and momentum in deep learning

    .
    In International Conference on Machine Learning, Cited by: §3.1.
  • Y. Tay, L. Anh Tuan, and S. C. Hui (2018) Latent relational metric learning via memory-based attention for collaborative ranking. In Proceedings of the 2018 World Wide Web Conference, Cited by: §3.4.
  • A. A. Ungar (2008) A gyrovector space approach to hyperbolic geometry. Synthesis Lectures on Mathematics and Statistics. Cited by: §3.2.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio (2018) Graph Attention Networks. International Conference on Learning Representations. Cited by: §1.
  • X. Wang, X. He, M. Wang, F. Feng, and T. Chua (2019) Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR Conference on Research and Development in Information Retrieval, Cited by: §1, §1, §2, §4.1.
  • [24] (2020)(Website) External Links: Link Cited by: §4.1.
  • R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec (2018) Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Cited by: §1.