1. Introduction
In information era, recommendation systems have been widely adopted to perform personalized information filtering (Covington et al., 2016; Ying et al., 2018). Even though there are many recommendation paradigms, collaborative filtering (Ebesu et al., 2018; Wang et al., 2019) which generates recommendations by utilizing available historical interactions, remains a fundamental and challenging task. The core idea behind collaborative filtering is to learn compact user/item embeddings and infer a user’s preference to an item according to the distance between their embeddings.
From the perspective of graph, useritem interactions can be viewed as a bipartite graph (Bollobás, 2013), where nodes represent users/items and edges represent their interactions. As a powerful tool of analyzing graphstructured data, Graph Neural Networks(GNNs) (Kipf and Welling, 2017; Veličković et al., 2018; Hamilton et al., 2017) have recently demonstrated great success across various domains, including recommendation systems. Employing multiple layers of neighborhood aggregation, GNNbased methods (Wang et al., 2019; He et al., 2020) have achieved the stateoftheart performance on diverse public benchmarks.
Although GNNbased methods have achieved outstanding performance, it might not be appropriate to adopt Euclidean space to embed users and items. In realworld scenarios, useritem bipartite graphs usually exhibit treelike hierarchical structures (Adcock et al., 2013), in which the number of a node’s neighbors grows exponentially with respect to the number of hops. Ideally, neighbors of node should be embedded in the ball centering at ’s embedding, and the distance between embeddings should reflect the number of hops between nodes. Nevertheless, in Euclidean space, the volume of a ball only grows polynomially as a function of radius. Hence, embedding exponentiallygrowing number of neighbors into polynomiallygrowing size of volume would make distance between embeddings less accurate to reflect distance between nodes in the graph, and this is called distortion (Chami et al., 2019). This kind of distortion makes it difficult to infer a user’s preference to a target item according to the distance between their embeddings.
In contrast, hyperbolic space (Chami et al., 2019) in which the volume of a ball grows exponentially with radius offers a promising alternative. Compared with Euclidean space, hyperbolic space is more suitable for modeling useritem interaction graphs which exhibit strong treelike hierarchical structures. Accordingly, it is a natural choice to conduct user/item embedding and graph convolution in hyperbolic space for recommendation. To the best of our knowledge, the only work adopting a similar idea is HGCF (Sun et al., 2021). However, resorting to tangent space to realize graph convolution makes the performance of HGCF inferior for the two following reasons. On the one hand, tangent space is only local linear approximation of hyperbolic space. During the process of message propagation, errors caused by approximation accumulate and spread to the whole graph. As a result, influence from highorder neighbors cannot be captured accurately. On the other hand, tangent space is actually Euclidean space according to its definition (Boothby, 1986). Hence, the advantage of hyperbolic space in modeling useritem interaction relationship with less distortion cannot be fully utilized.
To overcome the limitation of Euclidean space and obtain more accurate user/item embeddings, we design a novel fully hyperbolic GCN framework specially for collaborative filtering. All operations are conducted in hyperbolic space, more specifically, in the Lorentz model, and we name it Lorentz Graph Collaborative Filtering (LGCF).
The main contributions of this work are summarized as follows:

We propose a fully hyperbolic graph convolution network for recommendation.

We conduct extensive experiments on multiple public benchmark datasets, and the results demonstrate the superiority of our method.
2. Preliminaries
Problem Formulation. In this paper, the standard collaborative filtering setup is considered. Let be the set of users, and be the set of items. Historical interactions between users and items are represented as a binary matrix , where if the th user interacts with the th item, otherwise . Given historical interactions , the goal is to predict potential interactions.
GNNbased Recommendation Methods. GNNbased recommendation methods have received increasing attention for their ability to learn rich node representations. In collaborative filtering setting, GCN has replaced matrix factorization (Rendle et al., 2009) and shown leading performance. In order to capture the influence from highorder neighbors, Wang et al. (Wang et al., 2019) proposed NGCF, a GCN framework performing on useritem interaction bipartite graphs. Subsequently, He et al. (He et al., 2020) empirically found that two most common designs in GCNs — feature transformation and nonlinear activation, contribute little to the performance of collaborative filtering. Hence, they proposed a simplified architecture – LightGCN (He et al., 2020).
3. Our Method
As illustrated in Figure 1
, there are three components in LGCF: (1) an embedding layer that provides and initializes user/item embeddings in hyperbolic space; (2) multiple graph convolution layers that propagate user/item embeddings over the graph; and (3) a prediction layer that estimates a user’s preference to an item by computing the distance between their embeddings.
3.1. Embed Users/Items in Hyperbolic Space
Existing GNNbased recommendation methods usually embed users and items in the same Euclidean space. To accurately model useritem interaction relationship, we investigate the utilization of hyperbolic space. There are several models of hyperbolic space, such as the Lorentz model, the Klein model and the Poincaré ball model. In this paper, we choose the Lorentz model due to its simplicity and numerical stability. Formally, the Lorentz model of dimensional hyperbolic space is defined as:
(1) 
where is Lorentz inner product and is defined as . In addition, at an arbitrary point , hyperbolic space can be locally approximated by a linear Euclidean space. And this approximated Euclidean space is termed as tangent space in which norm is well defined. In LGCF, both users and items are embedded in the same Lorentz model of hyperbolic space.
It is well known that random initialization can have a significant impact on optimization in training (Sutskever et al., 2013)
. A common practice in Euclidean space is Gaussian distribution initialization. Similarly, we design an initialization strategy for embeddings based on Wrapped Normal Distribution
(Nagano et al., 2019) which is a generalization of Gaussian distribution to hyperbolic space.3.2. Graph Convolution Layer
The basic idea of GCNbased recommendation models is learning representations for users and items by aggregating neighbors’ information iteratively over the interaction graph. In order to apply GCN for recommendation where users and items are embedded in hyperbolic space, we design graph convolution layers specially since naïve generalization will drive embeddings out of hyperbolic space. Before that, we give a brief review of existing graph convolution layers in Euclidean space.
In Euclidean space, a graph convolution operation is composed of three steps: feature transformation, neighborhood aggregation and nonlinear activation. Among them, feature transformation is performed through linear transformation. In LGCF, we discard the feature transformation operation for two reasons. On the one hand, different from attributed graphs (e.g., citation networks) where nodes bring rich feature information, nodes in useritem interaction graphs contain no semantics but onehot IDs. In this case, feature transformation may provide no benefits, and could bring difficulties to training. On the other hand, linear transformation (matrixvector multiplication) is not welldefined in hyperbolic space since it is not a vector space.
Neighborhood Aggregation. Exising neighborhood aggregation of the th layer can be summarized as:
(2) 
in which represents the th node’s embedding and denotes the set consisting of node and its neighborhood nodes. A natural generalization of mean to hyperbolic space is Einstein midpoint (Ungar, 2008). However, it is defined in the Klein model , while user/item embeddings lie in the Lorentz model. Luckily, there are isometric maps between the Klein model and the Lorentz model defined as follows:
(3) 
in which and .
Hence, we propose a neighborhood aggregation strategy utilizing the Klein model as an intermediate bridge. Specifically, neighborhood aggregation can be divided into three steps. First, current embeddings are mapped from the Lorentz model to the Klein model by . Then, neighborhood aggregation is conducted in the Klein model as follows:
(4) 
Here, we obtain aggregated user/item embeddings in the Klein model. Last, aggregated embeddings are mapped back to the Lorentz model through .
Nonlinear Activation Layer. In Euclidean space, nonlinear activation has proven to be a key component in modern neural networks. However, direct adoption will drive the computation result out of hyperbolic space. To fix this problem, we design a calibration strategy following general activation. Formally, let
be the output of general activation function
, e.g., ReLU. The first element of
is calibrated while the other elements remain unchanged to pull the activated embedding back to hyperbolic space:(5) 
3.3. Prediction Layer
After propagating with graph convolution layers, we obtain multiple representations for users and items. Representations generated in different layers emphasize messages passed through different connections, and they reflect users’ preference or items’ attributes from different perspectives. Recommendation models operating in hyperbolic space usually estimate a user’s preference to a target item according to the distance or similarity metric between their representations. In the Lorentz model, the generalization of a straight line in Euclidean space is geodesics which gives the shortest distance between two points . Formally, the geodesics distance between and is defined as:
(6) 
Hence, in LGCF, we infer the preference of user to item based on the geodesics distance between their corresponding representations. Further, to utilize different semantics captured by different layers, we take representations learned by different layers into consideration simultaneously. In summary, LGCF estimates the preference of user to target item as:
(7) 
in which and are representations generated by the
th layer. Even though there are multiple choices for layer aggregation, such as weighted average, max pooling, LSTM, etc., we find simple summation adopted here works well empirically.
3.4. Margin Ranking Loss
Margin ranking loss (Tay et al., 2018) has been a competitive choice for distancebased recommendation systems. Since it encourages positive and negative useritem pairs to be separated up to a given margin. Once the difference between a negative useritem pair and a positive one is greater than the margin, these two useritem pairs will make no contribution to the loss. In this way, hard pairs violating the margin are focused all the time, making optimization much easier. We extend margin ranking loss to hyperbolic space based on geodesics distance. Given a sampled positive useritem pair and a negative one , geodesics margin loss is defined as:
(8) 
where
is a nonnegative hyperparameter. Note that representations obtained by different layers contribute to the loss simultaneously. This not only makes it possible to utilize different semantics captured by different layers, but also decreases the difficulty of optimization due to the residual connection
(He et al., 2016).3.5. Optimization
The only parameter of LGCF is the embedding matrix of users and items. These embeddings lie in the Lorentz model of hyperbolic space which is out of the range of common optimization algorithms such as SGD (Robbins and Monro, 1951) and Adam (Kingma and Ba, 2014). Hence, we employ RGSD (Bonnabel, 2013), a generation of SGD to hyperbolic space, which mimics SGD’s behavior while taking into account the geometry of hyperbolic space.
4. Experiments
4.1. Set Up
Dataset  #User  #Item  #Interactions 

AmazonCD  22,947  18,395  422,301 
AmazonBook  52,406  41,264  1,861,118 
Yelp2020  91,174  45,063  1,940,014 
Datasets and Baselines. Following HGCF (Sun et al., 2021), we employ AmazonCD (Ni et al., 2019), AmazonBook (Ni et al., 2019) and Yelp2020 (24) datasets. Dataset statistics are provided in Table 1. Each dataset is split into 8020 train and test sets. Multiple competitive baseline methods from three categories are compared: BPRMF (Rendle et al., 2009), NGCF (Wang et al., 2019), LightGCN (He et al., 2020), HVAE (Mirvakhabova et al., 2020), HGCF (Sun et al., 2021). Among them, BPRMF optimizes matrix factorization by Bayesian personalize ranking (BPR) loss (Rendle et al., 2009). NGCF and LightGCN employ GNN in Euclidean space. HVAE combines variational autoencoder(VAE) with hyperbolic geometry. Last, HGCF applies the latest Hyperbolic GCN (Chami et al., 2019) to recommendation systems.
Implementation. For a fair comparison, the embedding dimensionality is set to 50 for all methods, and the same negative sampling strategy is adopted. For all baseline methods, suggested settings in original papers are followed. Grid search for hyperparameters are conducted following HGCF (Sun et al., 2021)
. For LGCF, the number of GCN layers is set to 3. We set the learning rate to 0.001 and weight decay to 0.005. And model is trained for 1000 epochs. Hyperparameter margin
is tuned from .4.2. Overall Results
Datasets  Metrics  BPRMF  NGCF  LightGCN  HVAE  HGCF  LGCF 
AmazonCD  R@10  0.0779  0.0758  0.0929  0.0781  0.0962  0.0996 
R@20  0.1200  0.1150  0.1404  0.1147  0.1455  0.1503  
AmazonBook  R@10  0.0611  0.0658  0.0799  0.0774  0.0867  0.0899 
R@20  0.0794  0.1050  0.1248  0.1125  0.1318  0.1360  
Yelp2020  R@10  0.0325  0.0458  0.0522  0.0421  0.0543  0.0573 
R@20  0.0556  0.0764  0.0866  0.0691  0.0884  0.0946  
Datasets  Metrics  BPRMF  NGCF  LightGCN  HVAE  HGCF  LGCF 
AmazonCD  N@10  0.0610  0.0591  0.0726  0.0629  0.0751  0.0780 
N@20  0.0974  0.0718  0.0881  0.0749  0.0909  0.0945  
AmazonBook  N@10  0.0594  0.0655  0.0780  0.0778  0.0869  0.0906 
N@20  0.0971  0.0791  0.0938  0.0901  0.1022  0.1063  
Yelp2020  N@10  0.0283  0.0405  0.0461  0.0371  0.0458  0.0485 
N@20  0.0512  0.0513  0.0582  0.0465  0.0585  0.0612  
Recall and NDCG results for all datasets are reported in Table 2 and Table 3 respectively. We can see that LGCF consistently outperforms other methods on all the three datasets. Compared with LightGCN, LGCF achieves an improvement up to 16.15% in NDCG@10 on AmazonBook dataset, demonstrating the superiority of hyperbolic space over Euclidean space in modeling realworld useritem interactions.
Among the baseline methods, HGCF is the most competitive counterpart of LGCF. Even though HGCF adopts hyperbolic space, it resorts to tangent space to conduct aggregation operations, which brings inevitable distortion. In contrast, LGCF performs all graph convolution operations in hyperbolic space. LGCF outperforms HGCF with wide margins on all the datasets, showing the information loss introduced by tangent space in HGCF.
4.3. Ablation Study
R@10  R@20  N@10  N@20  
LGCF  0.0573  0.0946  0.0485  0.0612 
LGCFtangent  0.0545  0.0895  0.0463  0.0586 
To further analyze the effect of fully hyperbolic graph convolution network, we conduct an ablation study on Yelp2020, the largest one among the three datasets. Since simply replacing fully hyperbolic graph convolution with regular Euclidean graph convolution would drive user/item embeddings out of hyperbolic space, we conduct convolution operations in tangent space, and name this model variant as LGCFtangent. From experimental results shown in Table 4, we can observe that there is a wide margin between the performance of LGCF and LGCFtangent. This is because, in LGCFtangent, errors caused by tangent space approximation accumulate and spread to the whole graph. As a result, influence from neighbors, especially highorder neighbors, cannot be captured accurately.
4.4. Effect of Embedding Dimensionality
In order to validate the advantage of hyperbolic space to learn compact representations, we compare the results of LGCF and LightGCN with different values of embedding dimensionality. From Figure 2, we can observe that LGCF outperforms LightGCN consistently at all dimensionality values, and the greatest margin occurs at lower dimensionality. LGCF requires far lower embedding dimensionality to achieve comparable performance to its Euclidean analogue. This reflects that LGCF’s advantage is more prominent when the embedding dimensionality cannot be large due to limited computing and storage resources.
5. Conclusion
In this paper, we propose LGCF, a fully hyperbolic GCN model for recommendation. Utilizing the advantage of hyperbolic space, LGCF is able to embed users/items with less distortion and capture useritem interaction relationship more accurately. Extensive experiments on public benchmark datasets show that LGCF outperforms both Euclidean and hyperbolic counterparts and requires far lower embedding dimensionality to achieve comparable performance.
References
 Treelike structure in large social and information networks. In IEEE International Conference on Data Mining, Cited by: §1.
 Modern graph theory. Cited by: §1.
 Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control. Cited by: §3.5.
 An introduction to differentiable manifolds and riemannian geometry. Cited by: §1.

Hyperbolic graph convolutional neural networks
. Advances in Neural Information Processing Systems. Cited by: §1, §1, §4.1.  Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Cited by: §1.
 Collaborative memory network for recommendation systems. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Cited by: §1.
 Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, Cited by: §1.

Deep residual learning for image recognition.
In
IEEE Conference on Computer Vision and Pattern Recognition
, Cited by: §3.4.  Lightgcn: simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Cited by: §1, §2, §4.1.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.5.
 Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations, Cited by: §1.
 Performance of hyperbolic geometry models on topn recommendation tasks. In Fourteenth ACM Conference on Recommender Systems, Cited by: §4.1.

A wrapped normal distribution on hyperbolic space for gradientbased learning.
In
International Conference on Machine Learning
, Cited by: §3.1. 
Justifying recommendations using distantlylabeled reviews and finegrained aspects.
In
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
, Cited by: §4.1.  BPR: bayesian personalized ranking from implicit feedback. In Proceedings of the TwentyFifth Conference on Uncertainty in Artificial Intelligence, Cited by: §2, §4.1.
 A stochastic approximation method. The Annals of Mathematical Statistics. Cited by: §3.5.
 HGCF: hyperbolic graph convolution networks for collaborative filtering. Cited by: §1, §4.1, §4.1.

On the importance of initialization and momentum in deep learning
. In International Conference on Machine Learning, Cited by: §3.1.  Latent relational metric learning via memorybased attention for collaborative ranking. In Proceedings of the 2018 World Wide Web Conference, Cited by: §3.4.
 A gyrovector space approach to hyperbolic geometry. Synthesis Lectures on Mathematics and Statistics. Cited by: §3.2.
 Graph Attention Networks. International Conference on Learning Representations. Cited by: §1.
 Neural graph collaborative filtering. In Proceedings of the 42nd international ACM SIGIR Conference on Research and Development in Information Retrieval, Cited by: §1, §1, §2, §4.1.
 [24] (2020)(Website) External Links: Link Cited by: §4.1.
 Graph convolutional neural networks for webscale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Cited by: §1.
Comments
There are no comments yet.