Hyperbolic Recommender Systems

Many well-established recommender systems are based on representation learning in Euclidean space. In these models, matching functions such as the Euclidean distance or inner product are typically used for computing similarity scores between user and item embeddings. This paper investigates the notion of learning user and item representations in Hyperbolic space. In this paper, we argue that Hyperbolic space is more suitable for learning user-item embeddings in the recommendation domain. Unlike Euclidean spaces, Hyperbolic spaces are intrinsically equipped to handle hierarchical structure, encouraged by its property of exponentially increasing distances away from origin. We propose HyperBPR (Hyperbolic Bayesian Personalized Ranking), a conceptually simple but highly effective model for the task at hand. Our proposed HyperBPR not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in Hyperbolic space.


Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Hyperbolic space and hyperbolic embeddings are becoming a popular resear...

Augmenting the User-Item Graph with Textual Similarity Models

This paper introduces a simple and effective form of data augmentation f...

Multi-Manifold Learning for Large-scale Targeted Advertising System

Messenger advertisements (ads) give direct and personal user experience ...

Scalable Hyperbolic Recommender Systems

We present a large scale hyperbolic recommender system. We discuss why h...

HyperExpan: Taxonomy Expansion with Hyperbolic Representation Learning

Taxonomies are valuable resources for many applications, but the limited...

HICF: Hyperbolic Informative Collaborative Filtering

Considering the prevalence of the power-law distribution in user-item ne...

HRCF: Enhancing Collaborative Filtering via Hyperbolic Geometric Regularization

In large-scale recommender systems, the user-item networks are generally...


The pervasive impact that recommender systems have on the web is evident. This widespread ubiquity is understandable, given the growth of data in recent years whereby users are commonly plagued with over-choice. After all, interaction data (clicks, purchases, etc.) lives at the heart of many web applications such as content streaming sites, e-commerce and so on. To this end, recommender systems serve as not only a great mitigation strategy, but also create an overall better user experience on the web. This paper is concerned with the task of personalized (or collaborative) ranking, in which a ranked list of prospective candidate items is served to each user.

Learning representations of user and item pairs forms the crux of the personalized ranking problem. Across the literature, a diverse plethora of machine learning models have been proposed

[Rendle et al.2009, Rendle2010, Mnih and Salakhutdinov2008, He et al.2017]. A variety of matching functions have been traditionally adopted, such as the inner product (Bayesian Personalized Ranking) [Rendle et al.2009], Euclidean distance (Collaborative Metric Learning [Hsieh et al.2017]

and/or neural networks

[He et al.2017]. Notably, a common denominator is that all of these models operate in Euclidean space which may be sub-optimal for interaction data.

This paper investigates the notion of learning user-item representations in Hyperbolic space in which the distance increases exponentially relative to the origin. Hyperbolic representation learning have recently demonstrated great promise across a diverse range of applications such as learning entity hierarchies [Nickel and Kiela2017]

and/or natural language processing.

[Tay, Tuan, and Hui2018, Dhingra et al.2018]. In a similar vein, we hypothesize that a non-conformal space, provides a more suitable inductive bias for interaction data that is commonplace in recommender systems. Intuitively, Hyperbolic spaces induce a tree-structured (hierarchical) embedding space, which is inherently more suitable for modeling hierarchical structure. We show that a conceptually simple Hyperbolic adaptation of the popular Bayesian Personalized Ranking (BPR) algorithm is capable of not only achieving very competitive results, but also outperforms more complex neural models on multiple personalized ranking benchmarks.

It is intuitive that hierarchical structures exists as one of the predominant flavors in recommender systems. Naturally, items generally exhibit hierarchical structure (i.e., movies, products tend to follow a product hierarchy). Similarly, implicit user interactions may also inhibit hierarchical qualities due to intrinsic power-law nature of the problem domain. The notion of exploiting hierarchical structure has been established in many existing works in the literature [Wang et al.2015, Zhao et al.2017, Wang et al.2018]. However, this work is the first work to explore a hierarchical inductive bias for training machine learning models for recommender systems. Our experiments show that our proposed model, trained with this inductive bias, leads to considerable improvements in ranking performance of the model.

The usage of Hyperbolic distance qualifies our model as a metric learning approach, albeit in Hyperbolic space as opposed to Euclidean space. Metric learning models such as the Collaborative Metric Learning [Hsieh et al.2017] have reasonably demonstrated empirical success. However, it has been argued to introduce instability according to [Tay, Anh Tuan, and Hui2018] due to its inability to fit a large number of interactions with a fixed set of parameters. To this end, we argue that the Hyperbolic space can be interpreted to be seemingly larger than Euclidean spaces in the sense that the norm (distance from the origin) captures some information. Due to the increasing distance from the origin, this causes the embedding space to have a greater extent of representation capability as opposed to Euclidean spaces. This reinforces the key intuition of modeling user-item pairs in Hyperbolic space, while maintaining the simplicity and effectiveness of the CML model.

Our Contributions

All in all, the key contributions of this work are summarized as follows:

  • We investigate the notion of training recommender systems in Hyperbolic space as opposed to Euclidean space. We propose Hyperbolic Bayesian Personalized Ranking (HyperBPR), a strong competitive model for one-class collaborative filtering (i.e., personalized ranking). To the best of our knowledge, this is the first work that explores the use of Hyperbolic space for the recommender systems domain.

  • We conduct extensive experiments on eight benchmark datasets. Our proposed HyperBPR demonstrates the effectiveness of the Hyperbolic space, outperforming not only it’s Euclidean counterparts but also a suite of competitive baselines. Notably, HyperBPR outperforms the state-of-the-art neural collaborative filtering (NCF) and collaborative metric learning (CML) models on all benchmarks. We achieve a reasonable performance gain over competitors, pulling ahead by up to performance in terms of standard ranking metrics.

  • We conduct extensive qualitative and visualization experiments, delving into the inner workings of our proposed HyperBPR.

Related Work

Across the rich history of recommender systems research, a myriad of machine learning models have been proposed [Rendle2010, Mnih and Salakhutdinov2008, Rendle et al.2009, He et al.2016, Koren2008, He et al.2017, Hsieh et al.2017]. Traditionally, many works are mainly focused on factorizing the interaction matrix, i.e, Matrix Factorization [Mnih and Salakhutdinov2008, Koren, Bell, and Volinsky2009], learning latent factors for user and items based on their preferences. Naturally, the formulation of matrix factorization is equivalent combining the user-item embeddings using the inner product [He et al.2017]. To this end,[Hsieh et al.2017] argued that this formulation lack expressiveness due to its violation of the triangle inequality. As a result, the authors proposed Collaborative Metric Learning (CML), a strong recommendation baseline based on Euclidean distance. Notably, many recent works have moved into neural models [He et al.2017, Zhang et al.2018], in which stacked nonlinear transformations have been used to approximate the interaction function.

Our work is concerned with recommendation with implicit feedback (i.e., clicks, likes of binary nature). In this task, the Bayesian Personalized Ranking (BPR) model [Rendle et al.2009] remains a strong competitive baseline. BPR has seen widespread success across a myriad of domains and applications [Dave et al.2018b, Zhang et al.2016, He and McAuley2016b, Dave et al.2018a]. Our work trains the BPR model in Hyperbolic space, by incorporating the Hyperbolic distance as the similarity function between user and item.

Our work is inspired by recent advances in Hyperbolic representation learning [Nickel and Kiela2017, Cho et al.2018, Nickel and Kiela2018, Ganea, Bécigneul, and Hofmann2018, Sala et al.2018, Davidson et al.2018]. For instance, [Tay, Tuan, and Hui2018] proposed training a question answering system in Hyperbolic space. [Dhingra et al.2018] proposed learning word embeddings using a Hyperbolic neural network. [Gülçehre et al.2018]

proposed an Hyperbolic variation of self-attention and the transformer network, and applies it to tasks such as visual question answering and neural machine translation. While the advantages of Hyperbolic space seems eminent in the wide variety of application domains, there is no work that investigates this embedding space within the context of recommender systems and implicit interaction data. This constitutes the key novelty of our work. A detailed primer on Hyperbolic spaces is given in the technical exposition of the paper.

Hyperbolic Recommender Systems

This section outlines the overall architecture of our proposed model. The key motivation behind our architecture is to embed the two user-item pairs into the hyperbolic space and then maximize the margin between the scores of the positive user-item pair and the negative user-item pair through pairwise learning. Figure 2 depicts the overall model architecture.

Input Encoding

Our proposed model takes a user (denoted as ), a positive (observed) item (denoted as ) and a negative (unobserved) item (denoted as

) as an input. Each user and item are represented as one-hot vectors which maps onto a dense low-dimensional vector by indexing onto an user/item embedding matrix. Our model then leverages Bayesian Personalized Ranking (BPR) to optimize the pairwise ranking between the positive and negative item.

(a) M. C. Escher’s Circle Limit III, 1959 (b) Lines through a given point and parallel to a given line, illustrated in the Poincaré disk model
Figure 1: Visualizations of Hyperbolic space.
Property Euclidean Spherical Hyperbolic
Curvature 0 >0 <0
A line no finite length; unbounded finite length; unbounded finite length
Two distinct lines not enclose a finite area enclose a finite area not enclose a finite area
Parallel lines 1 0
Sum of triangle angles > <
Circle length
Disk area
Table 1: Some properties of Euclidean, spherical and hyperbolic geometry; in which is the radius and .

Hyperbolic Geometry & Poincaré Embeddings

The hyperbolic space is uniquely defined as a complete and simply connected Riemannian manifold with constant negative curvature [Krioukov et al.2010] as visualized in Figure 1111Images were taken at https://en.wikipedia.org/wiki/Hyperbolic_geometry. In fact, there are only three types of the Riemannian manifolds of constant curvature, which are Euclidean geometry (constant vanishing sectional curvature), spherical geometry (constant positive sectional curvature) and hyperbolic geometry (constant negative sectional curvature). Some properties of the three geometries can be found at Table 1. In this paper, we pay attention to the Euclidean spaces and hyperbolic spaces due to the key difference in their space expansion. Indeed, hyperbolic spaces expand faster (exponentially) than Euclidean spaces (polynomially). Specifically, for instance, in the two-dimensional hyperbolic space of constant curvature , with the hyperbolic radius of , we have:


in which is the length of the circle and is the area of the disk. Hence, both Eqn. (1) and (2) illustrate the exponentially growing/expansion of the hyperbolic space with respect to the radius .

Although hyperbolic space cannot be isometrically embedded into Euclidean space, there exists multiple models of hyperbolic geometry that can be formulated as a subset of Euclidean space and are very insightful to work with depends on different tasks. Amongst these models, we prefer the Poincaré ball model as proposed by [Nickel and Kiela2017] due to its conformality (i.e., angles are preserved between hyperbolic and Euclidean space) and convenient parameterization.

The Poincaré ball model is the Riemannian manifold , in which is the open -dimensional unit ball that equipped with the metric as:


where ; and

is the Euclidean metric tensor with components

of .

The distance between two points on is given by:


We adopt the hyperbolic distance function to model the relationships between users and items. Specifically, the hyperbolic distance between user and item is calculated based on Eqn. (4). On a side note, it is worth mentioning that helps to discover the latent hierarchies automatically as the distance within the Poincaré ball changes smoothly with respect to the norm of and . Notably, the distance between points grow exponentially as the norm of the vectors approaches 1. Geometrically, if we place the root node of a tree at the origin of , the children nodes thus spread out exponentially with their distance to the root towards the boundary of the ball due to the above mentioned property.

Learning Hyperbolic Representations of User-Item Pairs

Inspired by [Gülçehre et al.2018], the hyperbolic distance is then passed into an extra layer called hyperbolic matching layer for matching pairs of users and items. Given a user and an item that are both lying in , we take:


where is simply preferred as a linear function with and are scalar parameters and learned along with the network.

Figure 2: Illustration of our proposed HyperBRP architecture.

Optimization and Learning

This section illustrates the optimization and learning process of HyperBPR.

BPR Triplet Loss.

HyperBPR leverages BPR pairwise learning to minimize the pairwise ranking loss between the positive and negative items. The objective function is defined as follows:


where is the triplet that belongs to the set that contains all pairs of positive and negative items for each user;

is the logistic sigmoid function;

represents the model parameters; and is the regularization parameter.

Gradient Conversion.

The parameters of our model are learned by using RSGD [Bonnabel2013]. As similar to [Nickel and Kiela2017], the parameter updates have the form:


where denotes a retraction onto at ; is the learning rate at time ; and is the Riemannian gradient with respect to .

The Riemannian gradient is then calculated from the Euclidean gradient by rescaling with the inverse of the Poincaré ball metric tensor:


The details of gradient conversion can be referred to [Nickel and Kiela2017, Tay, Tuan, and Hui2018].


Experimental Setup

In this section, we introduce the overall experimental setup.


Dataset Interactions # Users # Items % Density
Clothing 235,906 7,917 171,760 1.74
Sports 113,119 3,740 54,744 5.53
Cell Phones 32,885 1,141 18,797 15.33
Toys & Games 111,301 3,143 61,733 5.74
Tools & Home 64,182 2,047 35,793 8.76
Automotive 34,167 1,211 26,096 10.81
Patio/Lawn 10,702 374 7,293 39.24
Musical 16,501 471 12,206 28.70
Table 2: Statistics of all datasets used in our experimental evaluation
Clothing, Shoes and Jewelry Sports and Outdoors Cell phones and Accessories Toys and Games Tools & Home Improvements Automotive Patio, Lawn and Garden Musical Instruments
Figure 3: Two-dimensional hyperbolic embedding of 8 Amazon datasets in the Poincaré disk. The images illustrate the embedding of user and item pairs after the convergence.

For our experimental evaluation, we adopt eight datasets from Amazon datasets [He and McAuley2016a]. The selection is based on promoting diversity based on dataset size and domain, in which we ensure the inclusion of both large/small datasets across various domains. The datasets can be obtained at http://jmcauley.ucsd.edu/data/amazon/ with their domain names truncated in the interest of space. The statistics of the datasets are reported in Table 2.

(a). Intermediate embedding of HyperBPR after 10 epochs, 100 epochs and embedding after convergence.

(b). Intermediate embedding of CML after 10 epochs, 100 epochs and embedding after convergence.
Figure 4: Comparison between two-dimensional Poincaré embedding and Euclidean embedding on Automotive dataset. The images illustrate the intermediate embedding of HyperBPR and CML after 10 epochs, 100 epochs and the embedding after convergence.
0.039 0.024 0.058 0.035 0.051 0.032 0.059 0.035 0.066 0.040 0.120 0.074
0.149 0.100 0.120 0.071 0.148 0.100 0.118 0.076 0.159 0.107 0.193 0.132
Cell Phones
0.186 0.128 0.147 0.092 0.200 0.130 0.157 0.101 0.203 0.127 0.243 0.158
Toys & Games 0.274 0.209 0.255 0.178 0.288 0.216 0.236 0.167 0.292 0.212 0.360 0.272
Tools & Home
0.139 0.095 0.134 0.087 0.161 0.115 0.146 0.086 0.167 0.112 0.198 0.135
Automotive 0.034 0.023 0.047 0.030 0.048 0.031 0.048 0.030 0.059 0.037 0.121 0.074
0.175 0.116 0.164 0.102 0.208 0.126 0.151 0.092 0.156 0.099 0.290 0.183
0.055 0.037 0.037 0.018 0.055 0.033 0.050 0.023 0.059 0.043 0.116 0.068
Table 3: Experimental results on 8 Amazon datasets. Our proposed HyperBPR achieves very competitive results, outperforming strong Euclidean baselines such as CML and BPR.

Evaluation Setup and Metrics

We experiment on the collaborative ranking (or one-class collaborative filtering) setup. We adopt Hit Ratio (HR@10) and nDCG@10 (normalized discounted cumulative gain) evaluation metrics, which are well-established ranking metrics for the task at hand. Following

[He et al.2017, Tay, Anh Tuan, and Hui2018], we randomly select negative samples which the user have not interacted with and rank the ground truth amongst these negative samples. We set since we empirically found this to be sufficient for probing differences in relative performance amongst compared baselines. For all datasets, the last item the user has interacted with is withheld as the test set while the penultimate serves as the validation set. During training, we report the test scores of the model based on the best validation scores.

Compared Baselines

In our experiments, we compare with five well-established and competitive baselines.

  • Bayesian Personalized Ranking (BPR) [Rendle et al.2009] is a strong collaborative filtering (CF) baseline that takes three inputs include users, positive items, and negative items. The triplet objective is to rank positive item higher than negative item for that user.

  • Multi-layered Perceptron (MLP)

    is a feedforward neural network that applies multiple layers of nonlinearities to capture the relationship between users and items. Following [He et al.2017], we use a three layered MLP with a pyramid structure.

  • Matrix Factorization (MF) is the standard baseline for recommender systems. It models the user-item representation using the inner product.

  • Neural Collaborative Filtering (NCF) [He et al.2017]

    is the state-of-the-art method for collaborative filtering. The key idea of NCF is to fuse the last hidden representation of MF and MLP together into a joint model.

  • Collaborative Metric Learning (CML) [Hsieh et al.2017] is a strong metric learning baseline that learns user-item similarity using the Euclidean distance. CML can be considered a key ablative baseline in our experiments, signifying the difference between Hyperbolic and Euclidean metric spaces.

(a) After 10 epochs (b) After 100 epochs (c) After convergence
Figure 5: Transformation of the user/item embeddings on Musical dataset with respect to the number of epochs.

Implementation Details

We implement all models in Tensorflow. All models are trained using Adam

[Kingma and Ba2014] with a learning rate is tuned amongst . The embedding size of all models is tuned amongst and selectively set to . The number of batch is tuned amongst . For models that optimize the hinge loss, the margin is tuned amongst . The NCF and MLP models are implemented following the configuration and architecture in [He et al.2017]

; however, the pretrained MF and MLP are not applied to NCF for a fair comparison. All the embeddings and parameters are randomly initialized using the Gaussian distribution with mean of 0 and standard deviation of 0.01. For most datasets and baselines, we empirically set the hyperparameters with the learning rate of

, the number of batches is 10, the embedding size of and the margin is set to 0.1.

Clothing, Shoes and Jewelry Sports and Outdoors Cell phones and Accessories Toys and Games Tools & Home Improvements Automotive Patio, Lawn and Garden Musical Instruments
Figure 6: Effects of the embedding size on 8 Amazon datasets.

Experimental Results

This section experimentally presents our results on all datasets. For all obtained results, the best result is in boldface whereas the second best is underlined. As reported in Table 3, our proposed model significantly outperforms all the baselines on both HR@10 and nDCG@10 metrics across all datasets.

Pertaining to the baselines, CML outperforms other baselines in most of the datasets. We observe that the performance of MF and CML is extremely competitive, i.e. both MF and CML consistently achieve good results across the datasets. The performance gain of CML on the datasets is approximately 1%-2%. Notably, the performance of MF is much better than CML on Patio dataset. One possible reason is that for the small datasets with high density (e.g., Patio with density of 39.24%), a simple model such as MF should be considered as a priority choice. In addition, the performance of NCF is often only comparable to vanilla MLP and MF in most cases. The explanation is because of using the dual embedding spaces (since NCF combines MLP and MF), this kind of usage could possibly lead to the overfitting if the dataset is not large enough [Tay, Anh Tuan, and Hui2018].

Remarkably, our proposed model HyperBPR significantly outperforms the best baseline method. The percentage improvements in term of nDCG on eight datasets (in the same order as reported in Table 3) are +3.39%, +2.50%, +2.83%, +5.54%, +2.00%, +3.76%, +5.72% and +2.45% respectively. We also observe similar high performance gains on the hit ratio (HR@10). Note that the Amazon datasets follow power-law distribution due to its rich and detailed category hierarchy [McAuley et al.2015]. Therefore, it enables us to achieve very competitive results of our proposed HyperBPR in the hyperbolic space over other strong Euclidean baselines. Informally, since trees require an exponential space for branching in which only hyperbolic geometry has this characteristic, trees prefer to be embedded in the hyperbolic space instead of Euclidean space. In other words, trees can be considered as discrete hyperbolic spaces [Krioukov et al.2010]. Our experimental evidence shows the remarkable recommendation results of our proposed HyperBPR model on the variety of datasets and the advantage of hyperbolic space over Euclidean space in handling hierarchical data structure.

Qualitative Analysis

This section investigates the qualitative analysis of our proposed model to understand the behavior of the embeddings in hyperbolic space.

Hyperbolic convergence

Figure 3 represents the two-dimensional hyperbolic embedding on the test set of 8 Amazon datasets after the convergence. We observe that item embeddings form a sphere over the user embeddings. Moreover, since we conduct the analysis on the test set, the visualization of the user/item embeddings in Figure 3 demonstrates the ability of HyperBPR to self-organize and automatically detect the hierarchical structure in the user/item embeddings, as similar to [Tay, Anh Tuan, and Hui2018].

On a side note, we observe that smaller datasets with high density such as Patio and Musical tends to force the user-item pair embeddings to the boundary of the ball at the convergence. We take the Musical dataset as an example to visualize the transformation of the embeddings. Figure 5 illustrates the user/item embeddings transformation with respect to the number of epochs. It is apparent that after the first 100 epochs, the user and item embeddings are likely to converge and form linking pairs between the embeddings. At the convergence, the embeddings are then pushed toward the boundary, which also give a sign of no hierarchical structure in the dataset.

Convergence comparison

Figure 4 illustrates the comparison between two-dimensional Poincaré embedding (HyperBPR) and Euclidean embedding (CML) on the Automotive dataset. For the CML, we decide to clip the norm, i.e. the norm of the embeddings is constrained to 1, for an analogous comparison.

At first glance, we notice the difference between the two types of embedding by observing the distribution of user and item embeddings in the spaces regarding the number of epochs. While HyperBPR has the item embeddings gradually assemble as the number of epochs increases, the item embeddings of CML have the opposite movement. The reason is because the learned metric of CML pulls the positive items closer while simultaneously pushing the negative items further apart; thus, the item embeddings are then pushed toward the boundary. In addition, the convergence of CML shows no hint of hierarchy which is a deficiency compare to HyperBPR.

Effect of Embedding Size

In this section, we study the effect of the embedding size on the performance of our proposed model and the baselines. Figure 6 represents the effect of the embedding size for on 8 Amazon datasets in term of nDCG@10. In general, we observe that HyperBPR always significantly outperforms the baselines regardless of the embedding size. While NCF maintains a stable performance throughout embedding size, the performance of other baselines seem to slightly fluctuate. Additionally, we notice that HyperBPR has its nDCG@10 only slightly decreases at but then still maintains superb performance as the embedding size increases.


In this paper, we introduce a new effective and competent recommendation model called HyperBPR. To the best of our knowledge, HyperBPR is the first model to explore the hyperbolic space in recommender system. Through extensive experiments on 8 datasets, we are able to demonstrate the effectiveness of HyperBPR over other baselines in Euclidean space, even state-of-the-art models such as CML or NCF. The promising results of HyperBPR may inspire other future works to explore hyperbolic space in solving recommendation problems.